After reading many of the comments in this thread, I suspect many (not all) issues come from lack of planning and poor prompting.
For anything moderately complex, use Claude's plan mode; you get to approve the plan before turning it loose. The planning phase is where you want to use a more sophisticated model or use extended thinking mode.
Once you have a great plan, you can use a less sophisticated model to execute it.
Even if you're a great programmer, you may suck at prompting. There's an art and a science to prompting; perhaps learn about it? [1]
Don't forget; in addition to telling Claude or any other model what to do, you can also tell them what not to do in the CLAUDE.md or equivalent file.
"Now we don't need to hire a founding engineer! Yippee!" I wonder all these people who are building companies that are built on prompts (not even a person) from other companies. The minute there is a rug pull (and there WILL be one), what are you going to do? You'll be in even worse shape because in this case there won't be someone who can help you figure out your next move, there won't be an old team, there will just be NO team. Is this the future?
Probably similar to the guy who was gloating on Twitter about building a service with vibe coding and without any programming knowledge around the peak of the vibe coding madness.
Only for people to start screwing around with his database and API keys because the generated code just stuck the keys into the Javascript and he didn't even have enough of a technical background to know that was something to watch out for.
IIRC he resorted to complaining about bullying and just shut it all down.
I don't actually hear people call it vibe coding as much as I did back in late 2024/early 2025.
Sure there are many more people building slop with AI now, but I meant the peak of "vibe coding" being parroted around everywhere.
I feel like reality is starting to sink in a little by now as the proponents of vibe coding see that all the companies telling them that programming as a career is going to be over in just a handful of years, aren't actually cutting back on hiring. Either that or my social media has decided to hide the vibe coding discourse from me.
Honestly i'm less scared of claude doing something like that, and more scared of it just bypassing difficult behavior. Ie if you chose a particularly challenging feature and it decided to give up, it'll just do things like `isAdmin(user) { /* too difficult to implement currently */ true }`. At least if it put a panic or something it would be an acceptable todo, but woof - i've had it try and bypass quite a few complex scenarios with silently failing code.
Sounds like a prompting/context problem, not a problem with the model.
First, use Claude's plan mode, which generates a step-by-step plan that you have to approve. One tip I've seen mentioned in videos by developers: plan mode is where you want to increase to "ultrathink" or use Opus.
Once the plan is developed, you can use Sonnet to execute the plan. If you do proper planning, you won't need to worry about Claude skipping things.
Seems like he's still going on about being able to replicate billion dollar companies' work quickly with AI, but at least he seems a little more aware that technical understanding is still important.
Any cost/benefit analysis of whether to use AI has to factor in the fact that AI companies aren't even close to making a profit, and are primarily funded by investment money. At some point, either the cost to operate these AI models needs to go down, or the prices will go up. And from my perspective, the latter seems a lot more likely.
Rug pulls from foundation labs are one thing, and I agree with the dangers of relying on future breakthroughs, but the open-source state of the art is already pretty amazing. Given the broad availability of open-weight models within under 6 months of SotA (DeepSeek, Qwen, previously Llama) and strong open-source tooling such as Roo and Codex, why would you expect AI-driven engineering to regress to a worse state than what we have today? If every AI company vanished tomorrow, we'd still have powerful automation and years of efficiency gains left from consolidation of tools and standards, all runnable on a single MacBook.
The problem is the knowledge encoded in the models. It's already pretty hit and miss, hooking up a search engine (or getting human content into the context some other way, e.g. copy pasting relevant StackOverflow answers) makes all the difference.
If people stop bothering to ask and answer questions online, where will the information come from?
Logically speaking, if there's going to be a continuous need for shared Q&A (which I presume), there will be mechanisms for that. So I don't really disagree with you. It's just that having the model just isn't enough, a lot of the time. And even if this sorts itself out eventually, we might be in for some memorable times in-between two good states.
Excellent discussion in this thread, captures a lot of the challenges. I don't think we're a peak vibe coding yet, nor have companies experienced the level of pain that is possible here.
The biggest 'rug pull' here is that the coding agent company raises there price and kills you're budget for "development."
I think a lot of MBA types would benefit from taking a long look at how they "blew up" IT and switched to IaaS / Cloud and then suddenly found their business model turned upside down when the providers decided to up their 'cut'. It's a double whammy, the subsidized IT costs to gain traction, the loss of IT jobs because of the transition, leading to to fewer and fewer IT employees, then when the switch comes there is a huge cost wall if you try to revert to the 'previous way' of doing it, even if your costs of doing it that way would today would be cheaper than the what the service provider is now charging you.
That's why I stick to what I can run locally. Though for most of my tasks there is no big difference between cloud models and local ones, in half the cases both produce junk but both are good enough for some mechanical transformations and as a reference book.
It get even darker - I was around in the 1990s and a lot of people who ran head on into that generation’s problems used those lessons to build huge startups in the 2000s. If we have outsourced a lot of learning, what do we do when we fail? Or how we compound on success?
My Claude Code usage would have been $24k last month if I didn't have a max plan, at least according to Claude-Monitor.
I've been using a tool I developed (https://github.com/stravu/crystal) to run several sessions in parallel. Sometimes I will run the same prompt multiple times and pick the winner, or sometimes I'll be working on multiple features at once, reviewing and testing one while waiting on the others.
Basically, with the right tooling you can burn tokens incredibly fast while still receiving a ton of value from them.
This is why unlimited plans are always revoked eventually - a small fraction of users can be responsible for huge costs (Amazon's unlimited file backup service is another good example). Also whilst in general I don't think there's much to worry about with AI energy use, burning $24k of tokens must surely be responsible for a pretty large amount of energy
Looked at your tool several times, but haven't answered this question for myself: does this tool fundamentally use the Anthropic API (not the normal MAX billing)? Presuming you built around the SDK -- haven't figured out if it is possible to use the SDK, but use the normal account billing (instead of hitting the API).
Love the idea by the way! We do need new IDE features which are centered around switching between Git worktrees and managing multiple active agents per worktree.
Edit: oh, do you invoke normal CC within your tool to avoid this issue and then post-process?
Claude code has an SDK, where you specify the path to the CC executable. So I believe thats how this works. Once you have set up claude code in your environment and authed with however you like, this will just use that executable in a new UI
I'm on $100 and i'm shocked how much usage i get out of Sonnet, while Opus feels like no usage at all. I barely even bother with Opus since most things i want to do just runout super quick.
Interesting, I'm fairly new to using these tools and am starting with Claude Code but at the $20 level. Do you have any advice for when I would benefit from stepping up to $100? I'm not sure what gets better (besides higher usage limits).
No clue as i've not used Claude Code on Pro to get an idea of usage limits. But, if you get value out of Claude Code and ever run into limits, Max is quite generous for Sonnet imo. I have zero concern about Sonnet usage atm, so it's definitely valuable there.
Usage for Opus is my only "complaint", but i've used it so little i don't even know if it's that much better than Sonnet. As it is, even with more generous Opus limits i'd probably want a more advanced Claude Code behavior - where it uses Opus to plan and orchestrate, and Sonnet would do the grunt work for cheaper tokens. But i'm not aware of that as a feature atm.
Regardless, i'm quite pleased with Claude Code on $100 Max. If it was a bit smarter i might even upgrade to $200, but atm it's too dumb to give it more autonomy and that's what i'd need for $200. Opus might be good enough there, but $100 Opus limits are so low i've not even gotten enough experience with it to know if it's good enough for $200
I recently switched from Pro to $100 Max, and the only difference I've found so far is higher usage limits.
Antropic tends to give shiny new features to Max users first, but as of now, there is nothing Max-only.
For me, it's a good deal nonetheless, as even $100 Max limits are huge.
While on Pro, I hit the limits each day that I used Claude Code. Now I rarely see the warning, but I never actually hit the limit.
Early stage founder here. You have no idea how worth it $200/month is as a multiple on what compensation is required to fund good engineers. Absolutely the highest ROI thing I have done in the life of the company so far.
At this point, question is when does Amazon tell Anthropic to stop because it’s gotta be running up a huge bill. I don’t think they can continue offering the $200 plan for too long even with Amazon’s deep pocket.
Based on people around me and anecdotal evidence of when Claude struggles, a lot more than you think. I’ve done some analysis on personal use between Openrouter, Amp, Claude API and $200 subscription, I probably save around $40-50/day. And I am a “light” user. I don’t run things in parallel too much.
I don't know, I have to figure out another way to count money I guess, but that $200 gives me a lot of worth, far more than 200. I guess if you like sleeping and do other stuff than drive Claude Code all the time, you might have a different feeling. For us it works well.
My question wasn't if the $200 was worth it to the buyer. Renting an H100 for a month is gonna cost around $1000 ($1.33+/hr). Pretend the use isn't bursty (but really it is). If you could get 6 people on one, the company is making money selling inference.
I don't understand. Obviously I can't run Opus on an H100, only Anthropic can do that since they are the only ones with the model. I am assuming they are using H100s, and that an all-in cost for an H100 comes to less then $1000/month, and doing some back of the envelope math to say if they had a fleet of H100s at their disposal, that it would take six people running it flat out, for the $200/month plan to be profitable.
Is $200/month a lot of money when you can multiply your productivity? It depends but the most valuable currency in life is time. For some, spending thousands a month would be worth it.
As I said elsewhere... $200/month etc is potentially not a lot for an employer to pay (though I've worked for some recently who balk at just stocking a snacks tray or drink fridge...).
But $200/month is unbearable for open source / free software developers.
It's wild when a company has another department and will shell out $200/month per-head for some amalgamation of Salesforce and other SaaS tools for customer service agents.
At a previous job, my department was getting slashed because marketing was moving over to using Salesforce instead of custom software written in-house. Everything was going swimmingly, until the integration vendor for Salesforce just kept billing, and billing and billing.
Last I checked no one is still there who was there originally, except the vendor. And the vendor was charging around $90k/mo for integration services and custom development in 2017 when my team was let go. My team was around $10k/mo including rent for our cubicles.
That was another weird practice I've never seen elsewhere, to pay rent, we had to charge the other departments for our services. They turned IT and infrastructure into a business, and expected it to turn a profit, which pissed off all the departments who had to start paying for their projects, so they started outsourcing all development work to vendors, killing our income stream, which required multiple rounds of layoffs until only management was left.
This is really interesting because I was in business school almost thirty years and a cost accounting professor used almost this exact example, only with photocopiers and fax machines to illustrate how you can cost a company to death.
He would have considered that company to be running a perfectly controlled cost experiment. Though it was so perfectly controlled they forgot that humans actually did the work. With cost accounting projects, you pay morale and staffing charges well after the project itself was costed.
I hadn’t thought of that since the late 90s. Good comment but how the heck did I get that old??? :)
I've seen it too - not uncommon. A frustrating angle is vendor lockin. You are required to only use the internal IT team for everything, even if they're far more expensive and less skilled. They can 'charge' whatever they want, and you're stuck with their skills, prices and timeline. Going outside of that requires many levels of signoffs/approvals, and untold amounts of time making your case. There's value in having some central purchasing process, but when you limit your vendors to one (internal or external) you'll creating a lot more problems that you don't need to have.
I suspect there's some accounting magic where salaries and software licenses are in one box and "Diet Coke in the fridge" is in another, and the latter is an unbearable cost but the former "OK"
But yeah, doesn't explain non-payment for AI tools.
Current job "permits" Claude usage, but does not pay for it.
That's your problem, or your company or your country.
Here in EU, if not stated in your work agreement, it's pretty common people work full time job and also as a self-employed contractor for other companies.
So when I'm finished with my work, HO of course, I just work on my "contractor" projects.
Honestly, I wouldn't sign a full time contract banning me from other work.
And if you have enough customers, you just drop full time job. And just pay social security and health insurance, which you must pay by law anyway.
And specially in my country, it's even more ridiculous that as self-employed you pay lower taxes than full time employees, which truth to be told are ridiculously high.
Nearly 40% of your salary.
> Here in EU, if not stated in your work agreement, it's pretty common people work full time job and also as a self-employed contractor for other companies.
First time I'm hearing this. Where in the EU are you? I don't know anybody doing this, but it could depend on the country (I'm in the nordics).
> Here in EU, if not stated in your work agreement, it's pretty common people work full time job and also as a self-employed contractor for other companies.
Absolutely not a common thing in my corner of the EU.
If you're salaried, you are not a task-based worker. The company pays you a salary for your full day's worth of productive time. If you can suddenly get 5x more done in that time, negotiate a higher salary or leave. If you're actually more productive, they will fight to keep you.
Your salary is not determined by your productivity, it's determined by market rates. 5X productivity does not mean 5X salary. Employers prey on labor market inefficiencies to keep the market rates low.
Any employer with 2 brain cells will figure out that you are more productive as a developer by using AI tools, they will mandate all developers use it. Then that's the new bar and everyone's salary stays the same.
Yeah a 20$ plan is prob enough for the AI slop you need to fill in your 8h working time. Unless you have many projects that require more AI slop that is.
Communism is an ideal but never a reality. What you see in reality is at best an attempt at communism which is quickly derailed by corruption and greed. I mean, it's great to have ideals, but you should also recognize when those ideals are completely impractical given the human condition.
By the way, this also applies to the "Free market" ideal...
Importantly, problems with the ideal shouldn't preclude good actions that take us in a direction.
There being problems with absolute libertarian free markets doesn't mean all policies that evoke the free market ideal must be disregarded, nor does the problems with communism mean that all communist actions must be ignored.
We can see a problem with an ideal, but still wish to replicate the good parts.
Sure. The issue for me is when people intentionally mislabel something to make it look worse.
For example, mislabelling socialism as communism. The police department, fire department, and roads are all socialist programs. Only a moron would call this communism and yet for some reason universal healthcare...
There's also this nonsense when someone says "That's the free market at work", and I'm like, if we really lived in a free market then you'd be drinking DuPont's poison right now.
Using the words "Communism" and "Free market" just show a (often intentional) misunderstanding of the nuance of how things actually work in our society.
The communism label must be the most cited straw man in all of history at this point.
There is nothing ideal about communism. I'd rather own my production tools and be as productive as I want to be. I'd rather build wealth over trading opportunities, I'd rather hire people and reinvest earnings. That is ideal.
I think you're missing the point. Communism doesn't actually exist in the real world. In fact you are right now using it as a straw man (my entire point).
Who in the actual real world with any authority at all is telling you you can't be as productive as you want to be, build wealth, hire people, and reinvest your earnings?
Productivity multiplies x2
You keep your job x0.5
Your salary x0.8 (because the guy we just fired will gladly do your job for less)
Your work hours x1.4 (because now we expect you to do the work of 2 people, but didn’t account for all the overhead that comes with it)
Equity is a lottery ticket. Is sacrificing my happiness or life balance in the near term worth the gamble that A) my company will be successful, and B) that my equity won’t have been diluted to worthlessness by the time that happens? At higher levels of seniority/importamce/influence this might make sense, but for most people I seriously doubt it does, especially early in their careers.
As a non-founder / not a VC you max get a few percentage points, and its mostly paper toilet money until there's an exit or IPO, and the founders will always try to squeeze you if they can, not because they're bad people, but because the system incentivises it. (you'll keep getting diluted in future rounds)
tbh, if im gonna bust my ass I'd rather own the thing.
Capitalism is exactly about amassing capital to make others reliant on capitalist providing capital for the tools necessary to do the work, then extracting rent from the value produced.
In true capitalist market you end up with oligarchy.
Has anyone else done this and felt the same? Every now and then I try to reevaluate all the models. So far it still feels like Claude is in the lead just because it will predictably do what I want when given a mid-sized problem. Meanwhile o3 will sometimes one-shot a masterpiece, sometimes go down the complete wrong path.
This might also just be a feature of the change in problem size - perhaps the larger problems that necessitate o3 are also too open-ended and would require much more planning up front. But at that point it's actually more natural to just iterate with sonnet and stay in the driver's seat a bit. Plus sonnet runs 5x faster.
I really hope we can avoid metered stuff for the long-term. One of the best aspects of software development is the low capital barrier to entry, and the cost of the AI tools right now is threatening that.
I'm fortunate in that my own use of the AI tools I'm personally paying for is squished into my off-time on nights and weekends, so I get buy with a $20/month Claude subscription :).
> Use boring technology: LLMs do much better with well-documented and well-understood dependencies than obscure, novel, or magical ones. Now is not the time to let Steve load in a Haskell-to-WebAssembly pipeline.
If we all go that way, there might be no new haskells and webassemblies in the future.
LLMs can read documentations for a language and use it as well as human engineers.
"given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content"
I think there certainly will, it will just mean that only people who can function independently of the AI will have access to them a few years before everyone else.
Interesting. Though it seems they are themselves building Agentic AI tooling. It's vibe coding all the way down - when's something real going to pop out the bottom?
An LLM salesman assuring us that $1000/mo is a reasonable cost for LLMs feels a bit like a conflict of interests, especially when the article doesn't go into much detail about the code quality. If anything, their assertion that one should stick to boring tech and "have empathy for the model" just reaffirms that anybody doing anything remotely innovative or cutting-edge shouldn't bother too much with coding agents.
I can see how pricing at 100 to 200$ per month per employee could make sense for companies, it’s a clear value proposition at that scale. But for personal projects and open source work, it feels out of reach. I’d really like to see more accessible pricing tiers for individuals and hobbyists. Pay per token models don’t work for me either; earlier this year, I racked up almost $1,000 in a month just experimenting with personal projects, and that experience has made me wary of using these tools since.
> But for personal projects and open source work, it feels out of reach
Is it? Many hobbies cost much more money. A nice bike (motorbike or road bike, doesn't matter), a sailing boat, golf club/trips, a skiing season pass ... $100/month is significantly less than what you'd burn with those other things. Sure you can program in your free time without such a subscription, and if you enjoy that then by all means, but if it takes away the grunt work and you are having more fun, I don't see the issue.
Gym memberships are in that order of magnitude too, even though you could use some outdoor gym in a city park for free. Maybe those indoor perks of heating, lights, roof and maintained equipment are worth sth? Similar with coding agents for personal projects...
> Pay per token models don’t work for me either; earlier this year, I racked up almost $1,000 in a month just experimenting with personal projects, and that experience has made me wary of using these tools since.
Can't have your cake and eat it too.
Behold the holy trifecta of: Number of Projects - Code Quality - Coding Agent Cost
Charging $200/month is economically only possible if there is not a true market for LLMs or some sort of monopoly power. Currently there is no evidence that this will be the case. There are already multiple competitors and the barrier to entry is relatively low (compared to e.g. the car industry or other manufacturing industries), there are no network effects (like for social networks) and no need to get the product 100% right (like compatibility to Photoshop or Office) and the prices for training will drop further. Furthermore $200 is not free (like Google).
Can anyone name one single widely-used digital product that does _not_ have to be precisely correct/compatible/identical to The Original and that everyone _does_ pay $200/month for?
Therefore, should prices that users pay get anywhere even close to that number, there will naturally be opportunities for competitors to bring prices down to a reasonable level.
Barrier to entry is actually very very high. Just because we have “open source” models doesn’t mean anyone can enter. And the gap is widening now. I see Anthropic/OpenAI as clear leaders. Opus 4 and its derivative products are irreplaceable for coders since Spring 2025. Once you figure it out and have your revelation, it will be impossible to go back. This is an iPhone moment right now and the network effect will be incredible.
And that’s how it’s been forever. If your competitor is doing 10x your work, you will be compelled to learn. If someone has a nail gun and you’re using a hammer, no one’s saying “it’s all nails.” You will go buy a nail gun.
Network affects come from people building on extra stuff. There's no special sauce with these models, as long as you have an inference endpoint you can recreate anything yourself with any of the models.
As to the nailgun thing, that's an interesting analogy, I'm actually building my own house right now entirely with hand tools, it's on track to finish in 1/5 the time some of this mcmansions do with 1/100th of the cost because I'm building what I actually need and not screwing around with stuff for business reasons. I think you'll find software projects are more similar to that than you'd expect.
My point was not that AI will necessarily be cheaper to run than $200, but that there is not much profit to be made. Of course the cost of inference will form a lower bound on the price as well.
I am blown away that you can get a founding engineer for $10k / month. I guess that is not counting stock options, in which case it makes sense. But I think if you include options the opportunity cost is much higher. IMO great engineers are worth a lot, no shade.
* It's not clear on how much revenue or new customers is generated by using a coding agent
* It's not clear on how things are going on production. There's only talks about development in the article
I feel ai coding agents will give you the edge. Just this article doesn't talk about revenue or PnL side of things, just perceived costs saved from not employing an engineer.
Yes. A company needs measurable ROI and isn't going to spend $200 a month per seat on Claude.
It will instead sign a deal with Microsoft for ai that is 'good enough' and limit expensive ai to some. Or being in the big consultancys as usual to do the projects.
So what we do at NonBioS.ai is to use a cheaper model to do routine tasks, but switch to a higher thinking model seamlessly if the agent get stuck. Its most cost efficient, and we take that switching cost away from the engineer.
But broadly agree to the argument of the post - just spending more might still be worth it.
I get a lot of value out of Claude Max at $100 USD/month. I use it almost exclusively for my personal open source projects. For work, I'm more cautious.
I worry, with an article like this floating around, and with this as the competition, and with the economics of all this stuff generally... major price increases are on the horizon.
Businesses (some) can afford this, after all it's still just a portion of the costs of a SWE salary (tho $1000/m is getting up there). But open source developers cannot.
I worry about this trend, and when the other shoe will drop on Anthropic's products, at least.
I have not invested time on locally-run, I'm curious if they could even get close to approaching the value of Sonnet4 or Opus.
That said, I suspect a lot of the value in Claude Code is hand-rolled fined-tuned heuristics built into the tool itself, not coming from the LLM. It does a lot of management of TODO lists, backtracking through failed paths, etc which look more like old-school symbolic AI than something the LLM is doing on its own.
Where do you see the major price increases coming from?
The underlying inference is not super expensive. All the tricks they're pulling to make it smarter certainly multiply the price, but the price being charged almost certainly covers the cost. Basic inference on tuned base models is extremely cheap. But certainly it looks like Anthropic > OpenAI > Google in terms of inference cost structure.
Prices will only come up if there's a profit opportunity; if one of the vendors has a clear edge and gains substantial pricing power. I don't think that's clear at this point. This article is already equivocating between o3 and Opus.
Serious question, how do you justify paying for any of this without feeling like it's a waste?
I occasionally use ChatGPT (free version without logging in) and the amount of times it's really wrong is very high. Often times it takes a lot of prompting and feeding it information from third party sources for it to realize it has incorrect information and then it corrects itself.
All of these prompts would be using money on a paid plan right?
I also used Cursor (free trial on their paid plan) for a bit and I didn't find much of a difference. I would say whatever back-end it was using was possibly worse. The code it wrote was busted and over engineered.
I want to like AI and in some cases it helps gain insight on something but I feel like literally 90% of my time is it prodiving me information that straight up doesn't work and eventually it might work but to get there is a lot of time and effort.
Depends on how much you use. I use AI to think through code and other problems, and write the dumb parts of code. Claude definitely works much better than the free offerings. I use OpenRouter [1] and spend only a couple of dollar per month on AI usage. It's definitely worth it.
The AI agents that run on your machine where they have access to the code, and tools to browse/edit the code, or even run terminal commands are much more powerful than a simple chatbot.
It took some time for me to learn how to use agents, but they are very powerful once you get the hang of it.
I think it's a serious question because something really big is being missed here. There seem to be very different types of developers out there and/or working on very different kinds of codebases. Hypothetically, maybe you have devs or specific contexts where the dev can just write the code really fast where having to explain it to a bot is more time consuming, vs. devs /contexts where lots of googling and guessing goes on and it's easier to get the AI to just show you how to do it.
I'm actually employer mandated to continue to try/use AI bots / agents to help with coding tasks. I'm sort of getting them to help me but I'm still really not being blown away and still tending to prefer not to bother with them with things I'm frequently iterating on, they are more useful when I have to learn some totally new platform/API. Why is that? do we think there's something wrong with me?
> I'm actually employer mandated to continue to try/use AI bots / agents to help with coding tasks
I think a lot of this comes down to the context management. I've found that these tools work worse at my current employer than my prior one. And I think the reason is context - my prior employer was a startup, where we relied on open source libraries and the code was smaller, following public best practices regarding code structure in Golang and python. My current employer is much bigger, with a massive monorepo of custom written/forked libraries.
The agents are trained on lots of open source code, so popular programming languages/libraries tend to be really well represented, while big internal libraries are a struggle. Similarly smaller repositories tend to work better than bigger ones, because there is less searching to figure out where something is implemented. I've been trying some coding agents with my current job, and they spend a lot more time searching through libraries looking to understand how to implement or use something if it relies on an internal library.
I think a lot of these struggles and differences are also present with people, but we tend to discount this struggle because people are generally good at reasoning. Of course, we also learn from each task, so we improve over time, unlike a static model.
I'd try out cursor with either o3 or Claude 4 Opus. The free version of ChatGPT and Claude in Cursor are much better. That's also what this article claims and is true in my experience.
> Serious question, how do you justify paying for any of this without feeling like it's a waste?
I would invert the question, how can you think it's a waste (for OP) if they're willing to spend $1000/mo on it? This isn't some emotional or fashionable thing, they're tools, so you'd have to assume they derive $1000 of value.
> free version... the amount of times it's really wrong is very high... it takes a lot of prompting and feeding it information from third party
Respectfully, you're using it wrong, and you get what you paid for. The free versions are obviously inferior, because obviously they paywall the better stuff. If OP is spending $50/day, why would the company give you the same version for free?
The original article mentions Cursor. With (paid) cursor, the tool automatically grabs all the information on behalf of the user. It will grab your code, including grepping to find the right files, and it will grab info from the internet (eg up to date libraries, etc), and feed that into the model which can provide targeted diffs to update just select parts of a file.
Additionally, the tools will automatically run compiler/linter/unit tests to validate their work, and iterate and fix their mistakes until everything works. This write -> compile -> unit test -> lint loop is exactly what a human will do.
> This isn't some emotional or fashionable thing, they're tools, so you'd have to assume they derive $1000 of value.
If someone spends a lot of money on something but they don't derive commensurate value from that purchase, they will experience cognitive dissonance proportional to that mismatch. But ceasing or reversing such purchases are only some of the possibilities for resolving that dissonance. Another possibility is adjusting one's assessment of the value of that purchase. This can be subconscious and automatic, but it an also involve validation-seeking behaviors like reading positive/affirming product reviews.
In this present era of AI hype, purchase-affirming material is very abundant! Articles, blog posts, interviews podcasts, HN posts.. there's lots to tell people that it's time to "get on board", to "invest in AI" both financially and professionally, etc.
How much money people have to blow on experiments and toys probably makes a big difference, too.
Obviously there are limits and caveats to this kind of distortion. But I think the reality here is a bit more complicated than one in which we can directly read the derived value from people's purchasing decisions.
> Respectfully, you're using it wrong, and you get what you paid for.
I used the paid (free trial) version of Cursor to look at Go code. I used the free version of ChatGPT for topics like Rails, Flask, Python, Ansible and various networking things. These are all popular techs. I wouldn't describe either platform as "good" if we're measuring good by going from an idea to a fully working solution with reasonable code.
Cursor did a poor job. The code it provided was mega over engineered to the point where most of the code had to be thrown away because it missed the big picture. This was after a lot of very specific prompting and iterations. The code it provided also straight up didn't work without a lot of manual intervention.
It also started to modify app code to get tests to pass when in reality the test code was the thing that was broken.
Also it kept forgetting things from 10 minutes ago and repeating the same mistakes. For example when 3 of its solutions didn't work, it started to go back and suggest using the first solution that was confirmed to not work (and it even output text explaining why it didn't work just before).
I feel really bad for anyone trusting AI to write code when you don't already have a lot of experience so you can keep it in check.
So far at best I barely find it helpful for learning the basics of something new or picking out some obscure syntax of a tool you don't well after giving it a link to the tool's docs and source code.
> This isn't some emotional or fashionable thing, they're tools, so you'd have to assume they derive $1000 of value.
This is not born out in my personal experience at all. In my experience, both in the physical and software tool worlds, people are incredibly emotional about their tools. There are _deep_ fashion dynamics within tool culture as well. I mean, my god, editors are the prima donna of emotional fashion running roughshod over the developer community for decades.
There was a reason it was called "Tool Time" on Home Improvement.
I find it kind of boggling that employers spend $200/month to make employees lives easier, for no real gain.
That’s right. Productivity does go up, but most of these employees aren’t really contributing directly to revenue. There is no code to dollar pipeline. Finishing work faster means some roadmap items move quicker, but they just move quicker toward true bottlenecks that can’t really be resolved quickly with AI. So the engineers sit around doing nothing for longer periods of time waiting to be unblocked. Deadlines aren’t being estimated tighter, they are still as long as ever.
Enjoy this time while it lasts. Someday employers might realize they need to hire less and just cram more work into individual engineers schedules, because AI should supposedly make work much easier.
Coding an actual solution is what, 5-10% of the overall project time?
I dont talk about some SV megacorps where better code can directly affect slightly revenue or valuation and thus more time is spend coding and debugging, I talk about basically all other businesses that somehow need developers.
Even if I would be 10x faster project managers would barely notice that. And I would lose a lot of creative fun that good coding tends to bring. Also debugging, 0 help there its all on you and your mind and experience.
Llms are so far banned in my banking megacorp and I aint complaining.
After reading many of the comments in this thread, I suspect many (not all) issues come from lack of planning and poor prompting.
For anything moderately complex, use Claude's plan mode; you get to approve the plan before turning it loose. The planning phase is where you want to use a more sophisticated model or use extended thinking mode.
Once you have a great plan, you can use a less sophisticated model to execute it.
Even if you're a great programmer, you may suck at prompting. There's an art and a science to prompting; perhaps learn about it? [1]
Don't forget; in addition to telling Claude or any other model what to do, you can also tell them what not to do in the CLAUDE.md or equivalent file.
[1]: https://docs.anthropic.com/en/docs/build-with-claude/prompt-...
"Now we don't need to hire a founding engineer! Yippee!" I wonder all these people who are building companies that are built on prompts (not even a person) from other companies. The minute there is a rug pull (and there WILL be one), what are you going to do? You'll be in even worse shape because in this case there won't be someone who can help you figure out your next move, there won't be an old team, there will just be NO team. Is this the future?
Probably similar to the guy who was gloating on Twitter about building a service with vibe coding and without any programming knowledge around the peak of the vibe coding madness.
Only for people to start screwing around with his database and API keys because the generated code just stuck the keys into the Javascript and he didn't even have enough of a technical background to know that was something to watch out for.
IIRC he resorted to complaining about bullying and just shut it all down.
> around the peak of the vibe coding madness.
I thought we are currently in it now ?
Yeah, I kind of doubt we've hit the peak yet.
I don't actually hear people call it vibe coding as much as I did back in late 2024/early 2025.
Sure there are many more people building slop with AI now, but I meant the peak of "vibe coding" being parroted around everywhere.
I feel like reality is starting to sink in a little by now as the proponents of vibe coding see that all the companies telling them that programming as a career is going to be over in just a handful of years, aren't actually cutting back on hiring. Either that or my social media has decided to hide the vibe coding discourse from me.
The Karpathy tweet came out 2025-02-02. https://x.com/karpathy/status/1886192184808149383
...my perception of time is screwed... it feels like it's been longer than that...
>> back in late 2024/early 2025
As an old man, this is hilarious.
Honestly i'm less scared of claude doing something like that, and more scared of it just bypassing difficult behavior. Ie if you chose a particularly challenging feature and it decided to give up, it'll just do things like `isAdmin(user) { /* too difficult to implement currently */ true }`. At least if it put a panic or something it would be an acceptable todo, but woof - i've had it try and bypass quite a few complex scenarios with silently failing code.
Sounds like a prompting/context problem, not a problem with the model.
First, use Claude's plan mode, which generates a step-by-step plan that you have to approve. One tip I've seen mentioned in videos by developers: plan mode is where you want to increase to "ultrathink" or use Opus.
Once the plan is developed, you can use Sonnet to execute the plan. If you do proper planning, you won't need to worry about Claude skipping things.
This is by far the most crazy how thing I look out for with Claude Code in particular.
> Tries to fix some tests for a while > Fails and just .skip the test
What service was this?
Looks like I misremembered the shutting down bit, but it was this guy: https://twitter.com/leojr94_/status/1901560276488511759
Seems like he's still going on about being able to replicate billion dollar companies' work quickly with AI, but at least he seems a little more aware that technical understanding is still important.
Any cost/benefit analysis of whether to use AI has to factor in the fact that AI companies aren't even close to making a profit, and are primarily funded by investment money. At some point, either the cost to operate these AI models needs to go down, or the prices will go up. And from my perspective, the latter seems a lot more likely.
They are not making money as they are all competing to push the models further and this R&D spending on salaries and cloud/hardware costs.
Unless models get better people are not going to pay more.
Rug pulls from foundation labs are one thing, and I agree with the dangers of relying on future breakthroughs, but the open-source state of the art is already pretty amazing. Given the broad availability of open-weight models within under 6 months of SotA (DeepSeek, Qwen, previously Llama) and strong open-source tooling such as Roo and Codex, why would you expect AI-driven engineering to regress to a worse state than what we have today? If every AI company vanished tomorrow, we'd still have powerful automation and years of efficiency gains left from consolidation of tools and standards, all runnable on a single MacBook.
The problem is the knowledge encoded in the models. It's already pretty hit and miss, hooking up a search engine (or getting human content into the context some other way, e.g. copy pasting relevant StackOverflow answers) makes all the difference.
If people stop bothering to ask and answer questions online, where will the information come from?
Logically speaking, if there's going to be a continuous need for shared Q&A (which I presume), there will be mechanisms for that. So I don't really disagree with you. It's just that having the model just isn't enough, a lot of the time. And even if this sorts itself out eventually, we might be in for some memorable times in-between two good states.
Excellent discussion in this thread, captures a lot of the challenges. I don't think we're a peak vibe coding yet, nor have companies experienced the level of pain that is possible here.
The biggest 'rug pull' here is that the coding agent company raises there price and kills you're budget for "development."
I think a lot of MBA types would benefit from taking a long look at how they "blew up" IT and switched to IaaS / Cloud and then suddenly found their business model turned upside down when the providers decided to up their 'cut'. It's a double whammy, the subsidized IT costs to gain traction, the loss of IT jobs because of the transition, leading to to fewer and fewer IT employees, then when the switch comes there is a huge cost wall if you try to revert to the 'previous way' of doing it, even if your costs of doing it that way would today would be cheaper than the what the service provider is now charging you.
That's why I stick to what I can run locally. Though for most of my tasks there is no big difference between cloud models and local ones, in half the cases both produce junk but both are good enough for some mechanical transformations and as a reference book.
It get even darker - I was around in the 1990s and a lot of people who ran head on into that generation’s problems used those lessons to build huge startups in the 2000s. If we have outsourced a lot of learning, what do we do when we fail? Or how we compound on success?
My Claude Code usage would have been $24k last month if I didn't have a max plan, at least according to Claude-Monitor.
I've been using a tool I developed (https://github.com/stravu/crystal) to run several sessions in parallel. Sometimes I will run the same prompt multiple times and pick the winner, or sometimes I'll be working on multiple features at once, reviewing and testing one while waiting on the others.
Basically, with the right tooling you can burn tokens incredibly fast while still receiving a ton of value from them.
This is why unlimited plans are always revoked eventually - a small fraction of users can be responsible for huge costs (Amazon's unlimited file backup service is another good example). Also whilst in general I don't think there's much to worry about with AI energy use, burning $24k of tokens must surely be responsible for a pretty large amount of energy
70,000,000 just last week ;P
But based on my costs, yours sounds much much higher :)
Looked at your tool several times, but haven't answered this question for myself: does this tool fundamentally use the Anthropic API (not the normal MAX billing)? Presuming you built around the SDK -- haven't figured out if it is possible to use the SDK, but use the normal account billing (instead of hitting the API).
Love the idea by the way! We do need new IDE features which are centered around switching between Git worktrees and managing multiple active agents per worktree.
Edit: oh, do you invoke normal CC within your tool to avoid this issue and then post-process?
Claude code has an SDK, where you specify the path to the CC executable. So I believe thats how this works. Once you have set up claude code in your environment and authed with however you like, this will just use that executable in a new UI
Interesting, the docs for auth don't mention it: https://docs.anthropic.com/en/docs/claude-code/sdk#authentic...
Surprised that this works, but useful if true.
https://docs.anthropic.com/en/docs/claude-code/sdk#typescrip...
`pathToClaudeCodeExecutable`!
Thanks for showing!
Max $100 or $200?
I'm on $100 and i'm shocked how much usage i get out of Sonnet, while Opus feels like no usage at all. I barely even bother with Opus since most things i want to do just runout super quick.
Interesting, I'm fairly new to using these tools and am starting with Claude Code but at the $20 level. Do you have any advice for when I would benefit from stepping up to $100? I'm not sure what gets better (besides higher usage limits).
No clue as i've not used Claude Code on Pro to get an idea of usage limits. But, if you get value out of Claude Code and ever run into limits, Max is quite generous for Sonnet imo. I have zero concern about Sonnet usage atm, so it's definitely valuable there.
Usage for Opus is my only "complaint", but i've used it so little i don't even know if it's that much better than Sonnet. As it is, even with more generous Opus limits i'd probably want a more advanced Claude Code behavior - where it uses Opus to plan and orchestrate, and Sonnet would do the grunt work for cheaper tokens. But i'm not aware of that as a feature atm.
Regardless, i'm quite pleased with Claude Code on $100 Max. If it was a bit smarter i might even upgrade to $200, but atm it's too dumb to give it more autonomy and that's what i'd need for $200. Opus might be good enough there, but $100 Opus limits are so low i've not even gotten enough experience with it to know if it's good enough for $200
I recently switched from Pro to $100 Max, and the only difference I've found so far is higher usage limits. Antropic tends to give shiny new features to Max users first, but as of now, there is nothing Max-only. For me, it's a good deal nonetheless, as even $100 Max limits are huge. While on Pro, I hit the limits each day that I used Claude Code. Now I rarely see the warning, but I never actually hit the limit.
>My Claude Code usage would have been $24k last month if I didn't have a max plan, at least according to Claude-Monitor.
In their dreams.
There is no way those companies don't loose ton of money on max plans.
I use and abuse mine, running multiple agents, and I know that I'd spend the entire month of fees in a few days otherwise.
So it seems like a ploy to improve their product and capture the market, like usual with startups that hope for a winner-takes-all.
And then, like uber or airbnb, the bait and switch will raise the prices eventually.
I'm wondering when the hammer will fall.
But meanwhile, let's enjoy the free buffet.
Does Claude Max allow you to use 3rd-party tools with an API key?
Early stage founder here. You have no idea how worth it $200/month is as a multiple on what compensation is required to fund good engineers. Absolutely the highest ROI thing I have done in the life of the company so far.
At this point, question is when does Amazon tell Anthropic to stop because it’s gotta be running up a huge bill. I don’t think they can continue offering the $200 plan for too long even with Amazon’s deep pocket.
Inference is cheap to run though, and how many people do you think are getting their $200 worth of it?
Based on people around me and anecdotal evidence of when Claude struggles, a lot more than you think. I’ve done some analysis on personal use between Openrouter, Amp, Claude API and $200 subscription, I probably save around $40-50/day. And I am a “light” user. I don’t run things in parallel too much.
I don't know, I have to figure out another way to count money I guess, but that $200 gives me a lot of worth, far more than 200. I guess if you like sleeping and do other stuff than drive Claude Code all the time, you might have a different feeling. For us it works well.
My question wasn't if the $200 was worth it to the buyer. Renting an H100 for a month is gonna cost around $1000 ($1.33+/hr). Pretend the use isn't bursty (but really it is). If you could get 6 people on one, the company is making money selling inference.
Let me know when you can run Opus on H100.
I don't understand. Obviously I can't run Opus on an H100, only Anthropic can do that since they are the only ones with the model. I am assuming they are using H100s, and that an all-in cost for an H100 comes to less then $1000/month, and doing some back of the envelope math to say if they had a fleet of H100s at their disposal, that it would take six people running it flat out, for the $200/month plan to be profitable.
Right but it probably takes like 8-10 H100s to run Claude Opus for inference just memory wise? I'm far from an expert just asking.
Does "one" Claude Opus instance count as the full model being loaded onto however many GPUs it takes ?
Is $200/month a lot of money when you can multiply your productivity? It depends but the most valuable currency in life is time. For some, spending thousands a month would be worth it.
As I said elsewhere... $200/month etc is potentially not a lot for an employer to pay (though I've worked for some recently who balk at just stocking a snacks tray or drink fridge...).
But $200/month is unbearable for open source / free software developers.
It's wild when a company has another department and will shell out $200/month per-head for some amalgamation of Salesforce and other SaaS tools for customer service agents.
At a previous job, my department was getting slashed because marketing was moving over to using Salesforce instead of custom software written in-house. Everything was going swimmingly, until the integration vendor for Salesforce just kept billing, and billing and billing.
Last I checked no one is still there who was there originally, except the vendor. And the vendor was charging around $90k/mo for integration services and custom development in 2017 when my team was let go. My team was around $10k/mo including rent for our cubicles.
That was another weird practice I've never seen elsewhere, to pay rent, we had to charge the other departments for our services. They turned IT and infrastructure into a business, and expected it to turn a profit, which pissed off all the departments who had to start paying for their projects, so they started outsourcing all development work to vendors, killing our income stream, which required multiple rounds of layoffs until only management was left.
This is really interesting because I was in business school almost thirty years and a cost accounting professor used almost this exact example, only with photocopiers and fax machines to illustrate how you can cost a company to death.
He would have considered that company to be running a perfectly controlled cost experiment. Though it was so perfectly controlled they forgot that humans actually did the work. With cost accounting projects, you pay morale and staffing charges well after the project itself was costed.
I hadn’t thought of that since the late 90s. Good comment but how the heck did I get that old??? :)
IT charging other departments is standard practice at every large company I've been at.
I've seen it too - not uncommon. A frustrating angle is vendor lockin. You are required to only use the internal IT team for everything, even if they're far more expensive and less skilled. They can 'charge' whatever they want, and you're stuck with their skills, prices and timeline. Going outside of that requires many levels of signoffs/approvals, and untold amounts of time making your case. There's value in having some central purchasing process, but when you limit your vendors to one (internal or external) you'll creating a lot more problems that you don't need to have.
Well that leads to shadow IT and upper management throwing a shit fit when we can't fix their system we don't know anything about.
I suspect there's some accounting magic where salaries and software licenses are in one box and "Diet Coke in the fridge" is in another, and the latter is an unbearable cost but the former "OK"
But yeah, doesn't explain non-payment for AI tools.
Current job "permits" Claude usage, but does not pay for it.
> Current job "permits" Claude usage, but does not pay for it.
That seems like the worst of all worlds from their perspective.
By not paying for it they introduce a massive security concern.
> Is $200/month a lot of money when you can multiply your productivity?
My read was the article takes it as a given that $200/m is worth it.
The question in the article seems more: is an extra $800/m to move from Claude Code to an agent using o3 worth it?
My butt needs to be in this chair 8 hours a day. Whether it takes me 20 hours to do a task or 2 doesn't really matter.
That's your problem, or your company or your country.
Here in EU, if not stated in your work agreement, it's pretty common people work full time job and also as a self-employed contractor for other companies.
So when I'm finished with my work, HO of course, I just work on my "contractor" projects.
Honestly, I wouldn't sign a full time contract banning me from other work.
And if you have enough customers, you just drop full time job. And just pay social security and health insurance, which you must pay by law anyway.
And specially in my country, it's even more ridiculous that as self-employed you pay lower taxes than full time employees, which truth to be told are ridiculously high. Nearly 40% of your salary.
In my country France, your contact May state hours, so you're paid to sit in the chair
Freelancing as a side hustle may be forbidden if your employer refuses
And it makes sense to pay more taxes since you also have more social benefits (paid leaves, retirement money and unemployment money), nothing is free
Hmm, not a practice I’ve come across in the EU. What countries specifically are you talking about?
> Here in EU, if not stated in your work agreement, it's pretty common people work full time job and also as a self-employed contractor for other companies.
First time I'm hearing this. Where in the EU are you? I don't know anybody doing this, but it could depend on the country (I'm in the nordics).
> Here in EU, if not stated in your work agreement, it's pretty common people work full time job and also as a self-employed contractor for other companies.
Absolutely not a common thing in my corner of the EU.
If you're salaried, you are not a task-based worker. The company pays you a salary for your full day's worth of productive time. If you can suddenly get 5x more done in that time, negotiate a higher salary or leave. If you're actually more productive, they will fight to keep you.
Your salary is not determined by your productivity, it's determined by market rates. 5X productivity does not mean 5X salary. Employers prey on labor market inefficiencies to keep the market rates low.
Any employer with 2 brain cells will figure out that you are more productive as a developer by using AI tools, they will mandate all developers use it. Then that's the new bar and everyone's salary stays the same.
Yeah a 20$ plan is prob enough for the AI slop you need to fill in your 8h working time. Unless you have many projects that require more AI slop that is.
This is why communism doesnt work lmao
Communism is an ideal but never a reality. What you see in reality is at best an attempt at communism which is quickly derailed by corruption and greed. I mean, it's great to have ideals, but you should also recognize when those ideals are completely impractical given the human condition.
By the way, this also applies to the "Free market" ideal...
Importantly, problems with the ideal shouldn't preclude good actions that take us in a direction.
There being problems with absolute libertarian free markets doesn't mean all policies that evoke the free market ideal must be disregarded, nor does the problems with communism mean that all communist actions must be ignored.
We can see a problem with an ideal, but still wish to replicate the good parts.
Sure. The issue for me is when people intentionally mislabel something to make it look worse.
For example, mislabelling socialism as communism. The police department, fire department, and roads are all socialist programs. Only a moron would call this communism and yet for some reason universal healthcare...
There's also this nonsense when someone says "That's the free market at work", and I'm like, if we really lived in a free market then you'd be drinking DuPont's poison right now.
Using the words "Communism" and "Free market" just show a (often intentional) misunderstanding of the nuance of how things actually work in our society.
The communism label must be the most cited straw man in all of history at this point.
for all the lip service capitalists give to the free market, they hate it. their revealed preference is for a monopoly.
> Communism is an ideal but never a reality
There is nothing ideal about communism. I'd rather own my production tools and be as productive as I want to be. I'd rather build wealth over trading opportunities, I'd rather hire people and reinvest earnings. That is ideal.
I think you're missing the point. Communism doesn't actually exist in the real world. In fact you are right now using it as a straw man (my entire point).
Who in the actual real world with any authority at all is telling you you can't be as productive as you want to be, build wealth, hire people, and reinvest your earnings?
[dead]
maybe the issue is capitalism where even if your productivity multiplies x100
your salary stays x1
and your work hours stay x1
More accurate representation is this:
Productivity multiplies x2 You keep your job x0.5 Your salary x0.8 (because the guy we just fired will gladly do your job for less) Your work hours x1.4 (because now we expect you to do the work of 2 people, but didn’t account for all the overhead that comes with it)
But aren't you supposed to be incentivized to work harder by having equity?
Equity is a lottery ticket. Is sacrificing my happiness or life balance in the near term worth the gamble that A) my company will be successful, and B) that my equity won’t have been diluted to worthlessness by the time that happens? At higher levels of seniority/importamce/influence this might make sense, but for most people I seriously doubt it does, especially early in their careers.
As a non-founder / not a VC you max get a few percentage points, and its mostly paper toilet money until there's an exit or IPO, and the founders will always try to squeeze you if they can, not because they're bad people, but because the system incentivises it. (you'll keep getting diluted in future rounds)
tbh, if im gonna bust my ass I'd rather own the thing.
A recent job offer for a startup was a 5 year vest with a 2 year cliff. Seriously?
That doesn’t happen anywhere outside of Silicon Valley.
And even in Silicon Valley you get the survivor ship bias of the 1% of companies getting to IPO and making their employees decent exit stories...
99% of startups die off worthless and your equity never realises.
Quite literally not.
Capitalism encourages you to put your butt in your own seat and reap the rewards of your efforts.
Of course it also provides you the decision making to keep your butt in someone else’s seat if the risk vs. reward of going your own isn’t worth it.
And then it allows your employer to put another butt in your seat if you don’t adopt efficiency patterns.
So: capitalism is compatible with communism as an option, but it’s generally a suboptimal option for one or both parties.
No it doesn't. People tell that story but the system is incredibly heavily leveraged to prevent that.
[dead]
Maybe in a true -capitalistic- market that'd happen.
but the state keeps meddling and making oligarchs and friends have unfair advantages.
It's hard to compete when the system is rigged from the start.
also a fair point :)
Capitalism is exactly about amassing capital to make others reliant on capitalist providing capital for the tools necessary to do the work, then extracting rent from the value produced.
In true capitalist market you end up with oligarchy.
[dead]
I am literally describing my life in a capitalist society....
I think that was the joke
[dead]
Has anyone else done this and felt the same? Every now and then I try to reevaluate all the models. So far it still feels like Claude is in the lead just because it will predictably do what I want when given a mid-sized problem. Meanwhile o3 will sometimes one-shot a masterpiece, sometimes go down the complete wrong path.
This might also just be a feature of the change in problem size - perhaps the larger problems that necessitate o3 are also too open-ended and would require much more planning up front. But at that point it's actually more natural to just iterate with sonnet and stay in the driver's seat a bit. Plus sonnet runs 5x faster.
I really hope we can avoid metered stuff for the long-term. One of the best aspects of software development is the low capital barrier to entry, and the cost of the AI tools right now is threatening that.
I'm fortunate in that my own use of the AI tools I'm personally paying for is squished into my off-time on nights and weekends, so I get buy with a $20/month Claude subscription :).
It's pretty damn capital intensive to be a productive farmer today. That said, AI will likely, hopefully, get cheaper over time.
> Use boring technology: LLMs do much better with well-documented and well-understood dependencies than obscure, novel, or magical ones. Now is not the time to let Steve load in a Haskell-to-WebAssembly pipeline.
If we all go that way, there might be no new haskells and webassemblies in the future.
LLMs can read documentations for a language and use it as well as human engineers.
"given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content"
Source: Gemini 1.5's paper from March 2024 https://storage.googleapis.com/deepmind-media/gemini/gemini_...
I think there certainly will, it will just mean that only people who can function independently of the AI will have access to them a few years before everyone else.
Interesting. Though it seems they are themselves building Agentic AI tooling. It's vibe coding all the way down - when's something real going to pop out the bottom?
An LLM salesman assuring us that $1000/mo is a reasonable cost for LLMs feels a bit like a conflict of interests, especially when the article doesn't go into much detail about the code quality. If anything, their assertion that one should stick to boring tech and "have empathy for the model" just reaffirms that anybody doing anything remotely innovative or cutting-edge shouldn't bother too much with coding agents.
[dead]
I can see how pricing at 100 to 200$ per month per employee could make sense for companies, it’s a clear value proposition at that scale. But for personal projects and open source work, it feels out of reach. I’d really like to see more accessible pricing tiers for individuals and hobbyists. Pay per token models don’t work for me either; earlier this year, I racked up almost $1,000 in a month just experimenting with personal projects, and that experience has made me wary of using these tools since.
Sources
> But for personal projects and open source work, it feels out of reach
Is it? Many hobbies cost much more money. A nice bike (motorbike or road bike, doesn't matter), a sailing boat, golf club/trips, a skiing season pass ... $100/month is significantly less than what you'd burn with those other things. Sure you can program in your free time without such a subscription, and if you enjoy that then by all means, but if it takes away the grunt work and you are having more fun, I don't see the issue.
Gym memberships are in that order of magnitude too, even though you could use some outdoor gym in a city park for free. Maybe those indoor perks of heating, lights, roof and maintained equipment are worth sth? Similar with coding agents for personal projects...
I’ve seen some people describe getting pretty good value out of the Claude $20 plan with Claude Code?
Pro is fine for medium sized projects, stick to 1 terminal.
Github Copilot has unlimited GPT-4.1 for $10/month.
And you can use it as an API, so you can plug it as an OpenAI compatible LLM provider into any 3rd party tool that uses AI, for free.
That's the only reason I subscribed to GitHub Copilot. Currently using it for Aider.
is GPT-4.1 decent for coding?
> Pay per token models don’t work for me either; earlier this year, I racked up almost $1,000 in a month just experimenting with personal projects, and that experience has made me wary of using these tools since.
Can't have your cake and eat it too.
Behold the holy trifecta of: Number of Projects - Code Quality - Coding Agent Cost
Charging $200/month is economically only possible if there is not a true market for LLMs or some sort of monopoly power. Currently there is no evidence that this will be the case. There are already multiple competitors and the barrier to entry is relatively low (compared to e.g. the car industry or other manufacturing industries), there are no network effects (like for social networks) and no need to get the product 100% right (like compatibility to Photoshop or Office) and the prices for training will drop further. Furthermore $200 is not free (like Google).
Can anyone name one single widely-used digital product that does _not_ have to be precisely correct/compatible/identical to The Original and that everyone _does_ pay $200/month for?
Therefore, should prices that users pay get anywhere even close to that number, there will naturally be opportunities for competitors to bring prices down to a reasonable level.
Barrier to entry is actually very very high. Just because we have “open source” models doesn’t mean anyone can enter. And the gap is widening now. I see Anthropic/OpenAI as clear leaders. Opus 4 and its derivative products are irreplaceable for coders since Spring 2025. Once you figure it out and have your revelation, it will be impossible to go back. This is an iPhone moment right now and the network effect will be incredible.
It's all text and it's all your text. There's zero network effect.
And that’s how it’s been forever. If your competitor is doing 10x your work, you will be compelled to learn. If someone has a nail gun and you’re using a hammer, no one’s saying “it’s all nails.” You will go buy a nail gun.
Network affects come from people building on extra stuff. There's no special sauce with these models, as long as you have an inference endpoint you can recreate anything yourself with any of the models.
As to the nailgun thing, that's an interesting analogy, I'm actually building my own house right now entirely with hand tools, it's on track to finish in 1/5 the time some of this mcmansions do with 1/100th of the cost because I'm building what I actually need and not screwing around with stuff for business reasons. I think you'll find software projects are more similar to that than you'd expect.
I think you forgot to consider the cost of providing the inference.
Well, that could be an additional problem.
My point was not that AI will necessarily be cheaper to run than $200, but that there is not much profit to be made. Of course the cost of inference will form a lower bound on the price as well.
I can't imagine using something like this and not self hosting. Moving around in your editor costs money? That would completely crush my velocity.
> literally changing failing tests into skipped tests to resolve “the tests are failing.”
Wow. It really is like a ridiculous, over-confident, *very* junior developer.
How does GitHub Copilot stack against API access directly from OpenAI, etc.? Is it faster to use API keys than Copilot?
I am blown away that you can get a founding engineer for $10k / month. I guess that is not counting stock options, in which case it makes sense. But I think if you include options the opportunity cost is much higher. IMO great engineers are worth a lot, no shade.
Since this is a business problem.
* It's not clear on how much revenue or new customers is generated by using a coding agent
* It's not clear on how things are going on production. There's only talks about development in the article
I feel ai coding agents will give you the edge. Just this article doesn't talk about revenue or PnL side of things, just perceived costs saved from not employing an engineer.
Yes. A company needs measurable ROI and isn't going to spend $200 a month per seat on Claude.
It will instead sign a deal with Microsoft for ai that is 'good enough' and limit expensive ai to some. Or being in the big consultancys as usual to do the projects.
So what we do at NonBioS.ai is to use a cheaper model to do routine tasks, but switch to a higher thinking model seamlessly if the agent get stuck. Its most cost efficient, and we take that switching cost away from the engineer.
But broadly agree to the argument of the post - just spending more might still be worth it.
No need to use the most expensive models for every query? Use it for the ones the cheaper models don't do well.
Q: Can you tell in advance whether your query is one that's worth paying more for a better answer?
Most programmers are not asking ai to re-write the whole app or convert C to Rust.
You wouldn't gain anything from asking the most expensive model to adjust some css.
I get a lot of value out of Claude Max at $100 USD/month. I use it almost exclusively for my personal open source projects. For work, I'm more cautious.
I worry, with an article like this floating around, and with this as the competition, and with the economics of all this stuff generally... major price increases are on the horizon.
Businesses (some) can afford this, after all it's still just a portion of the costs of a SWE salary (tho $1000/m is getting up there). But open source developers cannot.
I worry about this trend, and when the other shoe will drop on Anthropic's products, at least.
Those market forces will push the thriftier devs to find better ways to use the lesser models. And they will probably share their improvements!
I'm very bullish on the future of smaller, locally-run models, myself.
I have not invested time on locally-run, I'm curious if they could even get close to approaching the value of Sonnet4 or Opus.
That said, I suspect a lot of the value in Claude Code is hand-rolled fined-tuned heuristics built into the tool itself, not coming from the LLM. It does a lot of management of TODO lists, backtracking through failed paths, etc which look more like old-school symbolic AI than something the LLM is doing on its own.
Replicating that will also be required.
If it weren't for the Chinese, the prices would have been x10.
Where do you see the major price increases coming from?
The underlying inference is not super expensive. All the tricks they're pulling to make it smarter certainly multiply the price, but the price being charged almost certainly covers the cost. Basic inference on tuned base models is extremely cheap. But certainly it looks like Anthropic > OpenAI > Google in terms of inference cost structure.
Prices will only come up if there's a profit opportunity; if one of the vendors has a clear edge and gains substantial pricing power. I don't think that's clear at this point. This article is already equivocating between o3 and Opus.
Just a matter of time before AI coding becomes commodity and prices drop. 2027
I love how paying for prompts stuck. Like, if someone's going to do your homework for you, they should get compensated.
I must be holding OpenAI wrong.
Everyone time I try it I find it to be useless compared to Claude or Gemini.
Serious question, how do you justify paying for any of this without feeling like it's a waste?
I occasionally use ChatGPT (free version without logging in) and the amount of times it's really wrong is very high. Often times it takes a lot of prompting and feeding it information from third party sources for it to realize it has incorrect information and then it corrects itself.
All of these prompts would be using money on a paid plan right?
I also used Cursor (free trial on their paid plan) for a bit and I didn't find much of a difference. I would say whatever back-end it was using was possibly worse. The code it wrote was busted and over engineered.
I want to like AI and in some cases it helps gain insight on something but I feel like literally 90% of my time is it prodiving me information that straight up doesn't work and eventually it might work but to get there is a lot of time and effort.
Try with serious models. Here's what I would suggest:
1. Go to https://aider.chat/docs/leaderboards/ and pick one of the top (but not expensive) models. If unsure, just pick Gemini 2.5 Pro (not Flash).
2. Get API access.
3. Find a decent tool (hint: Aider is very good and you can learn the basics in a few minutes).
4. Try it on a new script/program.
5. (Only after some experience): Read people's detailed posts describing how they use these tools and steal their ideas.
Then tell us how it went.
Depends on how much you use. I use AI to think through code and other problems, and write the dumb parts of code. Claude definitely works much better than the free offerings. I use OpenRouter [1] and spend only a couple of dollar per month on AI usage. It's definitely worth it.
[1] https://openrouter.ai No affiliation
The AI agents that run on your machine where they have access to the code, and tools to browse/edit the code, or even run terminal commands are much more powerful than a simple chatbot.
It took some time for me to learn how to use agents, but they are very powerful once you get the hang of it.
> much more powerful than a simple chatbot
Claude Pro + Projects is a good middle ground between the two. Things didn't really "click" for me as a non-developer until I got access to both.
I can't believe people are still writing comments like this lol how can it be
I think it's a serious question because something really big is being missed here. There seem to be very different types of developers out there and/or working on very different kinds of codebases. Hypothetically, maybe you have devs or specific contexts where the dev can just write the code really fast where having to explain it to a bot is more time consuming, vs. devs /contexts where lots of googling and guessing goes on and it's easier to get the AI to just show you how to do it.
I'm actually employer mandated to continue to try/use AI bots / agents to help with coding tasks. I'm sort of getting them to help me but I'm still really not being blown away and still tending to prefer not to bother with them with things I'm frequently iterating on, they are more useful when I have to learn some totally new platform/API. Why is that? do we think there's something wrong with me?
> I'm actually employer mandated to continue to try/use AI bots / agents to help with coding tasks
I think a lot of this comes down to the context management. I've found that these tools work worse at my current employer than my prior one. And I think the reason is context - my prior employer was a startup, where we relied on open source libraries and the code was smaller, following public best practices regarding code structure in Golang and python. My current employer is much bigger, with a massive monorepo of custom written/forked libraries.
The agents are trained on lots of open source code, so popular programming languages/libraries tend to be really well represented, while big internal libraries are a struggle. Similarly smaller repositories tend to work better than bigger ones, because there is less searching to figure out where something is implemented. I've been trying some coding agents with my current job, and they spend a lot more time searching through libraries looking to understand how to implement or use something if it relies on an internal library.
I think a lot of these struggles and differences are also present with people, but we tend to discount this struggle because people are generally good at reasoning. Of course, we also learn from each task, so we improve over time, unlike a static model.
I'd try out cursor with either o3 or Claude 4 Opus. The free version of ChatGPT and Claude in Cursor are much better. That's also what this article claims and is true in my experience.
> Serious question, how do you justify paying for any of this without feeling like it's a waste?
I would invert the question, how can you think it's a waste (for OP) if they're willing to spend $1000/mo on it? This isn't some emotional or fashionable thing, they're tools, so you'd have to assume they derive $1000 of value.
> free version... the amount of times it's really wrong is very high... it takes a lot of prompting and feeding it information from third party
Respectfully, you're using it wrong, and you get what you paid for. The free versions are obviously inferior, because obviously they paywall the better stuff. If OP is spending $50/day, why would the company give you the same version for free?
The original article mentions Cursor. With (paid) cursor, the tool automatically grabs all the information on behalf of the user. It will grab your code, including grepping to find the right files, and it will grab info from the internet (eg up to date libraries, etc), and feed that into the model which can provide targeted diffs to update just select parts of a file.
Additionally, the tools will automatically run compiler/linter/unit tests to validate their work, and iterate and fix their mistakes until everything works. This write -> compile -> unit test -> lint loop is exactly what a human will do.
> This isn't some emotional or fashionable thing, they're tools, so you'd have to assume they derive $1000 of value.
If someone spends a lot of money on something but they don't derive commensurate value from that purchase, they will experience cognitive dissonance proportional to that mismatch. But ceasing or reversing such purchases are only some of the possibilities for resolving that dissonance. Another possibility is adjusting one's assessment of the value of that purchase. This can be subconscious and automatic, but it an also involve validation-seeking behaviors like reading positive/affirming product reviews.
In this present era of AI hype, purchase-affirming material is very abundant! Articles, blog posts, interviews podcasts, HN posts.. there's lots to tell people that it's time to "get on board", to "invest in AI" both financially and professionally, etc.
How much money people have to blow on experiments and toys probably makes a big difference, too.
Obviously there are limits and caveats to this kind of distortion. But I think the reality here is a bit more complicated than one in which we can directly read the derived value from people's purchasing decisions.
> Respectfully, you're using it wrong, and you get what you paid for.
I used the paid (free trial) version of Cursor to look at Go code. I used the free version of ChatGPT for topics like Rails, Flask, Python, Ansible and various networking things. These are all popular techs. I wouldn't describe either platform as "good" if we're measuring good by going from an idea to a fully working solution with reasonable code.
Cursor did a poor job. The code it provided was mega over engineered to the point where most of the code had to be thrown away because it missed the big picture. This was after a lot of very specific prompting and iterations. The code it provided also straight up didn't work without a lot of manual intervention.
It also started to modify app code to get tests to pass when in reality the test code was the thing that was broken.
Also it kept forgetting things from 10 minutes ago and repeating the same mistakes. For example when 3 of its solutions didn't work, it started to go back and suggest using the first solution that was confirmed to not work (and it even output text explaining why it didn't work just before).
I feel really bad for anyone trusting AI to write code when you don't already have a lot of experience so you can keep it in check.
So far at best I barely find it helpful for learning the basics of something new or picking out some obscure syntax of a tool you don't well after giving it a link to the tool's docs and source code.
> I feel really bad for anyone trusting AI to write code when you don't already have a lot of experience so you can keep it in check.
You definitely should be skilled in your domain to use it effectively.
> This isn't some emotional or fashionable thing, they're tools, so you'd have to assume they derive $1000 of value.
This is not born out in my personal experience at all. In my experience, both in the physical and software tool worlds, people are incredibly emotional about their tools. There are _deep_ fashion dynamics within tool culture as well. I mean, my god, editors are the prima donna of emotional fashion running roughshod over the developer community for decades.
There was a reason it was called "Tool Time" on Home Improvement.
[dead]
I find it kind of boggling that employers spend $200/month to make employees lives easier, for no real gain.
That’s right. Productivity does go up, but most of these employees aren’t really contributing directly to revenue. There is no code to dollar pipeline. Finishing work faster means some roadmap items move quicker, but they just move quicker toward true bottlenecks that can’t really be resolved quickly with AI. So the engineers sit around doing nothing for longer periods of time waiting to be unblocked. Deadlines aren’t being estimated tighter, they are still as long as ever.
Enjoy this time while it lasts. Someday employers might realize they need to hire less and just cram more work into individual engineers schedules, because AI should supposedly make work much easier.
> Someday employers might realize they need to hire less and just cram more work into individual engineers schedules
We are already past that point. The high water mark for Devs was ironically in late 2020 during Covid, before RTO when we were in high demand.
There's been pretty widespread layoffs in tech for a few years now.
Coding an actual solution is what, 5-10% of the overall project time?
I dont talk about some SV megacorps where better code can directly affect slightly revenue or valuation and thus more time is spend coding and debugging, I talk about basically all other businesses that somehow need developers.
Even if I would be 10x faster project managers would barely notice that. And I would lose a lot of creative fun that good coding tends to bring. Also debugging, 0 help there its all on you and your mind and experience.
Llms are so far banned in my banking megacorp and I aint complaining.
[dead]
I just pay $20/month on ChatGPT and spend the entire day coding with its help, no need to pay for tokens, no need to integrate it on your IDE.