… because programming languages are the right level of precision for specifying a program you want. Natural language isn’t it. Of course you need to review and edit what it generates. Of course it’s often easier to make the change yourself instead of describing how to make the change.
I wonder if the independent studies that show Copilot increasing the rate of errors in software have anything to do with this less bold attitude. Most people selling AI are predicting the obsolescence of human authors.
Transformers can be used to automate testing, create deeper and broader specification, accelerate greenfield projects, rapidly and precisely expand a developer's knowledge as needed, navigate unfamiliar APIs without relying on reference, build out initial features, do code review and so much more.
Even if code is the right medium for specifying a program, transformers act as an automated interface between that medium and natural language. Modern high-end transformers have no problem producing code, while benefiting from a wealth of knowledge that far surpasses any individual.
> Most people selling AI are predicting the obsolescence of human authors.
It's entirely possible that we do become obsolete for a wide variety of programming domains. That's simply a reality, just as weavers saw massive layoffs in the wake of the automated loom, or scribes lost work after the printing press, or human calculators became pointless after high-precision calculators became commonplace.
This replacement might not happen tomorrow, or next year, or even in the next decade, but it's clear that we are able to build capable models. What remains to be done is R&D around things like hallucinations, accuracy, affordability, etc. as well as tooling and infrastructure built around this new paradigm. But the cat's out of the bag, and we are not returning to a paradigm that doesn't involve intelligent automation in our daily work; programming is literally about automating things and transformers are a massive forward step.
That doesn't really mean anything, though; You can still be as involved in your programming work as you'd like. Whether you can find paid, professional work depends on your domain, skill level and compensation preferences. But you can always program for fun or personal projects, and decide how much or how little automation you use. But I will recommend that you take these tools seriously, and that you aren't too dismissive, or you could find yourself left behind in a rapidly evolving landscape, similarly to the advent of personal computing and the internet.
> Modern high-end transformers have no problem producing code, while benefiting from a wealth of knowledge that far surpasses any individual.
It will also still happily turn your whole codebase into garbage rather than undo the first thing it tried to try something else. I've yet to see one that can back itself out of a logical corner.
This is it for me. If you ask these models to write something new, the result can be okay.
But the second you start iterating with them... the codebase goes to shit, because they never delete code. Never. They always bolt new shit on to solve any problem, even when there's an incredibly obvious path to achieve the same thing in a much more maintainable way with what already exists.
Show me a language model that can turn rube goldberg code into good readable code, and I'll suddenly become very interested in them. Until then, I remain a hater, because they only seem capable of the opposite :)
You'd be surprised what a combination of structured review passes and agent rules (even simple ones such as "please consider whether old code can be phased out") might do to your agentic workflow.
> Show me a language model that can turn rube goldberg code into good readable code, and I'll suddenly become very interested in them.
They can already do this. If you have any specific code examples in mind, I can experiment for you and return my conclusions if it means you'll earnestly try out a modern agentic workflow.
That's a combination of current context limitations and a lack of quality tooling and prompting.
A well-designed agent can absolutely roll back code if given proper context and access to tooling such as git. Even flushing context/message history becomes viable for agents if the functionality is exposed to them.
I don't disagree exactly, but the AI that fully replaces all the programmers is essentially a superhuman one. It's matching human output, but will obviously be able to do some tasks like calculations much faster, and won't need a lunch break.
At that point it's less "programmers will be out of work" as "most work may cease to exist".
> That's simply a reality, just as weavers saw massive layoffs in the wake of the automated loom, or scribes lost work after the printing press, or human calculators became pointless after high-precision calculators became commonplace.
See, this is the kind of conception of a programmer I find completely befuddling. Programming isn't like those jobs at all. There's a reason people who are overly attached to code and see their job as "writing code" are pejoratively called "code monkeys." Did CAD kill the engineer? No. It didn't. The idea is ridiculous.
I'm sure you understand the analogy was about automation and reduction in workforce, and that each of these professions have both commonalities and differences. You should assume good faith and interpret comments on Hacker News in the best reasonable light.
> There's a reason people who are overly attached to code and see their job as "writing code" are pejoratively called "code monkeys."
Strange. My experience is that "code monkeys" don't give a crap about the quality of their code or its impact with regards to the product, which is why they remain programmers and don't move on to roles which incorporate management or product responsibilities. Actually, the people who are "overly attached to code" tend to be computer scientists who are deeply interested in computation and its expression.
> Did CAD kill the engineer? No. It didn't. The idea is ridiculous.
Of course not. It led to a reduction in draftsmen, as now draftsmen can work more quickly and engineers can take on work that used to be done by draftsmen. The US Bureau of Labor Statistics states[0]:
Expected employment decreases will be driven by the use of computer-aided design (CAD) and building information modeling (BIM) technologies. These technologies increase drafter productivity and allow engineers and architects to perform many tasks that used to be done by drafters.
Similarly, the other professions I mentioned were absorbed into higher-level professions. It has been stated many times that the future focus of software engineers will be less about programming and more about product design and management.
I saw this a decade ago at the start of my professional career and from the start have been product and design focused, using code as a tool to get things done. That is not to say that I don't care deeply about computer science, I find coding and product development to each be incredibly creatively rewarding, and I find that a comprehensive understanding of each unlocks an entirely new way to see and act on the world.
My father in law was a draftsman. Lost his job when the defense industry contracted in the '90s. When he was looking for a new job everything required CAD which he had no experience in (he also had a learning disability, it made learning CAD hard).
He couldn't land a job that paid more than minimum wage after that.
Wow, that's a sad story. It really sucks to spend your life mastering a craft and suddenly find it obsolete and your best years behind you. My heart goes out to your father.
This is a phenomenon that seems to be experienced more and more frequently as the industrial revolution continues... The craft of drafting goes back to 2000 B.C.[0] and while techniques and requirements gradually changed over thousands of years, the digital revolution suddenly changed a ton of things all at once in drafting and many other crafts. This created a literacy gap many never recovered from.
I wonder if we'll see a similar split here with engineers and developers regarding agentic and LLM literacy.
Doesn't AI have diminishing returns on it's pseudo creativity? Throw all the training output of LLM into a circle. If all input comes from other LLM output, the circle never grows. Humans constantly step outside the circle.
Perhaps LLM can be modified to step outside the circle, but as of today, it would be akin to monkeys typing.
I think you're either imagining the circle too small or overestimating how often humans step outside it. The typical programming job involves lots and lots of work, and yet none of it creating wholly original computer science. Current LLMs can customize well known UI/IO/CRUD/REST patterns with little difficulty, and these make up the vast majority of commercial software development.
In my experience the best use of AI is to stay in the flow state when you get blocked by an API you don't understand or a feature you don't want to implement for whatever reason.
AI can very efficiently apply common patterns to vast amounts of code, but it has no inherent "idea" of what it's doing.
Here's a fresh example that I stumbled upon just a few hours ago. I needed to refactor some code that first computes the size of a popup, and then separately, the top left corner.
For brevity, one part used an "if", while the other one had a "switch":
I wanted the LLM to refactor it to store the position rather than applying it immediately. Turns out, it just could not handle different things (if vs. switch) doing a similar thing. I tried several variations of prompts, but it very strongly leaning to either have two ifs, or two switches, despite rather explicit instructions not to do so.
It sort of makes sense: once the model has "completed" an if, and then encounters the need for a similar thing, it will pick an "if" again, because, well, it is completing the previous tokens.
Harmless here, but in many slightly less trivial examples, it would just steamroll over nuance and produce code that appears good, but fails in weird ways.
That said, splitting tasks into smaller parts devoid of such ambiguities works really well. Way easier to say "store size in m_StateStorage and apply on render" than manually editing 5 different points in the code. Especially with stuff like Cerebras, that can chew through complex code at several kilobytes per second, expanding simple thoughts faster than you could physically type them.
Just pointing out that there are limits and there’s no reason to believe that models will improve indefinitely at the rates we’ve seen these last couple of years.
There is reason to believe that humans will keep trying to push the limitations of computation and computer science, and that recent advancements will greatly accelerate our ability to research and develop new paradigms.
Look at how well Deepseek performed with the limited, outdated hardware available to its researchers. And look at what demoscene practitioners have accomplished on much older hardware. Even if physical breakthroughs ceased or slowed down considerably, there is still a ton left on the table in terms of software optimization and theory advancement.
And remember just how young computer science is as a field, compared to other human practices that have been around for hundreds of thousands of years. We have so much to figure out, and as knowledge begets more knowledge, we will continue to figure out more things at an increasing pace, even if it requires increasingly large amounts of energy and human capital to make a discovery.
I am confident that if it is at all possible to reach human-level intelligence at least in specific categories of tasks, we're gonna figure it out. The only real question is whether access to energy and resources becomes a bigger problem in the future, given humanity's currently extraordinarily unsustainable path and the risk of nuclear conflict or sustained supply chain disruption.
I am working on a GUI for delegating coding tasks to LLMs, so I routinely experiment with a bunch of models doing all kinds of things. In this case, Claude Sonnet 3.7 handled it just fine, while Llama-3.3-70B just couldn't get it. But that is literally the simplest example that illustrates the problem.
When I tried giving top-notch LLMs harder tasks (scan an abstract syntax tree coming from a parser in a particular way, and generate nodes for particular things) they completely blew it. Didn't even compile, let alone logical errors and missed points. But once I broke down the problem to making lists of relevant parsing contexts, and generating one wrapper class at a time, it saved me a whole ton of work. It took me a day to accomplish what would normally take a week.
Maybe they will figure it out eventually, maybe not. The point is, right now the technology has fundamental limitations, and you are better off knowing how to work around them, rather than blindly trusting the black box.
3) autistic rigidity regarding a single hallucination throwing the whole experience off
4) subconscious anxiety over the threat to their jerbs
5) unnecessary guilt over going against the tide; anything pro AI gets heavily downvoted on Reddit and is, at best, controversial as hell here
I, for one, have shipped like literally a product per day for the last month and it's amazing. Literally 2,000,000+ impressions, paying users, almost 100 sign ups across the various products. I am fucking flying. Hit the front page of Reddit and HN countless times in the last month.
Idk if I break down the prompts better or what. But this is production grade shit and I don't even remember the last time I wrote more than two consecutive lines of code.
If you are launching one product per day, you are using LLMs to convert unrefined ideas into proof-of-concept prototypes. That works really well, that's the kind of work that nobody should be doing by hand anymore.
Except, not all work is like that. Fast-forward to product version 2.34 where a particular customer needs a change that could break 5000 other customers because of non-trivial dependencies between different parts of the design, and you will be rewriting the entire thing by humans or having it collapse under its own weight.
But out of 100 products launched on the market, only 1 or 2 will ever reach that stage, and having 100 LLM prototypes followed by 2 thoughtful redesigns is way better than seeing 98 human-made products die.
The interesting questions happen when you define X, Y and Z and time. For example, will llms be able to solve the P=NP problem in two weeks, 6 months, 5 years, a century? And then exploring why or why not
> AI can very efficiently apply common patterns to vast amounts of code, but it has no inherent "idea" of what it's doing.
AI stands for Artificial Intelligence. There are no inherent limits around what AI can and can't do or comprehend. What you are specifically critiquing is the capability of today's popular models, specifically transformer models, and accompanying tooling. This is a rapidly evolving landscape, and your assertions might no longer be relevant in a month, much less a year or five years. In fact, your criticism might not even be relevant between current models. It's one thing to speak about idiosyncrasies between models, but any broad conclusions drawn outside of a comprehensive multi-model review with strict procedure and controls is to be taken with a massive grain of salt, and one should be careful to avoid authoritative language about capabilities.
It would be useful to be precise in what you are critiquing, so that the critique actually has merit and applicability. Even saying "LLM" is a misnomer, as modern transformer models are multi-modal and trained on much more than just textual language.
What a ridiculous response, to scold the GP for criticising today's AI because tomorrow's might be better. Sure, it might! But it ain't here yet buddy.
Lots of us are interested in technology that's actually available, and we can all read date stamps on comments.
You're projecting that I am scolding OP, but I'm not. My language was neutral and precise. I presented no judgment, but gave OP the tools to better clarify their argument and express valid, actionable criticism instead of wholesale criticizing "AI" in a manner so imprecise as to reduce the relevance and effectiveness of their argument.
> But it ain't here yet buddy . . . we can all read date stamps on comments.
That has no bearing on the general trajectory that we are currently on in computer science and informatics. Additionally, your language is patronizing and dismissive, trading substance for insult. This is generally frowned upon in this community.
You failed to actually address my comment, both by failing to recognize that it was mainly about using the correct terminology instead of criticizing an entire branch of research that extends far beyond transformers or LLMs, and by failing to establish why a rapidly evolving landscape does not mean that certain generalizations cannot yet be made, unless they are presented with several constraints and caveats, which includes not making temporally-invariant claims about capabilities.
I would ask that you reconsider your approach to discourse here, so that we can avoid this thread degenerating into an emotional argument.
> They were obviously not trying to make a sweeping comment about the entire future of the field
OP said “AI can very efficiently apply common patterns to vast amounts of code, but it has no inherent "idea" of what it's doing.”
I'm not going to patronize you by explaining why this is not "very precise", or why its lack of temporal caveats is an issue, as I've already done so in an earlier comment. If you're still confused, you should read the sentence a few times until you understand. OP did not even mention which specific model they tested, and did not provide any specific prompt example.
> Are you using ChatGPT to write your loquacious replies?
If you can't handle a few short paragraphs as a reply, or find it unworthy of your time, you are free to stop arguing. The Hacker News guidelines actually encourage substantive responses.
I also assume that in the future, accusing a user of using ChatGPT will be against site guidelines, so you may as well start phasing that out of your repertoire now.
Here are some highlights from the Hacker News guidelines regarding comments:
- Don't be snarky
- Comments should get more thoughtful and substantive, not less, as a topic gets more divisive.
- Assume good faith
- Please don't post insinuations about astroturfing, shilling, brigading, foreign agents, and the like. It degrades discussion and is usually mistaken.
I had a fun result the other day from Claude. I opened a script in Zed and asked it to "fix the error on line 71".
Claude happily went and fixed the error on line 91....
1. There was no error on line 91, it did some inconsequential formatting on that line
2. More importantly, it just ignored the very specific line I told it to go to. It's like I was playing telephone with the LLM which felt so strange with text-based communication.
This was me trying to get better at using the LLM while coding and seeing if I could "one-shot" some very simple things. Of course me doing this _very_ tiny fix myself would have been faster. Just felt weird and reinforces this idea that the LLM isn't actually thinking at all.
I suspect if OP highlighted line 71 and added it to chat and said fix the error, they’d get a much better response. I assume Cursor could create a tool to help it interpret line numbers, but that’s not how they expect you to use it really.
> This was me trying to get better at using the LLM while coding
And now you've learned that LLMs can't count lines. Next time, try asking it to "fix the error in function XYZ" or copy/paste the line in question, and see if you get better results.
> reinforces this idea that the LLM isn't actually thinking at all.
Of course it's not thinking, how could it? It's just a (rather big) equation.
54 def dicts_to_table_string(
55 headings: List[str], dicts: List[Dict[str, str]]
56 ) -> List[str]:
57 max_lengths = [len(h) for h in headings]
58
59 # Compute maximum length for each column
60 for d in dicts:
That’s not what he’s saying there. There’s a separate tool that adds line numbers before feeding the prompt into the LLM. It’s not the LLM doing it itself.
It's a tool. It's not a human. A line number works great for humans. Today, they're terrible for LLMs.
I can choose to use a screwdriver to hammer in a nail and complain about how useless screwdrivers are. Or I can realize when and how to use it.
We (including marketing & execs) have made a huge mistake in anthropomorphizing these things, because then we stop treating them like tools that have specific use cases to be used in certain ways, and more like smart humans that don't need that.
Maybe one day they'll be there, but today they are screwdrivers. That doesn't make them useless.
AI is still not there yet. For example, I constantly have to deal with mis-referenced documents and specifications. I think part of the problem is that it was trained on outdated materials and is unable to actively update on the fly. I can get around this problem but…
The problem with today’s LLM and GI solutions is that they try to solve all n steps when solving the first i steps would be infinitely more useful for human consumption.
Personally, I define the job of a software engineer as transform requirements into software.
Software is not only code. Requirements are not only natural language.
At the moment I can't manage to be faster with the AI than manually. Unless its a simple task or software.
In my experience AI's are atm junior or mid-level developers. And in the last two years, they didn't get significant better.
Most of the time, the requirements are not spelled out. Nobody even knows what the business logic is supposed to be. A lot of it has to be decided by the software engineer based on available information. It sometimes involve walking around the office asking people things.
It also requires a fair bit of wisdom to know where the software is expected to grow, and how to architect for that eventuality.
I can't picture an LLM doing a fraction of that work.
I think that's my problem with AI. Let's say I have all the requirements, down to the smallest detail. Then I make my decisions at a micro level. Formulate an architecture. Take all the non-functionals into account. I would write a book as a prompt that is not able to express my thoughts as accurately as if I were writing code right away. Apart from the fact that the prompt is generally a superfluous intermediate step in which I struggle to create an exact programming language with an imprecise natural language with a result that is not reproduce-able.
AI is less useful when I'm picking through specific, careful business logic.
But we still have lots of boilerplate code that's required, and it's really good at that. If I need a reasonably good-looking front-end UI, Claude will put that together quickly for me. Or if I need it to use a common algorithm against some data, it's good at applying that.
What I like about AI is that it lets me slide along the automation-specificity spectrum as I go through my work. Sometimes it feels like I've opened up a big stretch of open highway (some standard issue UI work with structures already well-defined) and I can "goose" it. Sometimes there's some tricky logic, and I just know the AI won't do well; it takes just as long to explain the logic as to write it in code. So I hit the brakes, slow down, navigate those tricky parts.
Previously, in the prior attempt to remove the need for engineers, no-code solutions instead trapped people. They pretended like everything is open highway. It's not, which is why they fail. But here we can apply our own judgment and skill in deciding when a lot of "common-pattern" code is required versus the times where specificity and business context is required.
That's also why these LLMs get over-hyped: people over-hype them for the same reason they over-hyped no-code; they don't understand the job to be done, and because of that, they actually don't understand why this is much better than the fanciful capabilities they pretend it has ("it's like an employee and automates everything!"). The tools are much more powerful when an engineer is in the driver's seat, exercising technical judgment on where and how much to use it for every given problem to be solved.
At a certain point, huge prompts (which you can readily find available on Github, used by large LLM-based corps) are just....files of code? Why wouldn't I just write the code myself at that point?
Almost makes me think this "AI coding takeover" is strictly pushed by non-technical people.
One of the most useful properties of computers is that they enable reliable, eminently reproducible automation. Formal languages (like programming languages) not only allow to unambiguously specify the desired automation to the upmost level of precision, they also allow humans to reason about the automation with precision and confidence. Natural language is a poor substitute for that. The ground truth of programs will always be the code, and if humans want to precisely control what a program does, they’ll be best served by understanding, manipulating, and reasoning about the code.
“Manual” has a negative connotation. If I understand the article correctly, they mean “human coding remains key”. It’s not clear to me the GitHub CEO actually used the word “manual”, that would surprise me. Is there another source on this that’s either more neutral or better at choosing accurate words? The last thing we need is to put down human coding as “manual”; human coders have a large toolbox of non-AI tools to automate their coding.
I think so many devs fail to realise that to your product manager / team lead, the interface between you and the LLM is basically the same. They write a ticket/prompt and they get back a bunch of code that undoubtedly has bugs and misinterpretations in it, will probably go through a few rounds of revisions of back and forth until its good enough to ship (ie they tested it black-box style and it worked once) then they can move on to the next thing until whatever this ticket was about rears its ugly head again at some point in the future. If you arent used to writing user stories / planning, youre really gonna be obsolete soon.
My guess is they will settle for 2x the productivity as a before-AI developer as their skill floor, but then not take a look at how long meetings and other processes take.
Why not look at Bob who takes like 2 weeks to write tickets on what they actually want in a feature? Or Alice who's really slow getting Figma designs done and validated? How nice would having a "someone's bothered a developer" metric be and having the business seek to get that to zero and talk very loudly about it as they have about developers?
It's interesting to see a CEO express thoughts on AI and coding go in a slightly different direction.
Usually the CEO or investor says 30% (or some other made up number) of all code is written by AI and the number will only increase, implying that developers will soon be obsolete.
It's implied that 30% of all code submitted and shipped to production is from AI agents with zero human interaction. But of course this is not the case, it's the same developers as before using tools to more rapidly write code.
And writing code is only one part of a developer's job in building software.
He’s probably more right than not. But he also has a vested interest in this (just like the other CEOs who say the opposite), being in the business of human-mediated code.
Presumably you're aware that the full name of Microsoft's Copilot AI code authoring tool is "GitHub Copilot", that GitHub developed it, and that he runs GitHub.
Imo this is a misunderstanding of what AI companies want AI tools to be and where the industry is heading in the near future. The endgame for many companies is SWE automation, not augmentation.
To expand -
1. Models "reason" and can increasingly generate code given natural language. Its not just fancy autocomplete, its like having an intern - mid level engineer at your beck and call to implement some feature. Natural language is generally sufficient enough when I interact with other engineers, why is it not sufficient for an AI, which (in the limit), approaches an actual human engineer?
2. Business wise, companies will not settle for augmentation. Software companies pay tons of money in headcount, its probably most mid-sized companies top or second line item. The endgame for leadership at these companies is to do more with less. This necessitates automation (in addition to augmenting the remaining roles).
People need to stop thinking of LLMs as "autocomplete on steroids" and actually start thinking of them as a "24/7 junior SWE who doesn't need to eat or sleep and can do small tasks at 90% accuracy with some reasonable spec." Yeah you'll need to edit their code once in a while but they also get better and cost less than an actual person.
That is the ideal for the management types and the AI labs themselves yes. Copilots are just a way to test the market, and gain data and adoption early. I don't think it is much of a secret anymore. We even see benchmarks created (e.g. OpenAI recently did one) that are solely about taking paid work away from programmers and how many "paid tasks" it can actually do. They have a benchmark - that's their target.
As a standard pleb though I can understand why this world; where the people with connections, social standing and capital have an advantage isn't appealing to many on this forum. If anyone can do something - other advantages that aren't as easy to acquire matter more relatively.
Too late, I am seeing developer after developer doing copy/paste from AI tools and when asked, they have no idea how the code works coz "it just works"
Google itself said 30% of their code is AI generated, and yet they had a recent outage worldwide, coincidence??
Going by the content of the linked post, this is very much a misleading headline. There is nothing in the quotes that I would read as an endorsement of "manual coding", at least not in the sense that we have used the term "coding" for the past decades.
I think most programmers would agree that thinking represents the majority of our time. Writing code is no different than writing down your thoughts, and that process in itself can be immensely productive -- it can spark new ideas, grant epiphanies, or take you in an entirely new direction altogether. Writing is thinking.
I think an over-reliance, or perhaps any reliance, on AI tools will turn good programmers into slop factories, as they consistently skip over a vital part of creating high-quality software.
You could argue that the prompt == code, but then you are adding an intermediary step between you and the code, and something will always be lost in translation.
I think this misses the point. You're right that programmers still need to think. But you're wrong thinking that AI does not help with that.
With AI, instead of starting with zero and building up, you can start with a result and iterate on it straight away. This process really shines when you have a good idea of what you want to do, and how you want it implemented. In these cases, it is really easy to review the code, because you knew what you wanted it to look like. And so, it lets me implement some basic features in 15 minutes instead of an hour. This is awesome.
For more complex ideas, AI can also be a great idea sparring partner. Claude Code can take a paragraph or two from me, and then generate a 200-800 line planning document fleshing out all the details. That document: 1) helps me to quickly spot roadblocks using my own knowledge, and 2) helps me iterate quickly in the design space. This lets me spend more time thinking about the design of the system. And Claude 4 Opus is near-perfect at taking one of these big planning specifications and implementing it, because the feature is so well specified.
So, the reality is that AI opens up new possible workflows. They aren't always appropriate. Sometimes the process of writing the code yourself and iterating on it is important to helping you build your mental model of a piece of functionality. But a lot of the time, there's no mystery in what I want to write. And in these cases, AI is brilliant at speeding up design and implementation.
Based on your workflow, I think there is considerable risk of you being wooed by AI into believing what you are doing is worthwhile. The plan AI offers is coherent, specific, it sounds good. It's validation. Sugar.
I know the tools and environments I am working in. I verify the implementations I make by testing them. I review everything I am generating.
The idea that AI is going to trick me is absurd. I'm a professional, not some vibe coding script kiddie. I can recognise when the AI makes mistakes.
Have the humility to see that not everyone using AI is someone who doesn't know what they are doing and just clicks accept on every idea from the AI. That's not how this works.
More complaining & pessimism means better signal for teams building the AI coding tools! Keep it up!
The ceiling for AI is not even close to being met. We have to be practical with whats reasonable, but getting 90% complete in a few prompts is magic.
"He warned that depending solely on automated agents could lead to inefficiencies. For instance, spending too much time explaining simple changes in natural language instead of editing the code directly."
Lots of changes where describing them in English takes longer than just performing the change. I think the most effective workflow with AI agents will be a sort of active back-and-forth.
Yeah, I can’t count the number of times I’ve thought about a change, explained it in natural language, pressed enter, then realized I’ve already arrived at the exact change I need to apply just by thinking it through. Oftentimes I even beat the agent at editing it, if it’s a context-heavy change.
How active are you ok with/want? I've just joined an agent tooling startup (yesh...I wrote that huh...) - and it's something we talk a lot about internally, we're thinking it's fine to do back and forth, tell it frankly it's not doing it right, etc, but some stuff might be annoying? Do you have a sense of how this might work to your mind? Thanks! :)
just wait til it starts virtue signalling and posting contrived anecdotes about b2b sales and the benefits of RTO on LinkedIn. the vibe-HR department is gonna have to rein things in fast!
I think these are coordinated posts by Microsoft execs. First their director of product, now this. Its like they're trying to calm the auto coding hype until they catchup and thus keep OpenAI from running away.
Hopefully a CEO finally tempering some expectations and the recent Apple paper bring some sanity back into the discourse around these tools.[^1]
Are they cool and useful? Yes.
Do they reason? No. (Before you complain, please first define reason).
Are they end all be all of all problem solving and the dawn of AGI? Also no.
Once we actually bring some rationality back into the radius of discourse maybe we'll actually begin to start figuring out how these things actually fit into an engineering workflow and stop throwing around ridiculous terms like vibe coding.
If you are an engineer, you are signing up to build rigorously verified and validated system, preferably with some amount of certainty about their behavior under boundary conditions. All the current hype-addled discussion around LLM seems to have had everything but correctness as it's focus.
[^1]: It shouldn't take a CEO but many people, even technologists, who should be more rational about whose opinions they deem worthy of consideration, m seem to overvalue the opinions of the csuite for some bizarre, inexplicable reason.
We've generated a lot of code with claude code recently...then we've had to go back and rationalize it... :) fun times...you absolutely must have a well defined architecture established before using these tools.
It was only 2 years ago we were still taking about GPTs making up completely nonsense, and now hallucinations are almost gone from the discussions. I assume it will get even better, but I also think there is an inherent plateau. Just like how machines solved mass manufacturing work, but we still have factory workers and overseers. Also, "manually" hand crafted pieces like fashion and watches continue to be the most expensive luxury goods. So I don't believe good design architects and consulting will ever be fully replaced.
Hallucinations are now plausibly wrong which is in some ways harder to deal with. GPT4.1 still generates Rust with imaginary crates and says “your tests passed, we can now move on” to a completely failed test run.
It would be nice if he would back that up by increasing focus on quality of life issues on GitHub. GitHub's feature set seems to get very little attention unless it overlaps with Copilot.
For me, too. On the poorly-documented stuff that I get into, where I really need help, it flails.
I wanted some jq script and it made an incomprehensible thing that usually worked. And every fix it made broke something else. It just kept digging a deeper hole.
After a while of that, I bit the bullet and learned jq script. I wrote a solution that was more capable, easier to read, and, importantly, always worked.
The thing is, jq is completely documented. Everything I needed was online. But there just aren't many examples to for LLMs, so they choke.
Start asking it questions about complexity classes and you can get it to contradict itself in no time.
Most of the code in the world right now is very formulaic, and it's great at that. Sorry, WordPress devs.
… because programming languages are the right level of precision for specifying a program you want. Natural language isn’t it. Of course you need to review and edit what it generates. Of course it’s often easier to make the change yourself instead of describing how to make the change.
I wonder if the independent studies that show Copilot increasing the rate of errors in software have anything to do with this less bold attitude. Most people selling AI are predicting the obsolescence of human authors.
Transformers can be used to automate testing, create deeper and broader specification, accelerate greenfield projects, rapidly and precisely expand a developer's knowledge as needed, navigate unfamiliar APIs without relying on reference, build out initial features, do code review and so much more.
Even if code is the right medium for specifying a program, transformers act as an automated interface between that medium and natural language. Modern high-end transformers have no problem producing code, while benefiting from a wealth of knowledge that far surpasses any individual.
> Most people selling AI are predicting the obsolescence of human authors.
It's entirely possible that we do become obsolete for a wide variety of programming domains. That's simply a reality, just as weavers saw massive layoffs in the wake of the automated loom, or scribes lost work after the printing press, or human calculators became pointless after high-precision calculators became commonplace.
This replacement might not happen tomorrow, or next year, or even in the next decade, but it's clear that we are able to build capable models. What remains to be done is R&D around things like hallucinations, accuracy, affordability, etc. as well as tooling and infrastructure built around this new paradigm. But the cat's out of the bag, and we are not returning to a paradigm that doesn't involve intelligent automation in our daily work; programming is literally about automating things and transformers are a massive forward step.
That doesn't really mean anything, though; You can still be as involved in your programming work as you'd like. Whether you can find paid, professional work depends on your domain, skill level and compensation preferences. But you can always program for fun or personal projects, and decide how much or how little automation you use. But I will recommend that you take these tools seriously, and that you aren't too dismissive, or you could find yourself left behind in a rapidly evolving landscape, similarly to the advent of personal computing and the internet.
> Modern high-end transformers have no problem producing code, while benefiting from a wealth of knowledge that far surpasses any individual.
It will also still happily turn your whole codebase into garbage rather than undo the first thing it tried to try something else. I've yet to see one that can back itself out of a logical corner.
This is it for me. If you ask these models to write something new, the result can be okay.
But the second you start iterating with them... the codebase goes to shit, because they never delete code. Never. They always bolt new shit on to solve any problem, even when there's an incredibly obvious path to achieve the same thing in a much more maintainable way with what already exists.
Show me a language model that can turn rube goldberg code into good readable code, and I'll suddenly become very interested in them. Until then, I remain a hater, because they only seem capable of the opposite :)
You'd be surprised what a combination of structured review passes and agent rules (even simple ones such as "please consider whether old code can be phased out") might do to your agentic workflow.
> Show me a language model that can turn rube goldberg code into good readable code, and I'll suddenly become very interested in them.
They can already do this. If you have any specific code examples in mind, I can experiment for you and return my conclusions if it means you'll earnestly try out a modern agentic workflow.
That's a combination of current context limitations and a lack of quality tooling and prompting.
A well-designed agent can absolutely roll back code if given proper context and access to tooling such as git. Even flushing context/message history becomes viable for agents if the functionality is exposed to them.
> It will also still happily turn your whole codebase into garbage rather than undo the first thing it tried to try something else.
That's not true at all.
...
It's only pretending to be happy.
I don't disagree exactly, but the AI that fully replaces all the programmers is essentially a superhuman one. It's matching human output, but will obviously be able to do some tasks like calculations much faster, and won't need a lunch break.
At that point it's less "programmers will be out of work" as "most work may cease to exist".
> That's simply a reality, just as weavers saw massive layoffs in the wake of the automated loom, or scribes lost work after the printing press, or human calculators became pointless after high-precision calculators became commonplace.
See, this is the kind of conception of a programmer I find completely befuddling. Programming isn't like those jobs at all. There's a reason people who are overly attached to code and see their job as "writing code" are pejoratively called "code monkeys." Did CAD kill the engineer? No. It didn't. The idea is ridiculous.
> Programming isn't like those jobs at all
I'm sure you understand the analogy was about automation and reduction in workforce, and that each of these professions have both commonalities and differences. You should assume good faith and interpret comments on Hacker News in the best reasonable light.
> There's a reason people who are overly attached to code and see their job as "writing code" are pejoratively called "code monkeys."
Strange. My experience is that "code monkeys" don't give a crap about the quality of their code or its impact with regards to the product, which is why they remain programmers and don't move on to roles which incorporate management or product responsibilities. Actually, the people who are "overly attached to code" tend to be computer scientists who are deeply interested in computation and its expression.
> Did CAD kill the engineer? No. It didn't. The idea is ridiculous.
Of course not. It led to a reduction in draftsmen, as now draftsmen can work more quickly and engineers can take on work that used to be done by draftsmen. The US Bureau of Labor Statistics states[0]:
Similarly, the other professions I mentioned were absorbed into higher-level professions. It has been stated many times that the future focus of software engineers will be less about programming and more about product design and management.I saw this a decade ago at the start of my professional career and from the start have been product and design focused, using code as a tool to get things done. That is not to say that I don't care deeply about computer science, I find coding and product development to each be incredibly creatively rewarding, and I find that a comprehensive understanding of each unlocks an entirely new way to see and act on the world.
[0] https://www.bls.gov/ooh/architecture-and-engineering/drafter...
My father in law was a draftsman. Lost his job when the defense industry contracted in the '90s. When he was looking for a new job everything required CAD which he had no experience in (he also had a learning disability, it made learning CAD hard).
He couldn't land a job that paid more than minimum wage after that.
Wow, that's a sad story. It really sucks to spend your life mastering a craft and suddenly find it obsolete and your best years behind you. My heart goes out to your father.
This is a phenomenon that seems to be experienced more and more frequently as the industrial revolution continues... The craft of drafting goes back to 2000 B.C.[0] and while techniques and requirements gradually changed over thousands of years, the digital revolution suddenly changed a ton of things all at once in drafting and many other crafts. This created a literacy gap many never recovered from.
I wonder if we'll see a similar split here with engineers and developers regarding agentic and LLM literacy.
[0] https://azon.com/2023/02/16/rare-historical-photos/
Doesn't AI have diminishing returns on it's pseudo creativity? Throw all the training output of LLM into a circle. If all input comes from other LLM output, the circle never grows. Humans constantly step outside the circle.
Perhaps LLM can be modified to step outside the circle, but as of today, it would be akin to monkeys typing.
I think you're either imagining the circle too small or overestimating how often humans step outside it. The typical programming job involves lots and lots of work, and yet none of it creating wholly original computer science. Current LLMs can customize well known UI/IO/CRUD/REST patterns with little difficulty, and these make up the vast majority of commercial software development.
In my experience the best use of AI is to stay in the flow state when you get blocked by an API you don't understand or a feature you don't want to implement for whatever reason.
Right level for exactly specifying program behavior in a global domain without context.
But once you add repo context, domain knowledge etc... programming languages are far too verbose.
AI can very efficiently apply common patterns to vast amounts of code, but it has no inherent "idea" of what it's doing.
Here's a fresh example that I stumbled upon just a few hours ago. I needed to refactor some code that first computes the size of a popup, and then separately, the top left corner.
For brevity, one part used an "if", while the other one had a "switch":
I wanted the LLM to refactor it to store the position rather than applying it immediately. Turns out, it just could not handle different things (if vs. switch) doing a similar thing. I tried several variations of prompts, but it very strongly leaning to either have two ifs, or two switches, despite rather explicit instructions not to do so.It sort of makes sense: once the model has "completed" an if, and then encounters the need for a similar thing, it will pick an "if" again, because, well, it is completing the previous tokens.
Harmless here, but in many slightly less trivial examples, it would just steamroll over nuance and produce code that appears good, but fails in weird ways.
That said, splitting tasks into smaller parts devoid of such ambiguities works really well. Way easier to say "store size in m_StateStorage and apply on render" than manually editing 5 different points in the code. Especially with stuff like Cerebras, that can chew through complex code at several kilobytes per second, expanding simple thoughts faster than you could physically type them.
Yeah that's one model that you happen to be using in June 2025.
Give it to o3 and it could definitely handle that today.
Sweeping generalizations about how LLMs will never be able to do X, Y, or Z coding task will all be proven wrong with time, imo.
Sweeping generalizations about how LLMs will always (someday) be able to do arbitrary X, Y, and Z don't really capture me either
In response to your sweeping generalization, I posit a sweeping generalization of my own, said the bard:
Whatever can be statistically predicted
by the human brain
Will one day also be
statistically predicted by melted sand
Until the day that thermodynamics kicks in.
Or the current strategies to scale across boards instead of chips gets too expensive in terms of cost, capital, and externalities.
I mean fair enough, I probably don't know as much about hardware and physics as you
Just pointing out that there are limits and there’s no reason to believe that models will improve indefinitely at the rates we’ve seen these last couple of years.
There is reason to believe that humans will keep trying to push the limitations of computation and computer science, and that recent advancements will greatly accelerate our ability to research and develop new paradigms.
Look at how well Deepseek performed with the limited, outdated hardware available to its researchers. And look at what demoscene practitioners have accomplished on much older hardware. Even if physical breakthroughs ceased or slowed down considerably, there is still a ton left on the table in terms of software optimization and theory advancement.
And remember just how young computer science is as a field, compared to other human practices that have been around for hundreds of thousands of years. We have so much to figure out, and as knowledge begets more knowledge, we will continue to figure out more things at an increasing pace, even if it requires increasingly large amounts of energy and human capital to make a discovery.
I am confident that if it is at all possible to reach human-level intelligence at least in specific categories of tasks, we're gonna figure it out. The only real question is whether access to energy and resources becomes a bigger problem in the future, given humanity's currently extraordinarily unsustainable path and the risk of nuclear conflict or sustained supply chain disruption.
I am working on a GUI for delegating coding tasks to LLMs, so I routinely experiment with a bunch of models doing all kinds of things. In this case, Claude Sonnet 3.7 handled it just fine, while Llama-3.3-70B just couldn't get it. But that is literally the simplest example that illustrates the problem.
When I tried giving top-notch LLMs harder tasks (scan an abstract syntax tree coming from a parser in a particular way, and generate nodes for particular things) they completely blew it. Didn't even compile, let alone logical errors and missed points. But once I broke down the problem to making lists of relevant parsing contexts, and generating one wrapper class at a time, it saved me a whole ton of work. It took me a day to accomplish what would normally take a week.
Maybe they will figure it out eventually, maybe not. The point is, right now the technology has fundamental limitations, and you are better off knowing how to work around them, rather than blindly trusting the black box.
Yeah exactly.
I think it's a combination of
1) wrong level of granularity in prompting
2) lack of engineering experience
3) autistic rigidity regarding a single hallucination throwing the whole experience off
4) subconscious anxiety over the threat to their jerbs
5) unnecessary guilt over going against the tide; anything pro AI gets heavily downvoted on Reddit and is, at best, controversial as hell here
I, for one, have shipped like literally a product per day for the last month and it's amazing. Literally 2,000,000+ impressions, paying users, almost 100 sign ups across the various products. I am fucking flying. Hit the front page of Reddit and HN countless times in the last month.
Idk if I break down the prompts better or what. But this is production grade shit and I don't even remember the last time I wrote more than two consecutive lines of code.
If you are launching one product per day, you are using LLMs to convert unrefined ideas into proof-of-concept prototypes. That works really well, that's the kind of work that nobody should be doing by hand anymore.
Except, not all work is like that. Fast-forward to product version 2.34 where a particular customer needs a change that could break 5000 other customers because of non-trivial dependencies between different parts of the design, and you will be rewriting the entire thing by humans or having it collapse under its own weight.
But out of 100 products launched on the market, only 1 or 2 will ever reach that stage, and having 100 LLM prototypes followed by 2 thoughtful redesigns is way better than seeing 98 human-made products die.
If you need a model per task, we're very far from AGI.
The interesting questions happen when you define X, Y and Z and time. For example, will llms be able to solve the P=NP problem in two weeks, 6 months, 5 years, a century? And then exploring why or why not
> AI can very efficiently apply common patterns to vast amounts of code, but it has no inherent "idea" of what it's doing.
AI stands for Artificial Intelligence. There are no inherent limits around what AI can and can't do or comprehend. What you are specifically critiquing is the capability of today's popular models, specifically transformer models, and accompanying tooling. This is a rapidly evolving landscape, and your assertions might no longer be relevant in a month, much less a year or five years. In fact, your criticism might not even be relevant between current models. It's one thing to speak about idiosyncrasies between models, but any broad conclusions drawn outside of a comprehensive multi-model review with strict procedure and controls is to be taken with a massive grain of salt, and one should be careful to avoid authoritative language about capabilities.
It would be useful to be precise in what you are critiquing, so that the critique actually has merit and applicability. Even saying "LLM" is a misnomer, as modern transformer models are multi-modal and trained on much more than just textual language.
What a ridiculous response, to scold the GP for criticising today's AI because tomorrow's might be better. Sure, it might! But it ain't here yet buddy.
Lots of us are interested in technology that's actually available, and we can all read date stamps on comments.
You're projecting that I am scolding OP, but I'm not. My language was neutral and precise. I presented no judgment, but gave OP the tools to better clarify their argument and express valid, actionable criticism instead of wholesale criticizing "AI" in a manner so imprecise as to reduce the relevance and effectiveness of their argument.
> But it ain't here yet buddy . . . we can all read date stamps on comments.
That has no bearing on the general trajectory that we are currently on in computer science and informatics. Additionally, your language is patronizing and dismissive, trading substance for insult. This is generally frowned upon in this community.
You failed to actually address my comment, both by failing to recognize that it was mainly about using the correct terminology instead of criticizing an entire branch of research that extends far beyond transformers or LLMs, and by failing to establish why a rapidly evolving landscape does not mean that certain generalizations cannot yet be made, unless they are presented with several constraints and caveats, which includes not making temporally-invariant claims about capabilities.
I would ask that you reconsider your approach to discourse here, so that we can avoid this thread degenerating into an emotional argument.
The GP was very precise in the experience they shared, and I thought it was interesting.
They were obviously not trying to make a sweeping comment about the entire future of the field.
Are you using ChatGPT to write your loquacious replies?
> They were obviously not trying to make a sweeping comment about the entire future of the field
OP said “AI can very efficiently apply common patterns to vast amounts of code, but it has no inherent "idea" of what it's doing.”
I'm not going to patronize you by explaining why this is not "very precise", or why its lack of temporal caveats is an issue, as I've already done so in an earlier comment. If you're still confused, you should read the sentence a few times until you understand. OP did not even mention which specific model they tested, and did not provide any specific prompt example.
> Are you using ChatGPT to write your loquacious replies?
If you can't handle a few short paragraphs as a reply, or find it unworthy of your time, you are free to stop arguing. The Hacker News guidelines actually encourage substantive responses.
I also assume that in the future, accusing a user of using ChatGPT will be against site guidelines, so you may as well start phasing that out of your repertoire now.
Here are some highlights from the Hacker News guidelines regarding comments:
- Don't be snarky
- Comments should get more thoughtful and substantive, not less, as a topic gets more divisive.
- Assume good faith
- Please don't post insinuations about astroturfing, shilling, brigading, foreign agents, and the like. It degrades discussion and is usually mistaken.
https://news.ycombinator.com/newsguidelines.html
I had a fun result the other day from Claude. I opened a script in Zed and asked it to "fix the error on line 71". Claude happily went and fixed the error on line 91....
1. There was no error on line 91, it did some inconsequential formatting on that line 2. More importantly, it just ignored the very specific line I told it to go to. It's like I was playing telephone with the LLM which felt so strange with text-based communication.
This was me trying to get better at using the LLM while coding and seeing if I could "one-shot" some very simple things. Of course me doing this _very_ tiny fix myself would have been faster. Just felt weird and reinforces this idea that the LLM isn't actually thinking at all.
LLMs probably have bad awareness of line numbers
I suspect if OP highlighted line 71 and added it to chat and said fix the error, they’d get a much better response. I assume Cursor could create a tool to help it interpret line numbers, but that’s not how they expect you to use it really.
How is this better from just using a formal language again?
Who said it's better? It's a design choice. Someone can easily write an agent that takes instructions in any language you like.
> This was me trying to get better at using the LLM while coding
And now you've learned that LLMs can't count lines. Next time, try asking it to "fix the error in function XYZ" or copy/paste the line in question, and see if you get better results.
> reinforces this idea that the LLM isn't actually thinking at all.
Of course it's not thinking, how could it? It's just a (rather big) equation.
As shared by Simon in https://news.ycombinator.com/item?id=44176523, a better agent will prepend the line numbers as a workaround, e.g. Claude Code:
That’s not what he’s saying there. There’s a separate tool that adds line numbers before feeding the prompt into the LLM. It’s not the LLM doing it itself.
The separate tool is called the agent.
Sounds like operator error to me.
You need to give LLMs context. Line number isn't good context.
> Line number isn't good context.
a line number is plenty of context - it's directly translatable into a range of bytes/characters in the file
It's a tool. It's not a human. A line number works great for humans. Today, they're terrible for LLMs.
I can choose to use a screwdriver to hammer in a nail and complain about how useless screwdrivers are. Or I can realize when and how to use it.
We (including marketing & execs) have made a huge mistake in anthropomorphizing these things, because then we stop treating them like tools that have specific use cases to be used in certain ways, and more like smart humans that don't need that.
Maybe one day they'll be there, but today they are screwdrivers. That doesn't make them useless.
AI is still not there yet. For example, I constantly have to deal with mis-referenced documents and specifications. I think part of the problem is that it was trained on outdated materials and is unable to actively update on the fly. I can get around this problem but…
The problem with today’s LLM and GI solutions is that they try to solve all n steps when solving the first i steps would be infinitely more useful for human consumption.
Personally, I define the job of a software engineer as transform requirements into software. Software is not only code. Requirements are not only natural language. At the moment I can't manage to be faster with the AI than manually. Unless its a simple task or software. In my experience AI's are atm junior or mid-level developers. And in the last two years, they didn't get significant better.
Most of the time, the requirements are not spelled out. Nobody even knows what the business logic is supposed to be. A lot of it has to be decided by the software engineer based on available information. It sometimes involve walking around the office asking people things.
It also requires a fair bit of wisdom to know where the software is expected to grow, and how to architect for that eventuality.
I can't picture an LLM doing a fraction of that work.
I think that's my problem with AI. Let's say I have all the requirements, down to the smallest detail. Then I make my decisions at a micro level. Formulate an architecture. Take all the non-functionals into account. I would write a book as a prompt that is not able to express my thoughts as accurately as if I were writing code right away. Apart from the fact that the prompt is generally a superfluous intermediate step in which I struggle to create an exact programming language with an imprecise natural language with a result that is not reproduce-able.
AI is less useful when I'm picking through specific, careful business logic.
But we still have lots of boilerplate code that's required, and it's really good at that. If I need a reasonably good-looking front-end UI, Claude will put that together quickly for me. Or if I need it to use a common algorithm against some data, it's good at applying that.
What I like about AI is that it lets me slide along the automation-specificity spectrum as I go through my work. Sometimes it feels like I've opened up a big stretch of open highway (some standard issue UI work with structures already well-defined) and I can "goose" it. Sometimes there's some tricky logic, and I just know the AI won't do well; it takes just as long to explain the logic as to write it in code. So I hit the brakes, slow down, navigate those tricky parts.
Previously, in the prior attempt to remove the need for engineers, no-code solutions instead trapped people. They pretended like everything is open highway. It's not, which is why they fail. But here we can apply our own judgment and skill in deciding when a lot of "common-pattern" code is required versus the times where specificity and business context is required.
That's also why these LLMs get over-hyped: people over-hype them for the same reason they over-hyped no-code; they don't understand the job to be done, and because of that, they actually don't understand why this is much better than the fanciful capabilities they pretend it has ("it's like an employee and automates everything!"). The tools are much more powerful when an engineer is in the driver's seat, exercising technical judgment on where and how much to use it for every given problem to be solved.
This is a really good point.
At a certain point, huge prompts (which you can readily find available on Github, used by large LLM-based corps) are just....files of code? Why wouldn't I just write the code myself at that point?
Almost makes me think this "AI coding takeover" is strictly pushed by non-technical people.
One of the most useful properties of computers is that they enable reliable, eminently reproducible automation. Formal languages (like programming languages) not only allow to unambiguously specify the desired automation to the upmost level of precision, they also allow humans to reason about the automation with precision and confidence. Natural language is a poor substitute for that. The ground truth of programs will always be the code, and if humans want to precisely control what a program does, they’ll be best served by understanding, manipulating, and reasoning about the code.
“Manual” has a negative connotation. If I understand the article correctly, they mean “human coding remains key”. It’s not clear to me the GitHub CEO actually used the word “manual”, that would surprise me. Is there another source on this that’s either more neutral or better at choosing accurate words? The last thing we need is to put down human coding as “manual”; human coders have a large toolbox of non-AI tools to automate their coding.
(Wow I sound triggered! sigh)
It's almost as bad as "manual" thinking!
What is the distinction between manual coding and human coding?
Analog coding
Acoustic coding
> Wow I sound triggered! sigh
this is okay! it's a sign of your humanity :)
How about “organic coding”? ;)
>Manual” has a negative connotation. If I understand the article correctly, they mean “human coding remains key”.
A man is a human.
Humanual coding? ;)
“Manual” comes from Latin manus, meaning “hand”: https://en.wiktionary.org/wiki/manus. It literally means “by hand”.
I think so many devs fail to realise that to your product manager / team lead, the interface between you and the LLM is basically the same. They write a ticket/prompt and they get back a bunch of code that undoubtedly has bugs and misinterpretations in it, will probably go through a few rounds of revisions of back and forth until its good enough to ship (ie they tested it black-box style and it worked once) then they can move on to the next thing until whatever this ticket was about rears its ugly head again at some point in the future. If you arent used to writing user stories / planning, youre really gonna be obsolete soon.
My guess is they will settle for 2x the productivity as a before-AI developer as their skill floor, but then not take a look at how long meetings and other processes take.
Why not look at Bob who takes like 2 weeks to write tickets on what they actually want in a feature? Or Alice who's really slow getting Figma designs done and validated? How nice would having a "someone's bothered a developer" metric be and having the business seek to get that to zero and talk very loudly about it as they have about developers?
It's interesting to see a CEO express thoughts on AI and coding go in a slightly different direction.
Usually the CEO or investor says 30% (or some other made up number) of all code is written by AI and the number will only increase, implying that developers will soon be obsolete.
It's implied that 30% of all code submitted and shipped to production is from AI agents with zero human interaction. But of course this is not the case, it's the same developers as before using tools to more rapidly write code.
And writing code is only one part of a developer's job in building software.
He’s probably more right than not. But he also has a vested interest in this (just like the other CEOs who say the opposite), being in the business of human-mediated code.
Presumably you're aware that the full name of Microsoft's Copilot AI code authoring tool is "GitHub Copilot", that GitHub developed it, and that he runs GitHub.
Yea, which is why I was surprised too when he said this.
Copilot. Not Pilot.
Well, I suspect GitHub's income is a function of the number of developers using it so it is not surprising that he takes this position.
Imo this is a misunderstanding of what AI companies want AI tools to be and where the industry is heading in the near future. The endgame for many companies is SWE automation, not augmentation.
To expand -
1. Models "reason" and can increasingly generate code given natural language. Its not just fancy autocomplete, its like having an intern - mid level engineer at your beck and call to implement some feature. Natural language is generally sufficient enough when I interact with other engineers, why is it not sufficient for an AI, which (in the limit), approaches an actual human engineer?
2. Business wise, companies will not settle for augmentation. Software companies pay tons of money in headcount, its probably most mid-sized companies top or second line item. The endgame for leadership at these companies is to do more with less. This necessitates automation (in addition to augmenting the remaining roles).
People need to stop thinking of LLMs as "autocomplete on steroids" and actually start thinking of them as a "24/7 junior SWE who doesn't need to eat or sleep and can do small tasks at 90% accuracy with some reasonable spec." Yeah you'll need to edit their code once in a while but they also get better and cost less than an actual person.
That is the ideal for the management types and the AI labs themselves yes. Copilots are just a way to test the market, and gain data and adoption early. I don't think it is much of a secret anymore. We even see benchmarks created (e.g. OpenAI recently did one) that are solely about taking paid work away from programmers and how many "paid tasks" it can actually do. They have a benchmark - that's their target.
As a standard pleb though I can understand why this world; where the people with connections, social standing and capital have an advantage isn't appealing to many on this forum. If anyone can do something - other advantages that aren't as easy to acquire matter more relatively.
Folks who believe this are going to lose a lot of money fixing broken software and shipping products that piss off their users.
Good for outsourcing companies. India used Y2K to build its IT sector and lift up its economy, hopefully Africa can do the same fixing AI slop.
Too late, I am seeing developer after developer doing copy/paste from AI tools and when asked, they have no idea how the code works coz "it just works"
Google itself said 30% of their code is AI generated, and yet they had a recent outage worldwide, coincidence??
You tell me.
Going by the content of the linked post, this is very much a misleading headline. There is nothing in the quotes that I would read as an endorsement of "manual coding", at least not in the sense that we have used the term "coding" for the past decades.
I think most programmers would agree that thinking represents the majority of our time. Writing code is no different than writing down your thoughts, and that process in itself can be immensely productive -- it can spark new ideas, grant epiphanies, or take you in an entirely new direction altogether. Writing is thinking.
I think an over-reliance, or perhaps any reliance, on AI tools will turn good programmers into slop factories, as they consistently skip over a vital part of creating high-quality software.
You could argue that the prompt == code, but then you are adding an intermediary step between you and the code, and something will always be lost in translation.
I'd say just write the code.
I think this misses the point. You're right that programmers still need to think. But you're wrong thinking that AI does not help with that.
With AI, instead of starting with zero and building up, you can start with a result and iterate on it straight away. This process really shines when you have a good idea of what you want to do, and how you want it implemented. In these cases, it is really easy to review the code, because you knew what you wanted it to look like. And so, it lets me implement some basic features in 15 minutes instead of an hour. This is awesome.
For more complex ideas, AI can also be a great idea sparring partner. Claude Code can take a paragraph or two from me, and then generate a 200-800 line planning document fleshing out all the details. That document: 1) helps me to quickly spot roadblocks using my own knowledge, and 2) helps me iterate quickly in the design space. This lets me spend more time thinking about the design of the system. And Claude 4 Opus is near-perfect at taking one of these big planning specifications and implementing it, because the feature is so well specified.
So, the reality is that AI opens up new possible workflows. They aren't always appropriate. Sometimes the process of writing the code yourself and iterating on it is important to helping you build your mental model of a piece of functionality. But a lot of the time, there's no mystery in what I want to write. And in these cases, AI is brilliant at speeding up design and implementation.
Based on your workflow, I think there is considerable risk of you being wooed by AI into believing what you are doing is worthwhile. The plan AI offers is coherent, specific, it sounds good. It's validation. Sugar.
That is a very weak excuse to avoid these tools.
I know the tools and environments I am working in. I verify the implementations I make by testing them. I review everything I am generating.
The idea that AI is going to trick me is absurd. I'm a professional, not some vibe coding script kiddie. I can recognise when the AI makes mistakes.
Have the humility to see that not everyone using AI is someone who doesn't know what they are doing and just clicks accept on every idea from the AI. That's not how this works.
More complaining & pessimism means better signal for teams building the AI coding tools! Keep it up! The ceiling for AI is not even close to being met. We have to be practical with whats reasonable, but getting 90% complete in a few prompts is magic.
> In an appearance on “The MAD Podcast with Matt Turck,” Dohmke said that
> Source: The Times of India
what in the recycled content is this trash?
"He warned that depending solely on automated agents could lead to inefficiencies. For instance, spending too much time explaining simple changes in natural language instead of editing the code directly."
Lots of changes where describing them in English takes longer than just performing the change. I think the most effective workflow with AI agents will be a sort of active back-and-forth.
Yeah, I can’t count the number of times I’ve thought about a change, explained it in natural language, pressed enter, then realized I’ve already arrived at the exact change I need to apply just by thinking it through. Oftentimes I even beat the agent at editing it, if it’s a context-heavy change.
Rubber duck. I’ve kept one on my desk for over a decade. It was also like a dollar, which is more than I’ve spent on LLMs. :)
https://en.m.wikipedia.org/wiki/Rubber_duck_debugging
How active are you ok with/want? I've just joined an agent tooling startup (yesh...I wrote that huh...) - and it's something we talk a lot about internally, we're thinking it's fine to do back and forth, tell it frankly it's not doing it right, etc, but some stuff might be annoying? Do you have a sense of how this might work to your mind? Thanks! :)
CEOs are possibly the last person you should listen to on any given subject.
I'm just vibe-CEOing my new company. It's amazing how productive it is!
just wait til it starts virtue signalling and posting contrived anecdotes about b2b sales and the benefits of RTO on LinkedIn. the vibe-HR department is gonna have to rein things in fast!
So do we ignore the Github CEO or all the other ones?
Code monkeys that doesn't understand the limits of LLMs and can't solve problems where the LLM fails are not needed in the world of tomorrow.
Why wouldn't your boss ask ChatGPT directly?
This seems to be an AI summary of a (not linked) podcast.
I think these are coordinated posts by Microsoft execs. First their director of product, now this. Its like they're trying to calm the auto coding hype until they catchup and thus keep OpenAI from running away.
Microsoft is the largest shareholder in OpenAI.
Amazingly, so does air and water. What AI salesman could have predicted this?
Hopefully a CEO finally tempering some expectations and the recent Apple paper bring some sanity back into the discourse around these tools.[^1]
Are they cool and useful? Yes.
Do they reason? No. (Before you complain, please first define reason).
Are they end all be all of all problem solving and the dawn of AGI? Also no.
Once we actually bring some rationality back into the radius of discourse maybe we'll actually begin to start figuring out how these things actually fit into an engineering workflow and stop throwing around ridiculous terms like vibe coding.
If you are an engineer, you are signing up to build rigorously verified and validated system, preferably with some amount of certainty about their behavior under boundary conditions. All the current hype-addled discussion around LLM seems to have had everything but correctness as it's focus.
[^1]: It shouldn't take a CEO but many people, even technologists, who should be more rational about whose opinions they deem worthy of consideration, m seem to overvalue the opinions of the csuite for some bizarre, inexplicable reason.
Not gonna lie, first time I've heard of manual coding.
I get a 403 forbidden error when trying to view the page. Anyone else get that?
We've generated a lot of code with claude code recently...then we've had to go back and rationalize it... :) fun times...you absolutely must have a well defined architecture established before using these tools.
It was only 2 years ago we were still taking about GPTs making up completely nonsense, and now hallucinations are almost gone from the discussions. I assume it will get even better, but I also think there is an inherent plateau. Just like how machines solved mass manufacturing work, but we still have factory workers and overseers. Also, "manually" hand crafted pieces like fashion and watches continue to be the most expensive luxury goods. So I don't believe good design architects and consulting will ever be fully replaced.
Hallucinations are now plausibly wrong which is in some ways harder to deal with. GPT4.1 still generates Rust with imaginary crates and says “your tests passed, we can now move on” to a completely failed test run.
It would be nice if he would back that up by increasing focus on quality of life issues on GitHub. GitHub's feature set seems to get very little attention unless it overlaps with Copilot.
Water’s wet, fire’s hot. News at 5.
That's him out the Illuminati then.
I wonder how much coding he does and how does he know which code is human written and which by machine.
No fucking shit Sherlock.
[dead]
AI is good for boilerplate, suggestions, nothing more.
For you, perhaps.
For me, too. On the poorly-documented stuff that I get into, where I really need help, it flails.
I wanted some jq script and it made an incomprehensible thing that usually worked. And every fix it made broke something else. It just kept digging a deeper hole.
After a while of that, I bit the bullet and learned jq script. I wrote a solution that was more capable, easier to read, and, importantly, always worked.
The thing is, jq is completely documented. Everything I needed was online. But there just aren't many examples to for LLMs, so they choke.
Start asking it questions about complexity classes and you can get it to contradict itself in no time.
Most of the code in the world right now is very formulaic, and it's great at that. Sorry, WordPress devs.
It's a powerful tool, but it's not that powerful.