Some thoughts on coding with AI

As you probably know AI tools are here and expectations range from “This is going to destroy the environment and make everyone unemployed” to “This is going to usher in a golden age freeing humanity from drudgery”. The reality is of course somewhere between the two and I don’t think we’re really going to know exactly where for many years.

I believe strongly in learning and understanding the tools available to us so that we can make pragmatic choices about what tools to use when. So for quite a while I’ve made various attempts to try using AI for coding. Most of my attempts have involved going to ChatGPT and asking it to write some code for me or answer questions about API specs. And it’s never been a great experience. Generally the code it generated wouldn’t work and it would confidently lie about what specs required. Clearly some people find this useful but I’ve never quite found out why yet.

Then something happened over the past few days that I found interesting. Last week one of the other engineers at Mozilla gave a demo to some of the other technical leads demonstrating OpenAI’s Codex to do some work in a project. It looked interesting. Most problems I deal with involve working across a large enough codebase that previously LLMs would not have enough context to work with. I wanted to give it a try.

I have a side project that is a Rust based DNS server. It has a problem with an edge case that I know about but I have never figured out the best approach to solve. So I added Copilot to VS Code and asked its agent mode to solve it. It dutifully went away, thought about it, and then proposed a patch. I wasn’t quite sure but I applied it and ran the automated tests. A bunch failed. “Hmm” Copilot said and then adjusted its approach. The tests failed again. These weren’t new tests or tests for this specific edge case, but the tests for all the other cases that need to keep passing. I kept suggesting it try something else but it never found a solution, eventually seeming to give up and stop responding. Another failure of AI to help me. I probably need to get better at prompting but that feels like time I could better spend writing code myself.

That wasn’t the interesting thing though, that was the set up. A couple of days later I started a new thing. I needed a script to attempt to parse bank account data out of PDF statements because banks are horrible and never provide decent machine readable access to data. Some do for the last year or if you’re lucky two years. But I was trying to gather together at least five years of data. So I started working on a Python script to do it. And I had forgotten that I had enabled Copilot. Very quickly it started making suggestions for the next line of code I should write. And it was for the most part right. It felt magical. I already knew what code I wanted to write, Copilot was just helping me type it faster. I turned on the next edit suggestions and I found it making good suggestions. I would change the capture groups in a regular expression and it would suggest changing the group numbers I was extracting later in the code. It definitely made some wrong suggestions, but because I had already decided what I was planning on doing I could spot very quickly when the suggestions were right or wrong.

And I think that’s maybe the crucial distinction. Asking AI to implement something from scratch seems doomed to failure. AIs don’t reason, they predict, and unless it has seen your problem before how likely is it to predict the correct solution? Autocomplete on the other hand is by nature a prediction about what I am going to do next. This seems like a perfect fit for LLMs. This was a fairly small script I was working on but it absolutely made me more productive. I would have got there without the AI, but the AI made it faster for me to write the code I had in mind more quickly.

With small enough suggestions I am pretty confident that I can recognise whether the suggestion is right or not. My main complaint is that occasionally it would offer quite a long suggestion which I would dismiss out of hand because I didn’t want to bother checking it over. It would be nice if there were a setting to limit the length of suggestions offered. Whether this continues to be useful in larger projects I don’t yet know, but I’m definitely going to leave it enabled and try.