Code co-pilots work quite well… for tasks where you would consider hiring a mob of interns. Treat them like experienced developers, and you will be disappointed in their output. Want to use them to write a ton of unit tests? Sure. Should they architect a large system? Sorry to say, not a job for an intern. The “co” part of “co-pilot” is key. Just like you wouldn’t set an intern loose without supervision, co-pilots need all of their work checked with a particularly critical eye. Co-pilots provide lots of value, especially when used in situations where they can write code, and you then edit their output. This saves time, especially when the code is largely formulaic and tedious to write.
Again, Interns (and code co-pilots) can be absolutely fabulous, but they need to be set up for success. I give similar advice to working with both.
Of course, you also want to ensure the tasks that you give them, whether human or model, fit their capabilities. Often, people give the LLM statistics and ask the LLM to summarize them in a way that requires repeating information in the inputs. Often, this is just kicking the tires, trying to understand whether they can do basic tasks for which you already know the answer. But there are better tools for this, and it’s a waste of time to use the wrong tools. You want to find the functions defined in a piece of code, use grep or a parser. If you then want to understand what those functions do, using an intern or an LLM is a fine choice. This seems like a statement of the obvious, but often people will use the LLM to do everything, and then become surprised when it misses something or makes it up. And, please, please don’t pass the LLM a list of functions and then ask it to spit it back out. It can only mess it up.
Questions need to be scaffolded, broken down into smaller pieces to ensure the question is understood and achieve consistency in the results. For example, instead of asking “Describe the purpose of this code,” you ask questions about the code structures, inputs, outputs, etc. Also, keep distractions to a minimum. Give them examples of what the answer looks like, including the output format you’d like.
Even if you follow these rules, sometimes LLMs are just going to mess up. Ruthlessly checking the code—not just for correctness, but also for best practices—is vital. That’s the price you pay for the rest of the productivity. If, for a given task, fixing the messups is making the whole co-pilot experience not worth it, then accept your losses and move on. The key is to have it fail early on these tasks, to find out what it is good at. Ensure that they can’t mess things up too much. For example, if they are in the critical path, there is a policy to review their work (coming from someone who dropped a production table the first week of her internship).
Here at Flux, we understand this. While you might not want to put in the effort to break down the questions, we do.
But, also, like with interns, have patience. It’s still early days.
Rachel Lomasky is the Chief Data Scientist at Flux, where she continuously identifies and operationalizes AI so Flux users can understand their codebases. In addition to a PhD in Computer Science, Rachel applies her 15+ years of professional experience to augment generative AI with classic machine learning. She regularly organizes and speaks at AI conferences internationally - keep up with her at her LinkedIn here.