Code Co-Pilots Are Mobs of Interns
Rachel Lomasky
·
Chief Data Scientist
January 23, 2025
About this blog
  • Co-Pilots as Interns: AI code co-pilots are great for repetitive, formulaic tasks like unit tests but require supervision, clear instructions, and careful review—similar to managing interns.
  • Effective Guidance: To get the best results, tasks must be broken down, patterns provided, outputs checked for best practices, and feedback given to improve understanding.
  • Task Alignment: Co-pilots excel when used for appropriate tasks, but relying on them for everything can lead to mistakes and inefficiency.
  • Iterative Improvement: Success with co-pilots requires patience, scaffolding questions, and accepting early failures to identify their strengths for your needs.
  • Code co-pilots work quite well… for tasks where you would consider hiring a mob of interns. Treat them like experienced developers, and you will be disappointed in their output. Want to use them to write a ton of unit tests? Sure. Should they architect a large system? Sorry to say, not a job for an intern. The “co” part of “co-pilot” is key. Just like you wouldn’t set an intern loose without supervision, co-pilots need all of their work checked with a particularly critical eye. Co-pilots provide lots of value, especially when used in situations where they can write code, and you then edit their output. This saves time, especially when the code is largely formulaic and tedious to write.

    Again, Interns (and code co-pilots) can be absolutely fabulous, but they need to be set up for success. I give similar advice to working with both.

    1. Give clear expectations of work, and break it down into small tasks.
    2. Check their work after each task to ensure they understood correctly. If not, give clear instructions on where they went right and where there could have been some improvement. If things are wrong, you need to give redirection. Rephrase the request, break the work down even smaller, etc. It may be tempting to just fix it rather than giving feedback, but it’s worth investing the time.
    3. Point out similar patterns to follow. This is often called “multi-shot learning” for code co-pilots, but it’s actually closer to “multi-shot comprehension.” The easiest way to write new code is to follow an existing pattern.
    4. If the desired output is at a higher level, e.g. a module or a repo, decompose the questions you ask. Ensure that the component pieces, e.g. files and classes, are understood and then summarize them. If you give an intern the whole repo, they’ll likely just be lost (as I explored in this previous blog post).
    5. Predefined test cases can help them understand what to write. Do not have them write both the code and the tests that evaluate the code (of course, this is good engineering practice, in general). It’s especially useful if you include edge cases which might have been overlooked by an engineer with less experience.
    6. Ask them to explain why they chose a certain approach. If this doesn’t make much sense, there’s a good chance that the code doesn’t either.

    Of course, you also want to ensure the tasks that you give them, whether human or model, fit their capabilities. Often, people give the LLM statistics and ask the LLM to summarize them in a way that requires repeating information in the inputs. Often, this is just kicking the tires, trying to understand whether they can do basic tasks for which you already know the answer. But there are better tools for this, and it’s a waste of time to use the wrong tools. You want to find the functions defined in a piece of code, use grep or a parser. If you then want to understand what those functions do, using an intern or an LLM is a fine choice. This seems like a statement of the obvious, but often people will use the LLM to do everything, and then become surprised when it misses something or makes it up. And, please, please don’t pass the LLM a list of functions and then ask it to spit it back out. It can only mess it up.

    Questions need to be scaffolded, broken down into smaller pieces to ensure the question is understood and achieve consistency in the results. For example, instead of asking “Describe the purpose of this code,” you ask questions about the code structures, inputs, outputs, etc.  Also, keep distractions to a minimum. Give them examples of what the answer looks like, including the output format you’d like. 

    Even if you follow these rules, sometimes LLMs are just going to mess up. Ruthlessly checking the code—not just for correctness, but also for best practices—is vital. That’s the price you pay for the rest of the productivity. If, for a given task, fixing the messups is making the whole co-pilot experience not worth it, then accept your losses and move on. The key is to have it fail early on these tasks, to find out what it is good at. Ensure that they can’t mess things up too much. For example, if they are in the critical path, there is a policy to review their work (coming from someone who dropped a production table the first week of her internship).

    Here at Flux, we understand this. While you might not want to put in the effort to break down the questions, we do.

    But, also, like with interns, have patience. It’s still early days. 

    Rachel Lomasky
    Chief Data Scientist
    About
    Rachel

    Rachel Lomasky is the Chief Data Scientist at Flux, where she continuously identifies and operationalizes AI so Flux users can understand their codebases. In addition to a PhD in Computer Science, Rachel applies her 15+ years of professional experience to augment generative AI with classic machine learning. She regularly organizes and speaks at AI conferences internationally - keep up with her at her LinkedIn here.

    About Flux
    Flux is more than a static analysis tool - it empowers engineering leaders to triage, interrogate, and understand their team's codebase. Connect with us to learn more about what Flux can do for you, and stay in Flux with our latest info, resources, and blog posts.