This is the start of a few posts about AI coding tools. This one sets the scene and gives a taste of what I found. In later posts I’ll show more of what I got from using them as is. , explore what the sweet spots might be and end with a harder look at getting value out of them.
A long time ago, in a company far, far away…
In the very early 90’s I had the, well, honour is not quite the word I want, the experience may be, to use a prototype of one of the first viable commercial speech-to-text and speech-to-control tools. I used it at work; I used it at home. It was a disaster in all ways. In the first go-around, the training for learning your voice was arduous, literally painful, and demanding. In the second prototype, it would often pick up the sound of the hard disk on the computer and interpret it as text, to much hilarity for all.
Hidden in this experience was something we all do: how much I imbued this software with life. By moving from communicating via keyboard to talking to the computer, I somehow gave the computer sentience. It was a person. I got cross with it, and I swore at it. The act of dictation was a challenge. It was deliberately difficult. Asking it to open files or to print something was a nightmare. Why wouldn’t it understand?
Of course, nowadays, that disconnect in communication is a common experience – we all have frustrations with Alexa or Siri, with whatever voice-activated devices we own. How much do you, even for a moment, consider that these tools are aware and (often) appear to be trying to thwart your desires?
As technologists, our secret desire is often a world where we can live in our favourite sci-fi movies (hopefully without the dangerous robots, etc.). That frequently means we’re the first on the hype curve. We’re early adopters, and we crash and burn when the tech doesn’t live up to the promise that we secretly believed was being offered.
Is that the same with Generative AI?
Are we making the same mistakes again? Are we assuming that this unique technology can, in one giant step, take us to the future we desire? Well, yes, we are. We’re doing the same thing that happened to me years ago, repeating what most of us did when first encountering Siri or Alexa. For some reason, we have moments when we behave as if there is some self-aware being behind the curtain. It’s not just us modern folk who do this. It’s a common theme throughout history. Novel mechanical behaviour – think automatons through the ages, think Chess playing machines and games. If you a) can’t work out how it’s done or b) it behaves like it is thinking – then we believe (even if it’s just a little) that a mind is attached.
How far have we travelled? – is AI for code truly useful?
Undeniably, the new AI technologies available are a leap ahead of what we had before. Generative AI, the general-purpose ChatGPT type of tool, is significantly better than anything we’ve ever seen. The purpose-specific tooling, especially in image generation, seems to have crossed a threshold: telling real from unreal will become a challenge for us all. But regardless of how far along the hype curve we want to think we are, we know that the technology is unlikely to live up to all our inflated expectations.
I’ve glibly said before that AI won’t take your coding job, but a developer using an AI tool might. So far, I think that’s true for the general-purpose ChatGPT style tool, but is it valid for AI tools that purport to understand and create code?
Rather than score these AIs against the hype, against our (my) inflated imagination, I want to be pragmatic and work out what I can do with them, where they offer value, and what to guard against.
Assessing ‘coding’ AI’s
How do you work out if an AI coding tool is any good? Well, we have two use cases that are deeply connected – understanding code and generating code. You might use code generation stand-alone, but more often, I suspect you’ll want to use it in conjunction with existing code. That means you need a tool that can read (and understand ) code and use that knowledge to generate related elements – more code, test cases, documentation, etc.
I had some grand ideas for testing these tools, but in the end, the most straightforward step was to take some of my code and see if I could use these AI tools to do three things for me.
1 – Create the documentation (in this case Javadoc)
2 – Review the code and identify bugs and weaknesses
3 – Generate unit test cases
Which tools to assess?
There are many AI coding tools out there. Some are purely code-understanding tools, but others can generate code. Most of these tools are cloud-based and require some payment (or at least a credit card). For my initial foray into this space, I have no intention of paying for something I can’t try first, and I have a natural aversion to companies that want a credit card for a trial. Before looking at the more specific tooling, I also realised I had to assess the general-purpose AI tools as well to understand where the ‘bar’ could be for the code-specific AIs.
The initial lineup then is two general-purpose AIs and three code-focused AIs
1: ChatGPT https://chatgpt.com/
2: Claude https://claude.ai/
3: Amazon Q Developer https://aws.amazon.com/q/developer/
4: Codeium https://codeium.com/
5: JetBrains AI Assistant https://www.jetbrains.com/ai/
My scenario
As I mentioned in other articles, I’m writing a tool to help with API evolution. In that tool, I have a class called HSClassFile that represents some basic elements of a real Java class. The class is large—about 1000 lines (including comments) and 11 inner classes. It uses the Builder pattern for instantiation.
I intend to ask each tool to do the following.
- “Generate the main Javadoc to describe this class.”
- “Assess the class for bugs and coding weaknesses and provide corrections or guidance.”
- ” Generate Junit test cases to provide around 90% code coverage -especially for edge conditions etc.”
First Results
in alphabetical order, here are screenshots of the Javadoc generated by the tools.
Amazon Q Developer
It’s quite short but accurate. Amazon Q spotted the builder pattern being used and called it out. The example usage is technically correct but missing crucial pieces.
ChatGPT
It is much more comprehensive but also a little inaccurate. My code cannot support byte code analysis, for example. ChatGPT also spotted the builder pattern; the provided sample is similar to the previous one. Technically correct, but still not helpful.
Claude
I found this one to be quite terse. Like the previous examples, the tools seem to get confused between my class and others intended to provide full java classfile manipulation. References to the creation and manipulation of Java class files using this code are incorrect.
CodEium
This tool is the most inaccurate so far. It claims that my code can write classfiles back to disk (not true) The examples contain references to methods that do not exist. There is no read() method, and all the get* methods are fictitious.
Jetbrains AI Assistant
It’s not often Javadoc makes me laugh, but Jetbrain’s AI assistant has decided it knows what the HS stands for. Though my code may not be the best, it’s certainly not part of the “High School code environment.”! This tool also has hallucinated methods – the constructor in the example doesn’t exist, nor do the read() or getClassName() methods. Yet again the tools have decided that my code can perform bytecode analysis without there being any part of my code that reads or stores bytecode.
JavaDoc Summary
So far, there’s little to say about the code AI tools. All of the tools have some level of inaccuracy. I’m disappointed that Codium and JetBrains AI Assistant were wrong about the examples they generated.
At this stage, my money is on ChatGPT. The Javadoc it produced was the most comprehensive and the least inaccurate.
coming up In Part 2
I’ll share the results of the answer to the next question: “Assess the class for bugs and coding weaknesses and provide corrections or guidance.” we’ll see if ChatGPT can stay ahead.