A visual chronicle of the JDK’s journey

Richard Gross

The Java Development Kit (JDK) has undergone significant transformations since its inception. From it’s “write-once, run everywhere” beginnings, over the applet wars, through the first release of the OpenJDK to the new release cadence, that has kept us occupied for the last eight years already. In 30 years a lot has happened.

Most of that journey is chronicled only in, arguably, dry textual representation. The text tells us how many features were added, changed or, very seldomly, removed. It does not tell us what impact that had on the code. What were the key architectural changes, what were the refinements and how were the complexities of such a large code base handled?

We can visualise this journey by mapping the JDK. There are multiple tools that can do so. For this article we are using CodeCharta because it is free, open-source, and maintained since 2017. I am also very familiar with it, because I helped develop it.

The code base we’ll analyse is the OpenJDK. It is the reference implementation of Java SE since version 10 (2018) and is available on GitHub since version 16 (2021). The first OpenJDK was version 7 (2011) but we can go back quite a bit further, since the first commit was December 1st 2007. We cannot however go back to JDK 1 (1996). In addition I am not a member or affiliated with the developers of the JDK. Most insights are based on dry textual representation, educated guesses and a nice visualisation.

JDK 7

With CodeCharta we can visualise the JDK as a 3D city map. Each file turns into building. The size of the building (✥) represents the real lines of code (RloC), that is the lines that are not comments or whitespace and that we actually have to read to understand the code. In the following map packages that belong to sun are marked in purple, java standard extensions (such as Swing) are marked in cyan while standard java is marked in blue. The red taint is given to commonly used java.lang, java.nio and java.util packages for size comparison. Additionally we are only colouring files whose names include the keyword *test*.

JDK 7 (filtered for *test*)

We can already see that even in 2011 Java is massive with 3,4 million lines. At the same time it is probably also not very well covered with tests, as the jdk/test package contains only 283K lines, or 12% of the total code. Later JDK versions will show us a much higher percentage. Additionally the sun-specific code is quite large and its presence is creating problems for the JDK people to this day. Finally we can also see a folder labelled hotspot. This is the java virtual machine (JVM) that executes the java code and fulfils the promise “write once, run anywhere”. It is written mostly in C++ and has been the default VM since version 1.3 in 2000.

JDK 7 was supposed to contain two massive changes: anonymous functions, aka lambdas, as well as a modularised structure. But in 2010 a plan B was formed to release a smaller JDK 7 first. A JDK that is rather light on features. Which is why the map only focuses on the rough structure.

JDK 8

JDK 8, released in 2014, has jumped significantly in size to 5,2 million lines. We have zoomed in a bit to take a look at two new added features.

JDK 8 (zoomed in)

Streams (10k lines, red tint), the functional interfaces necessary for anonymous functions (373 lines, tiny red tint to the right of the purple block) and java.time (19k lines). Interestingly the tests for streams have a good 1:1 relation (11k lines) but java.time requires a lot of tests (45k lines). To me this just shows how hard dealing with time actually is and how many cases the tests have to cover.

These features also show the limits of our analysis capabilities. Adding anonymous functions to the JDK involved much more than writing 373 lines of functional interfaces but the required changes are much more widespread and not discernible without first-hand knowledge.

JDK 9+10

JDK 9, released September 2017, grew again and reached 7,3M lines. It’s most ambitious feature was that the JDK now brought JPMS (Java Platform Module System) and the JDK was modularised with it. This modularisation changed the structure completely. Some restructuring, particularly those concerning the hotspot VM, were not completed with version 9 but left to the next version. Perhaps appropriate as 10 was also the version that had a particular enhancement: “consolidate the JDK forest into a single repository”. Since the modularisation structure is stable with JDK 10 (March 2018) we’ll target it with our next map.

JDK 8+10 with respective folder structure

At 7,2 million lines, version 10 is actually a bit smaller than 9 but still 2 million lines larger than 8. Most of that growth is supported with tests though. They now amount to 29% of the code base. We can see this relation clearly because tests have become a top-level concern.

The src folder is even more interesting than the test growth however. What was once a large monolith has been split into 20 java modules and 37 jdk modules. These modules have received a purpose-based grouping. UI-frameworks like AWT and Swing have been grouped into a java.desktop module. A java.base module exist that contains just the code that all other modules rely upon. It prominently contains collections, math, time and I/O (Input/Output).

Even jdk-internal concerns have received their own modules. The Nashorn JavaScript engine (yes this JDK can execute ECMAScript 5.1 code) is one such module. The Java code that hotspot uses was also given modules such as jdk.hotspot or jdk.internal.vm.compiler. Previously those were located in the hotspot package. The remaining 738k lines in the hotspot package are written in C++.

But that is just how the code was structured. The utilised module system provided another, much larger benefit. The internals of all the new modules were no longer accessible from the outside. All the module developers had to guarantee was that the exposed API of each module remained stable, leaving them free to change all the internal implementation details without worrying it might break someones code.

Some code was not hidden to the public though, even though the JDK maintainers would have liked it. One file in particular enables writing performance-critical but unsafe code. The appropriately named class sun.misc.Unsafe is to this day part of the JDK. Replacing it with viable but safe alternatives has been a challenge that is still not complete. 79 out of 87 methods are finally deprecated with JDK 23 (September 2023) though. Finally removing Unsafe is part of an ongoing effort the developers are calling “Integrity by default”. The idea is that “developers expect that their code and data is protected against use that is unwanted or unwise.”

JDK 11

JDK 11 was released September 2018. It and its two predecessors are part of a new release cadence where a new JDK is released every six months. Instead of tying JDKs to a specific feature set, and having endless meetings about key feature progress or lack thereof, the maintainers are now following a release-train model. Every six months the train leaves. The features that are ready get released. If your feature is not ready then do not stress about it. You have another train six months later.

With the release of a new JDK, the previous version won’t receive any support any more. But upgrading the JDK every six months is not something large enterprises want. They want their stable feature set that only gets security or bug patches, not risky new features. This is where so-called long-term support (LTS) releases come in. Every couple of years, the JDK releases a new JDK that it promises to support for years with fixes but without adding features. The JDK maintainers call this the tip & tail model and they suggest library maintainers should follow it as well.

JDK 11 is such a LTS release. The free support from Oracle ended September 2023, five years after it was released (other vendors such as Adoptium still provide free support though). This gives enterprises plenty of time to use the stable LTS-version before having to upgrade to the next version.

From a feature perspective, LTS releases are not treated differently to the short-term releases. Both are governed by the JDK Enhancement-Proposal (JEP). Such enhancement do not have to be final, they can also be:

  • Experimental: Test-bed mechanism used to gather feedback on nontrivial HotSpot enhancements. Experimental JEPs need to be enabled individually via JVM options -XX:+UnlockExperimentalVMOptions
  • Incubating: Non-final API or tool, intended to gather early feedback. Incubating JEPs need to be added individually via JVM options --enable-preview --add-modules jdk.incubator.xyz
  • Preview: API or tool intended to be final, still want to gather feedback. Preview JEPs are not enabled individually but all or nothing via JVM option --enable-preview.

Since LTS releases are not treated differently from non-LTS releases, they can also contain non-final enhancements. Such is the case for version 11, which contains an experimental garbage collector called ZGC (Z for short). The following map shows the new garbage collector but also some code removals. A JEP does not have to be a new API or tool, it can also be the removal of one.

JDK 11 vs 10

This map compares JDK 10 with JDK 11. Green buildings show files that have more lines than in version 10, red buildings have less lines. At 8 million lines version 11 has increased again. The tests now amount to 36% of the code base. Because we zoomed in a bit and gained more floor labels, we can now also see that the tests are not just one big ball of tests. Like the src folder, the tests are split into folders depending on if they target the jdk, hotspot, nashorn etc. The hotspot tests have gotten a huge boost with a lot of files added (the vmTestbase folder).

The jdk.localedata has also seen a huge boost. It contains the xml files of the Unicode Common Locale Data Repository (CLDR). The CLDR “provides key building blocks for software to support the world’s languages, with the largest and most extensive standard repository of locale data available”. The JDK uses it “to format dates, times, currencies, languages, countries, and time zones in the standard Java APIs”. The jdk.localedata takes up quite a lot of space with its 1,1 million lines. But since those lines are mostly xml-files, we’ll generally exclude it in future maps to focus on code changes.

In the map we can also see that some modules have not gotten a boost but were instead removed. JEP 320 delivered the removal of Java EE and CORBA modules. They were removed because maintaining them was no longer worth the effort compared to how little the features were used. In total this means the developers no longer have to maintain 357k lines.

One thing that was only seemingly removed was the incubator.httpclient. In actuality it was moved, modified and standardised into the module java.net.http. This process is special for incubating modules. They always start out separate from standardised modules. Depending on the progress they can later turn into preview features or immediately become stable like the http client. They might also be removed during incubating phase, if they do not provide enough value. This process is different for experimental hotspot JEPs. These do not start out as separate modules. The new scalable garbage collector “Z” is located directly in the hotspot code, even though it is not stable. The fact that hotspot is C++ code and cannot be controlled by the JPMS could have something to do with this.

JDK 12 to 17

Following the release of version 11 we see 5 more releases before September 2021 gives us the next “big” release in the form of JDK 17. Big here only means that 17 has LTS status. Which means enterprise developers can finally use long-standardised language features like switch expressions, records, sealed classes, sealed interfaces and pattern matching for instance of. The maintainers are clearly making continuous changes to the JDK. This becomes even more obvious when we map the hotspot and java.base areas of the JDK. Every release these areas increase (Δ+) or decrease (Δ-) by several thousand lines of code. In fact almost every file changes as can be seen in the following collage.

Hotspot&java.base Collage: JDK 12 to 17

That java.base can change so much, is already amazing. java.base is the module we all depend on and it is 586k lines big. That almost all of hotspot is changed with every release is even more impressive. These are the 806k lines that run the world. Well, a big part of the world at least. And no one has noticed all the internal changes.

But why change the JVM at all? Two reasons immediately come to mind. First, there are significant performance improvements happening in hotspot all the time. Depending on your use case, version 17 can be up to 20% faster than version 11. Second, hotspot has to change to accommodate new enhancements. In particular new language features such as records.

At this level of detail it is however hard to locate these new language features, at least without prior knowledge. It is very likely they had an effect on the hotspot module. The 104k lines of the jdk.compiler were also changed with every release. How much they had to change just for records is hard to say though. No obvious class called CompileRecord.java exists in there. Records might also be used in java.base and lead to code reductions. In short language changes are harder to pinpoint and visualize, as they do not create obvious structure changes. So although I’d like to talk more about them, I currently do not have the knowledge to investigate further.

JDK 18 to 21

After version 17 the time until the next LTS was shortened to every 2 years, down from every 3 years. Which is why September 2023 marks the LTS release of JDK 21. By this point the maintainers have really adopted the idea to gather feedback in production before stabilising enhancements. Both version 19 and 20 had no user-visible JEP. Instead the enhancement were all experimental, incubating or preview. After multiple rounds of feedback however, many of these enhancements were stabilised with JDK 21. Language-wise we got stable record patterns and pattern matching for switch. Two features that paved the way for what the maintainers call data-oriented programming. It is an amazing new way to program in Java.

What we also got after two previews was virtual Threads. Virtual threads are an alternative to the classic platform threads. They are very useful to write high-throughput concurrent applications when the tasks to be done are I/O-bound (e.g. they interact a lot with the network). They are also a drop-in replacement of platform threads as they actually use the same API. A great thing for sure but that makes finding the necessary modifications harder again. We can imagine that supporting virtual threads required quite heavy modification to hotspot. But there is no package called virtual or virtualthread. We can search for *Thread*, *Executor*, *Continuation* and indeed we see a few modifications. But quite few compared to what was most likely changed. In lang the class Thread.java received 377 additional lines, VirtualThread.java with 778 lines was added as well as ThreadBuilders.java with 357 lines. In util.concurrent the class Executors.java received another 20 new lines. In internal.vm a new file called Continuation.java was added. And of course hotspot received a huge update, but at this level we cannot isolate what change was motivated by virtual threads.

Virtual Threads: JDK 18 vs 21

An enhancement where we can isolate the changes (or at least I think we can) is the new foreign function & memory API. It allows java programmers to safely invoke foreign functions (code outside the JVM) and access foreign memory (memory not managed by the JVM) and supersedes the JNI (Java Native Interface).

Foreign Function & Memory API: JDK 17 to 22

The FF&M API started out as an incubator and was made a preview feature in 19. We can see that the incubator code was moved, changed and made available as preview. Interestingly the bulk of its logic is internal to the JDK (11k lines in internal/foreign). Only a select few of the classes are visible to the outside world (1k lines in lang/foreign). It is quite hard to see this tiny stretch of exposed code even though we zoomed in on java.base. This structure stayed the same over the following previews. The code was optimised with each release until it was stabilised with JDK 22, a non-LTS release. It is obvious that the JDK maintainers take their time to get it right. They don’t rush to hit a release train, even a long-term support train. In turn I don’t split the FF&M analysis into this chapter and the one where it was actually stabilised but place it where it fits best 🙂

JDK 22 to 25

JDK 25, with an expected release date of September 2025, is not just the latest LTS release. It is also the first time where release version and release year match. This has only whimsical importance but since we are celebrating 30 years of java why not add another fun fact to the list.

As of time of writing it is unclear what enhancements will be stable in JDK 25 but we can guess. Virtual threads will most likely be improved further. Version 24 already provided synchronize virtual threads without pinning. Perhaps we’ll see scoped values in 25, the easier to reason about replacements for thread-local variables. We won’t see structured concurrency being stabilised, as a fifth preview is planned for JDK 25.

We will probably see some new stable language features. Such as Primitive Types in Patterns, instanceof, and switch, Flexible Constructor Bodies, Module Import Declarations and Simple Source Files and Instance Main Methods. The latter JEP would however be great to have in 25, since it is “paving the on-ramp”.

The idea of “paving the on-ramp” is to make it easier for people to learn java. Currently writing a simple HelloWorld requires learning oh so many keywords and concepts just to print two words to the screen. It would be great if we could reduce our class HelloWorld { public static void main ... to just the essentials: void main() { println("Hello World"); }. This is exactly what the JEP does. The other concepts are not gone but they can be introduced gradually. Whether or not that is finalised with 25 is speculation though. The maintainers do not let arbitrary LTS deadlines tie them down.

What is not speculation is that JDK 24 delivered one enhancement that has been needed for at least 10 years, the Class File API. Up to its release, the JDK provided no official way to process the byte code it generated. It could of course always execute the byte code, but there was no official way to parse, generate and transform the class files that contain the byte code. This is something that frameworks often do to add functionality. It is also something that JDK does, for example to support lambda expressions at run time.

In the past the JDK has bundled it’s own version of the popular open-source bytecode processor ASM to process their own bytecode. This creates multiple problems. As described by the JEP for the Class File API, one of them is a vicious circle: “The ASM version for JDK N cannot finalize until after JDK N finalizes, so tools in JDK N cannot handle class-file features that are new in JDK N, which means javac cannot safely emit class-file features which are new in JDK N until JDK N+1.” When you pair this problem with a 6-month release cadence, it becomes very complicated for the maintainers. Arguably even more so for framework and the ASM developers to keep up. All the more reason to have not only an API for the JDK internally but also for users of the JDK.

Class File API: JDK 23 vs 24

Like the FF&M API, the largest chunk of code is internal and thus hidden. The Class File API also wants to keep the exposed API surface as small as possible. At 6k lines the visible API is still quite large. At same time it is an achievement because without the module system it would have been 4 times as large. The other 21k lines are located in internal/classfile but they do not add to the API surface. The visible API and the internal logic together are roughly the size of ASM. Just based on size, we can surmise that the new API has roughly the same features as the bundled ASM. And indeed, the plan is to eventually remove this copy from the JDK.

The Journey so far

The JDK is now almost 30 years old. We started this visual journey with JDK 7 in 2011 at the half-way point. Back then the JDK was 3,4 million lines heavy and only 12% of those were tests. With JDK 9 (September 2017) and 10 (March 2018) the JDK was changed to the new structure that has been kept until today. Version 10 contained 7,2 million lines and 29% of them were tests. We are now at JDK 24 with a release date of March 2025. The code today amounts to 9,4 million lines. 40% of them are tests. Without coverage data there is of course no way to say how that translates to path coverage. In my experience the src/test code split is healthy at roughly 50:50 though.

JDK 10+24

When we zoom in on the 4,3 million lines of src code we can see the familiar modules. java.desktop amounts to 25% of src. Today we mostly rely on that code to render IntelliJ, Eclipse and other IDEs. Another 22% go to hotspot, 15% is java.base and 6% are taken up by java.xml. All modules that were already around in JDK 10. The big newcomer is the incubating module for vector operations that takes up 12% of all src code.

In short, the JDK is massive. A fact that is also reflected in the size of files which we coloured in the map. Files above 500 lines are large and yellow, anything above 1500 lines is very large and red (note that we are counting the real lines of code, empty lines or JavaDoc comments are excluded). The maintainers do not share the same size sentiment. I think a big reason for that is that the JDK has to have a lot of convenience code to make the usage nicer. The List.java interface has some 50 methods for all the various interactions you might want to do. Arrays.java has above 100 methods which brings in 2k lines.

Even by these numbers the big blocks of red code in the center of incubator.vector are special. These are assembly files for linux and windows with 3k to 20k lines each. 402 thousand lines in total. I have no idea if they are hand-written or generated from some other source. It would be strange to have generated code committed to version control. I do hope they are all generated though because writing them by hand and keeping them in sync would be… challenging.

What the sum of these files do is clear. They support the new vector operations. Vector operations are in itself nothing special. Multiplying vectors and matrixes are common math operations after all. It is so common that CPU architectures often provide extensions that can make vector operations significantly faster. Hotspot even has a feature to auto-vectorize appropriate code but this is happening without programmer intent.

The Vector API now makes these operations accessible to performance critical code and it has been incubated nine times already. The first incubator was delivered with JDK 16 in March 2021. In the last couple of releases the assembly code has not changed but the java API receives additions to almost every file in almost every release. The maintainers are still working on the API to make it as clear and concise as possible. They will continue to do so until “necessary features of Project Valhalla become available as preview features”. But what is Valhalla and when will it be available?

The road ahead

Project Valhalla‘s goal is to introduce value objects into Java that “code like a class, work like an int”. This would allow objects with the performance characteristics of primitives (i.e. fast) but having the full modelling possibilities of classes. The performance is also the reason why the Vector API wants to build on the work of Valhalla.

If these value objects existed in the JDK, then there would be no difference between a primitive and a value object. We could unify the type system. For instance boxing/autoboxing from int to Integer and back would be a thing of the past. List<Integer> would be the same as List<int>. You would also be able to forbid null being put where only primitives should be allowed. Currently an int[] array forbids null but a List<Integer> allows it. The type system is not unified.

With Valhalla we would be able to chose to declare a value class, defined only by the values it contains, or the familiar class, defined by it’s identity in the heap. Defining the latter is actually done by writing identity class but identity is the default keyword for classes, so we can omit it. Both class types can be written in essentially the same way but value classes have more constraints than identity classes have. The constraints are what allows hotspot to optimise the performance at runtime. One of these constraints, that us developers will be able to define, is whether or not the object is nullable (“JEP draft: Null-Restricted Value Class Types”)

Clearly having value classes would be great but getting them into the JDK has been a 11 year effort already that was started around the release of JDK 8. Some improvements have already been merged back to the JDK but most of the work is still being done in a separate openjdk/valhalla repository. The mainline jdk is being merged into the repository at certain points in time, allowing us to see what Valhalla will change.

JDK 25 vs Valhalla

Valhalla currently adds 124k lines of code. 17% in hotspot and 81% in tests. The test consist of 39% more hotspot tests and 31% more micro/org. The latter are probably the performance benchmarks. If that is the full extend of Valhalla remains to be seen.

At its release, Valhalla will certainly be the most significant transformation of the JDK yet. Which is quite an achievement when you consider what journey the JDK has already been on. It has kept the maintainers occupied for 10 years already and it is one of the puzzle pieces to get Java ready for the next 30 years. And Valhalla will not be the only enhancement to the JDK. A lot will happen in the following years.

Reproducing the maps

If this article was interesting to you but you want even more details, then I encourage you to reproduce the results.

  1. Clone the OpenJDK
  2. Switch to the tag of the JDK you want to analyse. I picked the version of the JDK releases that was generally available (GA). The correct build number is listed on the page of the reference implementation. The jdk-xx-ga tags seem to have exactly the same purpose as the explicit ones below.
    1. For JDK 21 it is jdk-21+35
    2. For JDK 17 it is jdk-17+35
    3. For JDK 11 it is jdk-11+28
  3. Download CodeCharta
  4. Generate a map of the OpenJDK with CodeCharta. Merge Tokei metrics with git metrics as shown in the docs. There is also a script for automated simple analysis.
  5. Visualise the map in the Web Studio.

And the final step:
Ask your colleagues for help analysing the code 🙂 In my case I had enormous help from Stephan Schneider and Hans Spielvogel. I could not have written this article without their analysis. Thank you.

Total
0
Shares
Previous Post

Azul Introduces 100 – 1000x More Accurate In-Production Java Vulnerability Detection

Next Post

The Framework Illusion: Let’s Fix Your Value Delivery

Related Posts