- 15 minutes read

Some time ago, I noticed many developers confuse value types and records. Admittedly, that happens to me all the time, too, but these two concepts are fundamentally different. The term "value record" adds to the confusion. Plus, it seems the idea has evolved quite a bit since 2017, when I first heard about it. So I invite you to join my deep dive.

In a nutshell

  • Java records are syntaxtic sugar. You use a record to store immutable data. They make programming easier and more expressive, but they're not improving performance. They improve your semantics.
  • The value types of project Valhalla enable the compiler to generate faster code. They trade a few features you don't need anyway for improved performance.
  • Project Valhalla also plans to introduce a slightly more restricted value type called "primitive classes," allowing for a streamlined memory layout and blurring the difference between primitive types and objects.
  • There's also a design pattern called value objects. That's probably where the name comes from, but the value objects of project Valhalla have a different scope and definition. So the design pattern is confusing. Forget about it, at least for the next ten minutes. When I'm talking about value classes, they are a synonym of value types, and value objects are the instances of these value classes.

Records and value types are different concepts, but they aren't opposites. You can have a value record. Well, not now, unfortunately. Both value and primitive types are still under development.

Faster performance and less memory footprint is always exciting. But the real fun is learning why value classes and primitive classes perform better.

Setting the stage

When you google "Java value types," you'll learn about identities. That's important, and in 2023, it's the cornerstone of the current definition of value types, but let me point out a different topic first: the improved memory layout of primitive types. I know that's a bit odd because primitive types are an extension of value types. But the story of primitive types shows nicely that Java data structures have a problem.

Let's have a look at an array. It doesn't matter if we're talking about native arrays or ArrayList. Under the hood, an ArrayList is a fancy native array, trading additional memory and performance footprint for flexibility. Today, we're interested in the underlying array. Something like this:

var[] numbers = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};

Most of the time, you're using just a hand-ful of operations. You iterate over an array, pick individual elements, and re-assign them. Ninety-nine percent of the time, that's all you want to do with such an array. You'd expect Java to generate optimized code for these three use-cases. Well, it does, as long as we stick to primitive data types.

for (var number: numbers) { System.out.println(number); } System.println(numbers[5]); numbers[5] = 6;

Now let's have a look at an array of objects. The syntax changes and the application becomes more cumbersome, but it's still an array. From a programmer's point of view, nothing changes. You'll want to iterate over the array, read arbitrary elements, and assign new values. So here's the counterpart of the example above. Note that I didn't run the code. Bear with me if there's a syntax error. This article is about concepts, not actual code.

@lombok.Data class ComplexNumber { private double real; private double imaginary; } ComplexNumber[] = IntStream.range(1, 10) .map(n -> new ComplexNumber(n, 0)) .collect(Collectors.toList()); for (var number: numbers) { System.out.println(number); } System.println(numbers[5]); numbers[5] = new ComplexNumber(0, 5);

I've added Lombok to keep the example short. If you're not familiar with Lombok, give it a try. The @Data annotation generates a constructor, getters, setters, and even a toString() method. It's almost (but not quite) a Java record.

Back to topic. We're comparing an array of int to an array of ComplexNumber.

Anatomy of a simple array

As it turns out, we're using both array more or less the same way. The arrays contain different values, but that's not a big deal. However, a look at your computer's memory reveals a different picture. Again, we start with the array of integers. Simplifying things a bit[1], our memory model looks like so:

That's pretty much what you'd expect. The numbers are stored nicely in consecutive memory cells. The only surprise is the curved line. That's a pointer[2], and it's a peculiarity of Java. It separates the variable from its value, connecting them by a pointer. Let that sink in if you're not familiar with low-level programming. This pointer plays a crucial role in our discussion.

Why does Java separate the variable from its value? Well, that's simple. This feature allows us to re-assign the variable later. In theory, you can implement a programming language that does without the pointer. As far as I remember, Pascal did it until they introduced "dynamic arrays". Java's approach makes it possible to replace the array with a larger array later. That, in turn, is a feature ArrayList depends on.

var numbers = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; numbers = new int[20]; // <- requires a pointer because // the new array needs more memory

Memory model of an array of objects

If you're like me, you expect the array of complex numbers to look the same. Complex numbers are tuples of two numbers, so I'd assume all those tuples populate consecutive memory cells.

Oops. Reality looks so much different. It's like buying a cupboard for each plate you own. The array itself is another cupboard containing ten post-its telling us where to find the plate.

There's nothing wrong with it, and it's one of the reasons why Java has become such a successful language. Java is both flexible and powerful. This approach works for arbitrary complex objects and supports sparsely populated arrays well.

Nonetheless, power always comes at a price, and so does flexibility. Dense Java arrays suffer from inefficient memory usage and a performance penalty. Iterating an array of objects sounds simple, but it requires the CPU to follow all these pointers. The closer you look, the worse it gets. It's bad enough that there are pointers, possibly pointing to memory locations scattered wildly. The CPU has a hard time iterating all those scattered items, running into cache misses all the time.

Value types - infant edition

But what's stopping us from storing all array elements side-by-side in consecutive memory cells?

That's precisely the story of value types in Java. Reading State of the Values - Infant Edition, you'll learn a lot about identities and stuff like that, but the core idea is about making the Java memory model more efficient. Important keywords are "flattening" and "vectorization."

The latter is very important. Getting rid of all those pointers gives your code a performance boost by reducing indirections and cache misses. But if we manage to store an entire array orderly in consecutive memory locations, we benefit from an afterburner. Modern CPUs support vector operations. Have a look at this whitepaper to learn more about vector operations. They boast with a factor 16 performance boost. Keep in mind that the compiler can vectorize only a tiny part of your application, so the general boost is much smaller. Java's already collected the low-hanging fruits, as Martin Stypinski shows in his excellent article. Primitive arrays (including strings) are already using vectorization.

Remains the question of how to optimize arrays of objects. Can we teach Java to use a linear memory layout for arrays?

Sure we can. That's the thesis of State of the Values - Infant Edition, first published in 2015. Eight years later, we're still waiting for value types. It's not as easy as it seems, and the idea has evolved. When investigating this article, I was surprised that the current term "value types" has changed its meaning.[3] What I've been describing is now called "primitive objects". Value types are an intermediate step toward primitive classes.

As far as I can see, the Valhalla project has split into several smaller tasks:

What's all the talk about identity?

I'm not entirely sure I've understood the problem with identity yet, so take the following section with a grain of salt. However, there's one keyword in Java that's hardly ever used, but still, it causes a lot of trouble to language designers. I'm talking about the synchronized keyword. Traditionally, you can add a multithreading lock on any object. The JVM needs a pointer (aka reference) to the object to allow that. The object has to be allocated in the heap, and the garbage collector observes it.

You can optimize the source code if you know that an object is never used for synchronization. You can move it from the heap to the stack, for example. You can also go further by dropping the == operator. That, in turn, allows you to store the object's fields in a CPU register. After eliminating both synchronized and ==, you've got no use for the reference to the object.

In almost every case, loosing synchronization and equality is a small price because you don't need any of these features.

In many cases, the JVM optimizer detects that your object is never synchronized and generates optimized code even today. The idea of value types is to allow the programmer to declare that, because the JVM optimizer has limits and because it's not for free. It better to generate optimized code from scratch.

Value classes / value objects in a future Java

The draft of the JEP (Jave Enhancement Proposal) also adds immutability. In a nutshell, Java value objects are immutable objects you can't synchronize. Nor can you use such an object in a class hierarchy.[4] As for equality, there's a twist: the language designers decided to keep it but to define it differently. We consider two value objects equal if their fields have identical values.

This redefinition has unexpected side effects. You can't simply add the new value keyword to your classes. Instead, you need to check if the new semantics of the == operator make a difference.

The draft mentions another surprising pitfall:

Assumptions about unique ownership of an instance may be violated (for example, an identical instance may be created at two different program points)

I'm not entirely sure what this means, so allow me to make educated guesses. The definition is clear enough. The JVM or the application may clone an object or create a new object with identical values. Traditionally, we consider them different because all these objects have different references (or difference addresses in memory). Value types change that. All these objects are identical. However, I wonder what it means on the machine level. I assume it indicates that the JVM is allowed to reuse an existing instance of an object instead of creating a new one. That might also be the rationale behind making value objects immutable. It might even enable the JVM to move the object from main memory to CPU registers. You can't define a pointer to a register, so giving up references help moving objects to the super-fast internal CPU memory. If you're not familiar with registers, memory layout, and CPU design, have a look at my primer on assembly language.

You can already opt-in for a similar feature (without the register twist). String deduplication reduces memory footprint by reusing existing strings. Generally speaking, String is a textbook example of a value class: it's final, it's immutable, it allows for deduplication, and I've never seen any program using a string as a concurrency lock.

Another textbook example of value types is Optional. It's a container, so we're not interested in the identity of the Optional. When it gives up its identity, you won't even notice.

Primitive classes

Remains the topic of primitive classes. They also give up their reference. A primitive object can never be null, and you can't build a tree or a circular graph of primitive objects. The advantage is you can represent primitive object structure and arrays of primitive objects as a streamlined memory model. JEP 401 lists some key properties of primitive types:

  • Likey value types, they don't have an identity.
  • Primitive objects are a flattened sequence of their field values.
  • In particular, they don't have the header of a traditional Java object.
  • Arrays of primitive objects feature a flattened memory layout.

We didn't talk about the header of an object yet. That's a small datastructure every instance of a class has. It contains the identity hashcode, serveral bytes required for synchonization, four bits used by the garbage collecter, and a reference to the class. On a modern 64-bit JVM, the header takes 16 bytes. In particular, even classes without fields are 16 bytes large. Getting rid of the header is an attractive goal. The disadvantages are obvious, too. You can't store an object without header in the heap because of it lacking or limited garbage collector support. However, such an object can be part of a full-featured object on the heap, reducing memory footprint and cache misses significantly. And you can store a primitive object in the stack.

If you want to learn more about headers, Baeldung covers it, Aleksey Shipilёv gives your a deep dive, and of course you can jump directly to the sourcecode of the JVM.

Here's the complete drawing of an array of objects including headers:

Primitive classes are designed to get rid of both the object headers and all the arrows of my drawings. Object headers typically take 8 to 16 bytes, and each pointer takes another 4 to 8 byte, so I expect an impact on memory usage. Plus, the linear memory layout reduces the number of cache misses, helps the CPU to predict memory accesses, and sometimes the JVM may even use vector operations. Primitive objects also reduce the strain on the garbage collector. It's not surprising the Java language team is diligently working on the topic since at least 2015!

Nonetheless, the target memory layout looks attractive enough to keep on working:

Limitations of primitive objects

You can't replace every class with a primitive object because primitive classes have a set of limitations. For example, they can't be null. You can't derive from a primitive class. And - most troubling - read accesses aren't thread safe.

The Valhalla papers are also concerned about "tearing". Consider a primitive object consisting of two fields. Your algorithm reads field A first, and field B immediately after that. However, another thread may interrupt your first thread and overwrite the primitive object with a new value. In this case, your algorithm reads an invalid combination of A and B. Immutability solves this issue. Probably that's the reason why value objects are immutable by definition. On the other hand, value types and primitive types are still under development, so everything's subject to change, even immutability.

Java Records

I'm so excited about compiler technology I almost forgot to tell you about records. At first glance, Java Records are similar to value classes. They are immutable, final data classes.

However, records do have an identity. When value classes are released sometime in the future, we'll probably have "value records" as an option. But for now, the unique selling point of records is the simplified API and - even more important - their semantics. Nicolai Parlog explains it in a short video and with more detail in his blog. Records are ideal tools for storing data. When you see a traditional class, it may process and modify data. Records, in contrast, do not. You can add methods and additional constructors to records, but I believe that's a bad idea because it weakens the idea of a data object. It's better to consider records as immutable carriers of data. If you've got a different use case, use a traditional class.

Wrapping it up

Whow, that's been a deep dive into compiler technology. Compiler technology is always interesting.

However, the key takeaways of this article are down-to-earth: Records are useful because they improve the semantics of your sourcecode. Use them wisely, and you'll be rewarded with improved readability and lower maintainance cost. Value type are something completely different. They allow the compiler to generate better code. Primitive types are value types on steroids, allowing the compiler to generate code that's close to what an Assembly language programmer writes. Unfortunately, both primitive and value types have their sets of limitations, so don't expect too much. You can't use them everywhere.

In any case, records and value types aren't opposites. They're orthogonal. Value records are the best of both worlds: improved semantics and better performance.

Dig deeper

My 2017 version of this article

Jesper de Jong describing Valhalla in 2015/2016

Value Types - infant edition from 2015

Value Types (JEP draft)

JEP 401: primitive classes (preview)

State of Valhalla - Part 1: Road to Valhalla by Brian Goetz in December 2021

State of Valhalla - Part 2: The Language Model

State of Valhalla - Part 3: The JVM Model

  1. I'm omitting the object header. I'll cover it later.
  2. In this article, I'm using the term pointer and reference as synonyms. There are subtle differences, but they're not important here. In a nutshell, I'm calling it a pointer when I'm thinking about bits and bytes, and a reference when I'm thinking about Java objects and the Java type system.
  3. or maybe I got it wrong seven years ago.
  4. except for abstract classes. I'm puzzled about that.