; last updated - 9 minutes read

Value types

This morning Brian Goetz revealed interesting news at the end of his talk at the JAX conference. Oracle is considering to extend Java's type system. Did you ever wonder why there are no call by value parameters in Java methods? Why there's no Array<int> in Java? Why functions return one - and only one - value at a time? Why there are no tuples in Java? Why arrays are created in such a weird way, making them effectively scattered all around the memory? Brian Goetz told us they might fix all this after adding value types to the language, which allows for an efficient implementation of these ideas. Truth to tell, he didn't really promise value types. In particular, he says Java 9 will almost certainly not include the feature. However, judging from the quality and level of detail of his "State of the Values" paper I'm positive they are going to make it into a future version of Java.

I'd also like to point out value types are a revolution in that they require new JVM byte codes. Remember how rarely new JVM byte codes are added. Obviously byte codes are considered a valuable resource. If value types are worth new byte codes, chances are they're worth it.

What's is all about?

Java contains two different type systems: primitives, such as int and double, and "upper-case" objects. The two type systems are utterly incompatible, a fact lamented about time and again. Java 5 applied a little plaster on the wound: autoboxing converts primitives back and forth to the corresponding object types. This allows Java programmers to write like

ArrayList myList = new ArrayList<>(); myList.add(5); int sum=0; for (int element: myList) { sum += element; } System.out.println("The sum of the list is " + sum);

Can you imagine what a pain this was back when autoboxing had yet to be invented? Let's indulge ourselves in the good old times:

ArrayList myList = new ArrayList(); myList.add(new Integer(5)); // convert an int to an object int sum=0; for (int index = 0; index < myList.size(); index++) { Integer oElement = (Integer)myList.get(index); int element = oElement.intValue(); // convert the object back to an int sum += element; } System.out.println("The sum of the list is " + sum);

Apparently, we've found a way to deal with the two type systems efficiently. Mission accomplished.

So what the heck does Brian Goetz mean by "Value types enable you to use ArrayList<int> at last"?

Well, precisely the opposite of what Java 5 did. Autoboxing solves the problem by adding a lot of conversions automatically to the byte code. It makes the code much more inefficient. Value types, in turn, are about efficiency.

Value types are about stripping objects from their identity.

Java objects have an identity!

Putting it in a nutshell, value objects are regular objects with their identity stripped.

Now you're none the wiser, I guess. What's an object's identity?

In a way, object identity is a surprising feature. Actually, it's a feature almost nobody is aware of. If anything, we notice it because it gets in our way by making Java code clumsy. Consider an integer array:

int[] myArray = {1, 2, 3};

How does it look like in memory? Correctly: it's a consecutive sequence of 12 bytes in the memory. Chances are the array is put on the call stack, or even - if your CPU has enough registers - put in the CPU registers.[1]

Now for the "upper-case" Integer array:

Integer[] myArray = {1, 2, 3};

That's a completely different story. The array still is a consecutive sequence of 12 (possibly 24, depending on the operating system) bytes. But it doesn't contain the numbers. It just contains pointers to the numbers. These, in turn, are objects that may reside anywhere in the memory. Autoboxing hides what's really happening. In fact, I was a little surprised the short-hand code compiles. In Java 1.4 we had to write something like this:

Integer[] myArray = {new Integer(1), new Integer(2), new Integer(3)};

Each array element is allocated individually. That's a lot of strain on the memory management and the garbage collector. Lots and lots of tiny chunks of memory have to be allocated.[2] Plus, it's a lot of wasted memory. The Integer class has been implemented very efficiently, which means it contains merely a single value: the four-byte int. But there's the overhead that comes with every Java object. I didn't mention it when I talked about the int array, but every Java objects has an 8 to 16-byte header.[3] So it all adds up to 48 to 84 bytes. Plus the pointer myArray itself. And there are the static components of the classes. They're allocated only once, but they're still there.

So every objects in Java has a unique memory address and can only be accessed via a pointer. Java objects have an equals() and a hashcode() method, and they can be used as locks in a synchronized block. This is called "object identity".

We need roughly 50 to 100 byte just to store an array of three numbers. In other words, to store 12 bytes. Brian Goetz says we can do a lot better.

Put'em on the stack!

Mind you, good old C simply dumps the three numbers on the stack. Java can't do so because there are operations relying on object identity. Among other methods, that's equals(), hashcode(), wait and synchronized.

All we have to do is either to forbid using these methods or to find alternative implementations. That's not as simple as it seems: The "State of the values" document has 17 pages in print.

I don't want to dive further into the technical details of value types. I'd rather point you to the original document. Instead, I'd like to answer the most obvious question:

What's in store for us?

The "state of the values" document lists a couple of features that can be implemented using value types. Of course, that doesn't mean the features will be implemented, but it's simpler to implement them.

Java methods return a single value at most because it doesn't know the notion of tuples. Everything in Java has to have an identity, and you just can't implement tuples efficiently without dropping the identity. Of course, you can implement tuples and multiple return types on the JVM. Scala does so. But it's not really efficient. You can't merely put the return type (int, int) in two CPU registers, or put it on the stack. You have to wrap it in an object. Value types, in turn, can be put on the stack, making it possible to implement multiple return types in a very efficient way.

Value type arrays have predictable memory locations. That allows for a lot of optimizations. JVMs can iterate over arrays much faster if they merely have to add a fixed number to a pointer. Plus, they don't have to leave the cache line. Today, each array element is a pointer leading to an unpredictable memory location. But it has to be followed, potentially resulting in a lot of cache misses. The situation gets worse on CPUs featuring vector instructions. Value types allow the JVM to make use of advanced instructions of vector CPUs.

Java's numeric types are poorly aligned to the CPU types. In many cases, they are different. For instance, on X86 processor the floating point unit deals with 80-bit numbers. Java's float values have 32 bits, doubles have 64 bits. That probably[4] means they can't use the processors FPU. If this is true, Java's floating point performances can be improved considerably. Even if it uses the FPU, there's a lot of conversion needed, which does no good to the performance.

Your program might run a lot faster with value types. Can you imagine many optimizations are omitted because you might use an object to synchronize your threads? You'll rarely do, but the JVM can't know that. So it prefers the conservative approach. Value types can't be synchronized (they are defined that way), so chances are your programs run a lot faster. In fact, value objects can often be held in two or three CPU registers. Regular Java objects can't: you can't acquire a lock on a CPU register.

Update Dec 25, 2017: Minimal value types

In the meantime, the Java team have developed an early prototype called "minimal value types" (MVT). The idea is to implement a subset of the feature to see how the idea works in practice. You can already download an early access version. Also see the project page for more details.

Conclusion

Demanding an ArrayList<int> doesn't sound like a big deal, but it is. There are at least two different strategies: wrap everything in an object, and let the compiler of the JVM figure out how to unwrap it to really efficient code. That's one of the strategies of Scala. Don't get me wrong: as far as I know, the Scala compiler does an excellent job when it comes to dealing with primitives. The resulting byte code is as good as hand-written Java code. The other strategy is to fix or to extend the JVM itself. That's the strategy Oracle is considering for a Java version to come. That's a strategy offering a whole lot of new opportunities.

One of my readers told me there's also a third strategy which is used by Scala. According to Srdjan Mitrovic Scala uses dedicated data types for arrays of primitive types. This strategy allows for very efficient memory usage. However, it doesn't optimize the memory usage of complex types (or arrays of complex types) so even Scala might benefit from value types.

As I mentioned above, you have to take the story with a grain of salt: at the moment it's unclear when - and maybe even if - value types are going to be added to Java and the JVM.

If this cursory article piqued your curiosity, carry on to read the full story by the Oracle developers themselves.


Further reading

State of the values by John Rose, Brian Goetz and Guy Steele

Java documentation on primitive data types

StackOverflow on Java array sizes


  1. Actually, I don't think that's true because the array itself has an identity, preventing to put it on the stack.↩
  2. Unless the compiler or the JVM recognizes what's happening and starts to optimize. As far as I know, this is not the case.↩
  3. including the int array, raising the memory footprint from 12 to 20-28 bytes↩
  4. I didn't analyze the implementation. I'm just making an educated guess.↩

Comments