Value Types: Revamping Java’s Type System

Value types

This morning Brian Goetz revealed interesting news at the end of his talk at the JAX conference. Oracle is considering to extend Java’s type system. Did you ever wonder why there are no call by value parameters in Java methods? Why there’s no Array<int> in Java? Why functions return one – and only one – value at a time? Why there are no tupels in Java? Why arrays are created in such a weird way, making them effectively scattered all around the memory? Brian Goetz told us they might fix all this after adding value types to the language, which allow for a efficient implementation of these ideas. Truth to tell, he didn’t really promise value types. In particular, he says Java 9 will almost certainly not include the feature. However, judging from the quality and level of detail of his “State of the Values” paper I’m positive they are going to make it into a future version of Java.

I’d also like to point out value types are a revolution in that they require new JVM byte codes. Remember how rarely new JVM byte codes are added. Obviously byte codes are considered a valuable resource. If value types are worth new byte codes, chances are they’re worth it.

What’s is all about?

Java contains two different type systems: primitives, such as int and double, and “upper-case” objects. The two type systems are utterly incompatible, a fact lamented about time and again. Java 5 applied a little plaster on the wound: autoboxing converts primitives back and forth to the corresponding object types. This allows Java programmers to write like

ArrayList<Integer> myList = new ArrayList<>();
myList.add(5);
int sum=0;
for (int element: myList) {
   sum += element;
}
System.out.println("The sum of the list is " + sum);

Can you imagine what a pain this was back when autoboxing had yet to be invented? Let’s indulge ourselves in the good old times:

ArrayList myList = new ArrayList();
myList.add(new Integer(5)); // convert an int to an object
int sum=0;
for (int index = 0; index < myList.size(); index++)
{
   Integer oElement = (Integer)myList.get(index); 
   int element = oElement.intValue(); // convert the object back to an int
   sum += element;
}
System.out.println("The sum of the list is " + sum);

Obviously we’ve found a way to deal with the two type systems efficiently. Mission accomplished.

So what the heck does Brian Goetz mean by “Value types enable you to use ArrayList<int> at last”?

Well, exactly the opposite of what Java 5 did. Autoboxing solves the problem by adding a lot of conversions automatically to the byte code. It makes the code much more inefficient. Value types, in turn, are about efficiency.

Value types are about stripping objects from their identity.

Java objects have an identity!

Putting it in a nutshell, value objects are regular objects with their identity stripped.

Now you’re none the wiser, I guess. What’s an object’s identity?

In a way, object identity is a surprising feature. Actually, it’s a feature almost nobody is aware of. If anything, we notice it because it gets in our way by making Java code clumsy. Consider an integer array:

int[] myArray = {1, 2, 3};

How does it look like in memory? Correctly: it’s a consecutive sequence of 12 bytes in the memory. Chances are the array is put on the call stack, or even – if your CPU has enough registers – put in the CPU registers1.

Now for the “upper-case” Integer array:

Integer[] myArray = {1, 2, 3};

That’s a completely different story. The array still is a consecutive sequence of 12 (possibly 24, depending on the operating system) bytes. But it doesn’t contain the numbers. It just contains pointers to the numbers. These, in turn, are objects that may reside anywhere in the memory. Autoboxing hides what’s really happening. In fact, I was a little surprised the short-hand code compiles. In Java 1.4 we had to write something like this:

Integer[] myArray = {new Integer(1), new Integer(2), new Integer(3)};

Each array element is allocated individually. That’s a lot of strain on the memory management and the garbage collector. Lots and lots of tiny chunks of memory have to be allocated2. Plus, it’s a lot of wasted memory. The Integer class has been implemented very efficiently, which means it contains merely a single value: the four byte int. But there’s the overhead that comes with every Java object. I didn’t mention it when I talked about the int array, but every Java objects has a 8 to 16 byte header3. So it all adds up to 48 to 84 bytes. Plus the pointer myArray itself. And there’s the static components of the classes. They’re allocated only once, but they’re still there.

So every objects in Java has a unique memory address and can only be accessed via a pointer. Java objects have an equals() and a hashcode() method, and they can be used as locks in a synchronized block. This is called “object identity”.

We need roughly 50 to 100 byte just to store an array of three numbers. In other words, to store 12 bytes. Brian Goetz says we can do a lot better.

Put’em on the stack!

Mind you, good old C simply dumps the three numbers on the stack. Java can’t do so because there are operations relying on object identity. Among other methods, that’s equals(), hashcode(), wait and synchronized.

All we have to do is either to forbid using these methods or to find alternative implementations. That’s not as simple as it seems: The “State of the values” document has 17 pages in print.

I don’t want to dive further into the technical details of value types. I’d rather point you to the original document. Instead, I’d like to answer the most obvious question:

What’s in store for us?

The “state of the values” document lists a number of features that can be implemented using value types. Of course that doesn’t mean the features will be implemented, but it’s simpler to implement them.

Java methods return a single value at most, because it doesn’t know the notion of tuples. Everything in Java has to have an identity, and you simply can’t implement tuples efficiently without dropping the identity. Of course you can implement tuples and multiple return types on the JVM. Scala does so. But it’s not really efficient. You can’t simply put the return type (int, int) in two CPU registers, or put it on the stack. You have to wrap it in an object. Value types, in turn, can be put on the stack, making it possible to implement multiple return types in a very efficient way.

Value type arrays have predictable memory locations. That allows for a lot of optimizations. JVMs can iterate over arrays much faster if they simply have to add a fixed number to a pointer. Plus, they don’t have to leave the cache line. Today, each array element is a pointer leading to an unpredictable memory location. But it has to be followed, potentially resulting in a lot of cache misses. The situation gets worse on CPUs featuring vector instructions. Value types allows the JVM to make use of advanced instructions of vector CPUs.

Java’s numeric types are badly aligned to the CPU types. In many cases, they are different. For instance, on X86 processor the floating point unit deals with 80 bit numbers. Java’s float values have 32 bits, doubles have 64 bits. That probably4 means they can’t use the processors FPU. If this is true, Java’s floating point performances can be improved considerably. Even if it uses the FPU, there’s a lot of conversion needed, which does no good to the performance.

Your program might run a lot faster with value types. Can you imagine many optimizations are omitted because you might use an object to synchronize your threads? You’ll almost never do, but the JVM can’t know that. So it prefers the conservative approach. Value types can’t be synchronized (they are defined that way), so chances are your programs run a lot faster. In fact, value objects can often be held in two or three CPU registers. Regular Java objects can’t: you can’t acquire a lock on a CPU register.

Conclusion

Demanding an ArrayList<int> doesn’t sound like a big deal, but it is. There are at least two different strategies: wrap everything in an object, and let the compiler of the JVM figure out how to unwrap it to really efficient code. That’s one of the strategies of Scala. Don’t get my wrong: as far as I know the Scala compiler does a really good job when it comes to dealing with primitives. The resulting byte code is as good as hand-written Java code. The other strategy is to fix or to extend the JVM itself. That’s the strategy Oracle is considering for a Java version to come. That’s a strategy offering a whole lot of new opportunities.

One of my readers told me there’s also a third strategy which is used by Scala. According to Srdjan Mitrovic Scala uses dedicated data types for arrays of primitive types. This strategy allows for very efficient memory usage. However, it doesn’t optimize the memory usage of complex types (or arrays of complex types), so even Scala might benefit from value types.

As I mentioned above, you have to take the story with a grain of salt: at the moment it’s unclear when – and maybe even if – value types are going to be added to Java and the JVM.

If this cursory article piqued your curiosity, carry on to read the full story by the Oracle developers themselves.


Further reading

State of the values by John Rose, Brian Goetz and Guy Steele
Java documentation on primitive data types
StackOverflow on Java array sizes


  1. Actually, I don’t think that’s true because the array itself has an identity, preventing to put it on the stack.
  2. unless the compiler or the JVM recognizes what’s happening and starts to optimize. As far as I know, this is not the case.
  3. including the int array, raising the memory footprint from 12 to 20-28 bytes
  4. I didn’t analyze the implementation. I’m just making an educated guess.

6 thoughts on “Value Types: Revamping Java’s Type System

  1. Srdjan Mitrovic

    You said that here are two different strategies (Scala and JVM). There is a third strategy: specialization!
    For example ArrayList would compile to a completely different type and it would have a different signature than raw ArrayList. That specialization would use int array instead of object array. Scala has that option. There are only 8 primitive types and we could have specialization for these 8 primitive types.

    Reply
  2. Pingback: Le Touilleur Express ยป Blog Archive ยป Devoxx 2014 โ€“ mercredi

  3. Tomas Gradin

    This:
    Integer[] myArray = { 1, 2, 3 };

    will actually compile to this:

    ICONST_3
    ANEWARRAY Integer
    DUP
    ICONST_0
    ICONST_1
    INVOKESTATIC Integer.valueOf (int) : Integer
    AASTORE
    DUP
    ICONST_1
    ICONST_2
    INVOKESTATIC Integer.valueOf (int) : Integer
    AASTORE
    DUP
    ICONST_2
    ICONST_3
    INVOKESTATIC Integer.valueOf (int) : Integer
    AASTORE

    The Integer.valueOf(int) for values -128 โ€“ 127 are pre-cached as well, so it’s not really the case that lots of tiny chunks of memory get allocated here ๐Ÿ™‚

    Using new Integer(int) is a completely different story though!

    Reply
    1. Stephan Rauh Post author

      You’re right, of course. Java already does some nice tricks to gain efficiency. However, Brian Goetz was referring to arrays of objects in his talk. Maybe my example is a bit misleading because Integers can be converted to ints. You’ve chosen a native array, which is the most efficient way to store data in every language I know. But what about popular types such as java.util.ArrayList? What about a native array storing complex objects like, say, a CustomerBean? These are for more common scenarios than the native array of integers.

      Reply
  4. w0rp

    Your claim about not being able to use the FPU isn’t correct. ‘float’ is a single-precision 32-bit binary floating point number, and ‘double’ is a double-precision 64-bit binary floating point number. The same is true of C. The 80-bit x86 floating point number type can typically be used in C with `long double`, which may be just a double-precision floating point number on other architectures, and is something Java does not support.

    Reply

Leave a Reply

Your email address will not be published.