Recent Improvements of Java’s String Implementation

Posted on Posted in Java 8

Three days ago, I wrote an article about Strings in Java 8. I claimed they’d rewritten java.lang.String from scratch, supporting a 64 bit encoding that can be processed by the processor’s FPU. Of course, it was an April prank. So I was pretty surprised to learn there really are improvements of Java’s string implementation. They aren’t even subtle.

Java 7 Update 6 introduced a new implementation of java.lang.String that’s better suited for large numbers of Strings. As you may or may not know, the former implementation had been designed with optimizing the substring() method in mind. It was super-fast because it didn’t copy the characters of the substring. Instead it created a new pointer to the old String. Sometimes this prevents effective garbage collection, and it makes the intern() function extremely expensive.

So they opted to implement a new version that’s simpler. As a result of the new implementation substring() has to copy the character array. In the age of multi-gigabyte PCs and servers that’s more efficient than trying to save a few bytes.

In Java 7 they also abandoned 8-bit Strings. Now 16-bit Unicode strings are always used. That’s funny because it almost matches my April prank, where I claimed they abandoned the 16-bit character encoding in favor of 64-bit encoding. I wonder if java.lang.String makes use of the FPU?

Read the full story at Attila Balazs article on the state of strings in Java.

Leave a Reply

Your email address will not be published.