Category Archives: Performance

Newsflash: Sometimes Using Float is Faster Than Using Int in Java

Performance considerations in the eighties

Time and again, I’m surprised by the performance of modern CPUs. Actually, the speed boost astonishing me most took place a decade ago, give or take a few years. Before that, floating point operations were a lot slower than integer operations. When I learned my first assembly language, the 6502 processor was state of the art. This processor was so simple that you had to write a program to implement the integer multiplication. Multiplying floating point numbers was an even longer program. I’ve forgotten the actual numbers, but as a rule of thumb, multiplying integers was ten times slower than adding two integers, and multiplying floating point numbers was another magnitude slower.

… compared to 2016

Nowadays, x86 CPU multiply four floating point numbers in a single CPU cycle (under optimal conditions). In theory, the fastest possible operation takes one CPU cycle, to it boils down to saying that multiplying floating point numbers is every bit as fast as adding integers. The complexity of the algorithm hasn’t gone away, so this is absolutely stunning. Not only have hardware designers managed to implement the complex algorithm in hardware, they also managed to implement it in a fashion resembling parallel programming. I suppose the final algorithm isn’t that complicated. Actually, I’ve got an idea what it looks like. But it took hardware designers years to get there, so it’s obvious it wasn’t a low-hanging fruit.

What about Java?

As Grey Panther shows in his article, the effect also shows in Java programs. Under certain circumstances, integer operations can be slower than floating point operations. I don’t think that holds true for individual operations, because in this case, the latency induced by the CPU pipeline plays a major role. But when you do a lot of floating point operations, the JIT compiler and it’s optimations kicks in, allowing for an efficient CPU use. The net result is that it may pay to prefer floats over integers.

Wrapping it up

However, what’s more important, is that every data type is blazing fast. You can choose the data type that suits your problem. In earlier times, performance considerations often dictated the choice of the data type. Luckily, this is a thing of the past.

Dig deeper

Benchmarking the cost of primitive operations on the JVM
Intel® 64 and IA-32 Architectures Optimization Reference Manual

OmniFaces CombinedResourceHandler Gives Your Application a Boost

OmniFaces 2.1 is soon to be released. Time to present a useful OmniFaces component which has been improved in version 2.1: the CombinedResourceHandler.

The CombinedResourceHandler reduces page load times significantly by combining the countless resource files of a JSF page into one or two larger files, which can be loaded and processed much more efficiently. Since version 2.1, these files can optionally be cached on the server side, giving the application another boost. Server side caching reduces the time needed to display the BootsFaces showcase by a second. If that doesn’t sound much: believe me, your customers will notice the difference. It’s a difference if the page takes one or two seconds to display.
Continue reading

A Java Programmer’s Guide to Assembler Language

Imagine a programming language without variables. A language without function parameters and return values. A language without floating point numbers, maybe even without multiplication and division. A language without type casting. Without types, actually. Not to mention classes or objects. Join me in a journey to the world of assembler languages.
Continue reading

Java 8 Update 20: String Deduplication

Remember my last April prank? I span a yarn about the great way Oracle managed to squeeze every conceivable String into a 64-bit number. Funny thing is my vision comes sort of true. Of course, not the way I claimed. That was nonsense meant to be easily seen through (“There are only so many Strings a gal or a guy can think of – so everything you have to do is to assign them a 64-bit number”). But it’s true that a lot of work at Oracles’ is dedicated to optimize String management.

Java 7 Update 6 improved the speed of String.substring() by sacrificing a little memory efficiency (see my article Recent Improvements of Java’s String Implementation). Java 8 Update 20 takes the opposite approach: it sacrifices a little CPU efficiency in order to reduce memory footprint. On the long run this should reduce the strain on the CPU, too. In other words: most real worlds programs should run faster.

Everybody who’s analyzing a Java program in a profiler can’t avoid noticing: Java programs create incredible quantites of character arrays. Most of them are part of String objects. Java represents its String as objects, meaning there’s a pointer to a character array. That’s not exactly the most efficient way to represent a sequence of characters. It’s just what you get if you want to represent a String as an object instead of defining it as a native primitive type. However, the developers of Java got aware of the problem a long time ago, so they invented the String.intern() method. In general, it’s a bad idea to call this method by yourself (because you’re trying to outperform the JVM’s optimization), but sometimes it reduces your application’s memory footprint tremendously.

Putting it in a nutshell, String deduplication is a clever way of calling String.intern() as part of the garbage collection. Since Java 8 U20 the garbage collector recognizes duplicates String and merges them. Or rather, it merges the underlying character arrays, as our reader LukeU kindly remarks in the comment section. Replacing the String objects bears the risk of unwanted side effects, so calling String.intern() manually still frees a bit more memory. This approach costs some CPU power, but it shouldn’t be much of a concern because the garbage collector runs in its own thread. Plus, on the long run the reduced memory footprint makes the garbage collector run faster.

There’s a discussion on reddit indicating String deduplication really works, but sometimes you have to adjust the JVM parameters. I suppose that’s one of the reasons why the feature isn’t activate by default yet. You have to activate manually by starting the JVM with the parameter

-XX:+UseG1GC -XX:+UseStringDeduplication

Still, I sometimes wonder why a Java String has to be an object. Back in the seventies or the eighties, when the first languages became powerful enough to represent a String by means of the language instead of defining it as a primitive type, it quickly become a fashion to do so. Probably there are other reasons, but I always had the impression language designers considers this a sign of their language’s maturity. But being able to express a String as a library object doesn’t mean to have to do so. If Java were to use zero-terminated character arrays (the way BASIC does) or if it were to use a character array with a preceding length byte1, hardly any developer would notice. But the implementation gets rid of a pointer, simplifying memory management and garbage collection. The only convincing advantage of making Strings part of the language libraries is the ability to derive custom classes. Sadly, the Java developers prohibited this very early in the Java history by making String a final class. They did it for a good reason – but still, it’s a pity. Groovy’s GStrings show what you can do when you allow to derive from Strings.

That said, I’d like to point you to Java Performance Tuning Guide. They’ve written an in-depth article about Stringdeduplication.

Dig deeper:

Java Performance Tuning Guide on String deduplication
discussion on reddit on the topic (well, one of them – possibly the most interesting one)

  1. or a 64-bit word in the age of Gigabyte memories – and the characters should be at least 16-bit to support UTF-16. I used “byte” and “character” as a figure of speech. Is there a catchy word for 16 or 32-bit integers?

BabbageFaces 1.0 RC2 Now Available for Download

Fork me on GitHubMy guts tell me BabbageFaces is ready to be released into the wild. Most of the time that’s a reliable measure. Today I’ve tested it with the new PrimeFaces 5 showcase. BabbageFaces passed the test without problems. There’s also a decent showcase example containing most of the elements of a typical business application, running both on Apache MyFaces and Mojarra. I’m optimistic all that’s missing is a couple of source code comments, tidying and documentation to call it a final version.

The showcase demo also shows BabbageFaces allows you to stop scattering ids all around your JSF code just to make AJAX efficient. BabbageFaces compares the response with the current DOM tree and sends only the minimum set of changes to the clients. If network load is giving you headaches, BabbageFaces may be your tool of choice. It reduces network traffic. At the same time it simplifies the programming model.
Continue reading

Newsflash: Can You Rely on System.nanoTime()?

If you’re like me, you’re concerned about your application’s performance. So you’re familiar with profiling your application.

The simplest way to measure your application’s performance is to time how long a method call takes. Most of you probably know System.currentTimeMillis(), but there’s a better alternative:

public long measure() {
  long startTime = System.nanoTime();
  return System.nanoTime() - startTime;

System.nanoTime() is a great function, but one thing it’s not: accurate to the nanosecond. The accuracy of your measurement varies widely depending on your operation system, on your hardware and on your Java version. As a rule of thumb, you can expect microsecond resolution (and a lot better on some systems).

Alexey Shipilev wrote a great article on the dos and don’ts of measure performance. Among other things he shows how multithreading can spoil the accuracy of performance measurements, reducing the resolution of System.nanotime() to 15 milliseconds under averse conditions.

Newsflash: React Speeds Up AngularJS Rendering

Today I’ve read about a small but interesting framework called React.js that convinced me to start a new series on this blog. Newsflashes are small articles, just two or three sentences, describing an interesting idea and providing a link to read on. They are less thoroughly researched than the full-fledged articles of Instead I’ll go with my guts to choose interesting bits of information.

React is a lightweight Javascript framework focusing on the UI. According to the project page, using a virtual DOM difference algorithm makes it very fast. Thierry Nicola decribes in his article how to combine AngularJS and React to make your AngularJS application faster.

While the effect is impressive, chances are you’re going to benefit from React without having to use it yourself. My bet is many frameworks, possible even browsers, are going to use virtual DOM by default.

AngularFaces 2: Even Better Steroids for JSF

Fork me on GitHubAngularJS and other fancy Javascript frameworks stir up the Java world. The Java world has been dominated by server side framework for more than a decade now. Times they are changing. Now everybody’s moving to the client side. The server’s reduced to a faint memory, such as a REST service.

Everybody? Not quite. A lot of developers want to steer a middle course. That’s AngularFaces for you.

By the way, the interest in AngularFaces is tremendous. The AngularFaces announcement has become the second most read article of, being read a hundred times each day.

So I decided AngularFaces needs a strong grounding. You know, the first version hadn’t really been designed carefully. It just happened. Originally it was just a little experiment. Granted, it works fine, but it’s difficult to add new features to it. The second version is a complete re-write designed for the future. AngularFaces 2.0 has a lot of advantages over it’s predecessor.

And the nice thing about AngularFaces 2.0 is it’s already available as a preview version on GitHub.

Update Sept 16, 2014
Actually, the final version of AngularFaces 2.0 looks completely different from the AngularFaces 2.0 I envisioned four months ago. It doesn’t define components, it’s a plugin to existing components. In other words, it adds functionality to most components of Mojarra, MyFaces, PrimeFaces and probably (although I haven’t run a test yet) to most other JSF widget libraries. Nonetheless I think the ideas described in this article are good ideas worthy to be implemented. The AngularFaces widgets have moved to a project called AngularFacesWidgets.
Have a look at the tutorial at if you’re interested in AngularFaces 2.0 and above.
Continue reading

Tweaking the Hot Spot Compiler’s Settings

A couple of days ago Chris Newland published two interesting articles. He’s looking at the JVM’s Hot Spot compiler settings, looking for possible optimizations.

Regular readers of my blog may remember I analyzed the JVM optimizations two years ago. At the time I found out you can help the optimizing compiler to get to speed, but it doesn’t matter on the long run. After 10.000 invocations methods are compiled to machine code, and after that there’s nothing left to optimize. However, if you happen to work on one of the vast number of algorithms that call methods frequently, but less than – say – 1.000 times, low-level optimizations may help.
Continue reading

Java 8: Major Speed Boost by Overhauled String API

April 1st, 20141 2 Java 8 is out for a week or two now, and we all have been puzzled by the tremendous speed improvements of Oracle’s newest coup. String operations in particular have improved a lot. It’s time to look into the guts of Java and to analyze what’s going on.

Along the way, I also found out why Pattern.compile() has been deprecated in Java 8. Please stop using it. It’s a big performance penalty.

It took me a while to find out. Only when I looked into the byte code generated by Java 8, I realized what has happened. Oracle has overhauled its String implementation. Actually they have rewritten it from scratch. They need to retain the backward-compatibility, so it wasn’t that easy to see the trick: they abandoned 16-bit Unicode, replacing it by a more modern encoding. A major step to a more memory-efficient design! The Java community has been longing for this move since ages.
Continue reading

  1. Yes, this was this year’s April fool’s hoax.
  2. But read my follow-up article to learn about the real improvements of the string implementation.

Google Adds Asm.js Test To Its Octane Benchmark

Google enhances its Octance Benchmark measuring Javascript performance in an interesting way. It adds two benchmarks measuring technologies developed by Google’s competitors: Typescript has been developed by Microsoft, while asm.js is a Mozilla project aiming at speeding up a subset of Javascript. Google does not support the asm.js. According to them it’s better to optimize the entire Javascript runtime instead of focusing on a subset of it. In my eyes this is a convincing argument: asm.js code is designed to be generated by compilers, not by humans. As for now Javascript compilers like Dart, Ceylon, Kotlin or Typescript haven’t had that much of an impact. The majority of Javascript code is written by humans.
Continue reading

My Computer’s Too Fast!

Programmers need fast computers to work efficiently. Give them the fastest gear you can get, and they’ll give you a lot of bang for the buck. It’s not only their motivation elevated by being flattered1: Programmers’ tools tend to consume a lot of computer power. You don’t want to pay them for waiting, do you?

These days my computer’s way to fast. It’s superior speed is a major obstacle to the task at hand: tweaking performance. I need a slow computer.
Continue reading

  1. Sure this motivation and flattering do play an important role. But that another day’s story.

asm.js – A Reduced Instruction Set Makes Javascript Run Faster

The day after I finished my article claiming Javascript to be the new assembler language I heard about the asm.js project. This project strongly supports my previous article’s theory. Even the project’s name alleges asm.js programs are sort of assembler programs.

The goal of asm.js is to define a subset of Javascript that can be executed very efficiently. The idea is to identify expensive operations and to replace them with cheaper ones. But there’s a catch: you don’t really want to write an asm.js program. It looks a little weird, and it’s requires a lot of additional key strokes ordinary Javascript doesn’t require. So why should we bother?
Continue reading

The Future of JavaScript: an Assembler Language?

Behind the scenes, widely unnoticed, there’s an interesting development going on. The role Javascript plays in the computer industry is starting to change. The first time I got aware of the new role was when I learned about GWT. The heart of GWT is a cross-compiler translating Java code to Javascript.

Then I learned about Dart.

Continue reading

The Influence of CPU Caches on Java or Scala Programs

Today, I’ve listened to a pretty astonishing talk held by Jamie Allen. Both Java and Scala have become fast enough that some guys begin to care about CPU caches. The JVM’s JIT compiler does such a good job it’s possible to feel the impact of the CPU’s cache on your Java programming. Sometimes, it pays to rearrange your variables to get a performance boost. Incredible, isn’t it?

Continue reading

Aparapi: Run Java Applications on Your Graphics Accelerator Card

What could you achieve if you had a thousand CPU cores at hand? Actually, you probably have. All you need a dedicated graphics accelerator card. For example, my Radeon™ 6870 GPU has 1120 stream processors and delivers up to 2.000 GFlops of single precision floating point performance. That’s at least 10 times the performance of the i7 CPU in my computer. Wouldn’t it be nice to unleash this power?

Aparapi allows Java programmers to do so. If you’ve got the right type of problem, that is.
Continue reading