Concepts of programming languages

Java Classpath Tales – or: The Mystery of the Empty Beans.xml

BeyondJava on the empty beans.xml and how to retrieve files from the class path. Image source:  published under a CC0 license
Image source: Pixabay
The age of a programming language shows in the clever tricks programmers use. One such trick is used by CDI. Did you ever wonder why you have to put an empty beans.xml into the META-INF directory?

By the way, you may not even aware that the beans.xml is allowed to be empty. Many developers aren’t. You can use the beans.xml to configure a couple of things, leading developers to believe they have to do that. Plus, the IDEs tend to complain about an empty XML file because they consider it ill-formed. As far as I know, Netbeans generates a non-empty beans.xml by default. But still, even if you haven’t seen it yet, it’s allowed to put an empty beans.xml into the META-INF folder of your jar (or the WEB-INF folder of your *.war file). Empty meaning really empty: it’s a file of zero bytes length.

What good is such a weird file?

We’ll answer this question in a minute. Plus, this article shows you how to read arbitrary files hidden in a jar file, or – more generally speaking – somewhere in the classpath. By the way, this is one of the few situations when the Junit test succeeds but the real application fails. That is interesting enough in itself.

Reading resource files from the classpath

Our journey started when we decided to hide our resource files in the classpath of our web application. Maven sort of recommends this approach by providing a directory dedicated for that purpose: src/main/resources. Plus, files hidden in the classpath are protected from unauthorized access. There’s no way a hacker can access this file using an URL.

The simple approach to access such a file is using this.getClass().getResource(String name) or this.getClass().getResourceAsStream(String name). In a Junit test, that’s already the end of the story. The file is returned if it’s in the classpath of the application.

Reality check

However, a real-world application running in an application server looks a bit different. There’s more than one classloader. Accessing the classloader of a class yields precisely the classloader that originally loaded this class. If you’re using an application server, you’d rather use the classloader of the current thread (i.e. Thread.currentThread().getContextClassLoader()). That’s because this is the most recent classloader, and it calls each ancestor if it fails to find the resource file by itself.

That’s (probably) the reason why our JUnit test succeeded to load the resource file. If you’re running a JUnit test, there’s usually only one classloader. But a real JavaEE application has to deal with more than one classloader. So our real-world JavaEE application failed to load the resource file, even though the Junit test indicated everything was ok.

Multiple jar files

Things get even more complicated if there are several jar files. In this case, even Thread.currentThread().getContextClassLoader() won’t find every resource. It’ll fail to find resources hidden in “sister” jar files.

Luckily, there’s an alternative: getClass().getClassLoader().getResources(name). This method yields every URL a particular file or folder is stored in. Putting it the other way round: it returns the URL to the file you’re looking for, no matter which jar file it’s stored in. In most cases, there’s only a single result (if any), but there can be multiple results because we’re talking about multiple jar files. Each jar file can contain the same file in the same folder, such as /beans.xml or /WEB-INF/web.xml.

As soon as you’ve got the URL, you can read the file using one of the methods suggested by

URL url = getClass().getClassLoader().getResources(name);
Path path = Paths.get(url.toURI());       
byte[] fileBytes = Files.readAllBytes(path);

The mystery of the empty beans.xml

Now for the clever trick. Having learned about getResources(), can you imagine why CDI demands you to put an empty file into a particular directory?

When CDI is booting, it calls the method getResources("beans.xml"), returning a list of URLs containing a beans.xml.

A close look at that URL reveals that it’s actually a filename. It’s the name of the jar file containing the beans.xml file.

We’re almost there. Why do we need a list of jar file names?

The missing package reflection API

For some reason unknown, Java doesn’t have an exhaustive package reflection API. Almost every method of the reflection API requires you to know the class you’re interested in. You can’t ask Java to enumerate you the classes in a package. Nor can you ask it to list the packages in a jar file.

However, if you know the URL of the jar file, you can simulate the missing package reflection API. All you have to know is that the jar file is basically a zip file. Unzip it to learn about the folders and files within.

In Java, there’s (mostly) a one-to-one match between the class names and the file name. So knowing the folder name and the file name allows you to access the class file using the reflection API. That’s precisely what the CDI implementations do: they build a list of all files in the classpath and inspect the corresponding classes, looking for annotations like @Named.

It doesn’t matter whether the beans.xml is empty or not. The simulated package reflection API doesn’t care about the content of the file. As long as the file exists, it’s returned by getResources("beans.xml").

Wrapping it up

Truth to tell, I’m not sure whether I’m happy about this clever trick or not. The good news is that it works. The bad news is that there’s still no package reflection API. That’s a bit odd. I guess the language designers consider a package reflection API superfluous because there’s such a clever workaround. Even so, it’s a gap in the standard Java libraries.

Resuming to my initial words, the situation reminds me of the late days of the C64. This home computer was popular for ten years or so. During these ten years, the hardware changed a lot, but the programmer API never changed. It ran always at one Megahertz, it always had 64 kilobytes of memory, and neither the sound controller nor the graphics chip changed. Programmers became very familiar with this computer. Heck, even today I know some of the assembly language opcodes by heart. That familiarity allowed programmers to push the C64 far beyond the original limits imposed by its designers. It almost felt like magic.

Putting an empty file into a jar file in order to simulate the missing package reflection API belongs to the same category.

5 thoughts on “Java Classpath Tales – or: The Mystery of the Empty Beans.xml

  1. Note that Java EE 7 made most deployment descriptors optional. CDI 1.1 changed the default discovery mode from “all” to “annotated”, and made beans.xml optional. An empty beans.xml is equivalent to the original “all” discovery mode so backward compatibility is preserved. I imagine the original intention when requiring the beans.xml was to avoid problems with third party libraries not compatible with CDI, but that’s just an assumption.

  2. No idea to be honest, but the same happened to e.g. web.xml. I suppose they iterate over all files on all classpath locations… a hard, but still possible work to do I guess.

  3. I think it scans all the classes for annotations. There are warnings in console that startup time can be decreased by manually defining which beans need to be registered but I haven’t had time to check out how to do it.

Comments are closed.