Most build tools misuse javac

Over the past couple years, I've been looking more closely to the build tools I use and trying to imagine what the ultimate build tool would look like. Doing that, I regularly stumble on mistakes in one tool that seem to be copied over most (if not all) of them like cargo cult. Recently, I've had gripes with javac, or more accurately how it's used by build tools, and started investigating further. But first, let's see how javac works.

EDIT(2015-03-08): the source path issue has been fixed in Buck and Gradle.

How does javac work?

javac takes as input a boot class path, class path, source path, processor path, extension dirs, list of source files to compile, and list of class files to run annotation processors on.

Let's ignore annotation processing and cross-compilation for now.

You can then control the behavior of javac with several options. Let's look at a few of them:

What's interesting, and counter-intuitive, is that javac, when looking for information on types referenced by the input source files, could implicitly compile other source files, looked up in the source path. And even less intuitive is that javac will look for both a compiled class (in the class path) and source file (in the source path) for each type, and will by default prefer the newer when both are found. You can control this behavior using the -⁠Xprefer:source and -⁠Xprefer:newer options. Note that there's no -⁠Xprefer:class, and this is where many build tools start to get things wrong. Finally, when such types are implicitly compiled, class files are generated by default; you can control that with the -⁠implicit:class and -⁠implicit:none options (-⁠implicit:class being the default behavior).

So, javac isn't without flaws, and as I said, this is where build tools start to get things wrong. To fully understand how and why, let's look at how those tools work and what their expectations are.

How do build tools work?

First, I'll only talk about modern and high-level build tools that are Maven, Gradle and Buck. I know there are many others around (Buildr, Pants, etc. even Ant would be worth the look), I just haven't taken the time to look at how they deal with it.

Maven and Gradle use similar project layouts: each project has its own folder where all sources are put and generally compiled as a whole (in at least two phases actually: tests are compiled separately). You can however, with a bit more work, partition source files using globs in includes and excludes patterns. And of course each project declares its external dependencies, with different scopes depending on where they'll be used (compile-time, runtime, tests, etc.) The result of the compilation goes to a specific directory: Maven uses the same output directory as the one it also copies resources, whereas Gradle uses a specific output directory for each task (which greatly helps for incremental builds, but that's not today's subject.)

With Buck however, one (or two) giant source tree is split into several compilation tasks, generally per package, but not necessarily (one task could include subpackages except a few ones, you can use includes/excludes patterns to partition a package, which is sometimes used to put both production classes and tests in the same source tree, etc.) FWIW, resources also live in that same source tree. And just like Gradle, Buck uses a specific output directory for each compilation task.

Generally speaking, all three tools have a well-defined list of inputs (both dependencies and source files) and an output directory (remember that -⁠d option of javac we've talked about above?) Knowing that, you would think they should all three work in quite similar ways? You'd be wrong. Let's see how, but first let's define how we'd expect those tools to work with javac.

How do I expect build tools to use javac?

Let's get the easy things first: dependencies go in the class path, the output directory is passed as -d, and input source files are passed each one as an argument.

As seen above, the -⁠encoding should also always be explicit (and I suggest using UTF-8 by default).

If you only do that though, you risk compiling source files implicitly loaded from the classpath. It has happened before (and remember that there's no -⁠Xprefer:class, and no way to prevent javac from implicitly loading source files). So you need to pass an explicit source path.

What should be put in source path? If you use your source roots, then you'll risk you includes/excludes patterns to not be respected. It has happened before. I propose that source path be empty (it's as easy as using -⁠sourcepath : or -⁠sourcepath "".)

As seen above, -⁠target (and -⁠source, which controls -⁠target) should never be used without setting the boot class path and extension dirs. Even a tool like animal sniffer, doesn't guarantee your code will run without issues. What build tools should do is let you easily require a minimum JDK version for development, and a specific version for releases (unless you properly setup cross-compilation by also setting the boot class path and extension dirs), and use animal sniffer to make sure you at least don't call APIs from a newer Java version.

Now that we know how to use javac, and how we'd want our build tools to use it, we can starting looking at what they actually do, and specifically what they get wrong.

How do build tools misuse javac then?

What does Maven do wrong?

I'll start with Maven.

The maven-compiler-plugin has default values for -⁠source and -⁠target, with very oldish defaults (1.5 in the latest version released two years ago, but it was 1.4 not so long ago ; EDIT(2022-01-12): was bumped to 1.6 in 2018, and just updated to 1.7 in early 2022). This more or less forces you to redefine the values in every project. You can enforce a JDK version using the maven-enforcer-plugin's requireJavaVersion rule, and using profiles you could easily have different things for development and releases; and animal sniffer is designed with Maven as the first consumer. Maven also supports toolchains for quite a while, which let's you use a specific JDK version different from the one you run Maven with. It's rather limited and incomplete though (the toolchain applies to the whole lifecycle, not just to one plugin, for example; and developers have to create the appropriate toolchain definitions on their machine to be able to build a project making use of them.)

The maven-compiler-plugin (or is it the plexus-compiler?) sets the source roots as the source path, which we've seen above is wrong as soon as you use includes/excludes patterns. Strangely, that was fixed long ago, but was then reintroduced (for bad reasons) a few years later, and reported again as a bug since then. I made a repro case if you want to try it: https://gist.github.com/tbroyer/d3ddd1851beeff5868cc

Beware too that the maven-compiler-plugin 3.2 also has an issue when used with annotation processing: it adds the generated sources output directory to the source path (more accurately to its source roots, which it uses as the source path), causing compilation errors.

Finally, maven-compiler-plugin 3.x uses a new approach to incremental builds that recompiles all source files as soon as it detects one changed file (it previously only recompiled the changed sources, which causes issues when you change an API that's used by other, untouched, classes), but it apparently still tries to somehow match .java and .class files to each others, which won't work for package-info.java classes, that don't always generate a package-info.class by default. maven-compiler-plugin should use -⁠Xpkginfo:always, but it only works starting with Java 7, and Maven has committed to support very old JDKs…

What does Gradle do wrong?

Gradle doesn't pass a -⁠source and -⁠target by default, which is good, but it lets you easily set them. It however lets you easily set the boot class path and extension dirs too, so it's not so bad. Gradle is gaining support for toolchains similar to Maven, but it's not usable yet. It'll likely be more flexible than Maven's take on it, but I fear the toolchain definitions will be defined right in the build script, despite not being portable across developer machines, particularly when not using the same operating system (each developer will probably have different paths to the JDKs.)
You can quite easily enforce a particular JDK version using snippets like the following in your build script:

assert JavaVersion.current().isJava8Compatible()

or

assert JavaVersion.current() == JavaVersion.VERSION_1_8

And there's a plugin for animal sniffer.

Gradle doesn't pass a source path, which we've seen is wrong. Here's a small repro case if you want: https://gist.github.com/tbroyer/d8174f5eb99bdb7f291b and I've reported the issue. Edit(2015-03-08,2015-05-14): Gradle has been fixed; the fix shipped in 2.4.

No -⁠encoding by default, but easy to configure.

What does Buck do wrong?

Buck always passes a -⁠source and -⁠target (defaults to 1.7 though). It lets you define the boot class path (even though it's undocumented) but not extension dirs. You can however do that easily thanks to extra_arguments. Note that this configuration is global to your project. There's no way to plug animal sniffer or other similar tools too, and no way to enforce a JDK version (so by default, if you use a JDK 8, you'll risk producing classes that won't run with Java 7 despite the default target_level=7 configuration.) I suppose you could do something using a gen_rule, though that'd be a hack if you ask me.

Like Gradle, Buck doesn't pass a source path. Here's a small repro case: https://gist.github.com/tbroyer/512941cd798e1ccba4b4 and I've reported the issue. Edit(2015-03-06): Buck has been fixed.

No -⁠encoding either, Buck assumes your environment is already UTF-8 (which is a not-so-wrong assumption), but it can be passed using extra_arguments if you prefer being explicit.

Conclusion

As we've seen, all three build tools studied above misuse javac, each in a different way. Fortunately, there are workarounds for most flaws (if not all). I'd be happy to get feedback on how other build tools behave.

What's next?

I've looked a bit at Pants and it uses JMake under the hood, with a customized in-process javac call that tracks the mapping between inputs and outputs (to the extent that javac provides the information). Contrary to all three build tools above, JMake will try to make a truly incremental build, recompiling only the changed classes and all those that depend on them (JMake even claims that it parses the code and looks at changed APIs and which classes call them). I haven't yet investigated how JMake actually calls javac, and Pants is currently not easy to setup (at least according to its documentaton.)

I'll probably have a quick look at Ant too. It's a bit low-level but provides some high-level features for incremental builds. I suppose however that whether your build is broken or not would depend a lot on how you use Ant, not just how Ant uses javac.

Also, Takari provide a custom Maven lifecycle replacing most (if not all) standard plugins with their own. I haven't yet looked at how they call javac, and how they use ECJ/JDT for incremental compilation (looks like they could use JMake, and possibly the custom Twitter Compiler, to bring their incremental compilation to javac, and get annotation processing working at the same time.)