Maven is broken by design

The other day, someone asked about the status of the GWT Mavenization, saying he loves Maven and would like to help. I replied that “I used to really like it” but wasn’t “so sure nowadays.” It obviously was followed by “I know it could cause issues if not used in the proper way […] Do you mind telling me why you don’t love it anymore?” I’ve ranted a bit already on Twitter, on blog comments, and in my last post about Buck, so here’s a digest of it all so I can link to it the next time I’m asked about my thoughts on Maven.

Disclaimer: I’m writing this while on sick leave, taken by dizziness and other niceties.

Maven’s model is mutable

A few weeks ago, in reaction to Tesla Polyglot bringing a Scala DSL instead of XML to describe your project, Arnaud Héritier tweeted:

@emmanuelbernard @lescastcodeurs :-) I'm not against a simplest /less verbose config file (json for example), but I'm against a dev language

— Arnaud Héritier (@aheritier) September 3, 2013

To which I replied:

@aheritier @emmanuelbernard I initially was too (after looking at Rake, Buildr, Gradle and SBT), but Buck made me change my mind.

— Thomas Broyer (@tbroyer) September 3, 2013

@aheritier @emmanuelbernard …particularly when considering how Maven's supposedly declarative approach is actually so imperative…

— Thomas Broyer (@tbroyer) September 3, 2013

Think about this not-uncommon scenario: a pom.xml can only list a single source folder. If you have more than one (for whichever reason), you have to use the build-helper-maven-plugin to dynamically add it at some phase of the build earlier to where it’ll be used (generally by the maven-compiler-plugin at the compile phase, so you’d add the source folder at the generate-sources file for instance). Now imagine you’re building an IDE and have to import such a project: to discover the existence of the second source folder, you have to either:

M2Eclipse uses the former, while IntelliJ IDEA seems to be using a mix of the same and heuristics (but hardcodes the build-helper-maven-plugin and doesn’t seem to be pluggable).

Conversely, Gradle, for instance, has an immutable model. The project model is built first, and hooks are provided for plugins to dynamically augment it, then it’s frozen and the build can be executed. This allows IDEs to inspect the project’s model without duplicating work, without executing (part of) the build, and without heuristics.
And yet, Gradle projects are described using a “dev language” (to reuse Arnaud’s words). This is because that code doesn’t build anything, but rather constructs a representation of the project in memory.

Surprisingly (or not), Maven has such a model (which BTW is the basis of Tesla Polyglot); what it lacks is a clean distinction between constructing that description and then using it to build the project. I don’t think this can be fixed without breaking backwards compatibility with almost all plugins out there that generate sources or resources.

Incremental builds are either inexistent or broken

Maven’s incremental builds state of affair is rather bad.

First, it has to be done by each plugin. This gives more flexibility to the plugins, but once again makes it impossible to infer the inputs and outputs from the outside to build the project model in an IDE. Again, Gradle’s approach looks better, and I’m not even talking about Buck, which now even has a build cache (inputs are hashed and the output is stored in a cache – local or shared – with that hash as the key; when you run the build again, the cache is checked first, and because it can be shared on the network, everyone in the team can benefit from others’ builds!)

Then, Maven’s approach is that many plugins write to the same output directory, making it impossible to accurately check for staleness (if a first plugin writes to a file, and a second plugin overwrites it, it really doesn’t matter whether the sources for the first plugin are stale or not, as the result would be overwritten anyway).

Finally, due to the above rule, staleness is too often managed at a too fine-grain level (file-level), which can lead to build too few cases: if class B uses class A, and class A changes, the maven-compiler-plugin will recompile class A but won’t recompile class B leading to possible errors are runtime rather than compile-time. I’m not even talking about deleted or renamed files (if you rename A into C, the A.class won’t be deleted unless you mvn clean) or annotation processors (even if deleted files were tracked, tracking files generated by annotation processors would be harder).

AFAICT, most other build tools suffer from the same issues (Ant, SCons, etc.) Buck is a notable exception here, but is probably not alone.

It could possibly be fixed, but would have to be done in each and every plugin, so it’s not going to happen, and it would probably be too fragile to be trusted anyway. So let’s just say it’s not fixable.

As far as the maven-compiler-plugin is concerned, JDK 8 introduces a new tool, jdeps, that can give you the class dependencies.

Could be used to make a true incremental Java build tool (except maybe when annotation processors come in the play)

— Thomas Broyer (@tbroyer) September 10, 2013

The maven-compiler-plugin could then build a graph of the dependencies and rebuild B whenever A changed.

@tbroyer Now the question is: would it be worth it? I'd bet no. This is not what's slowing our builds.

— Thomas Broyer (@tbroyer) September 10, 2013

Is it really worth keeping track of all those things and try to only recompile the few things that have changed (and the things that depend on it, transitively) when compiling is not the most time-consuming task in a build? (note that you’d first have to fix the handling of deleted sources, etc.) Everything’s so much easier if you treat compilation as an atomic (non-incremental) task, the way Buck does.

Reactor builds are half-baked

Most other issues with Maven are related to reactor builds, aka multi-module projects. The problems with reactor builds are many, but basically stem from 2 design decisions: linear build lifecycle and snapshots.

Here are some use cases that Maven cannot handle without compromising the reproducibility of your build:

For all these cases, the only (somewhat) reliable way is to mvn install the modules you don’t want to rebuild constantly and then run everything in offline module, to make sure the snapshots you just installed won’t be overwritten by ones coming from your repo manager, provisioned by your CI server. This poses other problems, as some plugins don’t support running offline.

This is exacerbated by DVCS and their lightweight local branches: each time you switch branch you have to rebuild everything from scratch if you want to make sure you’re not mixing things from the previously checked-out branch.

As Lex Spoon already pointed this out last year in his Recursive Maven considered harmful piece.

One other issue is the linear build lifecycle: mvn test won’t package anything, downstream modules will use the target/classes from the modules they depend on. But what if one of the module dependencies contains an annotation processor? This is where you start to use mvn package and mvn package -DskipTests everytime you want to test or compile something.

Finally, linked to the above two issues, is one with the command line: back to the second use case above (the first one could be used too, but it’s not much about the build). Let’s say I have a feature-branch cut from master a couple days ago. I’m working on module C, which depends on A and B, but I don’t really need to test A and B as the CI server said they were OK at the time I cut the branch; I just want to use them, in their state corresponding to the revision I checked out. I do want to run tests for the C module that I’m working on though. You simply cannot tell Maven to package everything up to (and excluding) C, without running the tests, and then compile (and maybe package) and run tests for the C module. No, you only have two choices:

In any case, you’re asking Maven to do too much work. Combined with Maven’s tendency to already do too much work (see incremental builds above), this is a real productivity killer.

AFAICT, Maven didn’t initially have multi-module support, and this was first contributed as a plugin before being builtin. That would explain why reactors and the linear lifecycle don’t play well together, but it’s not an excuse.

And again, I don’t think this can be fixed without breaking a whole lot of builds out there; this is just how Maven works and has been designed: broken by design.

POMs have two uses

The POM in a Maven project has two uses: it describes how to build the project, and how to use the artifacts it produced. The main thing in common is the list of dependencies, and Maven’s scopes are too limiting: there’s no “this is only need at compile-time” scope (you’d use an optional dependency, or the provided scope), and test dependencies are not transitive (in the sense that if you need to expose a testing framework, you cannot just say “use the test artifact”, as you wouldn’t have the transitive dependencies; you have to split the tests in a separate module, which generally means you’ll also move some tests out of the module they’re supposed to test; this can be seen as a limitation of the “one artifact per module” rule too)

Do we really need to share in Central the recipes we used to build our artifacts? Who benefits from it? Compare that to how much it weighs.

I’ll succumb to the siren call of version ranges and add that artifact metadata in repositories should be updatable in some ways: what if you could update your old artifact to explicitly say that it’s not compatible with some new version of one of its dependencies? And how about adding information about known security vulnerabilities? Making the artifact as deprecated if you’ve made an “emergency release” to fix a bug?

This isn’t going to happen, as it would break everyone’s expectations that a released version is immutable. (One would have to explain to me why Maven tracks the source of the downloaded artifacts then: if you don’t trust your repositories for providing “the right artifacts”, then you probably have a bigger problem than what Maven is trying to work around here)

The project layout isn’t human- or even tooling-friendly

Let’s finish with a small additional rant. Maven has standardized the src/{main,test}/{java,resources} layout for projects. It was made with the best of intentions (everyone always had excludes="**/*.java", so segregating those files into separate folders seemed like a good idea), but it really comes in the way of the developers, and tooling isn’t there:

Repositories are all checked for all dependencies

You cannot tell Maven to download some dependency from some repository and some other dependency from some other repository. No, Maven will instead always check all the listed repositories, including (quite obviously) those from POMs of your dependencies and their transitive dependencies, i.e. things you don’t really have a hand in. This can become a real pain when one of those repositories is down (temporarily or permanently) as Maven will keep checking it, slowing your builds even more than they already (artificially) are.

The answer from the Maven developers and community is to set up a repository manager in your local network to serve as a proxy and never ever configure any repository in your POMs. This is nothing more than a workaround though, almost looking like they’re apologizing for their broken tool (hey, but you get some nice features out of that new tool!). For reference, Ivy recommends such a setup for enterprises, but for completely different reasons (that are also solved by a Maven repository manager BTW). An Ivy enterprise repository is also an entirely different beast than a Maven repository manager: it’s not a new piece of software, it can be a network filesystem, a WebDAV server (or simply an HTTP server if it should be read-only), or a remote filesystem accessible via SFTP or SSH; and you have to explicitly publish to it (it’s not a proxy to public repositories).

It’s becoming even worse if you have a laptop: you’ll have to switch your settings.xml depending on whether you’re at work (and want to use your enterprise repo manager) or at home or on the go (and want to use public repos). Most of the time, you won’t work on the same projects in those different places, so the settings you’ll need are per project, but Maven doesn’t let you do it. It’s an all or nothing. And guess what the Maven community answer to this issue is? Install a repo manager on your laptop to proxy all those repositories! (including your enterprise repo manager that already proxies most of them, but just isn’t always available as you move), and of course pay twice the price for storing all those artifacts (the repo manager cache, and your local repository).

This can probably be fixed by enhancing the POM (and current Maven versions would reject it), or possibly using a plugin/extension to swap artifact resolvers. The bad news is that Sonatype tries to get everyone to standardize on Aether for resolving artifacts, and I believe that the issue is there, rather than in Maven proper (haven’t checked though).

Maven is broken, it’s by design, and unfixable

Maven hasn’t been initially designed with incremental builds and multi-module projects in mind. The implications in Maven’s way of working are so deep that it’s impossible to fix it without breaking almost everyone. Unfixable and unfixable.

Similarly, and as Lex said, SNAPSHOTs are harmful for builds (they can be useful as dependencies in other projects though, I don’t deny it), but unfortunately you have to use them or keep rebuilding and retesting things you shouldn’t need to. Because the outcome from a build cannot be accurately predicted, and because of the linear lifecycle where you’re not forced to package your artifacts, it’s hard to impossible to add a build cache to Maven. Unfixable.

And we’ve seen that Maven’s internal representation of the project and its lifecycle isn’t IDE-friendly (among other things), and that cannot be fixed either without breaking almost everyone. Unfixable.

The standardized project layout isn’t much of an improvement relative to the already common src and test source roots. This is of course fixable (Maven can be configured this way), but we’re talking about “convention over configuration” here. Unfixable.

The POM as used in repositories is too verbose for its intended use, and could be vastly improved. Slimming it down would be possible, but enhancing it by making it no longer immutable would break everyone. Unfixable.

And repository management is, er, dumb? plain broken? Unfixable.

So what’s left? Er… not much I’d say… Good intentions? The road to hell is paved with good intentions.

And the worst part: it’s not just Maven, as almost everyone jumped into the bandwagon (mostly for compatibility’s sake, at least I hope so), and people keep advertizing Maven as the One True Way™ despite all its issues (sometimes not even denying them, sometimes because they don’t know what they’re missing, and other times just suffering from the Stockholm syndrom).

All that said…

…I’d love to be proved wrong, given Maven’s market share. From now on, I’ll invest some time trying out Gradle on a real project to better understand how it works and what its pitfalls are. Buck+Ivy+scripts are still an appealing combo, in case nothing else works the way it should, particularly as Buck is going to get plugins to contribute new tasks (rather than hack around macros and genrules).

Discuss: Google+