Reverse-engineering J2CL–Bazel integration

J2CL is a tool by Google that transpiles Java code to Closure-compatible JavaScript. It's been started in 2015 to eventually replace GWT at Google, for various reasons explained in the project README, and will be used as the basis for GWT 3. As a Google project, it's deeply integrated with their build tool: Bazel. We'll look at how it all works, with the goal to create an equivalent Gradle plugin.

This reverse-engineering work has already been done several times, I had done it a couple times a few years ago, but I don't think it's ever be documented though, so here it is.

EDIT(2020-07-28): j2cl-maven-plugin now serves tests using a web server.

Starting points

From a user point of view, the main starting points are the Bazel rules j2cl_library and j2cl_application.

The j2cl_application rule is actually just a macro around the rules_closure rules. It takes as input closure_js_library and j2cl_library dependencies, and entry point Closure namespaces (that must be present in the dependencies), and produces optimized JavaScript through a closure_js_binary rule, with a handful configuration options. It also generates a web_library rule for running your code during development, we'll come back to it later.

As we just saw, a j2cl_application doesn't directly have sources, but takes them as dependencies. All the code of your application would thus actually be in a j2cl_library.

A j2cl_library rule is what actually translates your Java source to JS. It takes Java source files as input and produces a JAR of compiled Java classes, and a ZIP of the transpiled JS (known as a JSZip).

Finally, there are rules for tests (we'll get to them later on), and for importing third-party libraries, including from Maven repositories.

j2cl_library

Let's look closer at what a j2cl_library does.

A j2cl_library is somehow both a java_library and a closure_js_library. It takes as input a set of Java and JS source files, along with dependencies that can be either closure_js_library or other j2cl_library rules. Note that in Bazel, Java code that's also used in the server for example will also be used by a java_library.

A j2cl_library will start by removing all the code from the Java source files that's annotated with @GwtIncompatible (technically, it replaces that code with spaces, such that line numbers are preserved). This stripped code will then be compiled with javac; this step could also produce new Java sources through annotation processing (technically, it could even generate JS files). The Java and JS source files, and the ones generated during compilation, will then be passed to J2CL to generate a Closure JS library.

To transpile Java code, J2CL needs the dependencies as compiled Java classes to resolve the Java APIs (specifically the method overloads, implicit casts, or even type inference through Java 10's var keyword); those need to be the stripped variants of the dependencies, which is why j2cl_library rules depend on other j2cl_library rules, and not on standard java_library rules. J2CL also resolves .native.js files sibling to the .java files, that generally contains JS code implementing native methods (somehow equivalent to GWT's JSNI), and concatenates their content into the generated .js files. It should be noted that the javac step replaces the bootstrap classpath with the Java Runtime Emulation library (which is almost-100% shared with GWT).

So, a j2cl_library outputs both a JAR like a java_library, except the sources have been stripped of any @GwtIncompatible code first, and a ZIP of JS files like a closure_js_library.

Technically, in Bazel, a j2cl_library, just like a closure_js_library, also type-checks the JS (this is probably a bit redundant in J2CL's case, though thereare the .native.js files too), and outputs metadata files about the library to help speed up downstream Closure JS libraries (by not rebuilding / re-type-checking them when their upstream API hasn't changed; this is similar to how Bazel, and Gradle, extract an API-only JAR/info from Java classes) and pass diagnostic suppressions down to the Closure compilation. This is quite specific to Bazel and the rules_closure, though could possibly be implemented in Gradle as well (the way Gradle extracts and checks the API-only info from Java classes is entirely different from how Bazel does it though: Gradle apparently fingerprinting a whole classpath of a Java compilation, whereas Bazel generates an iJAR for each java_library and then compares those files checksums as any other input).

As we saw, a j2cl_library leverages three tools:

And last, but not least, those tools are all run in persistent worker processes, so you don't have to spawn a JVM every time (which with Bazel's design of having rules almost for each Java package, would mean a lot of times).

Importing JARs

There are three Bazel rules for importing JARs in J2CL.

The j2cl_import rule is a simple bridge for when you need a j2cl_library to depend on a java_library; this should only be used for annotation-only JARs though (J2CL doesn't need the sources for annotations, and they don't generate JS code; they're only useful to trigger annotation processors, or configure static analysis tools, such as ErrorProne).

The j2cl_import_external rule takes a set of alternative URLs to a JAR (and its SHA256 checksum), and generates either a j2cl_import for annotation-only JARs, or a j2cl_library. In the latter case, the JAR should then contain Java sources. It is actually expected to be GWT library, with super-sources in super subpackages. The j2cl_library will then use the super-sources (and ignore the equivalent super-sourced files, that is then GWT-incompatible), and also ignore any *_CustomFieldSerializer* file (as Google doesn't use GWT-RPC anymore, and thus didn't port it to J2CL).

Finally, the j2cl_maven_import_external rule is a wrapper around j2cl_import_external simply generating the URLs from Maven coordinates and a set of Maven repository URLs. It should be noted that this macro uses the sources JAR, i.e. it replaces the classifier and packaging with sources and jar respectively.

Yes, you understand it right: those last two rules will only consider source files in the JARs (unless they are annotation-only JARs) and will therefore strip their @GwtIncompatible code and (re)compile them with javac, before finally translating them to JS. It is thus expected that the JARs either include all the source code (most importantly including the sources generated by annotation processing), or that, possibly, their dependencies are configured in such a way that the javac step will be able to process the annotations and generate the missing files. GWT libraries would fall in the former bucket, but their sources JAR as deployed in Maven repositories might not, so when reproducing this outside Bazel, we'd possibly make different choices.

Actually, if you look at the J2CL repository, you'll see that it has to use a different technique when importing the jbox2d library, as that one puts super-sources in a gwtemul subpackage (and includes GWT-incompatible code in another package). In this specific case, it grabs the sources from GitHub, filters out the super-sourced or J2CL-incompatible files, and then declare a j2cl_library for those source files (therefore really rebuilding the library from sources).

j2cl_application

As we briefly saw above, this macro is actually only a (rather simple) helper to generate Closure rules, and you could actually just use the Closure rules directly. This means that this rule is actually not concerned at all with J2CL, except for the few configuration options it passes to the Closure compiler.

The inputs to the Closure compiler are the transitive closure of all the closure_js_library and the JS output of the j2cl_library dependencies. This is the only place where the JS output of the j2cl_library rules are used; when a j2cl_library depends on another j2cl_library, as we saw above, it only uses its JAR output.

It's interesting to look at how users woud run and debug their code though: the j2cl_application will generate a JS non-optimized version (mostly as a closure_js_binary with different configuration), and an HTML file that to load it, and will launch a development server to serve them all. The HTML page and dev server are also ibazel and livereload aware, such that if run through ibazel the page will livereload whenever a source file is changed.

There are notes in the code about applications with custom dev servers, one could imagine servers that guard the page behind authentication, or need to somehow inject dynamic things into the page. This is left undocumented for now though. In any case, for those who know how GWT's super dev mode works, this is much different, much more like parcel serve than parcel watch for those who know JS development with Parcel, and without hot module replacement (HMR); though we do not really know how GWT development works at Google with Bazel, the rules_gwt being developped and maintained by a Googler whose not in the GWT team.

Tests

Tests are defined by j2cl_test rules, whose implementation has not actually been open sourced (yet). Those rules can be generated through a gen_j2cl_tests rule, where we can learn a bit more about it, though we'll actually learn more by looking at how it's used in J2CL's own tests.

Each j2cl_test rule corresponds to only one test class, which can be test case or a test suite. It should be noted that J2CL supports both JUnit 3 and JUnit 4 tests cases, but only supports JUnit 4 test suites.

Tests can be run in two flavors: compiled or not; I suppose it means whether to use optimized or unoptimized Closure output. And they can be run in several specific browsers, or, I suppose, against a set of globally-defined ones.

In the J2CL Git repository, we can also find an unused j2cl_generate_jsunit_suite macro, which is probably used internally at Google by the j2cl_test rule. It takes as input a test class name, generates a dummy Java file that references it in a @J2clTestInput annotation, and compiles it with an annotation processor that will generate support files. So, the dummy Java file is only used to trigger the annotation processor, but is otherwise completely useless. The processor will generate a test_summary.json file describing the tests, a JS file for each test case, and a Java file that needs to be processed by J2CL and is used by the JS files. Compilation is done through a j2cl_library, so the JS files actually have a .testsuite extension such that they're not picked up by J2CL. Technically, the dummy Java file could be processed using javac -proc:only, and then the generate Java file processed by J2CL without needing to be compiled to a Java class first. In the Bazel macro, the output of the j2cl_library is then processed to generate a ZIP containing only JS (with .testsuite renamed to .js) and the test_summary.json. The jsunit_test referenced in the macro is probably similar to the closure_js_test rule from rules_closure, but we can't really know.

To actually learn more about how testing works with J2CL, we have to look at the ongoing j2cl-maven-plugin effort, whose dev team asked Google how to actually do it.

In the Maven plugin, test cases (or test suites) are expected to be annotated with @J2clTestInput directly referencing the annotated class (whereas in Bazel, the annotation is actually entirely kept as an implementation detail). The plugin will itself compile the classes (despite them already having been compiled by the maven-compiler-plugin), because is will also, as Bazel does, first strip @GwtIncompatible code, and add the annotation processor to the compilation process. It will then read the generated test_summary.json and, for each test, copy the generated .testsuite file to .js, run the Closure compiler on it, similar to a j2cl_application with that script as the entry point, generate a simple HTML file loading the resulting script, and finally load that page in a browser to actually run the tests. To detect that the tests have finished running, the plugin will poll the page for specific JS state; this is actually the same as the phantomjs_test plumbing in rules_closure. AFAICT, the main difference from rules_closure is that the HTML page is loaded as a file:// URL in the Maven plugin, whereas it's actually served through HTTP in the rules_closure test harness, and this can actually have real consequences when dealing with cookies, resources, or HTTP requests.
EDIT(2020-07-28): the Maven plugin now uses an HTTP server.

Closing thoughts

Gradle should have all that's needed to make it performant to build a similar tooling as the J2CL–Bazel integration described here: built-in up-to-date checks for performant incremental builds, variants (JS vs Java classes for a J2CL library), worker processes for best performance, artifact transforms for J2CL'ing external dependencies (with built-in caching), continuous builds for rerunning the whole build pipeline on file change, etc.

I'll try to design such a Gradle plugin in a following post. Stay tuned!

Discuss: Dev.to