Exploring using Protobuf in the browser

I was about to write about Protobuf and GWT but I thought I'd rather start with a more general Protobuf (or Thrift or Avro or…) in the browser. I'll post the Protobuf and GWT thing as a follow up.

In a new webapp project at work, our client wants to handle scalability by moving some parts of the app to other machines instead of clustering; i.e. the server is composed of modules that could be distributed on several machines. Back in september I started to look around for protocols and solutions to help us implement it, and found Protocol Buffers and Thrift, among others. As the project now becomes more concrete, I've started revisiting my previous researches. I found out that Thrift now has an official release and that protobuf now has an (experimental) generator plugin mechanism. This last thing, along with parallel reseaches on Comet, which made me think about Wave and remember that it was said to use protobuf, made me wonder whether and how it could be used in the browser. My Comet researches also lead me to Closure Library, which has a goog.proto2 namespace, confirming using protobuf on the browser isn't totally dumb. And finally last week I was reminded of gwt-rpc-plus, which happens to have something about Thrift.

The context having been exposed, let's discuss it, in the form of Q&A's.

Q: Isn't protobuf about serializing objects to a compact binary format? (which you couldn't read from JavaScript)

A: Yes, but not only. Protobuf is 2 things: an IDL with a code generator, and a compact format for object serialization.

Q: But then which serialization frmat would you use?

A: the most efficient on the browser-side certainly is JSON. That's also what Closure Library uses, in different flavors (an array indexed by the field tags, and object keyed by the field tags or an object keyed by the field names).

Q: OK, so if it's not about the binary format, it's about the IDL; but why not go with JSON Schema then?

A: JSON Schema sure would be an alternative, but my first goal was to use protobuf (or thrift or…) for server-to-server communications, so I'd rather use the same technology everywhere. Add to this that to my knowledge there's no existing tool to generate Java classes from a JSON Schema (Jackson can generate a JSON Schema from an annotated Java class, but given the client-side will be GWT, i.e. Java, there would be no point generating such a schema).

Q: Er, if you're using GWT and server-side Java, then why not just use GWT-RPC?

A: GWT-RPC by default tightly couples the client and the server, as it passes around a checksum for each serialized class based on the class name, its parents classes name and all their fields name. This means that as soon as you add or remove a field from a class, you have to redeploy both the client and server side or they will fail to communicate. GWT-RPC also enforces primitive type equivalence between the client and server: if you changed a byte into an int on one side, the other side will error if it receives a value that cannot fit into the expected byte type. Protobuf is all about allowing you to update the messages (to a certain extent) without having to redeploy all involved parties. This is particularly true for high-availability services (such as Google Apps) where you'll upgrade servers in the cluster one at a time and/or upgrading the server while the client is still loaded in clients' browsers and has to be able to communicate with the upgraded server without error or reload. In our case, high availability isn't a high goal, but I think it might help during developments and particularly help focusing on "protocols" (including how much data goes on the wire) rather method calls (particularly true when you start mixing ORM with lazy-loading on the server-side; you don't really think about it on the database communications' side, but have to keep it in mind when it comes to browser-server communications; and it's even worse when you pass JDO/JPA entities in GWT-RPC!)

Q: OK, OK, but that's a GWT-RPC issue. You said this post wasn't specifically about GWT but more generally about protobuf in the browser, that you'd use JSON as a serialization, and that Closure Library has goog.proto2 support. But Closure is JavaScript so why isn't it just using JSON (on the client side at least)?

A: With plain JSON, you have to handle default values in your code each time you retrieve a property. I suppose there's also a rule within Google to sometimes change the name of a field, such as adding an "OBSOLETE_" prefix when obsoleting a field. This works with protobuf because the binary format only deals with field tags. In JSON though you'd likely use the field name as the key in an object, so you'd have to "break the rule" and not rename fields, or code defensively on client side (looking into both "foo" and OBSOLETE_foo", still fragile though as the developer could have made a typo and named the field OBSOLTE_foo), or serialize using field tags as the keys. In all those cases, either the code is fragile, depending only on some informal rule, or it becomes less readable and harder to maintain. Client-side developers would probably start writing functions to retrieve field values, then eventually wrap the decoded JSON object into a higher-level object so they can turn their functions into methods, etc. Which is exactly what goog.proto2.Message provides, and where a code generator becomes very helpful. (the next step is to find that your JSON object's property names adds bytes to be transmitted on the network and switch to a lighter representation, such as PB-Lite or using field tags as keys, now that you have wrapper objects to access field values.)

Q: Let's say you convinced me. Now how about services you can declare in a .proto file?

A: Closure Library (the only JavaScript library I know of that deals with protobuf) doesn't seem to provide anything. With GWT I'm expecting to reuse the GWT-RPC APIs or even infrastructure and wire protocol. More in the followup post about Protobuf and GWT…