A few weeks ago in a little chit-chat with our technical lead, I heard about a new form of serializing structured data for use in data storage or communications protocols from Google. You may have heard about it — it’s called protocol buffers.

Protocol Buffers, wha — ? Basically, it’s a platform independent, language-neutral, extensible approach to serializing structured data.

think XML, but smaller, faster, and simpler.

I won’t go into the past to dig up the equivalents of protocol buffers since there seems to be only XML. I don’t even add JSON as an equivalent since it’s only a data format. It’s safe to say you have probably heard of these two.

Protocol buffers have many advantages over XML for serializing structured data.

Protocol buffers: are simpler, 3 to 10 times smaller, 20 to 100 times faster, less ambiguous, generate data access classes that are easier to use programmatically

I have a reason to not like XML, and it is basically my history with SOAP, and love for REST — even if they do not share the same purpose at the end.

But, to make you a little bit more interested, let’s have a look at the difference:

In XML,

<employee>
    <name>Jane Doe</name>
    <email>jdoe@loodos.com</email>
    <position>Mobile Application Developer</position>
</employee>

The corresponding protocol buffer message (in protocol buffer text format):

# This is *not* the binary format used on the wire.
# The text format below is just a human-readable representation for debugging and editing
employee {
  name: "Jane Doe"
  email: "jdoe@example.com"
  position: "Mobile Application Developer"
}

In a manner of parsing these two; the binary format of protocol buffer message would take around 100–200 nanoseconds, whereas the XML version would take around 5,000–10,000 nanoseconds to parse.

Also, I can say protocol buffer is much easier to manipulate:

System.out.println("Name: " + employee.name());

Since you have to do something like below for XML:

System.out.println("Name: " + employee.getElementsByTagName("name").item(0).getTextContent());

Let’s stop picking on the old but gold XML, as well as the question of ‘which should we go with’.

The example I’m going to use will be in Go and is a “employee info” application which can write and read the basic information of an employee. Each of them has a name, ID, position, e-mail, address, and a phone number.

syntax = "proto2";
// there's also "proto3"

package employee;

message Employee {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;
  optional string position = 4;

  enum PhoneType {
    WORK = 0;
    MOBILE = 1;
  }

  message PhoneNumber {
    required string number = 1;
    optional PhoneType type = 2 [default = WORK];
  }

  repeated PhoneNumber phones = 3;
}

message EmployeeInfo {
  repeated Employee employees = 1;
}

Let’s look at the modifiers in above sample:

required: value for the field must be provided, otherwise the message will be considered “uninitialized”, and trying to build an uninitialized message will throw a RuntimeException.

optional: the field may or may not be set. If an optional field value isn't set, a default value is used. For simple types, you can specify your own default value, as we've done for the phone number type in the example.

repeated: the field may be repeated any number of times (including zero). It may be simply called dynamically sized array.

The modifier required is a little tricky to use, see the big red warning in documentation and dive in for more language specs.

That’s it for now, what we have here is Google’s lingua franca for data as they mention.

I’m extremely HYPED for this, and in my next post, I’ll show you how to compile this example as well as show you how I feel about the process!

Keep on keepin’ on!

Thanks to David Okun for editing.