Serialization - Continues....


August 13, 2014 . Comments
Tags: java serialization externalize

 
 
  

As a continuation of the previous serialization blogs I am planning to bore further on this topic. I like protobuf a lot and also I like FST serialization on the aspects that it can compact the serialized bytes in o(1). Due to the compaction of size between protobuf and FST I would say FST would win if we look at the end to end pipe line.

After learning the best from both these ideas i planned to write my own API and see if i can do better. What is that i have to offer that is not already in one of these. I am planning to add the below features on top of FST framework.

  1. The use of ByteBuffer for allocating memory for the objects to be serialized. This is particularly important as this widens the range for serialization to use

    DirectByteBuffer - though slow right now might become super fast sooner as OS handles the memory allocation for later. Today it is slow as the JVM controls the availability of the buffer memory for security and exclusive access.

    Array backed byte buffer - This is the normal bytebuffer. This is the fastest byte buffer implementation to use.

    Memory mapped byte buffer - This is the coolest feature for writing the serialized data asynchronously to disk.

  2. Pre-calculating the size of a object, so we donot incur the cost of array resizing on the fly. We would incur the array resizing cost and GC cost if we use FST or ByteArrayOutputStream. It is possible to use same bytebuffer to serialize multiple objects that are part of the same connection as long as the size matches.

  3. Introduction of LazyString. This is a boxed type of string to store the bytes of string and do the conversion of string to bytes and vice versa only when it is necessary.

  4. Only works with known types primitive types and their respective boxed objects, List, Map and some more objects specific to banyan. This looks custom built for banyan but is very easy to develop.

  5. Automatically provides compression and the compression acheived is better than LZ4 compression on top of java serialized bytes.

After explaining the benefits of the new API. It is time to show the code that does the actual serialization and deserialization for Banyan and check out their performance.

There are two use cases of objects i tend to use in Banyan commonly they are ArrayList for storing single items attribute or leaf values or a Map to store the values that can live in a branch which can protentially be deep.

In this blog i shall work with only ArrayList and my own RandomItem. Its schema looks like below with getter and setters.

Now for the code fragment used for the perf test of the new BinaryObjectSerializer

Now for the actual results and data size after serialization for a total call count of 10,000,000.

The list size of 1000 count the metrics are below

API Byte Size Performance AVG
Banyan 111,965 4 ms
Protobuf 247,981 4 ms
Fast 120,490 7 ms
Java 204,135 45 ms

For a list size of 750 count the metrics are below

API Byte Size Performance AVG
Banyan 78,333 4 ms
Protobuf 184,404 3 ms
Fast 84,467 5 ms

For a list size of 500 count the metrics are below

API Byte Size Performance AVG
Banyan 52,401 2 ms
Protobuf 123,468 2 ms
Fast 56,537 2 ms

For a list size of 250 count the metrics are below

API Byte Size Performance AVG
Banyan 26,288 1 ms
Protobuf 61,471 1 ms
Fast 28,211 1 ms

Now going and looking at the more realistic use cases

For a list size of 100 count the metrics are below

API Byte Size Performance AVG
Banyan 104,40 550 µs
Protobuf 248,84 450 µs
Fast 112,44 1 ms
For a list size of 25 count the metrics are below 
API Byte Size Performance AVG
Banyan 25,67 120 µs
Protobuf 62,03 100 µs
Fast 27,98 154 µs
 For a list size of 1 count the metrics are below 
API Byte Size Performance AVG
Banyan 103 7 µs
Protobuf 238 7 µs
Fast 140 10 µs

As you can see from the metrics above all three are comparable on performance. Protobuf is a little faster as it is more declarative. I shall update the post with more fine tuning to acheive better performance.

Now i just need to have support for serializing map types with these super great performance numbers - i get to jump to my next great solution. I am super excited on seeing these metrics. I would be willing to make this genenral purpose if anyone is interested in getting more about this. Based on these metrics I decided to write my own Serialization API's as listed above and they will be avialbale in the banyan repository soon.



Comments Section

Feel free to comment on the post but keep it clean and on topic.