Serialization - Continues....

August 13, 2014 . Comments
Tags: java serialization externalize

As a continuation of the previous serialization blogs I am planning to bore further on this topic. I like protobuf a lot and also I like FST serialization on the aspects that it can compact the serialized bytes in o(1). Due to the compaction of size between protobuf and FST I would say FST would win if we look at the end to end pipe line.

After learning the best from both these ideas i planned to write my own API and see if i can do better. What is that i have to offer that is not already in one of these. I am planning to add the below features on top of FST framework.

The use of ByteBuffer for allocating memory for the objects to be serialized. This is particularly important as this widens the range for serialization to use

DirectByteBuffer - though slow right now might become super fast sooner as OS handles the memory allocation for later. Today it is slow as the JVM controls the availability of the buffer memory for security and exclusive access.

Array backed byte buffer - This is the normal bytebuffer. This is the fastest byte buffer implementation to use.

Memory mapped byte buffer - This is the coolest feature for writing the serialized data asynchronously to disk.
Pre-calculating the size of a object, so we donot incur the cost of array resizing on the fly. We would incur the array resizing cost and GC cost if we use FST or ByteArrayOutputStream. It is possible to use same bytebuffer to serialize multiple objects that are part of the same connection as long as the size matches.
Introduction of LazyString. This is a boxed type of string to store the bytes of string and do the conversion of string to bytes and vice versa only when it is necessary.
Only works with known types primitive types and their respective boxed objects, List, Map and some more objects specific to banyan. This looks custom built for banyan but is very easy to develop.
Automatically provides compression and the compression acheived is better than LZ4 compression on top of java serialized bytes.

After explaining the benefits of the new API. It is time to show the code that does the actual serialization and deserialization for Banyan and check out their performance.

There are two use cases of objects i tend to use in Banyan commonly they are ArrayList for storing single items attribute or leaf values or a Map to store the values that can live in a branch which can protentially be deep.

In this blog i shall work with only ArrayList and my own RandomItem. Its schema looks like below with getter and setters.

Now for the code fragment used for the perf test of the new BinaryObjectSerializer

Now for the actual results and data size after serialization for a total call count of 10,000,000.

The list size of 1000 count the metrics are below

API	Byte Size	Performance AVG
Banyan	111,965	4 ms
Protobuf	247,981	4 ms
Fast	120,490	7 ms
Java	204,135	45 ms

For a list size of 750 count the metrics are below

API	Byte Size	Performance AVG
Banyan	78,333	4 ms
Protobuf	184,404	3 ms
Fast	84,467	5 ms

For a list size of 500 count the metrics are below

API	Byte Size	Performance AVG
Banyan	52,401	2 ms
Protobuf	123,468	2 ms
Fast	56,537	2 ms

For a list size of 250 count the metrics are below

API	Byte Size	Performance AVG
Banyan	26,288	1 ms
Protobuf	61,471	1 ms
Fast	28,211	1 ms

Now going and looking at the more realistic use cases

For a list size of 100 count the metrics are below

API	Byte Size	Performance AVG
Banyan	104,40	550 µs
Protobuf	248,84	450 µs
Fast	112,44	1 ms

For a list size of 25 count the metrics are below

API	Byte Size	Performance AVG
Banyan	25,67	120 µs
Protobuf	62,03	100 µs
Fast	27,98	154 µs

 For a list size of 1 count the metrics are below

API	Byte Size	Performance AVG
Banyan	103	7 µs
Protobuf	238	7 µs
Fast	140	10 µs

As you can see from the metrics above all three are comparable on performance. Protobuf is a little faster as it is more declarative. I shall update the post with more fine tuning to acheive better performance.

Now i just need to have support for serializing map types with these super great performance numbers - i get to jump to my next great solution. I am super excited on seeing these metrics. I would be willing to make this genenral purpose if anyone is interested in getting more about this. Based on these metrics I decided to write my own Serialization API's as listed above and they will be avialbale in the banyan repository soon.

Comments Section

Feel free to comment on the post but keep it clean and on topic.