How to do async journaling from your server code

August 7, 2014 . Comments
Tags: java, journaling

Problem

I want to write journal entries or log entries of transactions to disk. I don't want the system to slow down due to logging. I want to have a rolling log and should be able to recover when some thing wrong happens. Recovery part is not covered in here.

Solution.

Simply use a customized version of log4j which itself supports async logging or use flume. While you choose any of the above technologies please remember the footprint it has for your system and what they are designed for.

You can also use a slight variation of the code i outline below. The below code creates a rolling file with a time stamp. The file has kind of a fixed size of 360 MB, it uses super fast compression (LZF Compress). The writes are written to disk using LinkedBlockingQueue first written to a queue and then later written to memory mapped file in disk. The memory mapped files are outside of JVM heap memory so the file contents might not be held in memory during the time of the writes.

We can use either circular buffer or LinkedBlockingQueue both will give performance in o(1). Let us see the advantages and disadvantages of using circular buffer vs LinkedBlokingQueue.

Circular buffer

Advantages

No time taken for resizing or new memory allocation. Hence no garbage. This is a huge advantage as you know that the GC won't kick-in to claim any of the unused ones from this.

Since we do not resize or add new entries to a fixed buffer there will be no locks. Locks are made only on the part where we increment the index.

Disadvantage

This is a circular buffer hence it has the potential to overwrite existing entires. If we do not let it grow over its size we do not have the concern of overwriting the existing ones.

LinkedBlockingQueue

Advantages

Disadvantages

Since the journal entries are compressed it probably would support 10 hours worth of journal entries in 1 file. This code might potentially chunck the file and write the full snapshot to disk every 10 hours making the system extremely efficient and reducing the disk lag (latency on writing to disk).

Little bit of bragging here.

The No sql system i wrote uses this logic for journaling,Where it writes the full snapshot to disk when ever the 360 MB mark reaches on journaling. Once the full snapshot is made the Journal entries are used only to help outdated masters come back to life. Transferring a 360 MB or less file between masters is way easier than incurring the multiple reads on the system. When ever a different server needs a new snapshot of information the system is designed to give it the snapshot data and journal to bring the new server to life. It takes my system 10 to 16 seconds to load the Snapshot and journal data from disk in my 3 year old MAC laptop. It takes the system 25 seconds to write the snapshot.