KodeKabuki

Welcome, my name is Harish Mallipeddi. I work for Amazon Web Services (AWS). This blog is mostly a dump of interesting articles that I come across on the web. Topics span across multiple areas including algorithms/datastructures, NoSQL stores, database internals, web-scale challenges, and functional languages.

August 19, 2009 at 5:46pm

Home

LZO compression for Hadoop-0.20+

Starting from Hadoop-0.20 onwards, any code related to LZO compression has been removed from the Hadoop source tree. This is because the LZO code is licensed under GPL and hence incompatible with Hadoop’s Apache license. One more thing you should know is LZO compression is only supported via a native library (AFAIK there’s no pure Java implementation of it). LzoCodec and LzopCodec are almost the same (LzopCodec is compatible with the output from the lzop unix utility).

Here are the steps to get LzopCodec working with Hadoop-0.20 (see the gist embed below). I’m assuming you’ve already downloaded and installed the Hadoop-0.20 release tarball. We’ll be adding the compiled library to Hadoop-0.20’s lib/ folder. Repackage it into a tarball and push it to your cluster using whatever magic you use and you should have LZO compression working.