KodeKabuki

Jan 27

Absolute consistency -

Excellent rant on “consistency” in Dynamo-inspired systems like Riak.

Jan 03

DLNA with a Panasonic Viera

I bought a Panasonic Viera LED TV before the holidays and the TV is DLNA certified. I finally figured out how to get DLNA to work properly with the TV.

Serviio

I installed Serviio on a Linux (Ubuntu) box. Serviio is an Opensource DLNA server written in Java. Running it launches a daemon in the background which listens for requests on a specific port, and it also ships with a console app to configure it.

The TV should automatically discover the DLNA server, and you should be able to browse the shared folders on the server. I was able to play some of the videos directly without any problems, but playing other videos failed.

“Cannot read file” errors

Serviio supposedly uses ffmpeg underneath to transcode the videos on-the-fly if necessary. I couldn’t get this to work properly. I basically ended up manually transcoding videos into the NTSC-DVD format with ffmpeg and then dropping the transcoded videos into the shared folder. Now the videos started working fine. I was able to pause/skip videos correctly as well.

ffmpeg -i source.avi -target ntsc-dvd -ps 2000000000 -aspect 16:9 target.mpeg

Dec 06

http://blog.corensic.com/2011/11/28/virtual-machines-memory/ -

Great article describing how virtual->physical address translation works in x86.

Summary

The author has another article explaining how this works when you’re running virtualized operating systems on top of a hypervisor.

Dec 03

http://www.cloudera.com/resource/hadoop-world-2011-presentation-slides-hadoop-and-performance -

Video of @tlipcon’s talk on Hadoop Performance from Hadoop World 2011 is now available on the Cloudera website. Todd walks through a bunch of performance fixes he did to Hadoop recently, and it’s an interesting list of common perf optimization tricks:

Oct 07

Rsync internals

I’m officially back from a two month blogging hiatus. A few weeks ago, I got really curious about how rsync works and decided to dig deeper.

The objective of the rsync algorithm is to minimize the amount of data that needs to be sent over the wire in order to sync two versions of a file. The algorithm assumes sender has the latest version and we want to override the version on the receiver.

You say merkle tree? Maybe not…

The first technique that came to my mind is what copy-on-write systems do - build a hash tree (merkle tree) over the contents of the file by dividing the file into fixed-size chunks. This works fine in systems where the chunks are fixed-size and chunk boundaries don’t change due to inserts/deletes. In rsync’s case, the user could have inserted (or deleted) bytes into the middle of the file, and if you use fixed-size chunks, then the hashes of the chunks from that point onwards would not match with the hashes on the receiver’s end.

Rolling checksums

The trick which rsync has up its sleeve is the use of rolling checksums. A rolling checksum is a checksum algorithm which given the checksum for byte range [n, n+S-1] for chunk size S, byte n, and byte n+S, can compute the checksum for the byte-range [n+1, n+S] very efficiently. In other words, you can keep computing checksums for a sliding window over a byte stream very efficiently. Rsync actually uses the Adler-32 checksum which is also used in zlib.

But a rolling checksum is cryptographically weak. So just to be safe rsync uses both the MD5 hash and the rolling checksum for each chunk of size S. The receiver computes MD5+rolling-sum for each chunk in its version of the file and sends that across to the sender. The sender then compares the rolling-sums for each chunk, and if they match also compares the MD5sums, and if they match it can safely assume these chunks haven’t changed. Now at some point, the rolling-sums might not match. At this point, the sender can start sliding the window of chunk S over subsequent bytes in its version and try to see the next matching rolling-sum. This operation will be fast due to the nature of the rolling-sum. Once it finds a matching rolling-sum, then the algorithm can proceed as before, and it can also mark the exact byte-sequence which needs to be inserted in the receiver’s version of the file.

Gzip & rsync

The problem with gzipped files is that even if a single byte changes in the uncompressed file, then the rest of the contents of the compressed file from that byte onwards would change completely. This completely messes up rsync’s algorithm.

--rsyncable option - gzip supports this option and when you use it, it writes compressed files in a manner which is conducive to rsync’s algorithm. And what it does is use the same “rolling checksum” idea to define “chunk boundaries”. Once you move onto the next chunk, gzip resets its internal dictionary and starts compressing the next chunk freshly and so any changes to the bytes in the previous chunk don’t affect the compressed bytes in subsequent chunks. This idea is actually quite similar to the SequenceFile idea in Hadoop which lets you store data in compressed format but still be splittable so you can assign file-splits to mappers efficiently.

How are chunk boundaries identified? Whenever the last ‘n’ bits of the rolling checksum become all ‘0’ (statistically this has to happen at least once every 2^n times), and since you compute 1 rolling checksum for each new byte, you roughly get a chunk every 2^n bytes.

More info

There’s an audio recording and a full transcript of an hour-long talk by Dr. Andrew Tridgell who developed the rsync algorithm as part of his PhD thesis. It goes into further detail about the core algorithm itself and is a very entertaining read.

Jul 28

http://www.scribd.com/doc/53197944/Linux-and-H-W-optimizations-for-MySQL -

Comprehensive list of things to think about both at the hardware and Linux level in order to run and maintain MySQL servers.

Some interesting things I didn’t know before:

A quick tutorial on generating a huffman tree -

Huffman {en|de}coding is really simple once you know how to build the Huffman tree.

Jun 30

http://www.lighterra.com/papers/modernmicroprocessors/ -

This article provides an excellent overview of the current state of modern microprocessor design for ‘software engineers’.

Key concepts/ideas

Jun 25

“ext-2 and ext-3 lock a per-inode mutex for the duration of a write. This means that ext-2 and ext-3 do not allow concurrent writes to a file and that can prevent you from getting the write throughput you expect when you stripe a file over several disks with RAID. XFS does not do this which is one of the reasons many people prefer XFS for InnoDB.” —

XFS, ext and per-inode mutexes

I didn’t know this. So if you’re using InnoDB with O_DIRECT on a server with RAID, you’d really be well off using XFS. I’m guessing if you don’t use O_DIRECT, your writes just end up getting cached in the buffer cache, and the write throughout issue won’t be pronounced. Domas made a benchmark to test this behavior.

Jun 10

Recipe: encrypt/decrypt clipboard contents on OS X

I’ve started using the following whenever I need to store sensitive stuff in Evernote/Dropbox/GMail/etc.


encrypt_aes128() {
    pbpaste | openssl enc -e -aes128 -base64 -pass "pass:$1" | pbcopy
}
decrypt_aes128() {
    pbpaste | openssl enc -d -aes128 -base64 -pass "pass:$1" | pbcopy
}

Jun 01

http://papilio.cc -

An Arduino-like project for FPGA programming.

Mar 31

http://www.cloudera.com/blog/2011/03/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-3/ -

Great series of blog posts on some HBase GC optimization work by Todd Lipcon at Cloudera.

Problem: Long GC pauses were being observed for write-heavy HBase workloads on the RegionServer. One RegionServer is responsible for several regions, and all writes to a Region go to a MemStore which gets flushed to HDFS only after a certain threshold (which means the objects in MemStore make it to tenured generation). Assuming random distribution of writes to regions, memory gets fragmented in the tenured generation because memory for different regions gets interleaved.

Solution: Allocate memory for KVs in MemStore from Region-specific chunks. A chunk is merely just a contiguous byte array. So when a Region’s MemStore gets flushed, it doesn’t fragment memory much. Here’s the original JIRA which also has the patch.

Feb 22

http://www.toao.com/posts/finding-similar-items-key-store-minhashing.html -

Article introduces what minhashing is and proves that the probability of 2 sets being similar is actually equal to the probability of their minhashes matching. So you can actually calculate the minhashes of sets and use that to determine if the sets are similar/dissimilar without having to compare each and every element.

Feb 15

http://bartoszmilewski.wordpress.com/2010/09/11/beyond-locks-software-transactional-memory/ -

Bartosz Milewski writes a great article on how STMs are implemented at a high-level.

http://developer.yahoo.com/blogs/hadoop/posts/2011/02/mapreduce-nextgen/ -

Proposed redesign of Hadoop by the Y! Hadoop team. In short, HDFS stays the same, but MapReduce becomes an application-level library, and so the existing JobTracker and TaskTrackers get replaced by more generic ResourceManager and NodeManagers.