KodeKabuki

Welcome, my name is Harish Mallipeddi. I work for Amazon Web Services (AWS). This blog is mostly a dump of interesting articles that I come across on the web. Topics span across multiple areas including algorithms/datastructures, NoSQL stores, database internals, web-scale challenges, and functional languages.

February 7, 2011 at 3:33am

Notes

If your ideas are not being rejected at least 50% of the time, you are playing it way too safe (or too political).

— Summation: Dealing with rejection is a core competency

January 31, 2011 at 11:36am

1 note

http://www.moserware.com/2010/03/computing-your-skill.html →

High level description of the math behind Microsoft’s TrueSkill algorithm used in XBox player scores.

1:37am

0 notes

http://www.javalimit.com/2011/01/understanding-vector-clocks.html →

The best explanation of vector clocks I’ve seen so far.

December 23, 2010 at 1:48am

0 notes

http://www.infoq.com/presentations/LMAX →

Very insightful talk by couple of engineers from LMAX (UK). They build high-throughput, low-latency financial systems in Java. They go to extreme lengths to avoid using locks by relying on CAS+memory-barriers from JMM for concurrency control and avoiding cache misses. Read the comments below the video on the InfoQ page as well.

They emphasize on the importance of having “mechanical sympathy” - knowing and appreciating how things work in the layers underneath and taking advantage of that know-how for good (apparently that’s what the motor racing world calls it). This is pretty much similar to the “full-stack programmer” idea in the previous post by the Facebook engineer.

December 3, 2010 at 12:31pm

0 notes

http://calendar.perfplanet.com/2010/the-full-stack/ →

Well-written article by Carols Bueno (who works for Facebook) describing how a full-stack programmer would approach thinking/reasoning about a large-scale system (in this case a web application but this sorta thinking can be applied to reason about any system).

Acquiring full-stack experience & knowing the internals of everything is immensely powerful when designing systems. You probably cannot master everything but I have started noticing that once you know the internals of a bunch of systems, you can very quickly guess how something else might work internally. You begin with a handful of potential guesses and then rule out wrong ones by some quick experimentation/hacking.

November 29, 2010 at 1:15pm

Notes

http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html →

Interesting article - the author sets out to write a micro-benchmark to measure the cost of a context switch in Linux on different x86 hardware.

  • Cost of context switch cannot be measured simply by making syscalls to enter/leave kernel mode because in modern Linux kernels apparently that doesn’t cause a full context switch.
  • Benoit decides to use futexes - parent and child processes waiting on the futex wake each other up taking turns (thereby context switching).
  • Setting processor affinity has a very visible impact.
  • More than the actual overhead of a context switch, in real life scenarios, L1/L2 cache pollution due to context switches is more detrimental.
  • Cost of context switch estimated - 30µs

Overall it’s a very interesting article. Also check out his benchmark code on github.

November 16, 2010 at 3:36pm

0 notes

MySQL TechTalk @ Facebook

Watch live streaming video from facebookevents at livestream.com

I found this recording of a talk that Facebook hosted recently. Their MySQL team presents a bunch of interesting projects they’ve worked on at Facebook. I’ve been doing a lot of MySQL-related projects at work as well.

There’s an interesting section where one of the engineers mentions about his ‘non-stored procedures’ as he calls it - he reveals their entire relationship graph is actually stored pretty much in MySQL tables but just heavily sharded and they seem to have gotten it to work well. It looks like majority of their data lives in MySQL - Cassandra usage is pretty minimal.

What happens when replication delay becomes very high across a WAN link? After it crosses a certain threshold, they actually start forwarding DB requests to the datacenter which has the master DB until replication catches up. They also mentioned they’re working on Master-Master replication with conflict detection, so they can have masters in multiple datacenters.

Flashcache hasn’t been put into production yet - still experimental.

MySQL tools:

Facebook seems to be using Percona’s XtraDB and that makes binary backups easy. We opted to use LVM snapshots on a slave at work.

  • Percona’s maatkit - useful set of command-line utils
  • Percona’s aspersa - set of Perl scripts for debugging MySQL; ioprofiler is useful
  • innotop
  • qpress - QuickLZ compression for transferring MySQL dumps; supposedly faster than LZO also.

October 22, 2010 at 9:53am

0 notes

Core dumps on OS X

Core dumps are switched off by default.

Make OS X do a core-dump upon a segmentation fault:

ulimit -c unlimited

Unlike Linux, in OS X core dumps end up in /cores instead of the cwd.

gdb /path/to/your/binary /cores/core.XYZ

October 18, 2010 at 9:05am

Notes

Sharding gone wrong →

Foursquare Ops publish a post mortem of their recent outage on their blog. The post made by MongoDB engineer in the google group is actually more interesting since it reveals more of the technical details, and what went wrong specifically because of MongoDB’s behavior.

October 15, 2010 at 2:18pm

0 notes

In bureaucracies many people have the authority to say no, not the authority to say yes. So you end up with products with compromises. This goes back to Steve’s philosophy that the most important decisions are the things you decide NOT to do, not what you decide to do. It’s the minimalist thinking again.

— John Sculley On Steve Jobs, The Full Interview Transcript | Cult of Mac

September 29, 2010 at 11:06am

Notes

http://jcole.us/blog/archives/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/ →

Article discusses how Linux handles memory in a NUMA system especially when you’ve a single process (like mysqld) trying to take up 90% of the physical memory on the box.

September 26, 2010 at 10:06pm

0 notes

http://www.tokyohackerspace.org/akihabara/ →

Hacker heaven - video tour of interesting shops in the Akihabara district of Tokyo

September 20, 2010 at 10:35am

0 notes

http://blog.extracheese.org/2010/05/the-tar-pipe.html →

Fascinating 10 minute tour of everything that goes on in Unix when you type (cd src && tar -cf - .) | (cd dest && tar -xpf -) in a bash terminal.

September 11, 2010 at 10:17pm

0 notes

http://www.yosefk.com/blog/my-history-with-forth-stack-machines.html →

Yossi Kreinin writes about his experiences with the Forth programming language and concludes that he wasn’t able to ever scale Forth to solve a real-life problem. I tried to teach myself Factor, a Forth-inspired modern stack language. I quickly came to the same conclusion. The simplicity of implementing a naive interpreter for a stack language was very exciting but unlike when I learnt Erlang, there was no sudden excitement about how I could use this new language to build a whole bunch of cool things.

August 30, 2010 at 10:59am

0 notes

JRuby Hacking Guide (RubyKaigi talk by @nahi) →

Also check out his JRuby Source Reading Guide.