http://developer.yahoo.com/blogs/hadoop/posts/2011/02/mapreduce-nextgen/ -
Proposed redesign of Hadoop by the Y! Hadoop team. In short, HDFS stays the same, but MapReduce becomes an application-level library, and so the existing JobTracker and TaskTrackers get replaced by more generic ResourceManager and NodeManagers.
If your ideas are not being rejected at least 50% of the time, you are playing it way too safe (or too political). — Summation: Dealing with rejection is a core competency
http://www.moserware.com/2010/03/computing-your-skill.html -
High level description of the math behind Microsoft’s TrueSkill algorithm used in XBox player scores.
http://www.javalimit.com/2011/01/understanding-vector-clocks.html -
The best explanation of vector clocks I’ve seen so far.
http://www.infoq.com/presentations/LMAX -
Very insightful talk by couple of engineers from LMAX (UK). They build high-throughput, low-latency financial systems in Java. They go to extreme lengths to avoid using locks by relying on CAS+memory-barriers from JMM for concurrency control and avoiding cache misses. Read the comments below the video on the InfoQ page as well.
They emphasize on the importance of having “mechanical sympathy” - knowing and appreciating how things work in the layers underneath and taking advantage of that know-how for good (apparently that’s what the motor racing world calls it). This is pretty much similar to the “full-stack programmer” idea in the previous post by the Facebook engineer.
http://calendar.perfplanet.com/2010/the-full-stack/ -
Well-written article by Carols Bueno (who works for Facebook) describing how a full-stack programmer would approach thinking/reasoning about a large-scale system (in this case a web application but this sorta thinking can be applied to reason about any system).
Acquiring full-stack experience & knowing the internals of everything is immensely powerful when designing systems. You probably cannot master everything but I have started noticing that once you know the internals of a bunch of systems, you can very quickly guess how something else might work internally. You begin with a handful of potential guesses and then rule out wrong ones by some quick experimentation/hacking.
http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html -
Interesting article - the author sets out to write a micro-benchmark to measure the cost of a context switch in Linux on different x86 hardware.
Overall it’s a very interesting article. Also check out his benchmark code on github.
I found this recording of a talk that Facebook hosted recently. Their MySQL team presents a bunch of interesting projects they’ve worked on at Facebook. I’ve been doing a lot of MySQL-related projects at work as well.
There’s an interesting section where one of the engineers mentions about his ‘non-stored procedures’ as he calls it - he reveals their entire relationship graph is actually stored pretty much in MySQL tables but just heavily sharded and they seem to have gotten it to work well. It looks like majority of their data lives in MySQL - Cassandra usage is pretty minimal.
What happens when replication delay becomes very high across a WAN link? After it crosses a certain threshold, they actually start forwarding DB requests to the datacenter which has the master DB until replication catches up. They also mentioned they’re working on Master-Master replication with conflict detection, so they can have masters in multiple datacenters.
Flashcache hasn’t been put into production yet - still experimental.
MySQL tools:
Facebook seems to be using Percona’s XtraDB and that makes binary backups easy. We opted to use LVM snapshots on a slave at work.
Core dumps are switched off by default.
Make OS X do a core-dump upon a segmentation fault:
ulimit -c unlimited
Unlike Linux, in OS X core dumps end up in /cores instead of the cwd.
gdb /path/to/your/binary /cores/core.XYZ
Foursquare Ops publish a post mortem of their recent outage on their blog. The post made by MongoDB engineer in the google group is actually more interesting since it reveals more of the technical details, and what went wrong specifically because of MongoDB’s behavior.
In bureaucracies many people have the authority to say no, not the authority to say yes. So you end up with products with compromises. This goes back to Steve’s philosophy that the most important decisions are the things you decide NOT to do, not what you decide to do. It’s the minimalist thinking again. — John Sculley On Steve Jobs, The Full Interview Transcript | Cult of Mac
http://jcole.us/blog/archives/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/ -
Article discusses how Linux handles memory in a NUMA system especially when you’ve a single process (like mysqld) trying to take up 90% of the physical memory on the box.
http://www.tokyohackerspace.org/akihabara/ -
Hacker heaven - video tour of interesting shops in the Akihabara district of Tokyo
http://blog.extracheese.org/2010/05/the-tar-pipe.html -
Fascinating 10 minute tour of everything that goes on in Unix when you type (cd src && tar -cf - .) | (cd dest && tar -xpf -) in a bash terminal.
http://www.yosefk.com/blog/my-history-with-forth-stack-machines.html -
Yossi Kreinin writes about his experiences with the Forth programming language and concludes that he wasn’t able to ever scale Forth to solve a real-life problem. I tried to teach myself Factor, a Forth-inspired modern stack language. I quickly came to the same conclusion. The simplicity of implementing a naive interpreter for a stack language was very exciting but unlike when I learnt Erlang, there was no sudden excitement about how I could use this new language to build a whole bunch of cool things.