Jul 16, 2014

Mar 6, 2014

Links from my reading list

Feb 5, 2014

Links from my reading list

Nov 21, 2013

Links from my reading list

Oct 9, 2013

Links from my reading list

Sep 22, 2013

May 2, 2013

Understanding shared mem usage of multi-proc apps on Linux

Recently I had to debug a potential memory leak in an application that I’ve been working on. The problem was extra hard because the application is a multi-process application (spawns ~600 processes) that used a large mmap’d segment (~100G) for IPC. Running top and sorting by ‘rss’ values was not helpful since the top pids may not necessarily be the ones that might be leaking memory. The rss values do not account for sharing of the pages among the processes. So if a page is shared by n pids, then it’s counted ‘n’ times (in fact try summing the rss values of all the running pids and it would be way over the physical memory on the box!).

Fortunately newer Linux kernels track the proportional set size (pss) values for different memory segments of a pid. This can be obtained via /proc/<pid>/smaps. So if a page has been faulted in by 4 pids, then the page only contributes 0.25 to the pss of each pid (vs 1.0 to the rss of the pid). So this gives you a more realistic estimate of the memory usage of each pid.

Top doesn’t display the pss values but luckily someone already wrote a python script called smem to parse the /proc/smaps output of each running pid on the box and spit out the aggregated pss and uss values for each pid. Uss is unshared set size (basically all the memory used by a process that is not shared with anyone else). The uss value is actually computed by summing the PrivateClean and PrivateDirty fields in the /proc/smaps output.

I wrote a quick python script to experiment with smem. The script mmaps a 5G segment and spawns off a bunch of child pids which do different things (one only maps the 5G segment, another faults in 50% of the pages, another faults in 100% of the pages, another creates a private 5G segment, and so on).

View the python script smem_test.py on GitHub.

Apr 8, 2013

Feb 28, 2013

Links from my reading list

Feb 6, 2013

Jan 24, 2013

Links from my reading list

Dec 29, 2012

Links from my reading list

Building a newsfeed-like feature

  • Real-Time Delivery Architecture at Twitter - one interesting thing which I didn’t realize they started doing is to open persistent connections from the official desktop client and just stream tweets to users (essentially they’ve streaming servers which apply filters on the global firehose and push to the clients). The website is still rendered by reading the write-materialized inbox that’s stored as a list of tweet-ids in redis for each user.
  • Petabyte Scale Data at Facebook - Dhurba Borthakur talks about the different kinds of storage systems they’ve built at Facebook for different needs. Good overview of the BigData landscape.
  • Facebook News Feed: Social Data at Scale - interestingly they don’t materialize the newsfeed during writes like Twitter. I think this also helps them because they do ranking on the feed items and return only a subset of interesting items back to the user.
  • Scaling Tumblr - talk on scaling tumblr’s Dashboard feature (they used to do materialization during reads, but seem to be switching to materialization during writes like Twitter).

More scaling talks


This is probably my last post for 2012. Happy new year!

Dec 20, 2012

Onion routing

Recently came across a really interesting talk by developers on the Tor project. The talk is about how different countries across the world have tried to block Tor. Go watch it. Most of the countries except for China really just buy a deep-packet-inspection (DPI) solution that’s sold by a handful of companies (mostly based in the US), and hope it blocks Tor (Tor pretends to be SSL and so you’ve to look for patterns to distinguish Tor from real SSL traffic).

I dug a little deeper to learn how Tor works, and I’ve been running a Tor non-exit relay on one of my boxes just for fun (interestingly Akamai seems to run a bunch of exit relays!).


In Tor, you want the Onion Routers (OR) that form a circuit to only know about the previous hop and the next hop. They should never have a full picture of the entire circuit. That way if someone compromises one OR and records all traffic, they won’t be able to identify the end-user or the contents of the traffic with some exceptions.

  • Exception 1 - if a compromised OR is the exit relay (i.e. last hop before exiting out of the Tor network), then the attacker can see the traffic (if the traffic is plaintext to begin with; if it’s HTTPS traffic then even the last hop cannot inspect it obviously). But they cannot identify the user’s IP address. But there’s a big caveat of course - the traffic itself shouldn’t contain any user-identifying information. This is why they encourage you to use HTTPS everywhere. Imagine logging into email with plain HTTP and your email id flying around in plain-text. It’s pointless to hide your IP address when you’ve let your email address go in plain text. You’re not anonymous anymore.
  • Exception 2 - if a compromised OR is the entry relay (i.e. beginning of the circuit), then an attacker know who the user is but they don’t know the contents. This is because the traffic is encrypted several times by the end-user such that only at the last hop can it be fully decrypted to reveal the plaintext payload like peeling an onion (hence the name).

So Tor typically creates a 3 OR hop circuit, and you should be safe even if one Tor has been compromised, and the attacker wants to break your anonymity.

Circuit creation

The Tor client (Onion Proxy) will pick a bunch of relays to create a circuit. There are several mechanisms to get the list of available relays (there’s a global directory list available via HTTP, there are unpublished bridge relays, and then there are secret relays which friends give to their friends only). In countries like Iran/China, they obviously block all IP addresses listed in the global directory list, and sometimes more. Watch the talk I mentioned above to learn more.

Coming back to how Tor works, relays are used to create a circuit, and a circuit will multiplex several TCP streams. Circuit is established in a serial fashion. Client talks to relay A, does a DH exchange, and so they’ve a secret session key that they’ll use for encryption. Client picks another relay B, and tells A to extend the circuit to B. Client does DH exchange with B via A acting as a relay (A cannot see the session key for B because DH provides that guarantee; similarly B does not know the identify of the client since it only sees A as the source of the traffic). So in the future, client would encrypt TCP stream with B’s session key followed by encryption with A’s session key. So when a packet arrives at A, it cannot see the original payload without knowing B’s session key. So original payload can be obtained only if you know all the session keys, or if you’re the exit relay and the payload is not itself encrypted.

Nov 27, 2012

Links from my reading list


November has been a crazy month of debugging hard problems at work preparing for launch. Not sure if it’s pure coincidence but I came across some excellent articles/talks on debugging this month.


Oct 29, 2012

« To the past Page 1 of 45
Welcome, my name is Harish Mallipeddi. I'm a software engineer currently living in San Francisco. This blog is mostly a dump of interesting articles that I come across on the web. Topics span across multiple areas including algorithms/datastructures, NoSQL stores, database internals, web-scale challenges, and functional languages. Subscribe via RSS.