Caching woes…over!

Four months ago, I mentioned here that was facing problems because the backend caching service was stopped citing excessive resource usage. This service, which was written in Perl, would start consuming 24MB of memory instantly upon invocation. This would later grow and shrink, but mostly fluctuate between 34-40MB. When there were feeds to be merged it used to consume more than 90% CPU cycles.

I started writing a new caching service in C, using libcurl, libmysqlclient and libxml2. I was excited, but I had no idea how to use any of these libraries. And before I could dive deeper, I got busy with organizing FUDCon India 2011. After FUDCon, there was other pending work which needed immediate attention. By the time I got back to writing the caching service again, December was half gone.

The first step was to read the list of feeds from the database, fairly straightforwad, connect to MySQL server and do a query – done. Second was to use libcurl and download each feed to the local disk. Though libcurl project has decent documentation and examples available, finding the list of options to set took some time – done. The final step, which took most of the time was to parse and merge XML feeds. The problem here was, though has lots of documentation published, none of it was readily useful. And the other tutorials I found were just as confusing. In the end it was gdb(1) which helped me realise that the XML nodes are arranged like a linked list with separate pointers pointing to its sibling and children nodes. This was enough to read and traverse the XML structure, but not enough to actually read the feed content. I’ve one request for the folks who write the RFCs and standards: Please do not have optional fields and parameters in the standard, this will only lead to non-uniform documents and inconsistant implementations.

After catching errors like missing blog links, date & time of publication, authors name etc. I had merged all the three feed formats: RSS, RDF and Atom, with CPU usage below 5% for majority of the time. Still there were two problems:

  1. couple of elusive segmentation faults, which I could not reproduce no matter how I try.
  2. Memory consumption, it was still more than 15MB and growing.

After debugging for close to a week, I finally found the glitch that was leading to one of the segmentation fault. It was a bug in the merging algorithm which would creep-in while merging last node of the list. For the memory consumption issue, I decided to try Valgrind to find memory leaks in the service. Valgrind(1) not only helped me to fix the memory leaks but also help me find another problem in the merge function.

With most of the leaks fixed, the servers memory consumption now fluctuates between 7MB-10MB starting from 3.9MB upon invocation.

There are still few leaks, they mostly seem to be in one of the linked libraries. I’m yet to figure out how to go about fixing them.

==31922== 213,216 bytes in 6,663 blocks are possibly lost in loss record 721 of 721
==31922== at 0x4A05BB4: calloc (vg_replace_malloc.c:467)
==31922== by 0x7A94449: nss_ZAlloc (arena.c:892)
==31922== by 0x7A85371: pem_CreateObject (pobject.c:1157)
==31922== by 0x7A89688: nssCKFWSession_CreateObject (session.c:1353)
==31922== by 0x7A8E4F9: NSSCKFWC_CreateObject (wrap.c:1994)
==31922== by 0x3FF0446419: PK11_CreateNewObject (pk11obj.c:412)
==31922== by 0x3FF04478B2: PK11_CreateGenericObject (pk11obj.c:1359)
==31922== by 0x397D40984F: nss_create_object (nss.c:350)
==31922== by 0x397D40993C: nss_load_cert (nss.c:412)
==31922== by 0x397D43F238: Curl_nss_connect (nss.c:1129)
==31922== by 0x397D436758: Curl_ssl_connect (sslgen.c:199)
==31922== by 0x397D40F70F: Curl_http_connect (http.c:1307)
==31922== LEAK SUMMARY:
==31922== definitely lost: 312 bytes in 3 blocks
==31922== indirectly lost: 0 bytes in 0 blocks
==31922== possibly lost: 2,000,465 bytes in 17,637 blocks
==31922== still reachable: 602,346 bytes in 2,609 blocks
==31922== suppressed: 0 bytes in 0 blocks
==31922== Reachable blocks (those to which a pointer was found) are not shown.
==31922== To see them, rerun with: --leak-check=full --show-reachable=yes

Another possible way to save memory could be to de-link from the numerous libraries. Even though I directly link with just 10 libraries at compile time, ldd(1) and pmap(1) show that the server is actually linked with some 38 differerent dynamic libraries, all of which add-up to the total memory consumption of the server.

Hope this works-out well now. 🙂

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s