Of patches, review request and a Jasmine plant.

Hi,

Past week was intriguing. Last week-end I spent time digging into the source of LibNSS to find memory leaks that were report by Valgrind(1) with the caching server of feedmug.com. Valgrind(1) produces the call stack and shows precisely where the memory was allocated, finding where it was leaked is yet another exercise. It took some time and jumping from one function to another to find the exact point where it was leaked. It’s very easy to lose track while manully unfolding the call stack like this.

===
mozilla/security/nss/lib/nss/nssinit.c:687
mozilla/security/nss/lib/nss/nssinit.c:719
mozilla/security/nss/lib/base/error.c:281
mozilla/security/nss/lib/ckfw/instance.c:245
mozilla/security/nss/lib/ckfw/wrap.c:205
===

Just when I submitted this patch to LibNSS, I received another one from Jose(jmalv04) offering the systemd(1) unit file for the dnscache(1) server of New djbdns. It was one pending task Rahul had asked for. It’s really nice to receive these patches for New djbdns. Every now and again I keep getting mails from people asking for configuration help or saying that they use this package everyday and find it really helpful. 🙂

I’ve added the new systemd(1) unit files for dnscache(1) as well as tinydns(1) server and have also updated the long standing review request

at -> https://bugzilla.redhat.com/show_bug.cgi?id=480724.

This review request has almost become a case study by itself. I filed it more than two years ago, they had intense arguments over it, some liked the effort while others criticised a little. It is ironic how users want to use this package, they like it, even defend it at times. Yet nobody wants to approve it just becasue it was originally conceived and written by a notorious professor. Who then left it and moved on. 😦

You can access the new updated source and F16 RPMs from

-> http://pjp.dgplug.org/djbdns/ndjbdns-1.05.4.tar.gz
-> http://pjp.dgplug.org/djbdns/ndjbdns-1.05.4-4.fc16.src.rpm
-> http://pjp.dgplug.org/djbdns/ndjbdns-1.05.4-4.fc16.x86_64.rpm

For the concluding note – After long time today, I went to the nursery. It looked strangely deserted of plants. I guess they are doing some restructring there. I went to get a Chilli plant but couldn’t find one. The lady there said why not take Jasmine plant sir? I smiled to myself. The thing with Jsamine is I LOVE it and I’ve had many of them so far, they just don’t stay for long. I guess they need some direct sun light which is never available in my balcony through-out the year. But it’s Jsamine, why not try once more?! 🙂

Radio plays excellent music Sunday nights, No RJs. 🙂

Caching woes…over!

Four months ago, I mentioned here that feedmug.com was facing problems because the backend caching service was stopped citing excessive resource usage. This service, which was written in Perl, would start consuming 24MB of memory instantly upon invocation. This would later grow and shrink, but mostly fluctuate between 34-40MB. When there were feeds to be merged it used to consume more than 90% CPU cycles.

I started writing a new caching service in C, using libcurl, libmysqlclient and libxml2. I was excited, but I had no idea how to use any of these libraries. And before I could dive deeper, I got busy with organizing FUDCon India 2011. After FUDCon, there was other pending work which needed immediate attention. By the time I got back to writing the caching service again, December was half gone.

The first step was to read the list of feeds from the database, fairly straightforwad, connect to MySQL server and do a query – done. Second was to use libcurl and download each feed to the local disk. Though libcurl project has decent documentation and examples available, finding the list of options to set took some time – done. The final step, which took most of the time was to parse and merge XML feeds. The problem here was, though xmlsoft.org has lots of documentation published, none of it was readily useful. And the other tutorials I found were just as confusing. In the end it was gdb(1) which helped me realise that the XML nodes are arranged like a linked list with separate pointers pointing to its sibling and children nodes. This was enough to read and traverse the XML structure, but not enough to actually read the feed content. I’ve one request for the folks who write the RFCs and standards: Please do not have optional fields and parameters in the standard, this will only lead to non-uniform documents and inconsistant implementations.

After catching errors like missing blog links, date & time of publication, authors name etc. I had merged all the three feed formats: RSS, RDF and Atom, with CPU usage below 5% for majority of the time. Still there were two problems:

  1. couple of elusive segmentation faults, which I could not reproduce no matter how I try.
  2. Memory consumption, it was still more than 15MB and growing.

After debugging for close to a week, I finally found the glitch that was leading to one of the segmentation fault. It was a bug in the merging algorithm which would creep-in while merging last node of the list. For the memory consumption issue, I decided to try Valgrind to find memory leaks in the service. Valgrind(1) not only helped me to fix the memory leaks but also help me find another problem in the merge function.

With most of the leaks fixed, the servers memory consumption now fluctuates between 7MB-10MB starting from 3.9MB upon invocation.

There are still few leaks, they mostly seem to be in one of the linked libraries. I’m yet to figure out how to go about fixing them.

===
...
==31922== 213,216 bytes in 6,663 blocks are possibly lost in loss record 721 of 721
==31922== at 0x4A05BB4: calloc (vg_replace_malloc.c:467)
==31922== by 0x7A94449: nss_ZAlloc (arena.c:892)
==31922== by 0x7A85371: pem_CreateObject (pobject.c:1157)
==31922== by 0x7A89688: nssCKFWSession_CreateObject (session.c:1353)
==31922== by 0x7A8E4F9: NSSCKFWC_CreateObject (wrap.c:1994)
==31922== by 0x3FF0446419: PK11_CreateNewObject (pk11obj.c:412)
==31922== by 0x3FF04478B2: PK11_CreateGenericObject (pk11obj.c:1359)
==31922== by 0x397D40984F: nss_create_object (nss.c:350)
==31922== by 0x397D40993C: nss_load_cert (nss.c:412)
==31922== by 0x397D43F238: Curl_nss_connect (nss.c:1129)
==31922== by 0x397D436758: Curl_ssl_connect (sslgen.c:199)
==31922== by 0x397D40F70F: Curl_http_connect (http.c:1307)
==31922==
==31922== LEAK SUMMARY:
==31922== definitely lost: 312 bytes in 3 blocks
==31922== indirectly lost: 0 bytes in 0 blocks
==31922== possibly lost: 2,000,465 bytes in 17,637 blocks
==31922== still reachable: 602,346 bytes in 2,609 blocks
==31922== suppressed: 0 bytes in 0 blocks
==31922== Reachable blocks (those to which a pointer was found) are not shown.
==31922== To see them, rerun with: --leak-check=full --show-reachable=yes
==31922==
===

Another possible way to save memory could be to de-link from the numerous libraries. Even though I directly link with just 10 libraries at compile time, ldd(1) and pmap(1) show that the server is actually linked with some 38 differerent dynamic libraries, all of which add-up to the total memory consumption of the server.

Hope this works-out well now. 🙂