Archive for February, 2007

My previous entry about rsync segafults was obviously following the wrong lead.

I again stumbled upon the segfault problem when syncing a hundred-something thousand files. It was hard to debug the problem since it was not easy to reproduce. Furthermore, the multiple forks of the rsync process do not allow for easy debugging.

I tried to follow the child process from gdb, but I was getting transferred to the parent when the error occurred (a SEGFAULT in the child and a SIGPIPE in the parent… the former taking precedence I guess) and I was left with no way of knowing where the segmentation fault was caused.

I eventually managed to debug both parent and chlid with two gdb instances. The problem — a segfault in rsync_acl_free (or some similar function… not sure anymore). Well, one thing’s for sure — running rsync without -A surely does complete without a hitch. I’ll have to debug this properly later on.

I’ve been using my two 500G drives in a mixed RAID1/RAID5 array. The drives are partitioned in three — 100M, 10G, and 460G partitions. The first partition of every drive is in a RAID1 — this is my boot partition. The 10G partitions are also in a RAID1 (the root partition). And the big partitions are assembled in RAID5. All is fine, and testing each RAID device with hdparm revealed reading performance equal to that of a single hard disk — about 60MB/s.

Today I added another disk. Same partitioning — I grew the raid5 array to become double the size and I also added another device to the raid1 arrays so that there are three mirrors. Curious as I am, I again tested the RAID devices with hdparm. The results were intriguing — the RAID 5 array read at 120MB/s while the RAID 1 arrays were still stuck at 60MB/s. It is also interesting that when there were only two drives in the RAID 5 array hdparm -t reported 60 MB/s.

I figure that the 64k chunks on the RAID5 array force reading from different drives, while when the kernel is allowed to choose where to read from in the RAID 1, it simply goes for a single non-busy drive. Interleaving the reads on RAID 1 would have been nice but I guess I’ll figure it some other time.

Reading performance of hard disks and RAID arrays This figure illustrates the read performance when reading from the first, second and third hard disk and the three RAID arrays in turn. It is interesting to see that when reading from md1 (the root partition and a RAID1 array) we see reads from different drives that are sequentially switched over — it starts reading from sdb and then switches to sda. When reading from md2 (the RAID5 array), however, we see concurrent reading at a constant speed from all drives.

I recently discovered that my coding skills have atrophied in the last year. The problem — I haven’t had a chance to do anything at work. There are two possible solutions that I see: either get another job or find some projects to work on in my free time (plenty of that). I am thus looking for project ideas. Here is a quick list to serve me as a reminder:

  • Web based multi-protocol IM
    I am thinking this one can be done with libpurple for the IM protocol communication. This idea may be stupid and/or useless. I am thinking of using a J2EE application server and doing Java bindings for libpurple (possibly with JNI). It’s an overkill, but I need to practice my Java. Or, I could use jClaim which is also GPL, but it is not maintained and I couldn’t find the source for the separate protocols.
  • Enhance Ekiga.
    I have previously contributed to this project, maybe it is time to get back to working on it. I previously had the desire to implement support for driver separation for the audio input and output. This one should be a no-brainer, but I don’t have the required hardware at present. Maybe I could borrow some.

Update: I actually went for solution number one — got another job.

MS Word error message
MS Word at work just complained that it cannot find the file that I told it to open. The solution was to use the File -> Open menu and choose the “Open and repair” button.