Expanding a leading tilde in C/C++

If you’re writing an app that accepts a path to a filename as user-input or in config-files, you’ll have to be able to parse the famous leading tilde and expand it to the correct home directory of the correct user. For example, if I enter “~/.vimrc” it needs to be expanded to the file in my userdir “/home/david/.vimrc” before you can do anything with it. You can use “word expand” or wordexp to accomplish this.

Here’s a sample application showing how:

#include <stdio.h>
#include <wordexp.h>
 
int main(int argc, char* argv[]) {
	wordexp_t exp_result;
	wordexp(arv[1], &exp_result, 0);
	printf(exp_result.we_wordv[0]);
}

That should pretty much tell you everything you need to know. Here are some of the results output by this app:

~/.vimrc becomes /home/david/.vimrc
.vimrc becomes .vimrc
~.vimrc becomes ~.vimrc
~blacky/.vimrc becomes /home/admin/blacky/.vimrc (blacky’s homedir is /home/admin/blacky)

As you can see, it handles pretty much every situation correctly.

Gmail Carbon Copy

Today I’d like to introduce Gmail Carbon Copy, an application I’ve coded during the last couple of months. The latest version is stable and works, so I’m deeming it fit for public consumption.

Gmail Carbon Copy, or Gmailcc simply creates a back-up of your Gmail. It differs from existing alternatives because of two clever tricks: each mail is downloaded only once instead of once for every label while still saving the labels, and they’re stored in an actually usable, sparse Maildir format.

Gmail’s IMAP implementation is unique in that it maps labels to folders. The same mail will appear in different folders for every label attached to it. Regular IMAP clients like Thunderbird or getmail think each copy of the mail in a different folder is a new mail, and will download it again, even though it might just be a copy of a mail it downloaded earlier. Gmailcc detects “doubles” and will download each mail just once. Backups, especially the initial one, will finish much faster because of this and will take far less traffic.

Saving it in a usable Maildir format has the advantage that any regular mailserver like Courier can access your backup. It’s very practical: I’m using Gmailcc and Roundcube to access my mails on a webinterface if Gmail is down. It’s sparse because every mail is saved only once, while for every label a sizeless link is created instead of a true copy. This minimizes the space used to store the backup.

There are still some issues but it shouldn’t make your PC explode or kill your Gmail account. If you encounter bugs or would like to have features added, I encourage you to sign up and add a ticket.

Gmail Carbon Copy is open source (C++), licensed under the MIT license and works only on Linux at this time.

Munin and Apache: Can’t locate object method

If you’re using Munin to track statistics on your server and you’re trying to use any of the Apache plugins, you might have some trouble getting it working. If Munin won’t display any statistics on Apache, and the munin-node.log logfile is filled with lines like these:

Can't locate object method "new" via package "LWP::UserAgent"
at /etc/munin/plugins/apache_processes line 152.
2009/07/08-17:00:02 Plugin "apache_processes" exited with status 512. ----
Can't locate object method "new" via package "LWP::UserAgent"
at /etc/munin/plugins/apache_accesses line 130.
2009/07/08-17:00:03 Plugin "apache_accesses" exited with status 512. ----
Can't locate object method "new" via package "LWP::UserAgent"
at /etc/munin/plugins/apache_volume line 130.
2009/07/08-17:00:03 Plugin "apache_volume" exited with status 512. ----

then the solution is to install the package libwww-perl which includes the required LWP:UserAgent package.

Make sure to restart munin-node afterwards:

$ /etc/init.d/munin-node restart

On the Redundancy of the Password Inputbox

We all know and love the password inputbox. It hides all the characters you type with stars, and encrypts the contents stored in memory. It’s about the only constant in the potpourri of user registration pages. It’s the part no site ever gets wrong — use a password inputbox when asking the user for their password. But what function does it serve? It’s simple:

To hide your password from bystanders, innocent or otherwise.

That’s the one and only reason why we obscure the characters with stars. That co-worker sitting next to you, or the coffee-lady casually walking by, if it weren’t for the trusty password field they could have spotted and accidentally memorized your password while you’re entering it. It’s a great solution to a very real problem.

Redundancy @ WordPress.com

Redundancy at WordPress.com

All this is common knowledge, of course. So why am I repeating it? Because surprisingly, for most sites it’s redundant. All the websites out there that send your password by email, or show it when you’ve clicked the “activate account”-link are nullifying the sole reason of existence for the password field.

Since the user’s password is displayed on the screen in an e-mail, that coffee-lady can look at the password anyway. Worse: oft-times the user doesn’t know what’s coming when opening the mail or clicking the activation-link. He can’t pre-emptively check if anyone is in his vicinity before unknowingly revealing the password on his screen, which is an option when entering the password in a regular inputbox.

The conclusion is simple: if you think you can send the user his password by mail or show it in clear text on his profile, stop using the password inputbox. It won’t increase the level of security. By then, it only serves to annoy the user who has to enter his password blindly, twice even, possibly making an error along the way and having to try again. It’ll also tell the user the real degree of security you’re using, instead of fooling him with the asterisks.

(The real conclusion is of course to never show the password in cleartext, anywhere)

Hide Data in Bad Blocks

This is part 3 in a series on how to hide your data.

First of all, the methods explained in this series are not secure. Anyone with some low-level knowledge of filesystems can tell there’s hidden data when looking at a raw image of your disk. Always complement these methods using encryption and plausible deniability methods. TrueCrypt is an excellent way to do this.

Introduction

When a sector on a disk gets damaged, it becomes unusable.  Modern disks have spare sectors that are used to replace these bad sectors, so they’re handled and fixed automatically. If you’re young enough, you might never have witnessed these bad sectors, because modern hardware handles them transparently.

When the disk runs out of spare sectors, or never had any in the first place (like 3.5″ disks, or very old hard disks), the filesystem is the second line of defense. Inside the filesystem a list of known bad blocks—blocks on bad sectors—is stored. The filesystem takes care not to use these blocks and just skips them.

We can’t force the disk to remap certain blocks to spare sectors, but we can tell the filesystem which blocks have (supposedly) gone bad. If the blocks aren’t really damaged, any data we put there will never be touched, because the filesystem thinks it’s garbage anyway. That, is exactly what we’re going to do.

Practical

To keep it simple and fast, we’ll hide a whole partition inside a burst of bad blocks. The partition we’ll create has to be small and reside somewhere in the middle of the disk. We can’t put the partition at the beginning or the end of the disk, because most likely the filesystem requires an intact header at the start and end of the partition.

Partition inside Bad Blocks

The partition has to be small enough to be able to fit inside the non-secret partition while not arousing suspicion. Some operating systems mark bad blocks as used blocks, which means if we put a 100MB partition inside bad blocks, the “parent” filesystem will always have at least 100MB in use. This could arouse suspicion when there aren’t any files on it.

I’ll be using my trusty 256MB Compactflash card for this, which is excellent for illustratory purposes.

Here’s what sfdisk has to say about it:

$ sudo sfdisk -l /dev/sde

Disk /dev/sde: 1009 cylinders, 9 heads, 56 sectors/track
Units = cylinders of 258048 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start     End   #cyls    #blocks   Id  System
/dev/sde1          0       -       0          0    0  Empty
/dev/sde2          0       -       0          0    0  Empty
/dev/sde3          0       -       0          0    0  Empty
/dev/sde4          0       -       0          0    0  Empty

We can see the card is comprised of 1009 cylinders. I want to create a partition of about 20MB, which is about 82 cylinders on this disk (see the second line of sfdisk -l). Because we can’t create the partition at the start of the disk, let’s put it 214 cylinders in:

$ sudo sfdisk /dev/sde << EOF
214,82,6
EOF

Just like before, put FAT16 on it and transfer your secret data.

$ sudo mkfs.vfat -F16 /dev/sde1
mkfs.vfat 2.11 (12 Mar 2005)

If you want, you can copy the current partition table to the back of the disk for easy restoring, just like in the previous article.

Unmount it, and remove the partition:

$ sudo sfdisk /dev/sde << EOF
0,0,0
EOF

Now create the parent partition. This should at least encompass the whole secret partition. If you’ve copied the partition table to the back of the disk, make sure to leave at least the last cylinder free.

$ sudo sfdisk /dev/sde << EOF
,,6
EOF

Creating Bad Blocks

We need to calculate what blocks our secret partition resides on so we can mark them as bad. We know it starts at cylinder 214 and is 82 cylinders in size. Since on this disk, a cylinder is 258048 bytes big, the secret partition starts at byte 55222272. Divide this by the size of one block, which is 1024 bytes, and we get block 53928. Do the same for the size of the partition, and we find that 82 cylinders equal 20664 blocks. Now we know our partition starts at block 53928 and ends at block 74592. We’ll use a margin of 10 blocks on each side just in case our calculations aren’t precise.

Since we’re putting a FAT16 filesystem on it, we need to tell mkfs.vfat what blocks have supposedly gone bad. This is done by using a bad blocks file, which is a text-file with the address of each bad block on a new line. Let’s create our bad blocks file:

$ seq 53918 74602 > /tmp/badblocks

If you open /tmp/badblocks, you should see something like this:

53918
53919
53920
53921
...

To create the filesystem, we pass the bad blocks file using the -l parameter:

$ sudo mkfs.vfat -n "Camera" -l /tmp/badblocks /dev/sde1
mkdosfs 2.11 (12 Mar 2005)
20685 bad blocks

That’s it! You can now use your disk to your heart’s delight, nothing will touch your secret partition. One awesome way is to put the card in your camera and take some pictures with it. Your data will remain safe, and there’ll be nothing suspicious about a 4GB card “missing” some megabytes.

Revert

If you’ve smuggled your secret data across state borders, you’re ready to recover the secret partition. Just recreate the partition table to contain the secret partition:

$ sudo sfdisk /dev/sde << EOF
214,82,6
EOF

That’s it! You can even reuse the setup: by switching partition tables you’re effectively changing which partition is “active” on your card, and changing data in either partition won’t affect the other.

Advantages

  • Pretty much undetectable
  • Infinitely reusable
  • Bad blocks are less suspicious than unallocated space

Disadvantages

  • Quite complex to set up
  • Possibly suspicious size discrepancy in empty filesystems

Hide Data in Invisible Partitions

This is part 2 in a series on how to hide your data.

First of all, the methods explained in this series are not secure. Anyone with some low-level knowledge of filesystems can tell there’s hidden data when looking at a raw image of your disk. Always complement these methods using encryption and plausible deniability methods. TrueCrypt is an excellent way to do this.

Introduction

In the first article we learned about the Partition Table and how it identifies the partitions on our storage device. We also saw how to hide a partition using the standard method of flipping the 5th bit of the partition ID. From this moment on we’re stepping off the tracks and will use the tools at our disposal for things other than they were intended.

The Partition Table, Redux

Clever readers will have seen it coming when they read about the partition table in the previous article. Without those 64 bytes at the beginning of the disk, no one would know what partitions exist and where they are located. So that’s exactly what we’re going to fiddle with.

If we change the Partition Table, we don’t actually touch any of the real data on the disk. It’s the same thing with books: even if you remove the table of contents, you can still read the book, it’ll just be harder to find one specific chapter. If we remove the entry of a partition in the partition table, we’re not actually removing the partition, but just the info needed to know where it is. If you memorize this info, which are only 3 numbers, you can later add it back to the table, and access your data again.

Practical

A card with no partitions at all is suspicious, so we’ll create two partitions, and hide one of them afterwards.

Once again, we’re using sfdisk:

$ sudo sfdisk /dev/sde << EOF
> 0,500,6
> ,508,6
> EOF

This is the result:

david@Seven:~$ sudo sfdisk -l /dev/sde

Disk /dev/sde: 1009 cylinders, 4 heads, 62 sectors/track
Units = cylinders of 126976 bytes, blocks of 1024 bytes, counting from 0

Device Boot Start     End   #cyls    #blocks   Id  System
/dev/sde1          0+    499     500-     61999+   6  FAT16
/dev/sde2        500    1007     508      62992    6  FAT16
/dev/sde3          0       -       0          0    0  Empty
/dev/sde4          0       -       0          0    0  Empty

Put a FAT16 filesystem on the second partition…

$ sudo mkfs.vfat -F16 /dev/sde2
mkfs.vfat 2.11 (12 Mar 2005)

…mount it, and save your secret data on it.

Hang tight, here comes the dirty bit.

We know our secret partition starts right after the first partition, and is exactly 508 cylinders in size, with 0×6 as ID. You can memorize this data, or just copy the whole partition table to the end of the drive:

$ sudo dd bs=1 count=64 skip=446 seek=128118720 \
> if=/dev/sde of=/dev/sde
64+0 records in
64+0 records out
64 bytes (64 B) copied, 0.0282496 s, 2.3 kB/s

The Partition Table always starts at byte 446, so we skip those first few bytes. Byte 128118720 is the start of the last 64 bytes on my drive. You can calculate this by multiplying the size of a cylinder times the amount of cylinders—both can be found using the output of sfdisk -l —and subtracting 64. Note that we made sure our two partitions don’t fully utilize the disk, but leave 1 cylinder free, so that the last 126KB at the end of the drive are free for us to use.

Now let’s remove the partition from the partition table:

$ sudo sfdisk /dev/sde -N2 << EOF
> 0,0,0
> EOF

Our partition has magically disappeared. No operating system will be able to find the missing partition, but there exist special tools to recover the partition table. They do this by scanning the whole drive and looking for patterns that look like the beginning of a partition.

The one visible partition will obviously be of a smaller size than the whole drive. If for example you’re using a 2GB SD-card and want to avoid suspicion, replace the label with one from a 1GB SD-card, and make sure the visible partition is 1GB in size. This way, the only way to notice something is amiss is to run a partition editor and notice there’s a large chunk of unallocated space at the end of your drive.

The Invisible Partition in GParted, not quite invisible.

Revert

When you want to access your data again, you can just use sfdisk to recreate exactly the same partition using the numbers you memorized:

$ sudo sfdisk /dev/sde -N2 << EOF
> ,508,6
> EOF

Or overwrite the partition table with the copy we made at the end of the drive:

$ sudo dd bs=1 count=64 skip=128118720 seek=446 \
> if=/dev/sde of=/dev/sde

Both methods don’t touch any of the data on the actual partitions, so are pretty safe to use, as long as you remember where your partition is located, and not format the partition afterwards.

Advantages

  • Almost undetectable
  • Not accessible without changing the partition table (i.e. doing pretty advanced stuff)

Disadvantages

  • Possibly suspicious size discrepancy
  • Detectable using partition editor

Hide Data in Hidden Partitions

This is part 1 in a series on how to hide your data.

Introduction

First of all, the methods explained in this series are not secure. Anyone with some low-level knowledge of filesystems can tell there’s hidden data when looking at a raw image of your disk. Always complement these methods using encryption and plausible deniability methods. TrueCrypt is an excellent way to do this.

Second, these methods will destroy your data if you’re not careful. Use them at your own risk, and only on data you have backed up very well. These methods shouldn’t destroy your disk or memory card, since we’re purely toggling bits. However, I guarantee nothing. These methods should work on any general data storage device, be it hard disks, usb keys, or flash cards.

The Partition Table

The first sector on every disk contains the partition table. These are 64 bytes divided in 4 records of 16 bytes, one for each primary partition. This explains the mystery of why you can only create 4 primary partitions on a disk. Like most arbitrary limitations this is a remnant of history.

Next to parameters like the start and the size of the partition, these records also contain the partition-type descriptor, which is an 8 bit ID identifying the filesystem on the partition. We’ll call it the partition ID or ID from here on. In hexadecimal, the ID for FAT12 is 0×01. For ext2, reiserfs, and various other linux filesystems the ID is 0×83. Here’s a list of all the partition ID’s. Note that these are not regulated, and that the filesystem creators can decide for themselves what ID their system has. The partition ID is used by the OS to check if it can mount the specific filesystem on that partition or not, before actually trying to mount it.

Using sfdisk we can check out the partition table:

$ sudo sfdisk -l /dev/sdd

Disk /dev/sdd: 1009 cylinders, 9 heads, 56 sectors/track
Units = cylinders of 258048 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start     End   #cyls    #blocks   Id  System
/dev/sdd1          0+   1008    1009-    254267+   6  FAT16
/dev/sdd2          0       -       0          0    0  Empty
/dev/sdd3          0       -       0          0    0  Empty
/dev/sdd4          0       -       0          0    0  Empty

This partition table comes from a 256MB compactflash card (on my PC, device /dev/sdd). As you can see, it only has one partition, encompassing all 1009 cylinders (minus 1 sector, see the addition and subtraction signs), and having ID 0×6, which is the standard for FAT16. This doesn’t mean that there’s a FAT16 filesystem on that partition, though. It just means that there’s probably a FAT16 filesystem on there.

The Standard Method

As weird as it sounds, there’s actually some kind of “standard” on hidden partitions. Using this method you’re not really hiding the data as much as putting it in a corner where no one can see it unless they turn their heads. Every operating system and partition manager will recognize it as a ‘hidden partition’, and thus, it’s not really hidden. It even gets mounted by default in certain Linux distributions.

Why use this then? It’s useful when you need to install multiple legacy operating systems that don’t like to work together (Windows, I’m looking at you here). Grub, a linux bootloader, actually has the commands hide and unhide, which implement this method. It’s also a quick and easy, non-desctructable method to make sure the data can’t be accessed without doing some effort. Useful to hide data from a layperson.

The method is simple: flip the 5th least significant bit of the partition ID. The 0×6 (binary 00000110) for FAT16 becomes 0×16 (000010110). The 0×83 for Linux partitions becomes 0×93. Let’s say we want to hide the partition on my compactflash card:

$ sudo sfdisk --change-id /dev/sdd 1 16

Ta-da! You’ve now officially hidden your partition. The “1″-parameter is the number of the partition on the specified disk you want to change. Change it to 2 if you want to change the second partition, etc.

Here’s how the table looks like now:

$ sudo sfdisk -l /dev/sdd

Disk /dev/sdd: 1009 cylinders, 9 heads, 56 sectors/track
Units = cylinders of 258048 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start     End   #cyls    #blocks   Id  System
/dev/sdd1          0+   1008    1009-    254267+  16  Hidden FAT16
/dev/sdd2          0       -       0          0    0  Empty
/dev/sdd3          0       -       0          0    0  Empty
/dev/sdd4          0       -       0          0    0  Empty

As you can see: hidden, but they still know it’s there.

Advantages

  • Standard, supported by many OS’s and applications
  • Easy and fast to hide and unhide

Disadvantages

  • Standard, thus easily detected
  • Mounted by default in linux, which easily defeats the purpose

IRC Quote (2)

Zeus WPI has an IRC channel which at any one time contains two dozen geeks discussing a myriad of topics ranging from the latest XKCD to the physics behind not being able to reach absolute zero.

Such a cornucopia of madness wouldn’t be complete without a bot for certain administrative tasks such as keeping stats on each user.

The “riddle” I posted last year is a question we pondered about for a few days back then:

We keep a log-file of all that is said on our IRC channel. What’s the fastest way to extract one random line said by a specified person from that file, with every line having equal chance of being picked.

This we would use to implement a “quote”-command in our custom-made bot, which returns a quote for the named person.

To keep the problem interesting, no “persistent” data can be kept in memory over multiple queries, such as an index or a counter.

Adhemar was the only person to propose a solution, but we also asked our professor for Datastructures & Algorithms, Gunnar Brinkman. As it turns out Adhemar’s solution was very close to the one Prof. Brinkmann suggested.

Brinkmann’s Algorithm

This is the algorithm we were using:

totallines = 1
while not eof(logfile) do
   currentline = readline(logfile)
   if (rand() mod totallines) == 0 then
      currentqoute = currentline
   totallines++
done

In plain English:

For every line i, pick that line with chance 1/i.

Adhemar’s Algorithm

Adhemar’s solution however, is a tad faster on a real-life system because it does not need the relatively expensive mod-operation for every line:

currenthighest = 0
while not eof(logfile) do
   currentline = readline(logfile)
   currentrand = rand()
   if (currentrand >= currenthighest)
      currentquote = currentline
      currenthighest = currentrand
done

Or:

The player who rolls the highest dice gets picked.

Emperical data suggests the second algorithm is about 1% faster than the first. It’s obvious that this problem is an I/O-limited one, so these algorithms are probably as good as it gets without storing any data in memory.

Although the problem is relatively simple, the interesting thing to remember here is how to randomly pick an item from a set with an unknown amount of items.

Zenity and rsync

Zenity is a neat little tool to create simple GUI’s for your shellscripts. One of its most useful features is the progress dialog, which allows one to show the progress of a command using the all-familiar GTK progress bar.

Zenity and rsync

Zenity uses pipes to send commands to the dialogs. Any number sent to the Zenity instance while in progress mode will make the progress bar move to that number as the percentage completed. Any text that starts with # is set as the label above the progress bar.

Here’s an example shamelessly stolen and abbreviated from the manual:

        #!/bin/sh
        (
        echo "10" ; sleep 1
        echo "# Updating mail logs" ; sleep 1
        echo "20" ; sleep 1
        echo "# Resetting cron jobs" ; sleep 1
        echo "75" ; sleep 1
        echo "# Rebooting system" ; sleep 1
        echo "100" ; sleep 1
        ) |
        zenity --progress \
          --title="Update System Logs" \
          --text="Scanning mail logs..." \
          --percentage=0

To shape the output of a real application into data fit for Zenity mostly requires some creative awking. I couldn’t find an example to parse rsync output, so I made this awk-script to show the progress of an rsync operation:

{
   if (index($0, "to-check=") > 0)
   {
	split($0, pieces, "to-check=");
	term = substr(pieces[2], 0, length(pieces[2])-1);
	split(term, division, "/");
	print (1-(division[1]/division[2]))*100"%"
   }
   else
   {
	print "#"$0;
   }
   fflush();
}

Use it like this.

$ rsync -av --progress /media/disk/ ~/backup/usbstick/ |
   awk -f rsync.awk |
   zenity --progress --title "Backing up USB-Stick" \
      --text="Scanning..." --percentage=0

rsync.awk contains the awk-script above. Mind how we use the parameter progress to tell how far we’ve progressed. This results in the dialog shown above.

Autographed

Last Thursday I went on my annual visit to the Bookfestival in Expo, Ghent. While my dad was disappointed by the lack of good comic books this year, I was delighted being able to pick up Stephenson’s Confusion, Morrow’s The Last Witchfinder, Simmons’ Olympos and a boxed edition of Clarke’s Jonathan Strange & Mr. Norrell for less than €19. A steal!
Jonathan Strange & Mr. Norrell

I got even more excited when back home I turned the first page of Volume 1 of Jonathan Strange & Mr. Norrell: The signature of Susanna Clarke!
Susanna Clarke\'s Signature
I have no idea if it’s real or pressed on. There’s no impression of the pen, but it is in blue ink. Nonetheless, a pleasant surprise.