Resurrecting Drezn Content

The original Drezn used a custom Berkeley DB/GnuDB backend which is described as much as it needs to be in the About text. Out of historical interest, or something, I’ve spent some trying to extract that content and convert it so it can be republished. So this is about how that’s been going.

As can be inferred from that introduction, this is at best going to be a multipart article on how I did that. So far it’s not looking good.

Overview of the problem

The first major problem to overcome is just how long ago this site was up. This has a couple of different symptoms.

  • Twenty years ago is a long time to remember details about a hobby project that was in use for maybe a couple of years. (Hexdumping the database, I was surprised to see one of the last entries was a couple of years after I was sure I’d abandoned it.)

  • I haven’t used PHP for 17 years, at least. I can’t remember exactly what version I was using. After looking at the code a bit and looking around a bit such as what PHP packages are currently available in modern distros, I remember it was either 4 or 5. I originally thought 4 but at some point it looks like I renamed all the files from .php4 to .php.

  • The database. To save $5 a month from the hosting service, I elected to use BDB/GDBM instead of SQLite, which I believe was the database option they had. This means at the very least I have to dump the database and figure out how I used the key-value pairs, but I think that’s straightforward, something like the keys being the field name concatenated with the article number. But it turns out to be wayyyyy worse than that.

I don’t expect to do anything with the PHP, and in fact am hoping I don’t even need it to pull out the content. So long as I can get stuff out of the database files, I’ll just convert it to Markdown and go from there. There are some static-content pages I might do something with, but that won’t be PHP.

So, to unravel the database. Let’s have some fun with that.

(I’ll skip over where I have to search through various archives of old systems I have scattered over systems in my home data centre.)

First attempts: Python on Mac OS X

The first thing I do is copy the database files over to my regular-use workstation and see what I can learn about the file. I know I probably won’t go too far with it here unless I’m lucky—I don’t want to install a bunch of random and/or legacy software and then have to clean it up again. But let’s have a look.

$ file article.db
article.db: GNU dbm 1.x or ndbm database, little endian, old

For those of you not familiar with it, file is a utility on Unixish systems that tries to tell you what program is used for file or what format it has. For example, for the main PHP script in my old blog software, file would report blog.php: PHP script text, ASCII text, with very long lines.

Attempts to use Python’s DBM module were not fruitful. I would get things of this ilk:

>>> import dbm
>>> db = dbm.open('article.db')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../dbm/__init__.py",
line 94, in open
    return mod.open(file, flag, mode)
_gdbm.error: Malformed database file header

Well that’s not a good look.

I used some hexdumping shiznitz to see if I can tell that the file is corrupted or something and to see if the format is obvious enough I could reverse-engineer it. (I still might come back to semi-manually pulling the data out, if there’s documentation on it or I can at least figure out length-of-record pieces or whatnot.) I don’t see any obvious corruption and the file seems to be intact enough that file can read its magic markers.

(I have to admit at this point the file does contain, for unknown reasons, part of one of the PHP classes that managed the data, but I know this was in there early back when and probably was due to an early programming mistake.)

Python’s DBM supports multiple DBM formats, but I don’t spend too much time on this. All signs point to Gnu:

>>> dbm.whichdb('article.db')

Of course this could be how it looks on a system without Berkeley DB installed. But like I said, I don’t want to install anything for this on my current workstation.

Spinning up an old distro

The version clue in the output from file as well as some stuff I found on the Internet, along with earlier suspicions given the age, led me to wonder if the format was too old to be read by more modern implementations of Gnu DBM. So maybe if I spun up a system running an OS from that era I would have better luck.

Finding one was no problem. Ubuntu and CentOS both keep archives of old distributions. Back when I started Drezn I was using Slackware—had a hat at one point, wish I still had it—but I stopped using the more bespoke distros around ten years ago and have been using Ubuntu for personal stuff for the past four, so I chose Warty Warthog. This is the oldest distribution they have available. I haven’t looked it up but given this is Ubuntu version 4 this sounds a couple of years more recent than what would have been running at the ISP in 2001 or so, when I started this blog.

I downloaded a 64-bit ISO and created a new VM in VirtualBox. Booting it up, I soon ran into my first problem in running an old distro.

Problem the first

The ISO booted up and everything was running smoothly until it was time to partition drives. There weren’t any drives available. That seemed odd—but I soon realized it was looking for IDE drives, which was the default back then. Rather than search around for SATA support, if that was a thing then, I just reconfigured the VM to mount its main drive as an IDE device.

This got me through the main install. I created a user and so on, and it was a less involved process than usual, probably because the install isn’t as sophisticated, there are fewer packages, I skipped stuff I usually don’t skip—and a lot of security stuff, like firewalling, was not in the default install back then.

Problem the first-and-a-half

Once the system was up and I was logged in, I did what I usually do first: update the system. This failed immediately. The mirrors no longer support an 18-year-old release, for some reason. I set this aside, however: if I didn’t need anything outside of what was on the ISO, I didn’t need to spend time on this, given the VM was going to be removed once it had fulfilled its purpose.

The next thing I do with a VM on my own machine is set it up so I can use SSH. That way I can not only use my preferred terminal program, but it’s also necessary for transferring stuff over. I had to install SSH server off the ISO (because it wasn’t installed by default back then, and in case you’re wondering, neither were telnetd or ftpd, which frankly surprised me!) and started it up. And then I ran into something an unexpected difficulty.

Problem the definitely second

I created the necessary configuration in ~/.ssh/config and tried to connect. I got this:

$ ssh warty
Unable to negotiate with port 2232: no matching key exchange method
found. Their offer: diffie-hellman-group-exchange-sha1,diffie-hellman-group1-sha1

Welp, that’s a new one, but it makes sense. Key exchange algorithms, like ciphers and, erm, message authentication code (MAC) algorithms, evolve as new and better ones are developed and old ones are demonstrated to have security issues or are just no longer needed. The only trouble I had here was figuring out how to make the older SSHd on the Warty VM tell me what KEXes were available; I assumed the newer SSH on my workstation no longer recognized the offered methods due to security issues. Turns out this wasn’t the case, though. Probably that’s why it’s not accepted by default, but all I had to do was add the following to .ssh/config for that host, and I could get in:

KexAlgorithms +diffie-hellman-group1-sha1

With that done, I was able to upload the old site’s tarball, and see about dumping out these goofy database files.

I was somewhat reassured there might be hope by the different results I got for file on this old box:

$ file articleDB
articleDB: GNU dbm 1.x or ndbm database, little endian

Spot the difference? It doesn’t snootily add old to the end. But I was disheartened a little to see the 1.x. If this was an old-enough distro, it wouldn’t recognize there being a possibility of a 2.x version database.

Problems the third through fifth or somethingth

I tried a bunch of different things. Nothing worked.

  • A Perl script that dumps out key/value pairs from a DBM file complained it couldn’t open the file, same as elsewhere.

  • Couldn’t read it with Python either.

  • I built a suitably old version of PHP and couldn’t get anything out of it. It would fail with Driver initialization failed for handler: gdbm: File seek error.

It was in building PHP that I found that Ubuntu actually still has APT repositories available. They are at old-releases.ubuntu.com. This is great, because we talk about open source being a way to keep old data in old formats available, but without access to the old software, we don’t realize the benefit.

Anyway, basically, everything I tried led to the same thing: the driver was unable to read the files.

I downloaded an older version of GDBM and tried to use that. Nope. It occurred to me at some point that I was running a 64-bit version of Warty at a time when 32-bit was still the norm and definitely what I was using and probably what the ISP used, so I built a second VM using 32-bit Ubuntu. No change except at some point in rebuilding PHP against a different version of GDBM, I started getting a different error message:

Standard Message: DB Error: extension not found

This was what I was trying to figure out when I crossed the more-yawns-than-fresh-ideas threshold, and stopped for the night.

One thing that’s tickling at the back of my head is that, although the drivers all say GDBM, I was pretty sure the ISP used Berkeley DB, which was a popular semi-commercial variant and at the time better respected and more widely used than Gnu DB, I recall. The fact that file and Python are detecting GDBM might simply be their best guesses based on the system configuration and what’s installed: programs can install their own helpers for file to recognize their formats. So the next step is to install and run Berkeley DB, which is still available even though Oracle bought it.

Final thoughts for this time around

One thing as I’ve worked through this so far is I’ve seen some of those old posts and some of them are pretty goofy. They still have value for me, though, and remind me about my life then details I have forgotten. There are old photographs as well, like of a motorcycle I spent some time on before we started a family.

It’s fun though, and it’s a puzzle, so that ought to be enough, right? Not everything I do has to have some grand purpose.