May 18, 2003
Kernel Upgrade

Sambal was down for a few hours yesterday while I upgraded the kernel on this server.

You know that old joke: "Doctor, doctor, it hurts when I do this!" "Then don't do that!". Okay, imagine that the second line is said by a 35-year-old Czech guy in feigned exasperation. "DON'T DO DAT!" Got it? Good, because you're going to need it in the following story.

Rather than upgrading the old-fashioned way (download kernel source, unpack, configure, build, install), I decided to use my distribution's features. So I decided to download and install the Trustix kernel RPM's. (DON'T DO DAT!)

Actually, I was so distributionally correct that I decided to follow the instructions in the Trustix kernel upgrade document. (DON'T DO DAT!) So I printed the page out for handy reference.

Okay, get the files, check. I did that last week. Instructions for installing the files are at the bottom of the first page:
# rpm -Fvh kernel*.rpm
That's easy. Turn the page and read

Note that this will remove the old kernel from your system. To only install the new kernel and not touch the source or anything else...
So I just blew away my current kernel. (DON'T DO DAT!)

Create a new initial ram disk, yeah, no problem, we don't use no stinking SCSI. Update lilo. Make a boot disk (with the new kernel, too late to make one with the old kernel). Boot.

First we're going to fsck everything because I used to have 270 days of uptime, and 270 days exceeds the check count on my partitions. Wait half an hour. Then we start booting in earnest. All disks are up. One out of three ethernet interfaces comes up (the internal one). Both external interfaces are toast. This means that it takes forever to start the network services, as each one waits five minutes before deciding it can't find a name server, and times out. Guess I shouldn't have passed on my chance to do interactive startup.... (DON'T DO DAT!)

So I finally get into a root shell and start poking around. That's funny -- the NE driver won't load. I try to force it. That shell hangs. I try from another shell. That shell hangs. Great: anything that touches the network is causing a hang.

I realize that I am trying to use the modules from the old kernel version because my /etc/modules.conf explicitly refers to the old kernel version. I change that. modprobe ne. What do you mean, module not found?

I remember that Trustix doesn't ship any ISA net drivers (Why? Because nobody uses ISA anymore. Except for me. F you too, you supercilious Norweyans!) So I start rebuilding the kernel and modules using the config from my last kernel (thankfully not deleted).

The build is taking too long, so I decide to stop it and reconfigure with fewer modules. I'm throwing modules overboard like a Titanic officer chucking poor people into the North Atlantic. Ftape? Don't need that. WiFi support? If I ever go WiFi, I'll recompile. SCSI -- toss it. make dep; make. Wait -- I need SCSI to use my CD-Rewriter. Argh! Stop the make, reconfigure, make dep; make.

Somehow during this time I manage to lock up the remaining three consoles, so I have only one console left and it's running the compile.

I try logging in from my workstation, but no dice: my workstation can't see the server.

So I stop the compile and try pinging the workstation from the sever. Pause. Nothing is happening. I think, A ping is a network packet. I just used the network. My last remaining login shell is hung. (DON'T DO DAT!)

Push the reset button. fsck runs again. I boot into single-user mode and finish the kernel compile, modules compile, and installation of everything. Reboot.

Still only one ethercard. Try preloading the ethercard driver in the initial ram disk. Still only one ethercard.

I fiddle around for another ten minutes and figure out that the card moved its IO address from 0x330 to 0x300 without telling me. Thanks!

So I get all the ethernet cards up. But now all my IP addresses have changed, because my ISP's (Evil Cable Co. and Evil Telephone Co.) have decided that by being down for three hours, I've abandoned my DHCP lease. I beat their DHCP servers with the knobby club that is dhcpcd until they give me my preferred IP's back.

We're up.

And we still don't have working equal cost multipath routing. After I forget how painful this upgrade was, I'm going to 2.4.

Posted by Sam at 12:02 PM