My spectacular screw up and how to gain chrooot access to a software RAID system from the liveCD

2006-11-24 02:11:28 PST

Tags: , ,

Wow did I ever mess up.

You may have noticed mindstab.net was missing for the last 7 hours. What caused that was a set of events and not enough care.

It started I’m not quite sure when, but sometime in the last few months when I upgraded Kvasir’s baselayout from 1.9 to 1.11. Then I forgot about it.

Yesterday I noticed that Wildfire, a Java Jabber server, had be un-hardmasked. I thought I’d give it a try. How could a Java Jabber server take mindstab.net down you ask? Wait. So I tried to install it but unfortunately the install failed. version 3.1.0 installed, but not 3.1.1. I filed a bug. Today I got some advice mentioning it may have something to do with my old hardened kernel, 2.6.11. A suggestion was made to upgrade to the recent stable 2.6.17 because some /proc access issues had been resolved and that might be related to my problem. So I installed the new kernel during the day but thought I’d wait till I got home before rebooting in case there was a problem, I could then at least just reboot with the old kernel.

So I got home, and I had a few minutes before I had to go to work, and so I rebooted. The new kernel worked fine, however as soon as init took over, everything went straight to hell. Init complained udev wasn’t installed (remember, new baselayout), and I’m also pretty sure that the newer kernel’s don’t provide the static /dev filesystem anymore. So now init couldn’t find any harddrives. Crash. The end. Time to go to work. The old kernel won’t make a difference because it’s init’s problem.

Ouch. Stupid on my part. Very stupid.

So I went to work and came home and prepared to address the problem. I opened up Kvasir. Kvasir is the smallest 1U rackmount server not much money could buy. The case was designed to hold one harddrive and a cdrive floppy drive combo. I removed the cddrive floppy drive combo and put a second harddrive in so I could have a RAID 1 setup (mirrored harddrives). Great, but now I had to unhook one of the harddrives and put in a CD drive, in cramped space. Got it done though. However I then had some trouble with my LiveCDs. My full 2006.1 didn’t boot for some reason. 2005.1 booted, but “humorously” when I gave it the -noX option, it still scrambled the root password and so wouldn’t let me log on. “Awesome”. I had another 2005 liveCD but raid support seemed sketchy.

I finally just downloaded the 2006.1 minimal ISO and burned it on Nika. That worked and booted, but no md* entries in /dev. I did some googleing and found three pages that combined provided what I needed. First I had to make sure the modules I wanted were loaded.

modprobe dm-mod
modprobe raid1

Then to get the lay of the land and remember my setup I ran

mdadm -E --scan

I then remembered the basic layout of my RAID setup (mdN = hda(N+1) + hdc(N+1)). Next I had to manually create the /dev/md* devices, and in my case I had to force their creation because one the two mirror harddrives was “missing” (unplugged).

mdadm -A /dev/md0 /dev/hda1 --run
mdadm -A /dev/md2 /dev/hda3 --run
mdadm -A /dev/md4 /dev/hda5 --run
mdadm -A /dev/md5 /dev/hda6 --run

The –run forces the mdadm command to ‘A’ssemble the array even with missing harddrives. This created the /dev/md* devices I needed to safely access my harddrive without screwing up the larger RAID setup.

Now that I finally had access to my harddrive it wasn’t so hard to fix the problem. I chrooted it and installed udev and then rebooted with fingers crossed. And it worked! Still, a few other boot up errors: the new kernel was giving iptables some guff, courier IMAP wasn’t starting, and the conf.d/net syntax was wrong. But there were more pressing matters first.

I powered down and unhooked the cdrom drive and plugged back in the second harddrive and powered up again. Boot whet as fine as before but a

cat /proc/mdstats

revealed that the second hardrive wasn’t being used yet (probably because it was out of sync and the system wasn’t sure what I wanted it to do with it).

Personalities : [raid1]
md2 : active raid1 hda3[0]
   10008384 blocks [2/1] [U_]

md4 : active raid1 hda5[0]
   20000032 blocks [2/1] [U_]
...

So, to get them re-enabled as mirrors, it turned out all I had to do was re-add them and the system would slowly re-mirror them.

mdadm /dev/md2 -a /dev/hdc3
mdadm /dev/md4 -a /dev/hdc5
...
cat /proc/mdstats
Personalities : [raid1]
md2 : active raid1 hdc3[1] hda3[0]
   10008384 blocks [2/2] [UU]

md4 : active raid1 hdc5[1] hda5[0]
   20000032 blocks [2/1] [U_]
   [====>....................] recovery = 32.2% (6200000/20000032) finish=21.3mi speed=10588K/sec

md5 : active raid1 hdc6[1] hda6[0]
   80000052 blocks [2/1] [U_]
      resync=DELAYED

That started, I moved on to sorting out my other problems by upgrading a few other packages and recompiling my kernel with some new netlink stuff for iptables.

And so mindstab.net is now returning to full capacities, but what a nightmare that was. I’m sorry for any inconveniences this caused anyone. I will start enforcing a stricter self policy of when I can do low level server work that requires reboots. And now I terribly need sleep.

References:

Crude benchmarks of a G3 compared to some x86 boxes

2006-11-21 20:51:08 PST

Tags: , , ,

Well, now that Bast was up and running I had to start testing it. I mean that’s why I bought a Mac. So I tossed on some of my old code, specifically, VM-Proto, Fast-Lang, and Primes.

VM-proto was a tiny virtual machine and compiler I wrote several years ago to learn about basic compilation to byte code, and execution of said byte code. I also then made it very portable in order to learn about low level portability issues like endian safe code. I was pleased when vm-proto compiled, was able to compile source (a prime number finder) to object code and execute that object code. Better yet, object code vm-proto compiled on Bast (PPC) was executable by vm-proto on Inferno (x86), and object code compiled by vm-proto on Inferno, was executable on Bast. A complete success.

Next up was fast-lang, a simpler interpreter that I wrote this summer just to see how fast I could make a simple language. It compiled and was able to execute a simple stack test program I wrote for it, but failed on the more complex prime number finder with a seg fault. Clearly not so portable, although that was never a goal with it it, I just wanted it to be fast on what I had, which was x86.

Finally came my Primes suite, which is a collection of prime number finders I wrote in about 37 languages and a simple perl program to run them all and store benchmarks on how fast they ran. It was a fun way to get introduced to a lot of languages and also fun to race them against each other.

I’ve often used Primes as a crude benchmark of languages’ computational power and now I was going to use it to see how well languages worked on a Mac as compared to x86.

I ran a subset of tests in C, Object C, C++, Java, C# (mono), Awk, Perl, PHP, Python, and Common Lisp (sbcl). I ran the tests on four of my computers for comparison: Inferno, a 1.5GHz Athlon-XP; Kvasir, a 2.8GHz Pentium 4; Nika, a 1.5GHz Pentium M, and of course Bast, my new 350MHz PPC G3. The results are below:

	Inferno	Nika	Kvasir	Bast
C	1.19	0.79	0.45	2.83
ObjC	1.19	0.8		2.83
C++	1.93	1.06	1.1	4.76
Java	3.59	1.63	2.14	40.3
C#	3.69	1.87		10.5
Awk	32.1	27.1	30	199
Perl	38.2	21	23.3	145
PHP	15.1	8.89	13.4	64.9
Python	54	38	43.8	211
Lisp	10.4	5.19		36.3
Primes Test Results

There were some surprising results in there, but let’s start with Bast, the G3. C, Object C, and C++ were about par for the course, but then look at the spike in time for Java. For the record, Nika and Inferno are both using Sun’s JDK 1.5, and Bast is using IBM’s JDK 1.5. There really shouldn’t be that difference. Clearly Java is a lot less optimized for the PPC. (Hopefully now that it’s GPLed that can get some attention?). Next up was C# care of Mono, which looks good comparatively. Then a huge spike with Awk, which looks like it could be missing some PPC love. It’s a bit hard to tell with perl, but it looks reasonable when compared with the PHP and Python results. Comparatively speaking PHP is looking pretty good on the PPC. Python is slowest, but it’s slowest on x86 too so no surprise. And Finally Common Lisp care of SBCL, looking good, and beating Java on the PPC. Impressive. So what have we learned? Java and Awk need some PPC love. (I didn’t bother to include Ruby results, because until 1.9 becomes the stable 2.0, Ruby doesn’t officially have a VM implementation and is incomparably slow. Ruby 1.9/2.0 with YARV is competitive, but we’ll leave that for another day).

Now to my surprise, aside from C, Kvasir lost to Nika on all tests, even though Nika is half the speed. Nika also beat out Inferno and they should be much more comparable. Also, on Inferno you can see that Perl actually performs worse than awk. So I’ll try to guess and explain that. Inferno and Kvasir both use GCC 3.4. and compiled all the languages with GCC 3.4, while Nika uses GCC 4.1. Inferno and Nika both use -O3 optimization while Kvasir uses -O2 optimization, and Kvasir is using the hardened tool chain with a hardened kernel. So we can see that GCC 4.1 seems to provide a solid edge over 3.4 (as seen from Nika beating Inferno), and GCC 3.4 + the hardened tool chain + a hardened kernel gives a massive performance hit compared to vanilla gcc 4.1 as seen from Nika beating all languages but C on the nearly twice as powerful Kvasir. I’m a bit shocked to find that my laptop which I would have guessed would be the clear looser with respect to interpreted languages number crunching won out. Neat to know.

So now I’ve learned a few things.

You can grab Primes from my git server

git clone git://git.mindstab.net/git/primes

But be warned, the autoTest.pl code is old and beyond hideous. But it kind of gets the job done.

(And please no complaints that these tests aren’t fair. I’m not claiming they are anything remotely definitive. They are what they are: prime number finders in a bunch of languages)

Edit: I forgot to mention two things:

1. Getting Java for PPC is still a pain. Sun-SDK in portage isn’t available for PPC. All that is is the IBM version, which is still fetch restricted. You have to sign up with IBM to get it which involved you telling them all about your business, weather you have one or not, and all about yourself, and full addresses for both, plus a phone number and email address. Extremely lame and I hope the new GPL Java fixes this stat.

2. I didn’t include fortran results because GCC 4.1 seems to have a new and improved fortran compiler and my terrible old code doesn’t compile anymore and I wasn’t in the mood to go brush up on fortran to figure out why. Would have been neat though.

Howto setup and populate a git server on Gentoo

2006-11-20 13:09:19 PST

Tags: , , , ,

WARNING: This howto is now old and deprecated! Check the newer version linked to below for a current working howto:

http://www.mindstab.net/setting-up-a-remote-git-repository-with-just-git/


So I went back and got my git server working 100%. Reread some docs and now everything is working quite well. It’s pretty nice.

To get a git server setup, emerge dev-utils/git and cogito. Cogito is a “front end” of sorts for git. It supplies some better commands in some cases. Then run

/etc/init.d/git-daemon start

and you now have a git server. Yes, it’s that easy.

Of course you don’t have anything in it and you aren’t quite setup to, but it’s fine to have the server running with a somewhat empty system. Next create a directory to store your git repositories, like /git. Now all you have to do is create repositiories, which is made very easy by the command

cg-admin-setuprepo -g git /git/repoName

‘git’ being a user group that committers on the system will be members of so that they can all have write access. Now of course you have an empty repository. Also, you have a non shared repository. Other system users/’committers’ can access it, but it’s not exported by the git server. To do that, simply

touch /git/repoName/git-daemon-export-ok

To populate the repository, change into a directory where you have the code you want to go into the repository. This can be on any computer, not necessarily the server. We will set up more git stuff ‘client’ side and then ‘commit’ it to the git server. So where ever the code is, you need ssh access from that location to an account on the git server that is a member of the ‘git’ group that can write to the repository. Now, once if the code directory, set it up as a git repository with

cg-init

If all went well you now have a local git respiratory of the code. It might give you an error involving AUTHOR_NAME and COMMITTER_NAME. IF this happens it is because the name field for your user account is blank. Edit /etc/passwd and add a name (can be the same as the account name) to the 5th field of the account you are using. Now it should work (rm -r .git to restart). Now to connect the local repository to the server and update the server, we do as follows, first we add a branch to the local repository that is connected to the server.

cg-branch-add origin git+ssh://serverName.com/git/repoName

Now they are connected. ‘origin’ will be the default branch (I believe it’s a special branch, whose purpose should be obvious). Now just update the server.

cg-push

Done. Now you have a git server with a repository of your code. You can update the repository with further changes by

cg-commit
cg-push

Other non-system/non-commit users can access the repository with

git clone git://serverName.com/git/repoName

or

cg-clone git://serverName.com/git/repoName

and once in it they can update it from the server with

cg-pull

And there you have it, how to set up a git server on Gentoo and populate it.

References:

Installing Gentoo on a Mac G3

2006-11-19 23:47:43 PST

Tags: , ,

Well, I now have Bast, my new Mac G3, booting under its own power. It took a bit more work than normal because I ran into a few problems. But its all up and running now, so let’s go over the install.

Bast came with MacOS 9 installed. I booted into it, and it was pretty worthless. Admittedly, it did auto detect the net and I got to look at mindstab.net in Mac IE 5. That was about it. None the less, I had intended to try and save the MacOS partition since it was only using a few hundred megs out of a 20GB harddrive.

So put the Gentoo Minimal PPC 2006.1 disk in and rebooted. Unlike x86 where you can jump into the BIOS and change boot order, on a Mac you just hold down the C key when it boots for a while until you get to the cd’s boot prompt. You can probably just push it, but it’s hard to tell when, since there is no visual feed back, just a black screen, then a grey screen.

My first problems popup up here, or technically a little earlier. Even back in the 90s when this box was made, Apple had apparently moved to all USB peripherals, and cockily enough, not even bothered to include PS/2 ports on their towers. Kind of lame since I have a 4 port PS/2 KVM switch which this box was supposed to use. Thankfully I had picked up a cheap old Apple USB keyboard when I bough t the box. Reportedly though, it had the shortest keyboard cable I’ve ever seen. I quickly borrowed my usb hub from another box and got reasonable length from them combined. So all is good? Not quite. The Mac boot loader refused to acknowledge the USB keyboard through the USB hub, even with the hub plugged into external power. So I had to go stand over in the corner by the box, and plug the keyboard in and hit C to boot from cdrom and then ENTER to boot the cdrom, and then unplug the keyboard and plug it into the hub before I could sit down, every time I rebooted the mac. Pretty lame.

Anyways, the LiveCD booted up and after the boot loaders, no one had any problem with the Mac keyboard through the hub. Except of course the keyboard was glitchy and old and kept disconnecting/reconnecting so I lost keys when I typed. Had to watch what I typed very carefully. So I immediately got sshd up and running and that was that for the keyboard.

Next I brought up parted and took a look at the partition table. I tried to resize and shrink MacOS 9′s HFS+ file system and partition but parted failed on me every time. Everything on the net assured me it would be doable, but they seemed wrong.

From there things went more smoothly. I followed the Gentoo PPC install guide and everything worked pretty well. At times though, I did feel like I was on a second class architecture because they’d give an x86 dependent example, and then below it as though slapped on as after after thought were some PPC instructions. Of course if the instructions didn’t really differ, some times they only showed the x86 related version.

I was pretty sure I had the kernel configured properly, but it would have payed off a bit to read dmesg a little bit closer. More about that shortly. The only recommended/stable bootloader for PPC is yaboot. Yaboot doesn’t really seem as advanced as grub, but it’s not bad. But the docs probably could have elaborated on some points a little more, like what is OpenBoot? and all the options related to it and do I need to configure them? (apparently not)

Anyways, I thought I had everything and rebooted. Apparently not. I got a kernel panic because it couldn’t find the harddrive. It took me several say to figure that one out. First of all, it seems to have 2 IDE controllers, one for the Zip drive and CDRom, and another for the harddrive. I had to look closer at dmesg to find the second, a CMD646. The other problem didn’t become apparent until more reading.

On the liveCD the Apple IDE controller is compiled in and so those drives get hda and hdb. The CDM646 driver is a module and gets loaded later so on the liveCD the harddrive is hdc. However, once I was compiling the CDM646 driver into my kernel it still wasn’t working because now it was detecting the harddrive first and calling it hda, and oddly the cdrom and zip drive because hde and hdf. But I couldn’t see that unless I watched the system boot because once it kernel panicked I couldn’t scroll back up.

This was a frustrating bug to diagnose and fix, but I finally got it and now the box booted just fine. I was very close but had one more thing to look into. Both on the LiveCD and the new system, /proc/cpuinfo was reporting 50 bogomips, for what should have been more like 700. This bothered me because I wasn’t sure if I’d misconfigured something and was missing out on a lot of performance. Finally though, again, after a lot of searching, I found out that in recent kernels, the timing/scheduling mechanism has been altered to be more sane of PPC with the result that bogomips have become a useless measurement tool for performance. The Internet assures me I’m getting my full performance.

And so I now have Gentoo on a PPC computer, my alternate arch test box. Also, I’ve since picked up a Dynex PS/2 to USB converter and have the G3 hooked up to my KVM switch as it should be.

Well I bought a Mac…

2006-11-15 09:00:38 PST

Tags: , ,
New Mac

I’ve been meaning to pick up a PPC powered computer for a bit, because I’m a nutter like that. Anyway, I finally found a reasonable deal at TheMacMarket.com and picked up a 350MHz G3 with a 20GB hard drive and 256MB ram for $139. This box is now officially my alternate arch test box (big endian yeah!). I’ve wiped Mac OS 9 off of it and am installing Gentoo Linux. Details about the install later when it’s done.

One question though, a G3 350MHz should be rating about 700ish bogomips, but I’m getting 50. Anyone know what might be up with that? Could it just be the LiveCD?

The mess that is Gentoo/Jabber

2006-11-13 23:10:31 PST

Tags: , ,

You know it’s really pissy. Jabber is one of those big important technologies to the open source movement. So it should be easy for me to get a jabber server up. I’ve had ‘set up a jabber server’ on my todo list for ages but haven’t gotten around to it because of it’s general forboddingness. So today I gave it a stab… and failed.

There are several servers to choose from. Wildfire appears to be most feature complete but it runs on Java, is dual licenced (GPL/commercial), and hardmasked in Gentoo. In close second on features is ejabberd, which is stable. Finally there is jabberd. Jabberd 1 gets a failing grad on features but is at least stable. Jabber2 is still far behind ejabberd and wildfire, but at least passes, but is unstable. So I picked ejabberd, the only stable and non obsolete choice available for Gentoo.

Well, Gentoo.org doesn’t seem to have any docs on getting a jabber server set up. So I moved onto the gentoo-wiki. It has a page on ejabberd but its light to the point of being useless. It has a few pointers but not what you need to actually get, you know, a server up and running. Awesome. So I hit up the ejabberd site. They have little snipits all around there site that might one day add up to what you need, and a nice link to a super page that many of the little pages also reference. Sadly, the super page reads more like an API reference manual that a tutorial for actually setting up server. I suspect I could read the whole thing and know how to configure many features but not actually assemble a working server.

And lets not forget logs that read like stack traces, because that sure explains well what small or large configuration error I have made.

This really doesn’t make me feel like running my own jabber server. Makes me feel like sticking with MSN which, you know, works. Or at the best, signing up for a googletalk account. This is lame. We can do better. Jabber is supposed to be important so lets show some love for it and make it easy to roll your own Gentoo/Jabber server 1-2-3!

Update: Filled bugs 295 (human readable errors from ejabberdctl and no core dump dropping), and 296 (Log files appear as cross between stack trace and config dump). Also filled a Gentoo bug about ejabberd 1.1.2 not installing the guide that it’s einfo mentions.

Valid XHTML 1.0!
Valid CSS!
Mindstab.net is proudly powered by WordPress
Entries (RSS) and Comments (RSS).
18 queries. 0.595 seconds.