User Tolerant Liveware: October 2010

2010/10/25

More java

jBoss looks to be amazingly amazing. It's nearly exactly what I want in a Web Framework. In fact, it's close to what I thought JAAS would become, but never did.

It is in Java, though. And I think I'd rather stick my hand down my throat and rip out my lungs rather then live with Java, day in day out.

But maybe I could spend a lot time finishing JAAS and turning it into what I wanted it to be in the first place. Or, if the big contract dream because the Big Signed Contract I could hire some PFYs to do it while I cracked a whip. Yeah, that would be awesome.

TODO on JAAS:

Documentation;
Work Log4perl in;
Reflex layer;
Much better object lifetime;
Much faster/better session lifetime;
Speed! (See previous 2 items);
The Widget to HTML rendering stuff needs to be simplified and dekludged.

That, as a rough begining...

2010/10/21

IPv6 right now.

Reading slashdot I saw an article about the coming IPv4 Apocalypse. So I figured I should spend some time getting up to speed. And when I say "getting up to speed" I think I really mean "hack at it a little."

First I need some routable addresses. My ISP at home is Bell, which is probably going to be the last ISP in the universe to hand out IPv6 addresses. So I need a 6to4 tunnel. I found handy guide for Linode which basically said "get a free tunnel from HE." Which I did.

The following was added to /etc/sysconfig/network on the VM I wanted to be my IPv6 router

IPV6_DEFAULTDEV=sit1

IPV6FORWARDING=yes

IPV6_ROUTER=yes

Next I added he following to /etc/sysconfig/network-scripts/ifcfg-sit1

DEVICE=sit1

BOOTPROTO=none

ONBOOT=yes

IPV6INIT=yes

IPV6TUNNELIPV4=IPv4

IPV6ADDR=A1:B1:C1::2

Where IPv4 is the IP Bell gave me and A1:B1:C1::2 the client IPv6 address from HE.

ifup sit1

ping6 -n ipv6.chat.freenode.net

yay! It works.

Next, I want other computers to be able to access this tunnel. This took some messing around, but by blindly stabbing at it I got it going.

Then I asked for a /48, and added this to /etc/sysconfig/network-scripts/ifcfg-eth0

IPV6INIT="yes"

IPV6ADDR=A2:B2:C2::2

Where A2:B2:C2::2 is part of my Routed /48 from HE.

Then service network restart.

Next, on my desktop computer, I added the following to /etc/sysconfig/network

NETWORKING_IPV6=yes

Then to /etc/sysconfig/network-scripts/ifcfg-eth0

IPV6INIT=yes

IPV6ADDR=A2:B2:C2::6

Then:

ifdown eth0 ; ifup eth0

ping6 A2:B2:C2::2

YAY! That works.

But... what about routing and forwarding and so on? This is where I stabbed around blindly. The solution was to use radvd.

/etc/radvd.conf:

interface eth0

{

    AdvSendAdvert on;

    MinRtrAdvInterval 30;

    MaxRtrAdvInterval 100;

    prefix A2:B2:C2::/64

    {

        AdvOnLink on;

        AdvAutonomous on;

        AdvRouterAddr on;

    };

};

Start the service, then restart eth0 on corey, and bingo! I can surf to ipv6.google.com.

Todo : Setup reverse DNS for my /48. Setup ipv6 for all my computers. Figure out how to tell HE when Bell changes my IP. They have a tool to do this but my first attempt didn't work.

And of course understand what it is I'm doing. For instance, if IPv6 uses 128bit IPs, surely the /64 would be enough

2010/10/20

Server-side javascript

Louis is looking into getting the contract for a big project. A really big project. Big enough that we'd have to hire more programmers, and user-friendly phone answers.

This got me thinking about what language I'd use. In the last 15 years, I've written pretty much everything in Perl. I really like Perl. However, there are many things that annoy me about Perl. Many of these things are solved in Perl 6 and/or Moose. But Perl 6 isn't ready and never will be, and Moose.... well lets just say that I like Perl!

Also if I'm hiring programmer(s) maybe Perl isn't the best language. This project will be used for 10 (read 20) years to come, by a diverse bunch of not very bright users. So what else would I want to program in? Erlang is dead cool, but not for common mortals.

I'll ignore Java as a bad joke, C++ or C wouldn't do for something of this complexity. Ruby, Python? I'd just as soon use Perl.

So how about Javascript?

Javascript is really my new favorite language. Remove the frustration of dealing with JS in MSIE, which is really the problem of MSIE's DOM implementation, and JS is a really nice language, with associative arrays, regexes, objects, inheritance, functions as data, closures, etc. It lacks low-level data manipulation like Perl's pack/unpack, and it has some silly legacy features.

What's more, if all validation is already written in JS, you can then do client-side and server-side validation of input for free.

So how is JavaScript on the server done? Turns out there are many ways.

yum --enablerepo=rpmforge install js will install SpiderMonkey, and I guess you could use #!/usr/bin/js in a CGI. But in 2010, you don't want to be implementing a framework from scratch, including things like MySQL access.

Jaxer is very very cool looking. It hugely shrinks the distance between the client and the server, as it were. The browsers' DOM is accessible from the server. Which is a very very cool idea that I've used in the past.

Jaxer goes one further too: the border between browser and server can be blurry when you use the runat="proxy-server" feature. It 'simply' turns a function call in the browser into a synchronous XMLHttpRequest which calls the function on the server. How cool is that? I'm going to have to implement something like this for POE::XUL.

One drawback of Jaxer is that it seems to need Aptana to compile. Having source code isn't very useful if you can't patch and rebuild it. Especially if you want to create an RPM. Or even do some heavy lifting in C++, and pipe-fitting in JavaScript. Another drawback is that Jaxer is dependent on Aptana to survive. Will Aptana be around in another 5 years?

The Apache foundation has been around for years and isn't going anywhere. They have bsf, which allows one to embed JS in a JavaBean, which brings us back to Java. On the one hand: Java! RUN AWAY! On the other hand: hiring Java programmers should be easy. And doing any heavy lifting in Java, then doing the high-level gluing in JavaScript might be OK.

Mind you, this is all speculation; we are still a year or more away from knowing if we get the contract.

A better backup

I do backups badly. Basically, rsync to a large partition somewhere. That's not really a backup. It protects against hardware failure, yes. But not against "oops! I deleted that file 3 weeks ago." What's more I'm sure I'm not doing it as well as it could be; by backing-up to a hard disk, why not backup the entire OS, and the hard disk bootable? Would be complicated if multiple machines backup to one backup server, but for my clients, I most often have one server which backs-up to one set of removable disks

What's more, I moved all my VMs from Jimmy to George yesterday. When I say "move VM" I should say "moved all the services to new CentOS 5 VMs." Which sort of shows up another problem: keeping track of what you've set up where and why. Jimmy had lighttpd running on it. Why? Oh... to see the RRD graphs of the temperature and humidity in the attic. I should document all this, now that I "know it" but ideally it should be automated.

And conformance tests; a bit like unit tests, you run some scripts to see if everything in the new install is working as expected. After all was done, I realised that I hadn't copied over my subversion repositories, nor set them up.

One central issue, I suppose, is config files. Ideally, you just copy in the backed-up config file, start the service, run the test script, verify success. I notice that rpm provides a --configfiles option. Combined with rpm's verify options, maybe one could detect what config files have changed and keep a backup set of them. Of course, things like /var/spool/hylafax/etc/config.ttyS0 would have to be added by hand. As would stuff installed by hand into /opt and/or /usr/local

And a modified config file implies that the package is being used, so the package would get flagged as important. And then, maybe once a week say, you'd get email "hey, you don't have a conformance script for package X." Or "You didn't write a changelog for the latest changes to file Y."

2010/10/16

How not to spend a friday night.

I cascade of stupidity caused me to drive to Montreal and back on a Friday night. In the dark. In heavy rain. If you know me, this isn't my idea of fun. While I am slightly to blame, most of the blame is elsewhere.

First, power outage at a client's in St-Constant at 13h00 (roughly). DAMN YOU HYDRO. Though really, black-outs are par for the course. Client has a UPS though. BUT the USB cable went missing 6 years ago, so no way for the computer to turn itself off cleanly. DAMN YOU APC. Why not just put a USB-B port on the back of your damn UPSes instead of having using a secret-sacred RJ45 with 10 pins that costs way to much. Oh, yeah, that would be why. And DAMN YOU JEAN-PHILLIPP, a UPS without apcupsd (or equiv) is less then useful.

So anyway, battery eventually drains, BAM! Hard shutdown. Power comes back at some point, but I suspect not very cleanly. The on-off cycling causes the BIOS to loose its settings BLACK EYES TO YOU ASUS. SERIOUSLY WHAT THE EF?! Also: DAMN YOU APC AGAIN! The UPS should be smart enough to wait for a few seconds of clean power before turning passing power though.

But, now that the BIOS has been reset to showing the SATA as IDE drives, GRUB can no longer load stage 2. Which might be damn stupid, or unavoidable. The bug report to me is "GRUB " on the screen, nothing further.

And this is where my blame comes in; I know that ASUS motherboards can reset the BIOS. But I'd just had a problem with /boot on a RAID1, so I'm thinking that was the problem. I eat supper, drive 2 hours, boot the computer with Knoppix, reinstall grub so it can find stage 2. Reboot, yay grub! But then initrd can't find md0, which has VolGroup00 on it! WHA! It's there! KNOPPIX can find it, why can't you?

Messing around for a while until the light goes on: BIOS RESET BECAUSE ASUS HATES LIFE! OK, set the SATA back to AHCI. Boot again. Still no go. FAH!

And then the other shoe drops. One of those things that if you've never pulled an initrd apart and poked at the init script inside one, you wouldn't notice: init was looking for md0, but KNOPPIX was calling it md127. And it turns out that md devices have a prefered minor device number. So when KNOPPIX was calling it md127, it was writing that to the array. Which means when init was trying to activate md0, it goes BUH CAN'T FIND IT.

BLACK EYES TO YOU, KNOPPIX FOR CHANGING THAT! Seriously, changing the preferred name of an array is really bad form.

So how to change it back. First you deactivate the arrays :

mdadm --stop /dev/md126

mdadm --stop /dev/md127

BUH! That last doesn't work; LVM is still holding a lock on the array. I strongly suspect that vgremove would be enough to drop the lock, but there's no way I'm going to test that on live data.

So reboot, don't activate mdadm-raid. Do the following

mdadm --assemble --update=name --name=0 /dev/md0 /dev/sda2 /dev/sdb2 /dev/sdc2

mdadm --detail /dev/md0

mdadm --assemble --update=name --name=1 /dev/md1 /dev/sda1 /dev/sdb1 /dev/sdc1

mdadm --detail /dev/md1

The --detail lines should show Preferred Minor : 0 or 1 depending.

Note: Those are hairy commands. They could potentially kill your arrays if you get the partitioning wrong. DO NOT JUST CUT AND PASTE THEM IF YOU RUN INTO THE SAME PROBLEM AS ME! Read the docs, understand them, then adapt the commands to your setup.

PS : you pull initrd apart with

cd /boot

mkdir t ; cd t

gzip -dc ../initrd-$(uname -r).img | cpio -i

ls

Now go poke at init

2010/10/12

Reinventing the wheel. Badly

So, I had x11vnc all nice and tested with the TightVNC viewer on Linux and RealVNC on Windows but I get a bug report. User couldn't log on. -unixpw was displaying Username: user was entering username, but pressing Enter wouldn't move the the Password field.

Looking at /var/log/x11vnc.log I see a bunch of:

12/10/2010 15:17:39 unixpw_keystroke: bad keysym4: 0xff8d

Looking in /usr/include/X11/keysymdef.h, I see that 0xff8d is XK_KP_Enter, that is, the Enter key to the right of most number pads. And not XK_Return, the Enter key that's just to the right of the alphabet.

Looking at x11vnc's source code, I see that

x11vnc/unixpw.c only checks for XK_Return and XK_Linefeed. But what's more;
reimplements the huge bloody-effing-inputting-text-with-editing wheel;
x11vnc's -remap function doesn't happen in the code path that leads to unixpw_keystroke, so is bloody useless for this problem.

So the solution is to use TightVNC viewer on Windows. I already know that TightVNC is the better viewer for Linux. So now that's 2 out of 3.

Of course, the other solution would be to patch x11vnc. But I already have my time fully commited to reading Irregular Webcomic!

Third point being why was it working for my test setup but not in the field? Well, I had Windows Server 2003 as a VMware Server guest, via the VMware server console running on Linux. So something somewhere was remapping something somehow. "It is always possible to add another layer of indirection."

2010/10/08

Watch that command

I was reminded to day of how useful watch is. I omitted it from my previous list of important and useful commands because I rarely use it. But today I had George open on the bench and was finding out what fan connectors on the motherboard corresponded to which speed sensor, as reported by lm-sensors.

watch -n 0 "sensors w83793-i2c-0-2f | grep fan"

Then, as I plugged and unplugged a 3-wire fan here and there, I could see on screen what was going on. This is especially important, cause you don't want to unplug the CPU fan for any length of time.

FYI, the sensor:fan port mapping for a DSBV-DX is as follows:

fan1  CPU_FAN1

fan2  CPU_FAN2

fan3  FRNT_FAN1

fan4  FRNT_FAN2

fan5  FRNT_FAN3

fan6  FRNT_FAN4

fan7  REAR_FAN1

fan8  REAR_FAN2

fan9  FBD_FAN1

fan10 N/C

fan11 N/C

fan12 N/C

2010/10/07

I take it back

The war on noise continues apace. Next up, remove Jimmy from service, replacing it with George.

# fdisk -H 224 -S 56 /dev/sdd  

# fdisk -H 224 -S 56 /dev/sde

# sfdisk -l /dev/sdd



Disk /dev/sdd: 182401 cylinders, 255 heads, 63 sectors/track

Warning: The partition table looks like it was made

  for C/H/S=*/224/56 (instead of 182401/255/63).

For this listing I'll assume that geometry.

Units = cylinders of 6422528 bytes, blocks of 1024 bytes, counting from 0



   Device Boot Start     End   #cyls    #blocks   Id  System

/dev/sdd1          0+ 233598  233599- 1465132900   fd  Linux raid autodetect

/dev/sdd2          0       -       0          0    0  Empty

/dev/sdd3          0       -       0          0    0  Empty

/dev/sdd4          0       -       0          0    0  Empty

sdd and sde are a pair WD15EARSs. With 4k blocks. Normaly I use sfdisk -d to copy partitions, but that failed on 4k blocks.

mdadm -C -n 2 -l 1 /dev/md2 /dev/sdd1 /dev/sde1

pvcreate /dev/md2

pvs -o name,pe_start

vgcreate -s 32M T00 /dev/md2

lvcreate -l 99%VG --name LV00 T00

mkfs -t ext4 -E stride=32 -m 1 -O extents,uninit_bg,dir_index,filetype,has_journal /dev/T00/LV00 

tune4fs -c 0 -i 0 /dev/T00/LV00

I got the formating commands from I Do Linux. I read up on all those mkfs options. I'd never have guessed they were the "best" options to use.

The tune4fs is basically turning off the fsck that happens automatically every X days or Y reboots; fsck is SLOW, and the automatic fsck always happens when you least want it. And with journals, UPSes and so on, an unsafe shutdown isn't supposed to happen.

I left some space on the VG free so I could do snapshots.

But then I got to thinking. Blocking off one large LV means that the file server VM gets access to the entire disk and all that space can't be used by a different VM or the host for another purpose. With LVM, I can grow the LV if I need it. So why not give it 500G at a time?

e4fsck -f /dev/T00/LV00 

resize4fs -p /dev/T00/LV00 500G 

lvresize -L 500G -t -v /dev/T00/LV00 

resize4fs -p /dev/T00/LV00

This was as much a test of ext4 resizing as anything else. And it worked flawlessly. Btw, fsck on an empty FS is fast.

So what does it look like:

# df -h

Filesystem            Size  Used Avail Use% Mounted on

/dev/mapper/G00-root  572G  537G  5.8G  99% /

/dev/md0              950M   31M  870M   4% /boot

tmpfs                 5.9G     0  5.9G   0% /dev/shm

/dev/mapper/T00-LV00  493G  493G     0 100% /T00

WHAT 100% FULL ALREADY?

Well, no. I wanted to make sure the 500G as resize4fs sees it and 500G as lvresize sees it were the same thing, so I filled the FS to the brim:

# dd if=/dev/zero of=/T00/HUGE

dd: writing to /T00/HUGE': No space left on device

1031717425+0 records in

1031717424+0 records out

528239321088 bytes (528 GB) copied, 6608.14 seconds, 79.9 MB/s

See that? Bare in mind that this dd was happening during the initial RAID build. I hypothesis that the mobo in Corey (ASUS M2N-MX SE) that I was doing the previous tests on SUCKS. And that the mobo in George (ASUS DSBV-DX) does not.

# time rm -f /T00/HUGE

real    0m31.020s

user    0m0.000s 

sys     0m20.302s

2010/10/06

Neat bash tricks

One idiom I like is:

ssh $host "commands;" 2>&1 | while read line ; do

    # react to any error messages or messages from commands in $line

done

For instance, say you were running x11vnc on a remote host. x11vnc has the annoying habit of using a port other then the one you specify, if the one you want is already taken. Very annoying. So:

ssh $host "x11vnc ...." 2>&1 | while read line ; do

    if [[ $line =~ 'PORT=([[:digit:]]+)' ]] ; then

        port=${BASH_REMATCH[1]}

        # now set up some sort of port forwarding so that $port is a sane, known port

        ssh -L '*:9600:localhost'$port $host

    fi

done

This has some problems, in that the second ssh can survive x11vnc exiting. I thought "hey, how about $?" but that has other problems; say the second ssh exits before its time. The $? you saved could be reused. The kill you'd want to do would provoke hilarity. While complaining about this on IRC, a wise soul suggested that I open a lock file, and then any process with that file still open must be killed. I don't need locking, and didn't want to learn about flock in bash right away, so what I roughly did was:

child_kill () {

    if [[ ! $LOCKFILE ]] ; then

        return 0

    fi

    lsof -F '' $LOCKFILE | while read ppid ; do

            if [[ $ppid =~ '^p([[:digit:]]+)$' ]] ; then

                pid=${BASH_REMATCH[1]}

                if [[ $pid != $$ ]] ; then

                    kill -HUP $pid

                fi

            fi

        done

    rm -f $LOCKFILE

}



local LOCKFILE=$(mktemp -p /tmp)

trap "child_kill" EXIT

ssh ... | while read line ; do 

    ....

    ( exec 123>$LOCKFILE

      ssh -L ..... $host &

    )

done

child_kill

I wish there was a better way to deal with lsof's output, but this works so why complain?

What's more, I wish I could use SSH's ControlMaster to make the second connection that much faster. But the quick testing I did with 4.9p1 failed. Bugger

2010/10/04

It's a server, not a telephone!

Dear the people at Red Hat,

I'm trying to set up a VMware server on top of CentOS 5.5. I want / to be LVM on top of RAID5, /boot as RAID also. I want as few packages installed as possible, only the bare minimum services running. This seems like simple but widespread scenario.

But why did it take me 3 hours?

Why did LVM on RAID5 require a bleeding graphical install? This is a server we're talking about.

Why did a huge pile of useless stuff like bluetooth, pcsc, xfs (?!), nfs, cups, wpa_supplicant, tux, apache, nscd, IRDA get installed? If I need these things, yum allows me to install them in seconds. Instead I spend an hours hunting down those and other useless bits.

Maybe I'm just getting old. But I pine for Redhat 9, where everything was small, simple, understandable. Now everything is 5 levels of indirections, lots of smoke and a dash of magic.

Grrrrrr!

-Philip