In my travels I have used various wireless networks, both free and paid for that have had varying levels of brokenness. The worst ones I have encountered are the ones that have their DNS servers set up to wildcard all domains AND prevent name look ups to any servers but their own, one rung up from these network bottom feeders are the ones that just have their DNS set up to wildcard.
Both of these setups screw me over on the laptop because I have a tunnel set up using stunnel that allows postfix on my laptop to securely deliver mail to my ISP's secure mail relay. To do this postfix wants to resolve localhost.localdomain which these broken network set ups resolve to something other than 127.0.0.1 which, of course, breaks the whole process.
To try and work around this problem I set up BIND on my laptop so that rather than relying on (possibly broken) external DNS providers I had my own. I added some functions to /etc/dhclient-enter-hooks to prevent my /etc/resolv.conf being overwritten. This worked well for every thing but the worst network offenders that prevent DNS look ups to anything but their own infrastructure. I was relenting, using their DNS and living with the fact that I couldn't send emails until I connected to a better configured network. That was until one day when I was moaning about this situation online and somebody suggested setting up BIND to use the DNS servers given to me as forwarders. This is a great idea, it allows me to protect myself from the broken idea of wildcarding .localdomain but still have DNS that works with even the most broken network setups. I set about configuring this up.
Firstly in /etc/named.conf I added the following to the options section:
include "/etc/namedb/forwarders";
This will cause BIND to include the contents of the given file. To create the file we need to hook into the information retrieved by dhclient, this is done adding the following function to /etc/dhclient-enter-hooks:
make_resolv_conf() {
if [ -f /etc/namedb/forwarders ]
then
mv /etc/namedb/forwarders /etc/namedb/forwarders.old
fi
printf "forwarders { " > /etc/namedb/forwarders
for nameserver in $new_domain_name_servers
do
printf "%s; " ${nameserver} >> /etc/namedb/forwarders
done
echo "};" >> /etc/namedb/forwarders
echo "forward only;" >> /etc/namedb/forwarders
pkill -HUP named
return 0
}
The above shell function takes the DNS server list given in the variable new_domain_name_servers and formats up a valid BIND forwarders statement, after writing out the forwarders into the file it gives named a HUP signal to get named to re-read the configuration. Of course, /etc/resolv.conf is untouched here and is simply configured to query localhost.
So far this configuration has allowed everything on my laptop to operate correctly irrespective of how badly set up the DNS is on the network I am connecting to. Win.
Saturday, October 27, 2012
Tuesday, September 11, 2012
Error messages
I love error messages, especially the ones that make you think that something else is wrong so you beat your head trying to bash the wrong thing into shape.
I spent quite a few hours trying to get a solaris jumpstart on a machine that is roughly 1200km from the jumpstart server to actually do the install. This was a slow tedious process because the round trip time made everything take a long time. The error I was getting was something like:
Error: Unable to create 9680Mb slice (unnamed)
and then a dump of the mbr and partition table of the disk in the machine. Giving one the impression that somehow the solaris installer was objecting to the partitions/mbr currently on the disk. No amount of clearing nor manually setting up of the partitions would make the error go away.
The real problem? The fact that I had foolishly ASSuMEd that leaving the directive for creating a mirrored pool on a single disk machine would be harmless.... oh no. What I think the error really meant was "you told me to make a mirror and I can't find another device to do that on". It would have been nice if the error had been along the lines of "insufficient vdevs for mirror" or something along those lines. After I removed the mirror option from the pool setup directive the error went away... *sigh*.
I spent quite a few hours trying to get a solaris jumpstart on a machine that is roughly 1200km from the jumpstart server to actually do the install. This was a slow tedious process because the round trip time made everything take a long time. The error I was getting was something like:
Error: Unable to create 9680Mb slice (unnamed)
and then a dump of the mbr and partition table of the disk in the machine. Giving one the impression that somehow the solaris installer was objecting to the partitions/mbr currently on the disk. No amount of clearing nor manually setting up of the partitions would make the error go away.
The real problem? The fact that I had foolishly ASSuMEd that leaving the directive for creating a mirrored pool on a single disk machine would be harmless.... oh no. What I think the error really meant was "you told me to make a mirror and I can't find another device to do that on". It would have been nice if the error had been along the lines of "insufficient vdevs for mirror" or something along those lines. After I removed the mirror option from the pool setup directive the error went away... *sigh*.
Monday, September 10, 2012
NetBSD on a stick
This is not really a new idea but after a couple of upgrade "oopses" that rendered my machine unbootable due to init getting upset about not finding MAKEDEV I decided I needed a safety net that was a little bit more portable than the USB hard disk enclosure that had saved me in the past.
I had a spare 4Gb USB memory stick which is more than ample to hold a full NetBSD install including X and a few other bits and pieces so I decided to put the memory stick to good use. The advantage of using a memory stick is that it is easy to update, we have the entire NetBSD toolset available, not some cut down version. The downside is that the memory stick does have a limited number of writes so one wouldn't last very long for full time day to day use but as a portable rescue device or even just to be used to test new hardware before buying it can be useful.
On my laptop the memory stick appeared as sd0 so I will use this device in all my examples, for those following along at home you may need to adjust your device naming.
Firstly, I needed to adjust the mbr. The memory stick had a single MSDOS partition on it, this needed to be changed to be a NetBSD one (type 169). To do this I ran fdisk:
fdisk -u /dev/sd0d
disklabel -e /dev/sd0d
a: 7883696 1104 4.2BSD 2048 16384 0 # (Cyl. 0*- 3849)
newfs /dev/sd0a
mount /dev/sd0a /mnt
cdir=`pwd`
for f in *.tgz
do
(cd /mnt && tar zxpf ${cdir}/${f})
done
cp /usr/mdec/boot
installboot -fv /dev/sd0d /usr/mdec/bootxx_ffsv1
/dev/sd0a / ffs rw,log 1 1
I had a spare 4Gb USB memory stick which is more than ample to hold a full NetBSD install including X and a few other bits and pieces so I decided to put the memory stick to good use. The advantage of using a memory stick is that it is easy to update, we have the entire NetBSD toolset available, not some cut down version. The downside is that the memory stick does have a limited number of writes so one wouldn't last very long for full time day to day use but as a portable rescue device or even just to be used to test new hardware before buying it can be useful.
On my laptop the memory stick appeared as sd0 so I will use this device in all my examples, for those following along at home you may need to adjust your device naming.
Firstly, I needed to adjust the mbr. The memory stick had a single MSDOS partition on it, this needed to be changed to be a NetBSD one (type 169). To do this I ran fdisk:
fdisk -u /dev/sd0d
The -u flag makes fdisk interactive, prompting you to update certain parameters as you go along. For my purposes I just accepted the defaults to everything until I was prompted to change a partition. At this point I just selected the only partition there and changed the sysid to 169 and set a bootmenu label of NetBSD, all other parameters were left at their default. Once I had finished editing the partitions I accepted the default "none" at the partition prompt to move on. Fdisk prompted me to update the mbr to which I said yes and also said yes to it updating the mbr type to bootsel since I had a boot menu.
Once the mbr was set it was time to edit the disklabel:
disklabel -e /dev/sd0d
This opens up the disk label in an editor. I left most of the settings as is only editing the e: partition, I changed the e: into an a: and change the fstype to "4.2BSD", set the fsize to 2048, the pgsize to 16384 and the cpg/sgs to 0. The line ended up looking like:
a: 7883696 1104 4.2BSD 2048 16384 0 # (Cyl. 0*- 3849)
Then I just saved the changes and quit the editor. With the new disklabel in place I could then create the filesystem:
newfs /dev/sd0a
This took a while to do but eventually completed without problem. Once the file system was built I mounted it:
mount /dev/sd0a /mnt
Then unpacked the installation sets onto the memory stick. I had created my own release files using "build.sh release" but I could have easily just downloaded the official sets, the files I needed were under the binary/sets directory. I just cd'ed to that directory and did:
cdir=`pwd`
for f in *.tgz
do
(cd /mnt && tar zxpf ${cdir}/${f})
done
and waited for a long time, the unpacking was very slow. Once this was done I picked the kernel-GENERIC.gz file from binary/kernel, uncompressed this and copied as /mnt/netbsd.
To make the memory stick bootable I did:
cp /usr/mdec/boot
installboot -fv /dev/sd0d /usr/mdec/bootxx_ffsv1
I had to use the -f flag on installboot because I was getting an error "Old BPB too big", the -f flag forced installboot to continue anyway, the result seemed to work fine.
I created a new fstab:
/dev/sd0a / ffs rw,log 1 1
and edited /mnt/etc/rc.conf to change "rc_configured" to "YES" so the machine would come up multiuser.
After this it was just a matter of rebooting my machine, the BIOS picked up a USB device and started a boot from it. I was presented with a boot menu, when I selected the only option my machine slowly booted from the memory stick. I was able to log in and do everything you would expect to be able to do on a clean NetBSD install.
Saturday, August 25, 2012
Linux Lab in the Lap
I am a RHCE and have been working toward getting a RHCA which involves sitting five "certificate of expertise" exams which are deep dive technical exams on various Red Hat products such as satellite, cluster, naming services and a couple of others. To help me prepare for the exams I wanted to set up a lab environment that emulated as closely as possible the one used in the training courses that are associated with the exams. Fortunately I had been generous with most of the specifications on my laptop so, apart from disk space, I had the capability to set up a virtual environment on my laptop. I upgraded the laptop harddisk to give me the room and developed the "lab in the lap".
Originally I developed this running on RHEL 5 which was reasonably challenging since it involved a bunch of manual tweaking and hacking on xen infrastructure scripts directly (even finding out what to do proved difficult). I have recently updated the environment to RHEL 6 which
proved to be much easier to configure up.
This picture shows that the lab configuration looks like:
There are three isolated networks cluster, iscsi1 and iscsi2. Another network, appnet, is NAT'ed to the hosts network interface to provide external access if required. Each machine has four network interfaces. All three VM nodes (node 1 - 3) are built by first creating a template node (call it node0) using kickstart. The subsequent nodes are created by cloning node0. To simplify the cloning process the IP addresses for all the clone nodes are assigned using DHCP - this means that there are
no node specific actions to be done.
To create the setup I first set up all the networks using the virtual machine manager. This is a bit tedious to do but not difficult. In RHEL 6 you can also use the VMM to assign addresses to the virbr NICs
that are created when you create an isolated network, this provides the host access to the isolated networks. In RHEL 5 setting up the network interfaces was a bit trickier, first I had to make sure I had enough "dummy" network interfaces by adding:
to /etc/modprobe.conf. Then I created a script in /etc/xen/scripts called multi-network-bridge which set up the multiple bridges and the associated dummy interfaces:
Finally, I modified /etc/xen/xend-config.sxp and replaced argument for the network-script entry with my own multi-network-bridge script so that xend would run my script on start up.
For the isolated networks, iptables was blocking network access to the host. Libvirtd automatically configures iptables to provide the NAT facility for guests. I added a rule to allow the 172.17.0.0/16 network access on any port to allow the guests to communicate with the host on the other ethernet interfaces. Unfortunately you cannot just reload the iptables rules because this flushes the rules added by libvirtd which breaks the virtual networking.
With the networks set up it was then time to kickstart node0, the template to be used to stamp out the other nodes. On my host I set up directory with the contents of the RHEL distro under /var/www/html/rhel-5.6 and pointed the yum host config at this. I installed httpd to support a network kickstart. I dropped a basic kickstart file into /var/www/html/ks and then used:
Where 172.17.1.254 is the IP address of one of the dummy/virbr-nic interfaces on the host machine. The MAC address comes from the manually allocated xen ethernet range. To make things simple I allocated the MAC addresses with the second to last octet being the node number and the
last octet being the interface number of the VM. This made the scripting of the cloning simpler. With this scheme I configured static DHCP allocations for all the nodes. This can be done using "virsh
net-edit net_name", e.g:
to edit the cluster net and add the static definitions in the dhcp section of the xml list this:
<name>cluster</name>
The host entries above are the static address allocations. Once this file is edited I found I had to manually copy the xml file from /etc/libvirt/qemu/networks into /var/lib/libvirt/network/. It seems that the VMM manages the files in both places but virsh doesn't and the changes will not take effect. In RHEL6 I found I could restart libvirtd (service libvirtd restart), then send a SIGHUP to dnsmasq, then restart libvirtd once again. The first restart writes out the dnsmasq control files, then we can kick dnsmasq, then restart libvirtd again because it seems to lose contact with dnsmasq following the SIGHUP which breaks DHCP. I don't know if this process works on RHEL5, I just rebooted after making changes back then.
Once the template node was built I cloned it using:
Doing a clone is slightly faster than building the node using kickstart, once the node is cloned I set the VM to autostart on boot and start up:
That's it, a node ready for experimenting with. The labs I was doing called for up to three nodes running, this was a simple matter of creating more clones. I wrote a shell script that manages the building and cloning of the nodes, the command line arguments are similar to those of the Red Hat script used in the actual labs though I have added extensions to build more/different nodes so I can use the lab setup for more than one course.
I found when building the RHEL 6.2 clones that the ethernet interfaces were numbered eth4 to eth7 instead of eth0 to eth3. This was due to a set of udev rules that are intended to attach a particular ethernet interface to a particular device. This makes sense in a real world machine because it keeps the interface numbering consistent even when an intferace goes missing but, in my case, this feature was not desirable so I had to add a post action to the RHEL 6.2 kickstart to remove the udev rules. I have also noticed that since upgrading the host to RHEL 6.2 that my cloned machines no longer boot properly when the template machine is not built with a graphical console. This used to work fine
when the host OS was RHEL 5.6. Trying to debug this is a bit awkward because there is no console access nor is the networking coming up. I guess I could loop mount the disk image and see if there are any logs but for the moment I can put up with making the template node with a graphical install.
Footnotes:
Originally I developed this running on RHEL 5 which was reasonably challenging since it involved a bunch of manual tweaking and hacking on xen infrastructure scripts directly (even finding out what to do proved difficult). I have recently updated the environment to RHEL 6 which
proved to be much easier to configure up.
This picture shows that the lab configuration looks like:
There are three isolated networks cluster, iscsi1 and iscsi2. Another network, appnet, is NAT'ed to the hosts network interface to provide external access if required. Each machine has four network interfaces. All three VM nodes (node 1 - 3) are built by first creating a template node (call it node0) using kickstart. The subsequent nodes are created by cloning node0. To simplify the cloning process the IP addresses for all the clone nodes are assigned using DHCP - this means that there are
no node specific actions to be done.
To create the setup I first set up all the networks using the virtual machine manager. This is a bit tedious to do but not difficult. In RHEL 6 you can also use the VMM to assign addresses to the virbr NICs
that are created when you create an isolated network, this provides the host access to the isolated networks. In RHEL 5 setting up the network interfaces was a bit trickier, first I had to make sure I had enough "dummy" network interfaces by adding:
options dummy numdummies=5
to /etc/modprobe.conf. Then I created a script in /etc/xen/scripts called multi-network-bridge which set up the multiple bridges and the associated dummy interfaces:
#!/bin/sh
/etc/xen/scripts/network-bridge $@ vifnum=0 netdev=eth0 bridge=xenbr0
/etc/xen/scripts/network-bridge $@ vifnum=1 netdev=dummy0 bridge=virbr1
ifconfig dummy0 172.16.50.254 netmask 255.255.255.0 broadcast 172.16.50.255 up
/etc/xen/scripts/network-bridge $@ vifnum=2 netdev=dummy1 bridge=virbr2
ifconfig dummy1 172.17.1.254 netmask 255.255.255.0 broadcast 172.17.1.255 up
/etc/xen/scripts/network-bridge $@ vifnum=3 netdev=dummy2 bridge=virbr3
ifconfig dummy2 172.17.101.254 netmask 255.255.255.0 broadcast 172.17.101.255 up
/etc/xen/scripts/network-bridge $@ vifnum=4 netdev=dummy3 bridge=virbr4
ifconfig dummy3 172.17.201.254 netmask 255.255.255.0 broadcast 172.17.201.255 up
Finally, I modified /etc/xen/xend-config.sxp and replaced argument for the network-script entry with my own multi-network-bridge script so that xend would run my script on start up.
For the isolated networks, iptables was blocking network access to the host. Libvirtd automatically configures iptables to provide the NAT facility for guests. I added a rule to allow the 172.17.0.0/16 network access on any port to allow the guests to communicate with the host on the other ethernet interfaces. Unfortunately you cannot just reload the iptables rules because this flushes the rules added by libvirtd which breaks the virtual networking.
With the networks set up it was then time to kickstart node0, the template to be used to stamp out the other nodes. On my host I set up directory with the contents of the RHEL distro under /var/www/html/rhel-5.6 and pointed the yum host config at this. I installed httpd to support a network kickstart. I dropped a basic kickstart file into /var/www/html/ks and then used:
virt-install --name=node0 --ram=512 --vcpus=1 --disk path=/var/lib/images/node0.img,size=8 --extraargs="ksdevice=eth0 ks=http://172.17.1.254/ks/node0.ks" --mac 00:16:3e:00:00:00 --network network:cluster --network network:appnet --network network:iscsi1 --network network:iscsi2 --location=/var/www/html/rhel-5.6/
Where 172.17.1.254 is the IP address of one of the dummy/virbr-nic interfaces on the host machine. The MAC address comes from the manually allocated xen ethernet range. To make things simple I allocated the MAC addresses with the second to last octet being the node number and the
last octet being the interface number of the VM. This made the scripting of the cloning simpler. With this scheme I configured static DHCP allocations for all the nodes. This can be done using "virsh
net-edit net_name", e.g:
virsh net-edit cluster
to edit the cluster net and add the static definitions in the dhcp section of the xml list this:
<name>cluster</name>
<uuid>f0ddb5c3-7db7-9943-2195-aa0454971d0d</uuid>
<bridge name='virbr2' stp='on' delay='0' />
<ip address='172.17.1.253' netmask='255.255.255.0'>
<dhcp>
<range start='172.17.1.128' end='172.17.1.250' />
<host mac='00:16:3e:00:01:00' ip='172.17.1.1' name='node1' />
<host mac='00:16:3e:00:02:00' ip='172.17.1.2' name='node2' />
<host mac='00:16:3e:00:03:00' ip='172.17.1.3' name='node3' />
</dhcp>
</ip>
</network>
The host entries above are the static address allocations. Once this file is edited I found I had to manually copy the xml file from /etc/libvirt/qemu/networks into /var/lib/libvirt/network/. It seems that the VMM manages the files in both places but virsh doesn't and the changes will not take effect. In RHEL6 I found I could restart libvirtd (service libvirtd restart), then send a SIGHUP to dnsmasq, then restart libvirtd once again. The first restart writes out the dnsmasq control files, then we can kick dnsmasq, then restart libvirtd again because it seems to lose contact with dnsmasq following the SIGHUP which breaks DHCP. I don't know if this process works on RHEL5, I just rebooted after making changes back then.
Once the template node was built I cloned it using:
virt-clone -o node0 --force --mac 00:16:3e:00:01:00 --mac 00:16:3e:00:01:03 --mac 00:16:3e:00:01:01 --mac 00:16:3e:00:01:02 --name node1 --file /var/lib/images/node1.img
Doing a clone is slightly faster than building the node using kickstart, once the node is cloned I set the VM to autostart on boot and start up:
virsh autostart node1
virsh start node1
That's it, a node ready for experimenting with. The labs I was doing called for up to three nodes running, this was a simple matter of creating more clones. I wrote a shell script that manages the building and cloning of the nodes, the command line arguments are similar to those of the Red Hat script used in the actual labs though I have added extensions to build more/different nodes so I can use the lab setup for more than one course.
I found when building the RHEL 6.2 clones that the ethernet interfaces were numbered eth4 to eth7 instead of eth0 to eth3. This was due to a set of udev rules that are intended to attach a particular ethernet interface to a particular device. This makes sense in a real world machine because it keeps the interface numbering consistent even when an intferace goes missing but, in my case, this feature was not desirable so I had to add a post action to the RHEL 6.2 kickstart to remove the udev rules. I have also noticed that since upgrading the host to RHEL 6.2 that my cloned machines no longer boot properly when the template machine is not built with a graphical console. This used to work fine
when the host OS was RHEL 5.6. Trying to debug this is a bit awkward because there is no console access nor is the networking coming up. I guess I could loop mount the disk image and see if there are any logs but for the moment I can put up with making the template node with a graphical install.
Footnotes:
- For those 1337 h4x0rz out there interested in cracking the root password in the linked kickstart file, here's a hint... try redhat to save yourself a bit of time.
- The course I attended wasn't actually on RHEL6 it was run on RHEL5.8, not that this really presented an issue as I just added the distro files in and setup a kickstarted template for that OS version. I was able to complete all the labs with this setup. The practice really helped in the exam, I was able to complete the exam in half the allotted time with a score of 100%.
Friday, August 03, 2012
lab in the lap adventures
To help me with practicing for a Red Hat certification exam I set up RHEL 5.6 on my laptop. I configured up xen and was using a bunch of virtual machines, each with multiple network interfaces connected to isolated networks. For convenience all the interfaces were configured using DHCP, this way I could build one VM and just clone it to make the other machines - during the clone the network interfaces were given different mac addresses and DHCP just handed out the static allocations for those MAC addresses. This all worked quite smoothly, though it did take a lot of work and searching to put all the bits together and it did help me a lot in going over all the lab exercises before my exam.
I am going to do another course and exam soon but the course will be using RHEL 6.2 for the OS so I thought it would be good to update my install. Red Hat's recommendation is that you do a clean install between major versions but I really didn't want to lose my xen configuration so I attempted to perform an upgrade instead which you can do by hitting escape within the first 60 seconds of the cdrom boot and then adding:
linux upgradeany
to the boot command line. This forces the upgrade regardless of the version. The upgrade seemed to work ok but the first hurdle was the process had not written a new entry in grub.conf to boot the upgraded system. This was easy to fix, just boot up the cd in rescue mode and fix. I then tried to get X working properly but no matter what I did I would either get a black screen or X would come up and just hang. In the end I decided that it probably was going to take me more time to figure out what was wrong with X than it would to do a clean install and redo my VM setups. After a clean install X worked fine, I had some problems initially with the synaptics touchpad but that was just a matter of configuring it in xorg.conf.
I had a USB stick with what I thought was a copy of the old /etc directory plus my VM build scripts and the like. On closer inspection I found these files were a bit out of date. I did do a dd of the entire linux disk to a file on a NetBSD machine so I knew I could get the files back... the trick was how. I could have just put the image back onto my laptop disk but that would have been tedious, having to copy the data over, boot up the old OS, get what I wanted and then reinstall 6.2 again. Fortunately, the NetBSD machine with the dd image on it had enough room to have rhel image uncompressed. I though that what I could do is us vnconfig to create a virtual drive from the file like this:
vnconfig -c vnd0 ./dd_image_file
and then use lvm on NetBSD to access the logical volumes. Unfortunately it looks like the version of the NetBSD kernel I have on that machine is too old and it had issues with lvm. I knew that NetBSD on my laptop worked fine with lvm so what I ended up doing was NFS mounting the file system from the machine with the image (I didn't have enough room on the laptop to hold the disk image locally), then use vnconfig as above to create the virtual disk. I then did:
lvm vgscan
which found the lvm volgroups on vnd0, I then did:
lvm vgchange -a y
mount -t ext2fs /dev/mapper/volgroup-lvol00 /mnt
an there we have it. A disk image on one machine, nfs mounted to another, through a virtual disk driver into lvm. It is a bit tortuous (and a bit slow due to the NFS mount going over wireless) but I can access the files and pull off what I want without having to reinstall.
When I get organised I will do a write up about the lab in the lap.
I am going to do another course and exam soon but the course will be using RHEL 6.2 for the OS so I thought it would be good to update my install. Red Hat's recommendation is that you do a clean install between major versions but I really didn't want to lose my xen configuration so I attempted to perform an upgrade instead which you can do by hitting escape within the first 60 seconds of the cdrom boot and then adding:
linux upgradeany
to the boot command line. This forces the upgrade regardless of the version. The upgrade seemed to work ok but the first hurdle was the process had not written a new entry in grub.conf to boot the upgraded system. This was easy to fix, just boot up the cd in rescue mode and fix. I then tried to get X working properly but no matter what I did I would either get a black screen or X would come up and just hang. In the end I decided that it probably was going to take me more time to figure out what was wrong with X than it would to do a clean install and redo my VM setups. After a clean install X worked fine, I had some problems initially with the synaptics touchpad but that was just a matter of configuring it in xorg.conf.
I had a USB stick with what I thought was a copy of the old /etc directory plus my VM build scripts and the like. On closer inspection I found these files were a bit out of date. I did do a dd of the entire linux disk to a file on a NetBSD machine so I knew I could get the files back... the trick was how. I could have just put the image back onto my laptop disk but that would have been tedious, having to copy the data over, boot up the old OS, get what I wanted and then reinstall 6.2 again. Fortunately, the NetBSD machine with the dd image on it had enough room to have rhel image uncompressed. I though that what I could do is us vnconfig to create a virtual drive from the file like this:
vnconfig -c vnd0 ./dd_image_file
and then use lvm on NetBSD to access the logical volumes. Unfortunately it looks like the version of the NetBSD kernel I have on that machine is too old and it had issues with lvm. I knew that NetBSD on my laptop worked fine with lvm so what I ended up doing was NFS mounting the file system from the machine with the image (I didn't have enough room on the laptop to hold the disk image locally), then use vnconfig as above to create the virtual disk. I then did:
lvm vgscan
which found the lvm volgroups on vnd0, I then did:
lvm vgchange -a y
mount -t ext2fs /dev/mapper/volgroup-lvol00 /mnt
an there we have it. A disk image on one machine, nfs mounted to another, through a virtual disk driver into lvm. It is a bit tortuous (and a bit slow due to the NFS mount going over wireless) but I can access the files and pull off what I want without having to reinstall.
When I get organised I will do a write up about the lab in the lap.
Sunday, July 29, 2012
Managed to update the usbasp with new firmware. I used a scrap bit of veroboard to put a couple of headers on and wired them pin for pin to create a joiner from my programming cable to the usbasp cable. I loaded up a slightly modified avrisp sketch (changed the reset pin to be D0) onto the leostick. Oddly, I found that if I used one of the usb ports on my laptop for the leostick testing the set up by reading the firmware from the usbasp would randomly hang part way through. When I changed to another USB port the process worked flawlessly. Once I had read the old firmware I uploaded the new firmware to the usbasp which worked fine. Now there are no more messages from avrdude about not being able to set sclk when I upload sketches using the usbasp.
Tuesday, July 24, 2012
More playing with arduinos. I bought an Etherten which I plan to use for my data logger project. It has pretty much all I need on a single board apart from the wireless interface. I managed to source a board with a Nordic nRF905 on it for the wireless - this is the same chip used in the little monitoring display that was supplied with the solar inverter, it talks using SPI so there should be a good chance of getting it going with the arduino.
The first step was uploading sketches to the Etherten. This proved more difficult than I thought it would be after getting the leostick working. The Etherten steadfastly refused to be programmed in NetBSD (in Windows it was fine...). I tried updating the firmware for the usb->serial bridge chip by mostly following the instructions https://andrewmemory.wordpress.com/2011/04/14/upgrading-the-arduino-uno-8u2-using-flip/ though I just used the raw button to get the file down and had to hold the main reset down and then short the bridge chip reset to get it into programming mode. That didn't help me. Next I tried updating the bootloader using optiloader on the leostick, I had to patch the optiloader code:
The first step was uploading sketches to the Etherten. This proved more difficult than I thought it would be after getting the leostick working. The Etherten steadfastly refused to be programmed in NetBSD (in Windows it was fine...). I tried updating the firmware for the usb->serial bridge chip by mostly following the instructions https://andrewmemory.wordpress.com/2011/04/14/upgrading-the-arduino-uno-8u2-using-flip/ though I just used the raw button to get the file down and had to hold the main reset down and then short the bridge chip reset to get it into programming mode. That didn't help me. Next I tried updating the bootloader using optiloader on the leostick, I had to patch the optiloader code:
--- optiloader.ino 2012-07-18 20:53:16.000000000 +0930
+++ ../optiLoader.pde 2012-07-18 20:51:49.000000000 +0930
@@ -50,15 +50,6 @@
// 9 to power more complex Arduino boards that draw more than 40mA, such
// as the Arduino Uno Ethernet !
//
-// For a leostick the pins are:
-// 0: slave reset (arbitrary allocation)
-// 14: MISO
-// 15: SCK
-// 16: MOSI
-//
-// The pins are only available on the ICSP header, this header
-// also supplies the power to the other board too.
-//
// If the aim is to reprogram the bootloader in one Arduino using another
// Arudino as the programmer, you can just use jumpers between the connectors
// on the Arduino board. In this case, connect:
@@ -98,11 +89,11 @@
/*
* Pins to target
*/
-#define SCK 15
-#define MISO 14
-#define MOSI 16
-#define RESET 0
-#define POWER 1
+#define SCK 13
+#define MISO 12
+#define MOSI 11
+#define RESET 10
+#define POWER 9
// STK Definitions; we can still use these as return codes
#define STK_OK 0x10
This is due to the leostick not having the programming pins in the "normal" arduino place. I made up a cable and managed to program the Etherten bootloader but still no joy. While I was mucking around with this I also found an usb-asp on ebay which is an in circuit programmer for atmel chips which happens to be what the arduinos use. The programmer and a 10 to 6 pin adaptor came to less than $7 including postage so I bought those.
The good news is that the usb-asp works fine under NetBSD with the native avrdude, I can upload sketches to the Etherten no problems. Avrdude is complaining that the usbasp needs a firmware update which is another adventure. I have loaded up the AVRisp sketch onto the leostick but I need to make a cross-over adaptor to connect my cable to the usbasp cable, then I can update the firmware.
Friday, June 29, 2012
These are not the bugs you are looking for
There is something vaguely frustrating about finding and fixing a bug in some software thinking that you have found your problem and things will start working right only to find that the problem is still there - you found a bug but not the one you were searching for.
I had this today, I have been off and on trying to track down why aspell seg faults when NetBSD curses is used, the problem has the hallmarks of memory being overwritten. I built a version of libcurses with dmalloc and it was telling me that a boundary was being overrun. After a bit of digging I found that in __init_get_wch there was a memset used to clear a buffer but the size argument was way too big causing memset to stomp past the end of the array. I fixed this and dmalloc no longer complained when I ran my simple test code but aspell still seg faults and the stack backtrace from the core file still looks as mangled as it did before my fix. So, yes, I definitely fixed a bug - just not the one I was aiming to fix.
I had this today, I have been off and on trying to track down why aspell seg faults when NetBSD curses is used, the problem has the hallmarks of memory being overwritten. I built a version of libcurses with dmalloc and it was telling me that a boundary was being overrun. After a bit of digging I found that in __init_get_wch there was a memset used to clear a buffer but the size argument was way too big causing memset to stomp past the end of the array. I fixed this and dmalloc no longer complained when I ran my simple test code but aspell still seg faults and the stack backtrace from the core file still looks as mangled as it did before my fix. So, yes, I definitely fixed a bug - just not the one I was aiming to fix.
Wednesday, June 27, 2012
Been distracted fixing a scrolling bug in the menu library (libmenu). When I did the code I only really tested the code in "row major" mode where the menu items are laid out left to right in the desired number of columns, the other mode "column major" for want of a better term lays the menu items out going down the columns - in both cases the number of rows in a menu is determined by the number of items to display. You are supposed to be able to navigate around the menu items by using up/down/left/right commands (there are a lot of other commands too...). To make the navigation easier I pre-calculated the neighbours when the menu is being created and stored them in the item structure. Lots of edge cases, literally, doubly so because you can either have "cyclic" navigation which will wrap around the edges of the menu or "non-cyclic" where navigation stops at the edge of the menu. I had to totally rewrite the neighbour calculation and tidy up the menu drawing. Now it works as it should in both modes.
I have also played around with getting rescanning of the usb bus working. I am not sure if I am happy with what I have, you can rescan an attached device and it will be detached and reattached, you can rescan an entire bus but I am not sure if that does anything. More testing is required.
I have also played around with getting rescanning of the usb bus working. I am not sure if I am happy with what I have, you can rescan an attached device and it will be detached and reattached, you can rescan an entire bus but I am not sure if that does anything. More testing is required.
Thursday, June 14, 2012
The fix is in....
I have committed the changes to umodem (plus usb.h) that allow the driver to attach the serial port on the leostick, I expect this will work for other arduinos too - they should "just work" in NetBSD-current now. The problem of the drivers detaching and not reattaching when the arduino gets reprogrammed needs to be worked on still.
Tuesday, June 12, 2012
Arduino success
I have managed to upload an example sketch to the leostick, it took a bit of sleuthing to get all the bits right.
step 1: I used the boot configuration editor (boot -c) to disable the uhidev driver so that it would not claim the leostick, this meant that the ugen driver claimed it instead which is what I wanted. I already had a program that used the ugen ioctl USB_GET_CONFIG_DESC to get the configuration description of the device. It turned out the leostick had three interfaces, the first is a CDC ACM (usb serial) the second looks to be CDC Data but didn't make much sense and the third is the HID that uhidev was latching on to. Given that the NetBSD umodem driver supports the CDC ACM interface type it was a matter of working out why umodem was not attaching. It turns out that the umodem match was checking that the interface protocol attribute was set to a particular number, on the leostick the protocol was 0.
step 2: fix the umodem driver. I just modified the attach so that it would attach if the protocol was the AT protocol or if protocol is 0. Rebuilt the kernel, installed, rebooted and now when I plug the leostick in the umodem driver claims the serial port. I could connect to the serial port and see output from the leostick.
step 3: I built the blink sketch from the examples in the arduino IDE. I had built a native version of avrdude and put it in the right place in the arduino tree. I tried using the upload using programmer menu item but this errored out. I replaced avrdude with a script that logged the command line parameters and tried to manually run avrdude with the logged parameters but had the same errors. After a bit of fumbling around I found that some of the avrdude parameters weren't right for my environment plus I could only upload a sketch when the bootloader was active (the first 7 seconds after reset). So, I finally managed to upload a sketch by firstly pressing reset and then running the command:
/usr/pkg/bin/avrdude -C/home/user/blymn/arduino-1.0.1/hardware/tools/avrdude.conf -patmega32u4 -carduino -P/dev/ttyU1 -Uflash:w:/tmp/build1833870341762611968.tmp/Blink.cpp.hex:i -v
I was still stumped for a while because it seemed like I couldn't upload a sketch without doing a reset but adding the option to force the baud rate to 1200 would let me upload again without resetting:
/usr/pkg/bin/avrdude -C/home/user/blymn/arduino-1.0.1/hardware/tools/avrdude.conf -patmega32u4 -carduino -P/dev/ttyU1 -Uflash:w:/tmp/build1833870341762611968.tmp/Blink.cpp.hex:i -v -b 1200
The only problem now is that the NetBSD usb drivers detach from the leostick and don't reattach when a sketch is uploaded when a sketch is already running so I still need to reset the leostick to get the devices back... inconvenient.
step 1: I used the boot configuration editor (boot -c) to disable the uhidev driver so that it would not claim the leostick, this meant that the ugen driver claimed it instead which is what I wanted. I already had a program that used the ugen ioctl USB_GET_CONFIG_DESC to get the configuration description of the device. It turned out the leostick had three interfaces, the first is a CDC ACM (usb serial) the second looks to be CDC Data but didn't make much sense and the third is the HID that uhidev was latching on to. Given that the NetBSD umodem driver supports the CDC ACM interface type it was a matter of working out why umodem was not attaching. It turns out that the umodem match was checking that the interface protocol attribute was set to a particular number, on the leostick the protocol was 0.
step 2: fix the umodem driver. I just modified the attach so that it would attach if the protocol was the AT protocol or if protocol is 0. Rebuilt the kernel, installed, rebooted and now when I plug the leostick in the umodem driver claims the serial port. I could connect to the serial port and see output from the leostick.
step 3: I built the blink sketch from the examples in the arduino IDE. I had built a native version of avrdude and put it in the right place in the arduino tree. I tried using the upload using programmer menu item but this errored out. I replaced avrdude with a script that logged the command line parameters and tried to manually run avrdude with the logged parameters but had the same errors. After a bit of fumbling around I found that some of the avrdude parameters weren't right for my environment plus I could only upload a sketch when the bootloader was active (the first 7 seconds after reset). So, I finally managed to upload a sketch by firstly pressing reset and then running the command:
/usr/pkg/bin/avrdude -C/home/user/blymn/arduino-1.0.1/hardware/tools/avrdude.conf -patmega32u4 -carduino -P/dev/ttyU1 -Uflash:w:/tmp/build1833870341762611968.tmp/Blink.cpp.hex:i -v
I was still stumped for a while because it seemed like I couldn't upload a sketch without doing a reset but adding the option to force the baud rate to 1200 would let me upload again without resetting:
/usr/pkg/bin/avrdude -C/home/user/blymn/arduino-1.0.1/hardware/tools/avrdude.conf -patmega32u4 -carduino -P/dev/ttyU1 -Uflash:w:/tmp/build1833870341762611968.tmp/Blink.cpp.hex:i -v -b 1200
The only problem now is that the NetBSD usb drivers detach from the leostick and don't reattach when a sketch is uploaded when a sketch is already running so I still need to reset the leostick to get the devices back... inconvenient.
Sunday, June 10, 2012
Arduino on NetBSD
I have been toying with using an arduino board to create a low power data logger. Not surprisingly there is no pre-packaged arduino IDE for NetBSD but there is for Linux and NetBSD can run Linux binaries. I downloaded the 32bit arduino-1.0.1 software and installed that. When I tried to run the arduino app it told me I was missing java. My previous attempts at getting java and NetBSD to play nicely have not turned out well but I thought I may as well give it another go and downloaded java for 32bit Linux. I installed java and set my PATH to point at it and then ran arduino again. Much to my surprise the IDE actually runs and seems to mostly work apart from it steadfastly refusing to see any serial ports. I was able to build an example sketch but I cannot upload it to a board because of the serial port problem. The IDE gives you the option to upload using using a programmer in which they just call avrdude, this fails because libusb is missing. A quick look in NetBSD pkgsrc shows that avrdude is actually there so I can build a native version of this tool. In fact, now that I look it seems that all the "avr" tools required to crossbuild binaries for the arduino boards are available in pkgsrc. So I could build native tools and use a makefile to produce and upload the binaries. I think I will stick with just replacing avrdude for the moment.
As for a target... I have a leostick which was given away at the 2012 LinuxConf.au, I have tried plugging this into my laptop and it is attached as a keyboard and mouse which is not very helpful, no serial port is attached. Apparently this works on some later versions of linux, I guess due to changes in the CDC ACM driver, in NetBSD this seems to be handled by umodem but I haven't yet managed to get it to attach with this driver. We shall see how it goes.
As for a target... I have a leostick which was given away at the 2012 LinuxConf.au, I have tried plugging this into my laptop and it is attached as a keyboard and mouse which is not very helpful, no serial port is attached. Apparently this works on some later versions of linux, I guess due to changes in the CDC ACM driver, in NetBSD this seems to be handled by umodem but I haven't yet managed to get it to attach with this driver. We shall see how it goes.
Sunday, June 03, 2012
Oops
OK - This place was ignored for quite some time, lets see if we can do something regular from now on.
What has happened the last 4 and a bit years? Well, verifiedexec per page signatures is still not in the tree. I merged the bottom end of the uvm routines but there was push-back on how I had done this from others, citing the long function argument list as being not a good thing. I have the actual code working, not only uvm but also the per-page verifiedexec stuff. I wrote a simple test that consisted of a binary that was stored on a NFS share - the binary just calls a couple of functions located in different memory pages (I inserted a bunch of unused code to get the functions on different pages). There are two versions of the binary the "good" version which has a matching veriexec signature and is the one initially run and the "evil" version that has a modified second page. The test consisted of starting the "good" binary and checking both function calls are the good ones, then on the NFS server overwrite the "good" binary with the "evil" one, once this is done the resident executable pages were flushed using msync (this is just a short cut you could force resource starvation). With an unmodified veriexec, the "evil" function would be executed, with the per-page modification the binary was terminated as soon as the pager attempted to bring in the modified page. This is good but I really need to fix up the UVM modifications to make them less convoluted.
Part of the reason veriexec hasn't progressed much in the intervening time is that I started working on automatically testing libcurses. This is quite a complex thing to do because curses expects to talk to a terminal. I developed a test framework that takes a simple list of commands and runs them against a test program and then verifies the output matches what is expected. This framework works with the NetBSD ATF (Automated Test Framework) and has been committed to the tree along with a small set of tests. More tests are on the way, there are an awful lot to write and it is a slow process but definitely worth it as a number of previously unreported bugs have been found and fixed.
I talked about both veriexec and the curses testing at BSDCan 2012, the actual talks are here for veriexec and here for curses testing there are papers and the slides from the presentations available at the aforementioned links.
What has happened the last 4 and a bit years? Well, verifiedexec per page signatures is still not in the tree. I merged the bottom end of the uvm routines but there was push-back on how I had done this from others, citing the long function argument list as being not a good thing. I have the actual code working, not only uvm but also the per-page verifiedexec stuff. I wrote a simple test that consisted of a binary that was stored on a NFS share - the binary just calls a couple of functions located in different memory pages (I inserted a bunch of unused code to get the functions on different pages). There are two versions of the binary the "good" version which has a matching veriexec signature and is the one initially run and the "evil" version that has a modified second page. The test consisted of starting the "good" binary and checking both function calls are the good ones, then on the NFS server overwrite the "good" binary with the "evil" one, once this is done the resident executable pages were flushed using msync (this is just a short cut you could force resource starvation). With an unmodified veriexec, the "evil" function would be executed, with the per-page modification the binary was terminated as soon as the pager attempted to bring in the modified page. This is good but I really need to fix up the UVM modifications to make them less convoluted.
Part of the reason veriexec hasn't progressed much in the intervening time is that I started working on automatically testing libcurses. This is quite a complex thing to do because curses expects to talk to a terminal. I developed a test framework that takes a simple list of commands and runs them against a test program and then verifies the output matches what is expected. This framework works with the NetBSD ATF (Automated Test Framework) and has been committed to the tree along with a small set of tests. More tests are on the way, there are an awful lot to write and it is a slow process but definitely worth it as a number of previously unreported bugs have been found and fixed.
I talked about both veriexec and the curses testing at BSDCan 2012, the actual talks are here for veriexec and here for curses testing there are papers and the slides from the presentations available at the aforementioned links.
Wednesday, September 19, 2007
Oh my, where did that year go?
Oops - I thought his would happen. Never was one for a regular diary, oh well.
The generic hook stuff looks like it won't be going anywhere, too bad. Wasn't really that much work to port it and it passed the time on the train.
Per-page veriexec is in a much better state now, it works fine and is stable but it depends on other modifications which I need to get in first.
I need to post up some diffs on some work I did to unify the bottom ends of genfs_getpages() and uvm_aio_iodone() into a single routine that is called from both. This gives me an ideal place to put a hook to check pages coming in from storage. This function has gone through a few rounds of refinement - teasing out a bit of code that is mostly the same but slightly different between the two uses is a bit hairy, getting a sane argument list (or even understanding the purpose of some of the arguments) has required a bit of work.
The thing that started me thinking about my blog again was this article
http://www.cbc.ca/news/background/tech/privacy/white-list.html
Way to go Symantec - NetBSD has been able to do this for years, great to see that the Windows world may actually be innovating their way to where we have been for a long time.
Other NetBSD stuff done over the last year includes integrating wide curses support. This was originally done by a student as a Google Summer of Code project and was supposed to be integrated soon after the SoC finished but it never happened. I finally had a combination of spare time and will to make it happen, the code integration was not too hard and, all in all, there was not much pain and lossage resultant. We still have what looks like a refresh bug but it seems to be relatively rare but I really need to look at it before the release with wcurses gets kicked out.
Also been messing with ACPI on my laptop recently, nasty thing refuses to go to sleep or power off. After trawling through the debug it seems that it is spinning on trying to power down the PCI bus. I am unsure why the bus never sleeps, it works fine in Windows (no surprise). I followed the instructions and extracted the AML from memory and hacked it to take out the PCI bus shutdowns. With the modified code the machine sleeps and powers off, the only problem is that the back light does not turn on when recovering from a sleep (S3). Even the trick of setting a bios password does not help. Another developer has a tool that is supposed to reset the video card, I will try and get a copy of that and see if it helps - if it does then running the tool on wakeup is not hard to arrange.
The generic hook stuff looks like it won't be going anywhere, too bad. Wasn't really that much work to port it and it passed the time on the train.
Per-page veriexec is in a much better state now, it works fine and is stable but it depends on other modifications which I need to get in first.
I need to post up some diffs on some work I did to unify the bottom ends of genfs_getpages() and uvm_aio_iodone() into a single routine that is called from both. This gives me an ideal place to put a hook to check pages coming in from storage. This function has gone through a few rounds of refinement - teasing out a bit of code that is mostly the same but slightly different between the two uses is a bit hairy, getting a sane argument list (or even understanding the purpose of some of the arguments) has required a bit of work.
The thing that started me thinking about my blog again was this article
http://www.cbc.ca/news/background/tech/privacy/white-list.html
Way to go Symantec - NetBSD has been able to do this for years, great to see that the Windows world may actually be innovating their way to where we have been for a long time.
Other NetBSD stuff done over the last year includes integrating wide curses support. This was originally done by a student as a Google Summer of Code project and was supposed to be integrated soon after the SoC finished but it never happened. I finally had a combination of spare time and will to make it happen, the code integration was not too hard and, all in all, there was not much pain and lossage resultant. We still have what looks like a refresh bug but it seems to be relatively rare but I really need to look at it before the release with wcurses gets kicked out.
Also been messing with ACPI on my laptop recently, nasty thing refuses to go to sleep or power off. After trawling through the debug it seems that it is spinning on trying to power down the PCI bus. I am unsure why the bus never sleeps, it works fine in Windows (no surprise). I followed the instructions and extracted the AML from memory and hacked it to take out the PCI bus shutdowns. With the modified code the machine sleeps and powers off, the only problem is that the back light does not turn on when recovering from a sleep (S3). Even the trick of setting a bios password does not help. Another developer has a tool that is supposed to reset the video card, I will try and get a copy of that and see if it helps - if it does then running the tool on wakeup is not hard to arrange.
Monday, September 04, 2006
Are we there yet?
Things have been slowly moving forward on the development front. I submitted a generic hooking infrastructure to NetBSD core for a decision as to whether or not it should be committed to the tree. A decision is still pending on that. What I have done is a direct port of the FreeBSD eventhandler stuff but there is a general dislike about the large macro needed to work some of the magic. We shall see what happens.
On the veriexec front, I updated my kernel sources and re-merged my per-page veriexec changes with the changes made by Elad when he added the fileassoc facility. I extended the fileassoc facility to allow a "hint" to be used instead of the implicit VOP_GETATTR() that the fileassoc code was using since in the per page code there were places where trying to get the file attributes was the wrong thing to do and I also could use the fileassoc facility as a generic hash table so I could do the vnode pointer <=> veriexec entry mapping. This change was not liked by some because it was seen I was trying to stealth the fileid back into the interface. It was more a lack of choice. At the time, using filehandles (VOP_VPTOFH()) was not an option because the NFS client code was lacking support for the VOP_VPTOFH() call. This would have meant a major regression as I would not be able to support NFS - very bad. After a bit of a thrash around with another developer (yamt) the problem with NFS was raised and yamt kindly offered to make good this hole in the filehandle support. This support was added over the weekend. While I was waiting I converted fileassoc to use filehandles and re-factored the per-page modifications to also use filehandles. The per-page veriexec patch is getting smaller and smaller which is good. I will give the fileassoc code a final once-over and then commit it. At this point I should be ready to re-spin the per-page veriexec code and finally get it into the source tree - that day will be a great day for me.
On the veriexec front, I updated my kernel sources and re-merged my per-page veriexec changes with the changes made by Elad when he added the fileassoc facility. I extended the fileassoc facility to allow a "hint" to be used instead of the implicit VOP_GETATTR() that the fileassoc code was using since in the per page code there were places where trying to get the file attributes was the wrong thing to do and I also could use the fileassoc facility as a generic hash table so I could do the vnode pointer <=> veriexec entry mapping. This change was not liked by some because it was seen I was trying to stealth the fileid back into the interface. It was more a lack of choice. At the time, using filehandles (VOP_VPTOFH()) was not an option because the NFS client code was lacking support for the VOP_VPTOFH() call. This would have meant a major regression as I would not be able to support NFS - very bad. After a bit of a thrash around with another developer (yamt) the problem with NFS was raised and yamt kindly offered to make good this hole in the filehandle support. This support was added over the weekend. While I was waiting I converted fileassoc to use filehandles and re-factored the per-page modifications to also use filehandles. The per-page veriexec patch is getting smaller and smaller which is good. I will give the fileassoc code a final once-over and then commit it. At this point I should be ready to re-spin the per-page veriexec code and finally get it into the source tree - that day will be a great day for me.
Wednesday, May 24, 2006
Long time between drinks
Been a while since I did this, a lot of things have happened.
Managed to fix up per page verified exec. I was horribly abusing the UVM subsystem by setting flags I shouldn't be touching. Hence the horrible and weird crashings, I was claiming back pages that no longer belonged to me. Ugly. I found this with some help from Chuq. Fixing it was a bit more involved, I needed to unlock the pages to avoid a deadlock when I did a VOP_GETATTR but needed the pages back so I could check their fingerprints, which mean I needed to do a getpages to get the pages back but the function I was in was being called by getpages... round and round in circles. The only reason I needed VOP_GETATTR was to get the device and inode numbers so I could look up the entry in the veriexec hash tables. Chuq suggested I could get around the circular references by setting up a hash table with the vnode pointer as the index that pointed to the details I needed. This neatly solved the issue but presented it's own problems, firstly I needed to associate the vnode to the device and file id's. I extended the getnewvnode() to add in two new arguments so the device and file id could be passed in, this way an association between the new vnode and the veriexec entry can be made in the vnode hash table. Doing this required touching a lot of file system code to get the right numbers passed in. The worst was NFS where the device and file id are not available when the new vnode is allocated, in this case I defer the vnode->veriexec association until the VOP_GETATTR is done, this should be safe since any file that is executed or read needs to get the permissions first so the association will be set up before it is needed. The other end of this is, of course, when the vnode is no longer required and gets recycled for a new use the old association needs to be broken. Chuq suggested a hook that gets run when the vnode gets cleaned (more on hooks later), at the moment I have hard coded a clean up into the vnode recycle code which works but is not the best. After doing all this and having a few false starts I finally have per page veriexec where it should be, fs independent and functional. I have sent the diffs off to Elad so that he can look at them and also so I am not the only one with them (mmm paranoia).
The follow on project was to finally do the generic hook stuff so that I didn't generate yet another set of hook functions in the kernel just for the vnode association clean up. I posted a proposal to tech-kern and received back a suggestion that we use the FreeBSD eventhandler code. I was happy with that idea, I have imported this into a private tree and modified things to suit NetBSD better. I have this all working now and have migrated some of the hooks over to using the code. I will update my private tree to the latest NetBSD and make sure nothing is broken and then put up some diffs on tech-kern and see what happens. Once the generic hooks are in I will go back and modify the per page veriexec stuff to use them to break the vnode->veriexec association.
Managed to fix up per page verified exec. I was horribly abusing the UVM subsystem by setting flags I shouldn't be touching. Hence the horrible and weird crashings, I was claiming back pages that no longer belonged to me. Ugly. I found this with some help from Chuq. Fixing it was a bit more involved, I needed to unlock the pages to avoid a deadlock when I did a VOP_GETATTR but needed the pages back so I could check their fingerprints, which mean I needed to do a getpages to get the pages back but the function I was in was being called by getpages... round and round in circles. The only reason I needed VOP_GETATTR was to get the device and inode numbers so I could look up the entry in the veriexec hash tables. Chuq suggested I could get around the circular references by setting up a hash table with the vnode pointer as the index that pointed to the details I needed. This neatly solved the issue but presented it's own problems, firstly I needed to associate the vnode to the device and file id's. I extended the getnewvnode() to add in two new arguments so the device and file id could be passed in, this way an association between the new vnode and the veriexec entry can be made in the vnode hash table. Doing this required touching a lot of file system code to get the right numbers passed in. The worst was NFS where the device and file id are not available when the new vnode is allocated, in this case I defer the vnode->veriexec association until the VOP_GETATTR is done, this should be safe since any file that is executed or read needs to get the permissions first so the association will be set up before it is needed. The other end of this is, of course, when the vnode is no longer required and gets recycled for a new use the old association needs to be broken. Chuq suggested a hook that gets run when the vnode gets cleaned (more on hooks later), at the moment I have hard coded a clean up into the vnode recycle code which works but is not the best. After doing all this and having a few false starts I finally have per page veriexec where it should be, fs independent and functional. I have sent the diffs off to Elad so that he can look at them and also so I am not the only one with them (mmm paranoia).
The follow on project was to finally do the generic hook stuff so that I didn't generate yet another set of hook functions in the kernel just for the vnode association clean up. I posted a proposal to tech-kern and received back a suggestion that we use the FreeBSD eventhandler code. I was happy with that idea, I have imported this into a private tree and modified things to suit NetBSD better. I have this all working now and have migrated some of the hooks over to using the code. I will update my private tree to the latest NetBSD and make sure nothing is broken and then put up some diffs on tech-kern and see what happens. Once the generic hooks are in I will go back and modify the per page veriexec stuff to use them to break the vnode->veriexec association.
Saturday, January 14, 2006
of misdirection and other things
I managed to haul the UPS battery pack down to The Battery Bloke shop today not an easy task but taking the car seemed such a waste. The batteries are standard sealed lead acid jobs so no problems sourcing them. All up $140 including the labour involved in removing the plastic terminal covers and connector and attaching said items to the new batteries.
I have been trying to track down some annoying bugs in the page level verified exec stuff. There seem to be a few bugs that have been hiding each other. Once after whacking on what I thought was the right bit of code I eventually work out the bug was in a totally different spot, fixing that bug made one of the crashes go away. I hate that, you have a segment of code you are not certain about and are sure there are bugs lurking in there so you start whacking away. After a concerted effort you work out that the bit of code is not involved at all in making things crash. Now I am at the point where things work exactly twice, on the third time the kernel either panics or weird stuff happens. The problem is that the panic or weird stuff happen in a sort of unrelated place - the damage is done elsewhere but it only circles around and bites later on which is rather difficult to debug. Again, for a long while I thought the problem was with the way I was releasing pages but even totally disabling the page release code does not stop the kernel panic, the fault lays elsewhere in the code. I shall try just removing code until things start working so I can narrow down what is being done incorrectly. Tedious.
I have been trying to track down some annoying bugs in the page level verified exec stuff. There seem to be a few bugs that have been hiding each other. Once after whacking on what I thought was the right bit of code I eventually work out the bug was in a totally different spot, fixing that bug made one of the crashes go away. I hate that, you have a segment of code you are not certain about and are sure there are bugs lurking in there so you start whacking away. After a concerted effort you work out that the bit of code is not involved at all in making things crash. Now I am at the point where things work exactly twice, on the third time the kernel either panics or weird stuff happens. The problem is that the panic or weird stuff happen in a sort of unrelated place - the damage is done elsewhere but it only circles around and bites later on which is rather difficult to debug. Again, for a long while I thought the problem was with the way I was releasing pages but even totally disabling the page release code does not stop the kernel panic, the fault lays elsewhere in the code. I shall try just removing code until things start working so I can narrow down what is being done incorrectly. Tedious.
Monday, December 26, 2005
bug hunt
I received some help with qemu on the NetBSD mailing lists, if I add -monitor stdio to the qemu command line then I get the monitor command line in the xterm where I started qemu. This allows me to send key events to the running emulation. I wanted this so I could drop NetBSD to ddb in qemu so I could tweak variables in the kernel. With the added command args I can now do this.
I wanted ddb so I could turn on the UVM history printing, I tried running with the history printing on as a default but this produces so much output it has a severe performance impact and I was never sure what was happening because everything scrolled up the screen so fast. So, I thought that I could disable the printing by default and just enable it before I run my tests. Hence the need for ddb. Unfortunately, even enabling the UVM history printing just prior to running my tests still produces so much output that it does not help. So, I will try just enabling the printing in the veriexec page routines to try and keep the amount of information manageable.
I wanted ddb so I could turn on the UVM history printing, I tried running with the history printing on as a default but this produces so much output it has a severe performance impact and I was never sure what was happening because everything scrolled up the screen so fast. So, I thought that I could disable the printing by default and just enable it before I run my tests. Hence the need for ddb. Unfortunately, even enabling the UVM history printing just prior to running my tests still produces so much output that it does not help. So, I will try just enabling the printing in the veriexec page routines to try and keep the amount of information manageable.
Saturday, December 24, 2005
assaulted battery
Bleh. The batter(y|ies) in my UPS ha(s|ve) decided to finally die. Of course, this does not happen at a convenient time so that I could wander on down the road to "The Battery Bloke" and order some new ones, oh no, here we are in the middle of some craziness to do with the shortest day of the year in the Northern Hemisphere (longest day of the year where I am) so everyone has shut down for some rampant consumerism. Fortunately, the power is pretty reliable where I am so the UPS is really more for comfort value in the face of storms and the like. I can plug everything direct into the mains and soldier on until the new year.
Another day, another kenel panic
Been trying to get verified exec on NetBSD to work correctly at the page level. I have worked on this off and on for quite a while. I did have it working fine but, unfortunately, it relied on all the filesystems calling genfs_getpages() which does not happen. I have shifted the code into the uvm getpages call but now when I force a modified page detection and try to repeat the test things seem to fall apart, pages get flagged as being modified when they are not and sometimes the kernel panics in genfs_putpages(), unfortunately the kernel core from this panic does not show much, I suspect that the cause of the panic is well in the past and it is just UVM tripping over something bad I have done to it or something I have omitted to do.
I managed to clean up some bugs by compiling with DIAGNOSTIC set in the kernel config. I wanted to use UVMHIST and UVMHIST_PRINT but the output they produce is way too verbose. I disabled the uvm history printing by tweaking the controlling variable with the idea that I would do the setup for my tests, tweak the uvm log printing on and then run the test. This plan has fallen apart somewhat because I am using the qemu machine emulator to provide me with a crashbox machine without requiring extra hardware which is always an advantage when you try hacking code during your daily commute to work on the train.
Qemu has actually worked very well for me, saved a lot of time by allowing me to keep my development environment running while the kernels panic in their qemu sandbox, it also means I don't put my file systems at risk when the kernel crashes. Making a backup of the qemu hard disk is a simple copy and I can just copy back the backup if the machine gets trashed. I can manipulate the qemu hard disk by using the vnd file-as-a-disk-image driver thing (vnode disk driver) to copy files to and from the qemu disk image so installing a new kernel or retrieving the kernel core dumps is quite easy, just a matter of a few short scripts to make life convenient. The only problem I have is that the keystrokes for generating keyboard events for "special" keys like dropping to ddb or changing virtual consoles are a mystery to me - the documented ones don't work - for example, ctrl-alt is supposed to release the focus from the qemu window after it has been locked there by clicking in the window. This does not seem to work and the quick look at the code for qemu I cannot see how they events can be generated in the sdl input handling code - just trying to use a special key combo like ctrl-alt-f1 changes the virtual console on the host machine not in qemu. I have put the question about this up on current-users@n.o, we shall see if there is an answer otherwise I shall just hack something in to qemu that will do what I want... ahhh the joys of having the source.
I managed to clean up some bugs by compiling with DIAGNOSTIC set in the kernel config. I wanted to use UVMHIST and UVMHIST_PRINT but the output they produce is way too verbose. I disabled the uvm history printing by tweaking the controlling variable with the idea that I would do the setup for my tests, tweak the uvm log printing on and then run the test. This plan has fallen apart somewhat because I am using the qemu machine emulator to provide me with a crashbox machine without requiring extra hardware which is always an advantage when you try hacking code during your daily commute to work on the train.
Qemu has actually worked very well for me, saved a lot of time by allowing me to keep my development environment running while the kernels panic in their qemu sandbox, it also means I don't put my file systems at risk when the kernel crashes. Making a backup of the qemu hard disk is a simple copy and I can just copy back the backup if the machine gets trashed. I can manipulate the qemu hard disk by using the vnd file-as-a-disk-image driver thing (vnode disk driver) to copy files to and from the qemu disk image so installing a new kernel or retrieving the kernel core dumps is quite easy, just a matter of a few short scripts to make life convenient. The only problem I have is that the keystrokes for generating keyboard events for "special" keys like dropping to ddb or changing virtual consoles are a mystery to me - the documented ones don't work - for example, ctrl-alt is supposed to release the focus from the qemu window after it has been locked there by clicking in the window. This does not seem to work and the quick look at the code for qemu I cannot see how they events can be generated in the sdl input handling code - just trying to use a special key combo like ctrl-alt-f1 changes the virtual console on the host machine not in qemu. I have put the question about this up on current-users@n.o, we shall see if there is an answer otherwise I shall just hack something in to qemu that will do what I want... ahhh the joys of having the source.
Subscribe to:
Posts (Atom)