Sunday, March 02, 2014

Fixing what fsck cannot fix

Before I start, lets get this out of the way....

WARNING: The tools and techniques described below, if misapplied, will turn your file system consistency to that of warm custard.  If you are not careful you can irreparably damage your data and lose files.  Be warned.

Next, do a backup of your data - right now.  So you don't have to try and salvage a busted file system.  It is a lot less stressful.

Now that we have those public service announcements out of the way...  It came to pass that I wanted to use my NetBSD on a stick but it was having problems booting.  I booted from my HDD and did a fsck of the memory stick filesystem which found a few errors probably the result of an ungraceful shutdown.  One of the files it complained about was the kernel (/netbsd) which would explain the problems booting but it also said that the /mnt directory was corrupted.  Fsck complained about a missing . and .. entry then claimed everything was fixed.  I have a habit of doing a second fsck if the first one found errors and fixed them, just to be certain everything was caught and fixed.  Unfortunately for me fsck still did not like /mnt and went through the same motions as the first time.  Not a good look.

I thought the easiest way may be just to remove the bad directory so I mounted the memory stick and tried to rm the mnt directory on the stick.  Nope, rm said the directory was not empty.  I tried to cd into the directory *kapow* kernel panic due to an inconsistent file system (no surprise there).  So, what to do?  fsck won't fix the error, I could rebuild the stick but I didn't want to spend the time doing that.

Enter the fsdb(8) command, this command allows you to perform some low level manipulation of a file system.  It will allow you to do things to a file system that you are normally prevented from doing - in other words you have the power to make a real mess if you do the wrong thing.  What I wanted to do was just force a removal of the broken mnt directory which is quite easy to do.  For me my memory stick was identified as sd0 and the root file system therein was on the "a" partition so I used fsdb to open the file system:

fsdb -f /dev/rsd0a

which printed a bunch of information about the filesystem and put me at a fsdb prompt.  To remove the bad directory I wanted to do two things, firstly clear the inode associated with the mnt directory and then remove the mnt entry from the parent directory.  To clear the inode we need to know the inode number, there are quite a few ways to get it, "ls" inside fsdb will show the directory entries which contains the inode number, ordinary ls from the command line could be used too if the file system is mountable.  For me, I had the report from fsck that inode 10336 was corrupt and that is was the mnt directory.  I validated this information using the ls in fsdb.  So, knowing the inode, clearing it is a matter of using the "clri" command:

clri 10336

fsdb confirmed the inode is cleared.  Now remove the directory entry from the parent directory (in my case this was /):

cd /
rm mnt

this will invalidate the directory slot associated with the name given.  Once this is done I just quit fsdb:

quit

Fsdb prints a warning message that the file system has been marked dirty and that a fsck is required to clean up any damage.  Following that advice I ran:

fsck -y /dev/rsd0a

and let fsck clean up after the surgery.  The file system cleaned up with no major problems.  I mounted up the memory stick and copied a new kernel onto it since the old one appeared to be mangled.

After that NetBSD on a stick worked fine... much easier than recreating the whole thing from scratch.
Again, you really should not be running fsdb on a filesystem you care about, you should have good backups and not need to resort to this level of skullduggery to recover but fsdb is handy to know about when things go really bad.

Thursday, February 20, 2014

the fix for mvderwin is in

I have been quiet for a while trying to fix the deceptively named mvderwin() function in libcurses.  Going by the name you would imagine that if derwin() creates a sub-window with coordinates relative to the origin of the parent window then mvderwin() should move a sub-window with coordinates relative to the origin of the parent window.  Except it doesn't.  What mvderwin() really does is create a mapping of the specified portion of the parent window into the sub-window, the sub-window does not change location at all.

Fixing the mvderwin code mainly involved tweaking around the screen refresh code so that when it found a sub-window that had been changed using mvderwin() the refresh code copied the characters from the correct place in the parent window into the sub-window region.  Working on the refresh code in libcurses is probably the most difficult thing to do as it is quite complex and easy to break in strange ways.  Fortunately, I have the automated tests for curses to help pick up any problems which is exactly why I wrote the curses automated testing in the first place.  Being able to run a test suite enables me to check I have not broken previously working code and also check the behaviour of my fixes to ensure they are outputing exactly what I think they should be.  The latter is probably the hardest thing to do in curses testing.  It is quite easy to make some changes that displays correctly but is doing things badly in an invisible way.  Something like outputting blanks when it does not need to, just by looking it would be difficult to tell whereas the automated testing suite flags the output as unexpected right away.

Friday, January 03, 2014

Google Summer of Code Mentor Summit

The Google Summer of Code mentor summit happened on the 19th and 20th of October last year.  It has taken me this long to find enough time to sit down and write something about it.

I have been a mentor for the summer of code since its inception.  I have mentored a few students to a successful conclusion.  I don't really take any credit for this, the students did all the hard work and made things happen.  All I did was provide some guidance now and then.  It has been a great experience not only helping someone new work on NetBSD but also get some great outcomes for the NetBSD project.

Every year Google bring a couple of mentors from each of the participating organisations to their Mountain View headquarters for a summit where mentors can share their experiences and learn from each other.  It also is a chance for Google to say "thank you" to the mentors for their efforts in helping make the Google Summer of Code work.  I had been wanting to go to one of these summits for a long time but just did not have the opportunity to leave home for all the previous ones.  Finally, I was able to put my hand up to go.

I arrived in California a few days early so that I could get over the jet lag and catch up with other NetBSD people before the mentor summit.  As the time got closer to the mentor summit the hotel started filling up with mentors, you could see a lot of t-shirts from previous Google Summer of Codes which helped identify the mentors.  One of the great things I found early on was the sense of community.  Other mentors would recognise the t-shirts and start chatting to you.  I remember one time I was wandering around the hotel and another mentor said "oh, hey, you are a mentor too! We are going off to the computer history museum.  You want to come along?" - they guy didn't know me at all but was happy to invite me along on a trip.  I also ran into another mentor waiting at a bus stop and, like me, was heading into San Francisco for a bit of sight seeing.  We chatted on the bus and train about our projects and what we had done for the Summer of Code.  It was good to be part of such a friendly and inclusive bunch.  All through the summit the most common questions seemed to be What project are you from?  and Where are you from?

The summit itself started with a get together around the hotel pool on Friday night.  From that point on I felt totally looked after.  The people from Google had organised everything, be it food and drink friday night, to buses to and from the Google headquarters, all the food, drink and coffee one could wish for there and also the party saturday night.  It was great to be so spoilt.  A big thanks to all the Googlers who put all that together.

At first the summit itself was a bit hard to get my head around.  I am used to conferences where the schedule is fixed in advanced and you pick what you want to listen to and then find the room.  The summit is very different to this, it is an "unconference".  The first thing I realised is that it is important to actually vote on the talk proposals that were put up well in advance of the summit.  Anyone can propose a talk and the proposals that get the most votes get a room to hold that talk, the number of votes garnered determines the size room so it is important to vote (oops, I didn't).  The next is the time and location of the talk can be very fluid - if the time or location doesn't suit it can be changed on the day so checking the board with the talk locations on it regularly is a must.  Once I had the concepts sorted it was not so bad, I attended some very interesting talks.

The summit was over all too fast and now I am looking forward to attending another one.  It is truly a unique experience and one I can highly recommend.

As a footnote and on a totally different subject, travelling with a device that has GPS is such a boon.  I feel far more confident in wandering around know that I can easily work out where I am and how to get back to home base.  I have an android phone and was using OSMAND+ which allows me to navigate without needing an internet connection unlike a lot of other mapping apps.

Thursday, September 12, 2013

Sniffing the USB for answers

Previously I mentioned that part of what helped tracking down the problems with the arduino uno attach was sniffing the USB to see what the bus transactions were doing.  I thought I would write a quick piece as to how I did this.

The first bit was buying a logic analyser.  I looked at getting a dedicated USB analyser but they tend to be a bit on the pricey side and, besides, they are very specialised - I don't expect I will be doing too much USB sniffing so buying a special device just to sort out a particular bug was hard to justify.  Since USB uses standard logic level signals I could get away with capturing the logic levels and using some software to decode what that translated to in terms of USB data.  The added bonus is that I would have a tool for decoding other protocols such as SPI, one-wire, I2C, serial and so forth.  There are lots of options for buying a logic analyser and the price range varies widely depending on the capabilities of the analyser.  For me, I needed something that was fast enough to sample a full speed USB (that runs at 12MHz) and at least two channels, preferably more.

There are a lot of logic analysers based on the Cypress FX2 chips, these guys are just dumb samplers that feed the data back to a host PC over USB - all the triggering and data storage is done on the host.  There were cheaper options but I settled on a salea logic because it was housed in a decent case, had a nice set of probes, a carry case along with software that seemed decent - you can download and run the software without the analyser hardware to try it out which is good.  The salea gave me 8 inputs and just enough bandwidth to sample a full speed USB.

The next step was the software.  The salea came with software which is functional and has some decoding facilities but, unfortunately, those facilities don't extend to decoding USB transactions.  To decode the USB transactions I used the sigrok open source software to perform the decodes.  Sigrok has firmware for the salae logic device, the firmware is loaded when the device is opened (i.e. it is not stored in flash on the device, it is downloaded every time into RAM) so there is no harm in using their firmware.  I ended up not using this feature, because my NetBSD machine was the device under test I used another laptop running Windows and the salea software to perform the data captures.  The captures were exported as VCD (value change dump) files which I copied to my NetBSD machine and ran sigrok-cli  on the data.

To perform the captures I needed to break out the USB signals so I could attach the logic analyser.  To do this I bought a short USB extension cable, cut it in half and soldered the wires to a small piece of veroboard.  I soldered a bit of single line header to the veroboard so each wire in the cable was brought out to a header pin that I could connect the logic analyser probes to.  I wasn't going to bring out the +5V rail but when I was making up the board I forgot and did all four wires.. oh well.  The probe wires from the salae plug neatly onto the header which makes the arrangement safe as well as physically well connected.  Here is a snap of what the breakout board looks like with the probes attached:










Fortunately for me, the cable manufacturer followed the USB hardware specification and used the specified colours for the wires so it was easy to pick the correct wires for probing.  The spec says white is the D- signal, green is the D+ signal and black is ground.  So I connected probe 0 (black) to the white wire and probe 1 (brown) to the green wire.  As a side note - astute readers may have noticed that the probe wire colours correspond to the resistor colour code which makes it easy to know what probe number you are dealing with - ground is sleeved to and labelled to distinguish it from the probes.  Once the probes were connected, I plugged the extension cable into my NetBSD machine, hooked up the salea to the windows machine and started the sampling software.  I configured a trigger to start sampling on the falling edge of D+ as this transition will happen when a device is plugged in and to sample at 24MHz.  I then connected the arduino to the other end of the extension cable.  The analyser triggered and started saving samples.  I let the attach process for the arduino complete, then stopped the capture and exported the sample data as a VCD file.

Analysing the data was a challenge.  The sigrok software is capable of doing it but, to be blunt, their documentation is rather poor.  I ended up on their IRC channel to ask for help.  The guys there were friendly and guided me through what I needed to do.  Once you understand what they are doing it makes sense, the sigrok software uses layers of decoders to analyse the data stream.  For usb they start with a layer called usb_signalling that takes the raw data samples and converts them into USB data.  The next layer is called usb_protocol which takes the output from usb_signalling and interprets the data bytes as USB protocol information.  All the decoders are written in python, it is quite easy to modify existing decoders or write your own.  The command line I used for the decoding was:


sigrok-cli --rate 1000m -i netbsd.vcd -I vcd:compress -P usb_signalling:dm=0:dp=1,usb_protocol

Note that the --rate option is one of my own making, there is a bug currently in sigrok-cli in that it fails to determine the sample rate from a VCD file and will crash. I added --rate to allow me to set the sample rate, this is a hack only, the sigrok developers are looking at a more elegant solution. Also, the sample rate is 1GHz because that is the sample rate set in the VCD. The next argument is the input file (netbsd.vcd) and then the input format (vcd) with the compress flag set to speed up the file reading. Then the -P option specifies the protocol decoders - first usb_signalling which has two arguments that define which probe numbers were connected to what signals - dm is D- and, as above, this was probe 0, similarly dp is D+ and was probe 1. The next decoder layer is usb_protocol which performs the protocol decode. On the machines I have the decode was very slow, it took almost half an hour of churning before sigrok-cli started spitting out any decoded data and took many hours to decode about 30 seconds of sampled data. The end result was output that looks like this:

usb_protocol: "SOF 1781" 
usb_protocol: "SETUP DEV 0 EP 0" 
usb_protocol: "DATA0 00 05 02 00 00 00 00 00" 
usb_protocol: "ACK " 
usb_protocol: "IN DEV 0 EP 0" 
usb_protocol: "NAK " 
usb_protocol: "SOF 1782" 
usb_protocol: "IN DEV 0 EP 0" 
usb_protocol: "DATA1 ()" 
usb_protocol: "ACK " 
usb_protocol: "SOF 1783" 
usb_protocol: "SOF 1784" 
usb_protocol: "SOF 1785" 
usb_protocol: "SOF 1786" 

This actually shows the SetAddress USB request that NetBSD was sending to the device, not only that but shows the problem that we were having - the "IN" command there is the status request for the SetAddress.  The first try results in a "NAK" - negative acknowledgement, the device was too busy to answer, the usb controller retries the IN again and gets the status back - unfortunately by this time it seems that the atmel USB microcode is confused and things don't work right from then on.  What we observed with windows and linux is that the "IN" for the status was delayed until the next start-of-frame (SOF) which gave the device time to process the request before getting hit by a demand for status.  Modifying the NetBSD driver to do the same made things work.

The decoding was not always perfect, I think because the data rate of the USB was right at the limit of the sample rate for the logic analyser (remember Mr. Nyquist says you need a sample rate of at least twice your highest frequency to properly sample data) so sometimes I would see decode errors but, fortunately, the decode mostly worked right and provided us with valuable clues.

Friday, September 06, 2013

Arduino uno/Freetronics etherten attaches now

It took a lot of thrashing about and false trails but we now have the aduino uno (plus others) attaching and also the atmel processors will attach in DFU mode (used for flashing the boot loaders) whereas before they would fail with an address error.  I had to resort to sniffing the USB transactions using a salae logic and decoding with sigrok-cli.  What this showed was the uhci driver was was feeding the control request bus transactions in too fast.  By changing how the transactions were fed and also disallowing what is known as "short" transactions for the control requests.  After fixing these things the attach worked fine.  Many thanks to fellow NetBSD developer Nick who waded in, helped with suggestions and formulated the final working fix.  Hopefully a proper fix will hit NetBSD-current soon.

Sunday, August 04, 2013

Another Red Hat Cert

Just last week I sat and passed the dreaded RHS333 Enterprise Network Security Services exams.  The instructor I had said that he only knew of one person who passed these exams first sitting, that was himself.  Now he knows two...  Not that this matters now, Red Hat are deprecating this certification in favour of a new one - that doesn't invalidate what I have done but going forward there will be a new certification to take the place of RHS333.  Regardless, I thought I would put down my thoughts on how to prepare for these exams as it does apply to pretty much all the Red Hat certifications just in case it helps someone else.

Firstly, I set up a lab environment which was just an extension of what I had described in Linux lab in the lap posting.  I just added a couple of new networks but made them routed instead of NAT type networks because it seemed like cross network traffic was being NATed with the IP address of the bridge interface which messed up the IP address based access control exercises.  On one network (the "good" network) I kickstarted a couple of machines to perform the exercises on, on the other network (the "bad" network) I kickstarted another machine to use as the "bad guy" in the lab exercises.  Then I would just run through the lab exercises until I could do them without even thinking, rebuilding all the machines before going over the labs again so I would have clean machines each time.

Here are a few more tips:

  • If yu have the book from the associated course then do all the exercises in the labs and do them well, make sure you understand what you are doing.  Don't skip any because anything that is covered in that book can be in the exam.
  • If a lab does not work then find out why, this is very important.  If you screw up you need to know why so either you can immediately recognise the error in the exam and fix it or just avoid making the error in the first place.  I have seen someone screw up a lab exercise and went on to do the same error in the exam - they failed because of this.
  • You don't have to memorise everything.  One of my instructors said, "this is an open man exam" meaning that you have all the man pages and documentation that is part of the standard RHEL distribution during the exam.  You can install whatever you like on the exam machines.  So, knowing where to look for information instead of rote memorisation is a valid tactic.  A lot of the time the samples provided in the man pages, sample configuration files or under /usr/share/doc are enough to get you going during the exam.  Just cut and paste from the documentation into your config file and adjust to suit the exam requirements.  So make it part of the exam preparation to hunt down where the trickier configuration documentation is kept so you can immediately bring it up during the exam.  Going hunting during the exam wastes precious time.
  • If you are having problems with something during the exam that other things do not rely on then just move on.  You can always revisit the question later, perhaps you will have worked out what to do or have time to spare to fiddle around.  By moving on you may able to pick up extra marks on things you can get working instead of beating your head against a wall fruitlessly
  • Do the chkconfig on immediately after installing or starting any new service that has been asked for.  If you are asked to provide a service then it must come up on reboot - ensure it does.  Same with selinux settings and other system settings - make sure they are permanent from the get-go, saves having to rework later.  Same with firewall settings, if iptables is running then add the appropriate rules for the service as you go - I usually just edit /etc/sysconfig/iptables directly and reload the service, saves typing because you can copy another existing line (though, command line history would be just the same...)
  • Make sure you are careful with file modes and ownership - I must admit to being sloppy with this during the exam and it does cost me
  • Leave enough time for a couple of reboots and carefully check everything that has been asked for still works after the reboot.  I have found that after drilling on the lab exercises I have plenty of time for this, in all the exams I have faced after starting really working on the lab exercises in my own time I have found that I can easily complete the entire exam with plenty of time to spare.
Hopefully this stuff will make a difference for someone else...

Friday, July 12, 2013

NetBSD, sendmail and smtp auth

Firstly, I do like having a MTA on my laptop. It permits things like send-pr to work and allows me to queue up mails while I am off network and have them relay when a network appears. For better or worse, the default MTA that ships with recent versions of NetBSD is postfix.

I don't really like postfix much, I really prefer sendmail. I have used sendmail from back in the days when you had to hand craft your sendmail.cf, the m4 configuration stuff was not around. I am used to the rich debug options you get with sendmail, something I sorely miss with postfix. Being that postfix is the default MTA on NetBSD, I thought I would at least give it a chance and I have been using it for quite some time but a few days ago it did what I think is an unforgivable sin. Postfix refused to relay mail. It did so for no clear reason. Yes, initially there seemed to be a problem with my stunnel that securely tunnels a connection to my ISP smtps server but that resolved itself and yet mail still sat in the queue. A reboot (not that I rebooted due to that, my laptop gets rebooted a lot for other reasons) and also a restart of postfix did not move the mail. I had to tell postfix to push the mail out... that just sucks in my books, the MTA should try and flush the queue if there are pending messages when it starts up. So, time to bring sendmail back. This is how I configured sendmail to use the stunnel tunnel to my ISPs secure mail server and perform a smtp auth which my ISP requires.

Install stunnel from pkgsrc and then edit /usr/pkg/etc/stunnel/stunnel.conf. Add:

[smtps]
accept = 5000
connect = securemail.server.your.isp:smtps


setup stunnel to start at boot by adding an entry to the /etc/rc.conf and put a copy of /usr/pkg/share/examples/rc.d/stunnel into /etc/rc.d. Now stunnel should start on boot (actually, I have a dhclient hook script that restarts stunnel when I get an new IP). That should be it, if you start up stunnel then you should be able to test the connection by doing:

telnet localhost 5000


If all is well then telnet should connect and open a session, you should be able to see a smtp greeting. This is all you really need to do though it does make you vulnerable to a man in the middle attack where someone could manipulate your DNS to make stunnel think it is connecting to the secure mail server but is really connecting to the attackers host. Your ISP may provide a digital certificate that you can use to validate you are connecting to the correct machine (mine does). Put the certificate into /usr/pkg/etc/stunnel/certs.pem. My ISP only provided their certificate, I had to download the root certificate bundle from the certificate issuers web site myself. You can find out the issuer by running:

openssl x509 -in /usr/pkg/etc/stunnel/certs.pem -noout -issuer


Once you have the issuers certificate bundle, just append the certificates to the end of the certs.pem file (assuming all the certs are in PEM format of course). Then tell stunnel to start using the certificates by adding this to the stunnel.conf:

verify = 3
CAfile = /usr/pkg/etc/stunnel/certs.pem


Setting verify to 3 tells stunnel to verify the peer with a locally installed certificate. Restart stunnel and perform the telnet test again, you should again get a successful connection to your ISPs securemail server. If you don't then you have something broken, there is a debug option in the stunnel.conf if you set this on then a stunnel.log will be written into the location that stunnel is chroot'ed to. By default that is /var/chroot/stunnel/. If you see:

CERT: Verification error: unable to get local issuer certificate


Then your issuer certificate is either missing or incorrect.

Once stunnel configured and tested, it is time to move onto sendmail. Sendmail is available in pkgsrc, I normally build my own packages instead of installing binary ones. These are the steps I followed. To build sendmail with SASL support add this to /etc/mk.conf:

add PKG_OPTIONS.sendmail=   sasl


make and install sendmail. Copy the sendmail and smmsp in /usr/pkg/share/examples/rc.d/ to /etc/rc.d.
Also install cy2-plain otherwise you will get AUTH=client, available mechanisms do not fulfill requirements in the maillog if you try and use PLAIN for smtp authentication. I am guessing that if a different authentication scheme is used then you should install the matching cy2 package to get support for the scheme.

With sendmail installed we next turn to the sendmail config. Firstly create a file in /usr/pkg/share/sendmail/cf/. I normally use the convention of machinename.mc where machinename is the hostname of the machine the sendmail configuration is for. The file had the contents:

include(`../m4/cf.m4')
VERSIONID(`@(#)machine.mc $Revision: 1.1.1.1 $')
OSTYPE(bsd4.4)dnl
FEATURE(masquerade_envelope)dnl
FEATURE(`authinfo',`hash /home/sendmail/client-info')dnl
define(`confAUTH_MECHANISMS', `EXTERNAL GSSAPI DIGEST-MD5 CRAM-MD5 LOGIN PLAIN')dnl
define(`SMART_HOST', [127.0.0.1])dnl
define(`RELAY_MAILER_ARGS', `TCP $h 5000')dnl
define(`ESMTP_MAILER_ARGS', `TCP $h 5000')dnl
define(`confCHECKPOINT_INTERVAL', 10)dnl
define(`confMESSAGE_TIMEOUT', 3d/4h)dnl
MASQUERADE_AS(your.isp.domain)dnl
LOCAL_USER(root)
MAILER(local)dnl
MAILER(smtp)dnl

Usually people put the file for authinfo under /etc/mail, I have put mine in /home/sendmail mainly because my /home partition is encrypted using CGD so, without my CGD passphrase, people cannot steal my authentication details even if they have physical access to the machine. Note also that the number in the RELAY_MAILER_ARGS and ESMTP_MAILER_ARGS (in my case 5000) must match the number defined on the accept line in the stunnel.conf file.

Create the sendmail configuration file using:

make machinename.cf


Copy the resulting machinename.cf to /etc/mail/sendmail.cf. If this is a first time install of sendmail you may need to make and install the submit.cf file into /etc/mail too, the submit.mc is in the same directory as the custom configuration file source.

The above sendmail configuration expects a sendmail hash database in /home/sendmail/client-info, to create this we create a file /home/sendmail/client-info with the contents:

AuthInfo:127.0.0.1 "U:username@your.isp.domain" "I:username@your.isp.domain" "P:sekret!"


Since we have stunnel listening on port 5000 on localhost all we need to define is an entry for localhost. The U and I parameters are for authentication and authorisation, usually they are both the same and most places seem to use your email address. It is best to check these details with your mail provider. The P field is the password for the account.
After the client-info file is created, change directory to /home/sendmail (or where-ever you placed the file) and run:

makemap hash client-info < client-info


To create the hash database. Make sure the directory and all the files in it are only readable by root.

Next we need to update the /etc/mailer.conf to tell NetBSD to use sendmail instead of the default postfix. Just run this command:

ln -fs /usr/pkg/share/examples/sendmail/mailer.conf /etc/mailer.conf


Finally, edit /etc/rc.conf and add:

postfix=NO
sendmail=yes
smmsp=yes


To turn off postfix and start sendmail at boot. At this point you should be able to shut down postfix, start sendmail and test sending emails.