Sunday, March 02, 2014

Fixing what fsck cannot fix

Before I start, lets get this out of the way....

WARNING: The tools and techniques described below, if misapplied, will turn your file system consistency to that of warm custard.  If you are not careful you can irreparably damage your data and lose files.  Be warned.

Next, do a backup of your data - right now.  So you don't have to try and salvage a busted file system.  It is a lot less stressful.

Now that we have those public service announcements out of the way...  It came to pass that I wanted to use my NetBSD on a stick but it was having problems booting.  I booted from my HDD and did a fsck of the memory stick filesystem which found a few errors probably the result of an ungraceful shutdown.  One of the files it complained about was the kernel (/netbsd) which would explain the problems booting but it also said that the /mnt directory was corrupted.  Fsck complained about a missing . and .. entry then claimed everything was fixed.  I have a habit of doing a second fsck if the first one found errors and fixed them, just to be certain everything was caught and fixed.  Unfortunately for me fsck still did not like /mnt and went through the same motions as the first time.  Not a good look.

I thought the easiest way may be just to remove the bad directory so I mounted the memory stick and tried to rm the mnt directory on the stick.  Nope, rm said the directory was not empty.  I tried to cd into the directory *kapow* kernel panic due to an inconsistent file system (no surprise there).  So, what to do?  fsck won't fix the error, I could rebuild the stick but I didn't want to spend the time doing that.

Enter the fsdb(8) command, this command allows you to perform some low level manipulation of a file system.  It will allow you to do things to a file system that you are normally prevented from doing - in other words you have the power to make a real mess if you do the wrong thing.  What I wanted to do was just force a removal of the broken mnt directory which is quite easy to do.  For me my memory stick was identified as sd0 and the root file system therein was on the "a" partition so I used fsdb to open the file system:

fsdb -f /dev/rsd0a

which printed a bunch of information about the filesystem and put me at a fsdb prompt.  To remove the bad directory I wanted to do two things, firstly clear the inode associated with the mnt directory and then remove the mnt entry from the parent directory.  To clear the inode we need to know the inode number, there are quite a few ways to get it, "ls" inside fsdb will show the directory entries which contains the inode number, ordinary ls from the command line could be used too if the file system is mountable.  For me, I had the report from fsck that inode 10336 was corrupt and that is was the mnt directory.  I validated this information using the ls in fsdb.  So, knowing the inode, clearing it is a matter of using the "clri" command:

clri 10336

fsdb confirmed the inode is cleared.  Now remove the directory entry from the parent directory (in my case this was /):

cd /
rm mnt

this will invalidate the directory slot associated with the name given.  Once this is done I just quit fsdb:

quit

Fsdb prints a warning message that the file system has been marked dirty and that a fsck is required to clean up any damage.  Following that advice I ran:

fsck -y /dev/rsd0a

and let fsck clean up after the surgery.  The file system cleaned up with no major problems.  I mounted up the memory stick and copied a new kernel onto it since the old one appeared to be mangled.

After that NetBSD on a stick worked fine... much easier than recreating the whole thing from scratch.
Again, you really should not be running fsdb on a filesystem you care about, you should have good backups and not need to resort to this level of skullduggery to recover but fsdb is handy to know about when things go really bad.

Thursday, February 20, 2014

the fix for mvderwin is in

Hello! If you have landed here because of the link from Mr. Dickey's page you may want to consider reading my blog post here

I have been quiet for a while trying to fix the deceptively named mvderwin() function in libcurses.  Going by the name you would imagine that if derwin() creates a sub-window with coordinates relative to the origin of the parent window then mvderwin() should move a sub-window with coordinates relative to the origin of the parent window.  Except it doesn't.  What mvderwin() really does is create a mapping of the specified portion of the parent window into the sub-window, the sub-window does not change location at all.

Fixing the mvderwin code mainly involved tweaking around the screen refresh code so that when it found a sub-window that had been changed using mvderwin() the refresh code copied the characters from the correct place in the parent window into the sub-window region.  Working on the refresh code in libcurses is probably the most difficult thing to do as it is quite complex and easy to break in strange ways.  Fortunately, I have the automated tests for curses to help pick up any problems which is exactly why I wrote the curses automated testing in the first place.  Being able to run a test suite enables me to check I have not broken previously working code and also check the behaviour of my fixes to ensure they are outputing exactly what I think they should be.  The latter is probably the hardest thing to do in curses testing.  It is quite easy to make some changes that displays correctly but is doing things badly in an invisible way.  Something like outputting blanks when it does not need to, just by looking it would be difficult to tell whereas the automated testing suite flags the output as unexpected right away.

Friday, January 03, 2014

Google Summer of Code Mentor Summit

The Google Summer of Code mentor summit happened on the 19th and 20th of October last year.  It has taken me this long to find enough time to sit down and write something about it.

I have been a mentor for the summer of code since its inception.  I have mentored a few students to a successful conclusion.  I don't really take any credit for this, the students did all the hard work and made things happen.  All I did was provide some guidance now and then.  It has been a great experience not only helping someone new work on NetBSD but also get some great outcomes for the NetBSD project.

Every year Google bring a couple of mentors from each of the participating organisations to their Mountain View headquarters for a summit where mentors can share their experiences and learn from each other.  It also is a chance for Google to say "thank you" to the mentors for their efforts in helping make the Google Summer of Code work.  I had been wanting to go to one of these summits for a long time but just did not have the opportunity to leave home for all the previous ones.  Finally, I was able to put my hand up to go.

I arrived in California a few days early so that I could get over the jet lag and catch up with other NetBSD people before the mentor summit.  As the time got closer to the mentor summit the hotel started filling up with mentors, you could see a lot of t-shirts from previous Google Summer of Codes which helped identify the mentors.  One of the great things I found early on was the sense of community.  Other mentors would recognise the t-shirts and start chatting to you.  I remember one time I was wandering around the hotel and another mentor said "oh, hey, you are a mentor too! We are going off to the computer history museum.  You want to come along?" - they guy didn't know me at all but was happy to invite me along on a trip.  I also ran into another mentor waiting at a bus stop and, like me, was heading into San Francisco for a bit of sight seeing.  We chatted on the bus and train about our projects and what we had done for the Summer of Code.  It was good to be part of such a friendly and inclusive bunch.  All through the summit the most common questions seemed to be What project are you from?  and Where are you from?

The summit itself started with a get together around the hotel pool on Friday night.  From that point on I felt totally looked after.  The people from Google had organised everything, be it food and drink friday night, to buses to and from the Google headquarters, all the food, drink and coffee one could wish for there and also the party saturday night.  It was great to be so spoilt.  A big thanks to all the Googlers who put all that together.

At first the summit itself was a bit hard to get my head around.  I am used to conferences where the schedule is fixed in advanced and you pick what you want to listen to and then find the room.  The summit is very different to this, it is an "unconference".  The first thing I realised is that it is important to actually vote on the talk proposals that were put up well in advance of the summit.  Anyone can propose a talk and the proposals that get the most votes get a room to hold that talk, the number of votes garnered determines the size room so it is important to vote (oops, I didn't).  The next is the time and location of the talk can be very fluid - if the time or location doesn't suit it can be changed on the day so checking the board with the talk locations on it regularly is a must.  Once I had the concepts sorted it was not so bad, I attended some very interesting talks.

The summit was over all too fast and now I am looking forward to attending another one.  It is truly a unique experience and one I can highly recommend.

As a footnote and on a totally different subject, travelling with a device that has GPS is such a boon.  I feel far more confident in wandering around know that I can easily work out where I am and how to get back to home base.  I have an android phone and was using OSMAND+ which allows me to navigate without needing an internet connection unlike a lot of other mapping apps.