Monday, September 04, 2006

Are we there yet?

Things have been slowly moving forward on the development front. I submitted a generic hooking infrastructure to NetBSD core for a decision as to whether or not it should be committed to the tree. A decision is still pending on that. What I have done is a direct port of the FreeBSD eventhandler stuff but there is a general dislike about the large macro needed to work some of the magic. We shall see what happens.

On the veriexec front, I updated my kernel sources and re-merged my per-page veriexec changes with the changes made by Elad when he added the fileassoc facility. I extended the fileassoc facility to allow a "hint" to be used instead of the implicit VOP_GETATTR() that the fileassoc code was using since in the per page code there were places where trying to get the file attributes was the wrong thing to do and I also could use the fileassoc facility as a generic hash table so I could do the vnode pointer <=> veriexec entry mapping. This change was not liked by some because it was seen I was trying to stealth the fileid back into the interface. It was more a lack of choice. At the time, using filehandles (VOP_VPTOFH()) was not an option because the NFS client code was lacking support for the VOP_VPTOFH() call. This would have meant a major regression as I would not be able to support NFS - very bad. After a bit of a thrash around with another developer (yamt) the problem with NFS was raised and yamt kindly offered to make good this hole in the filehandle support. This support was added over the weekend. While I was waiting I converted fileassoc to use filehandles and re-factored the per-page modifications to also use filehandles. The per-page veriexec patch is getting smaller and smaller which is good. I will give the fileassoc code a final once-over and then commit it. At this point I should be ready to re-spin the per-page veriexec code and finally get it into the source tree - that day will be a great day for me.

Wednesday, May 24, 2006

Long time between drinks

Been a while since I did this, a lot of things have happened.

Managed to fix up per page verified exec. I was horribly abusing the UVM subsystem by setting flags I shouldn't be touching. Hence the horrible and weird crashings, I was claiming back pages that no longer belonged to me. Ugly. I found this with some help from Chuq. Fixing it was a bit more involved, I needed to unlock the pages to avoid a deadlock when I did a VOP_GETATTR but needed the pages back so I could check their fingerprints, which mean I needed to do a getpages to get the pages back but the function I was in was being called by getpages... round and round in circles. The only reason I needed VOP_GETATTR was to get the device and inode numbers so I could look up the entry in the veriexec hash tables. Chuq suggested I could get around the circular references by setting up a hash table with the vnode pointer as the index that pointed to the details I needed. This neatly solved the issue but presented it's own problems, firstly I needed to associate the vnode to the device and file id's. I extended the getnewvnode() to add in two new arguments so the device and file id could be passed in, this way an association between the new vnode and the veriexec entry can be made in the vnode hash table. Doing this required touching a lot of file system code to get the right numbers passed in. The worst was NFS where the device and file id are not available when the new vnode is allocated, in this case I defer the vnode->veriexec association until the VOP_GETATTR is done, this should be safe since any file that is executed or read needs to get the permissions first so the association will be set up before it is needed. The other end of this is, of course, when the vnode is no longer required and gets recycled for a new use the old association needs to be broken. Chuq suggested a hook that gets run when the vnode gets cleaned (more on hooks later), at the moment I have hard coded a clean up into the vnode recycle code which works but is not the best. After doing all this and having a few false starts I finally have per page veriexec where it should be, fs independent and functional. I have sent the diffs off to Elad so that he can look at them and also so I am not the only one with them (mmm paranoia).

The follow on project was to finally do the generic hook stuff so that I didn't generate yet another set of hook functions in the kernel just for the vnode association clean up. I posted a proposal to tech-kern and received back a suggestion that we use the FreeBSD eventhandler code. I was happy with that idea, I have imported this into a private tree and modified things to suit NetBSD better. I have this all working now and have migrated some of the hooks over to using the code. I will update my private tree to the latest NetBSD and make sure nothing is broken and then put up some diffs on tech-kern and see what happens. Once the generic hooks are in I will go back and modify the per page veriexec stuff to use them to break the vnode->veriexec association.

Saturday, January 14, 2006

of misdirection and other things

I managed to haul the UPS battery pack down to The Battery Bloke shop today not an easy task but taking the car seemed such a waste. The batteries are standard sealed lead acid jobs so no problems sourcing them. All up $140 including the labour involved in removing the plastic terminal covers and connector and attaching said items to the new batteries.

I have been trying to track down some annoying bugs in the page level verified exec stuff. There seem to be a few bugs that have been hiding each other. Once after whacking on what I thought was the right bit of code I eventually work out the bug was in a totally different spot, fixing that bug made one of the crashes go away. I hate that, you have a segment of code you are not certain about and are sure there are bugs lurking in there so you start whacking away. After a concerted effort you work out that the bit of code is not involved at all in making things crash. Now I am at the point where things work exactly twice, on the third time the kernel either panics or weird stuff happens. The problem is that the panic or weird stuff happen in a sort of unrelated place - the damage is done elsewhere but it only circles around and bites later on which is rather difficult to debug. Again, for a long while I thought the problem was with the way I was releasing pages but even totally disabling the page release code does not stop the kernel panic, the fault lays elsewhere in the code. I shall try just removing code until things start working so I can narrow down what is being done incorrectly. Tedious.