Wednesday, September 19, 2007
Oh my, where did that year go?
Oops - I thought his would happen. Never was one for a regular diary, oh well.
The generic hook stuff looks like it won't be going anywhere, too bad. Wasn't really that much work to port it and it passed the time on the train.
Per-page veriexec is in a much better state now, it works fine and is stable but it depends on other modifications which I need to get in first.
I need to post up some diffs on some work I did to unify the bottom ends of genfs_getpages() and uvm_aio_iodone() into a single routine that is called from both. This gives me an ideal place to put a hook to check pages coming in from storage. This function has gone through a few rounds of refinement - teasing out a bit of code that is mostly the same but slightly different between the two uses is a bit hairy, getting a sane argument list (or even understanding the purpose of some of the arguments) has required a bit of work.
The thing that started me thinking about my blog again was this article
http://www.cbc.ca/news/background/tech/privacy/white-list.html
Way to go Symantec - NetBSD has been able to do this for years, great to see that the Windows world may actually be innovating their way to where we have been for a long time.
Other NetBSD stuff done over the last year includes integrating wide curses support. This was originally done by a student as a Google Summer of Code project and was supposed to be integrated soon after the SoC finished but it never happened. I finally had a combination of spare time and will to make it happen, the code integration was not too hard and, all in all, there was not much pain and lossage resultant. We still have what looks like a refresh bug but it seems to be relatively rare but I really need to look at it before the release with wcurses gets kicked out.
Also been messing with ACPI on my laptop recently, nasty thing refuses to go to sleep or power off. After trawling through the debug it seems that it is spinning on trying to power down the PCI bus. I am unsure why the bus never sleeps, it works fine in Windows (no surprise). I followed the instructions and extracted the AML from memory and hacked it to take out the PCI bus shutdowns. With the modified code the machine sleeps and powers off, the only problem is that the back light does not turn on when recovering from a sleep (S3). Even the trick of setting a bios password does not help. Another developer has a tool that is supposed to reset the video card, I will try and get a copy of that and see if it helps - if it does then running the tool on wakeup is not hard to arrange.
The generic hook stuff looks like it won't be going anywhere, too bad. Wasn't really that much work to port it and it passed the time on the train.
Per-page veriexec is in a much better state now, it works fine and is stable but it depends on other modifications which I need to get in first.
I need to post up some diffs on some work I did to unify the bottom ends of genfs_getpages() and uvm_aio_iodone() into a single routine that is called from both. This gives me an ideal place to put a hook to check pages coming in from storage. This function has gone through a few rounds of refinement - teasing out a bit of code that is mostly the same but slightly different between the two uses is a bit hairy, getting a sane argument list (or even understanding the purpose of some of the arguments) has required a bit of work.
The thing that started me thinking about my blog again was this article
http://www.cbc.ca/news/background/tech/privacy/white-list.html
Way to go Symantec - NetBSD has been able to do this for years, great to see that the Windows world may actually be innovating their way to where we have been for a long time.
Other NetBSD stuff done over the last year includes integrating wide curses support. This was originally done by a student as a Google Summer of Code project and was supposed to be integrated soon after the SoC finished but it never happened. I finally had a combination of spare time and will to make it happen, the code integration was not too hard and, all in all, there was not much pain and lossage resultant. We still have what looks like a refresh bug but it seems to be relatively rare but I really need to look at it before the release with wcurses gets kicked out.
Also been messing with ACPI on my laptop recently, nasty thing refuses to go to sleep or power off. After trawling through the debug it seems that it is spinning on trying to power down the PCI bus. I am unsure why the bus never sleeps, it works fine in Windows (no surprise). I followed the instructions and extracted the AML from memory and hacked it to take out the PCI bus shutdowns. With the modified code the machine sleeps and powers off, the only problem is that the back light does not turn on when recovering from a sleep (S3). Even the trick of setting a bios password does not help. Another developer has a tool that is supposed to reset the video card, I will try and get a copy of that and see if it helps - if it does then running the tool on wakeup is not hard to arrange.
Monday, September 04, 2006
Are we there yet?
Things have been slowly moving forward on the development front. I submitted a generic hooking infrastructure to NetBSD core for a decision as to whether or not it should be committed to the tree. A decision is still pending on that. What I have done is a direct port of the FreeBSD eventhandler stuff but there is a general dislike about the large macro needed to work some of the magic. We shall see what happens.
On the veriexec front, I updated my kernel sources and re-merged my per-page veriexec changes with the changes made by Elad when he added the fileassoc facility. I extended the fileassoc facility to allow a "hint" to be used instead of the implicit VOP_GETATTR() that the fileassoc code was using since in the per page code there were places where trying to get the file attributes was the wrong thing to do and I also could use the fileassoc facility as a generic hash table so I could do the vnode pointer <=> veriexec entry mapping. This change was not liked by some because it was seen I was trying to stealth the fileid back into the interface. It was more a lack of choice. At the time, using filehandles (VOP_VPTOFH()) was not an option because the NFS client code was lacking support for the VOP_VPTOFH() call. This would have meant a major regression as I would not be able to support NFS - very bad. After a bit of a thrash around with another developer (yamt) the problem with NFS was raised and yamt kindly offered to make good this hole in the filehandle support. This support was added over the weekend. While I was waiting I converted fileassoc to use filehandles and re-factored the per-page modifications to also use filehandles. The per-page veriexec patch is getting smaller and smaller which is good. I will give the fileassoc code a final once-over and then commit it. At this point I should be ready to re-spin the per-page veriexec code and finally get it into the source tree - that day will be a great day for me.
On the veriexec front, I updated my kernel sources and re-merged my per-page veriexec changes with the changes made by Elad when he added the fileassoc facility. I extended the fileassoc facility to allow a "hint" to be used instead of the implicit VOP_GETATTR() that the fileassoc code was using since in the per page code there were places where trying to get the file attributes was the wrong thing to do and I also could use the fileassoc facility as a generic hash table so I could do the vnode pointer <=> veriexec entry mapping. This change was not liked by some because it was seen I was trying to stealth the fileid back into the interface. It was more a lack of choice. At the time, using filehandles (VOP_VPTOFH()) was not an option because the NFS client code was lacking support for the VOP_VPTOFH() call. This would have meant a major regression as I would not be able to support NFS - very bad. After a bit of a thrash around with another developer (yamt) the problem with NFS was raised and yamt kindly offered to make good this hole in the filehandle support. This support was added over the weekend. While I was waiting I converted fileassoc to use filehandles and re-factored the per-page modifications to also use filehandles. The per-page veriexec patch is getting smaller and smaller which is good. I will give the fileassoc code a final once-over and then commit it. At this point I should be ready to re-spin the per-page veriexec code and finally get it into the source tree - that day will be a great day for me.
Wednesday, May 24, 2006
Long time between drinks
Been a while since I did this, a lot of things have happened.
Managed to fix up per page verified exec. I was horribly abusing the UVM subsystem by setting flags I shouldn't be touching. Hence the horrible and weird crashings, I was claiming back pages that no longer belonged to me. Ugly. I found this with some help from Chuq. Fixing it was a bit more involved, I needed to unlock the pages to avoid a deadlock when I did a VOP_GETATTR but needed the pages back so I could check their fingerprints, which mean I needed to do a getpages to get the pages back but the function I was in was being called by getpages... round and round in circles. The only reason I needed VOP_GETATTR was to get the device and inode numbers so I could look up the entry in the veriexec hash tables. Chuq suggested I could get around the circular references by setting up a hash table with the vnode pointer as the index that pointed to the details I needed. This neatly solved the issue but presented it's own problems, firstly I needed to associate the vnode to the device and file id's. I extended the getnewvnode() to add in two new arguments so the device and file id could be passed in, this way an association between the new vnode and the veriexec entry can be made in the vnode hash table. Doing this required touching a lot of file system code to get the right numbers passed in. The worst was NFS where the device and file id are not available when the new vnode is allocated, in this case I defer the vnode->veriexec association until the VOP_GETATTR is done, this should be safe since any file that is executed or read needs to get the permissions first so the association will be set up before it is needed. The other end of this is, of course, when the vnode is no longer required and gets recycled for a new use the old association needs to be broken. Chuq suggested a hook that gets run when the vnode gets cleaned (more on hooks later), at the moment I have hard coded a clean up into the vnode recycle code which works but is not the best. After doing all this and having a few false starts I finally have per page veriexec where it should be, fs independent and functional. I have sent the diffs off to Elad so that he can look at them and also so I am not the only one with them (mmm paranoia).
The follow on project was to finally do the generic hook stuff so that I didn't generate yet another set of hook functions in the kernel just for the vnode association clean up. I posted a proposal to tech-kern and received back a suggestion that we use the FreeBSD eventhandler code. I was happy with that idea, I have imported this into a private tree and modified things to suit NetBSD better. I have this all working now and have migrated some of the hooks over to using the code. I will update my private tree to the latest NetBSD and make sure nothing is broken and then put up some diffs on tech-kern and see what happens. Once the generic hooks are in I will go back and modify the per page veriexec stuff to use them to break the vnode->veriexec association.
Managed to fix up per page verified exec. I was horribly abusing the UVM subsystem by setting flags I shouldn't be touching. Hence the horrible and weird crashings, I was claiming back pages that no longer belonged to me. Ugly. I found this with some help from Chuq. Fixing it was a bit more involved, I needed to unlock the pages to avoid a deadlock when I did a VOP_GETATTR but needed the pages back so I could check their fingerprints, which mean I needed to do a getpages to get the pages back but the function I was in was being called by getpages... round and round in circles. The only reason I needed VOP_GETATTR was to get the device and inode numbers so I could look up the entry in the veriexec hash tables. Chuq suggested I could get around the circular references by setting up a hash table with the vnode pointer as the index that pointed to the details I needed. This neatly solved the issue but presented it's own problems, firstly I needed to associate the vnode to the device and file id's. I extended the getnewvnode() to add in two new arguments so the device and file id could be passed in, this way an association between the new vnode and the veriexec entry can be made in the vnode hash table. Doing this required touching a lot of file system code to get the right numbers passed in. The worst was NFS where the device and file id are not available when the new vnode is allocated, in this case I defer the vnode->veriexec association until the VOP_GETATTR is done, this should be safe since any file that is executed or read needs to get the permissions first so the association will be set up before it is needed. The other end of this is, of course, when the vnode is no longer required and gets recycled for a new use the old association needs to be broken. Chuq suggested a hook that gets run when the vnode gets cleaned (more on hooks later), at the moment I have hard coded a clean up into the vnode recycle code which works but is not the best. After doing all this and having a few false starts I finally have per page veriexec where it should be, fs independent and functional. I have sent the diffs off to Elad so that he can look at them and also so I am not the only one with them (mmm paranoia).
The follow on project was to finally do the generic hook stuff so that I didn't generate yet another set of hook functions in the kernel just for the vnode association clean up. I posted a proposal to tech-kern and received back a suggestion that we use the FreeBSD eventhandler code. I was happy with that idea, I have imported this into a private tree and modified things to suit NetBSD better. I have this all working now and have migrated some of the hooks over to using the code. I will update my private tree to the latest NetBSD and make sure nothing is broken and then put up some diffs on tech-kern and see what happens. Once the generic hooks are in I will go back and modify the per page veriexec stuff to use them to break the vnode->veriexec association.
Saturday, January 14, 2006
of misdirection and other things
I managed to haul the UPS battery pack down to The Battery Bloke shop today not an easy task but taking the car seemed such a waste. The batteries are standard sealed lead acid jobs so no problems sourcing them. All up $140 including the labour involved in removing the plastic terminal covers and connector and attaching said items to the new batteries.
I have been trying to track down some annoying bugs in the page level verified exec stuff. There seem to be a few bugs that have been hiding each other. Once after whacking on what I thought was the right bit of code I eventually work out the bug was in a totally different spot, fixing that bug made one of the crashes go away. I hate that, you have a segment of code you are not certain about and are sure there are bugs lurking in there so you start whacking away. After a concerted effort you work out that the bit of code is not involved at all in making things crash. Now I am at the point where things work exactly twice, on the third time the kernel either panics or weird stuff happens. The problem is that the panic or weird stuff happen in a sort of unrelated place - the damage is done elsewhere but it only circles around and bites later on which is rather difficult to debug. Again, for a long while I thought the problem was with the way I was releasing pages but even totally disabling the page release code does not stop the kernel panic, the fault lays elsewhere in the code. I shall try just removing code until things start working so I can narrow down what is being done incorrectly. Tedious.
I have been trying to track down some annoying bugs in the page level verified exec stuff. There seem to be a few bugs that have been hiding each other. Once after whacking on what I thought was the right bit of code I eventually work out the bug was in a totally different spot, fixing that bug made one of the crashes go away. I hate that, you have a segment of code you are not certain about and are sure there are bugs lurking in there so you start whacking away. After a concerted effort you work out that the bit of code is not involved at all in making things crash. Now I am at the point where things work exactly twice, on the third time the kernel either panics or weird stuff happens. The problem is that the panic or weird stuff happen in a sort of unrelated place - the damage is done elsewhere but it only circles around and bites later on which is rather difficult to debug. Again, for a long while I thought the problem was with the way I was releasing pages but even totally disabling the page release code does not stop the kernel panic, the fault lays elsewhere in the code. I shall try just removing code until things start working so I can narrow down what is being done incorrectly. Tedious.
Monday, December 26, 2005
bug hunt
I received some help with qemu on the NetBSD mailing lists, if I add -monitor stdio to the qemu command line then I get the monitor command line in the xterm where I started qemu. This allows me to send key events to the running emulation. I wanted this so I could drop NetBSD to ddb in qemu so I could tweak variables in the kernel. With the added command args I can now do this.
I wanted ddb so I could turn on the UVM history printing, I tried running with the history printing on as a default but this produces so much output it has a severe performance impact and I was never sure what was happening because everything scrolled up the screen so fast. So, I thought that I could disable the printing by default and just enable it before I run my tests. Hence the need for ddb. Unfortunately, even enabling the UVM history printing just prior to running my tests still produces so much output that it does not help. So, I will try just enabling the printing in the veriexec page routines to try and keep the amount of information manageable.
I wanted ddb so I could turn on the UVM history printing, I tried running with the history printing on as a default but this produces so much output it has a severe performance impact and I was never sure what was happening because everything scrolled up the screen so fast. So, I thought that I could disable the printing by default and just enable it before I run my tests. Hence the need for ddb. Unfortunately, even enabling the UVM history printing just prior to running my tests still produces so much output that it does not help. So, I will try just enabling the printing in the veriexec page routines to try and keep the amount of information manageable.
Saturday, December 24, 2005
assaulted battery
Bleh. The batter(y|ies) in my UPS ha(s|ve) decided to finally die. Of course, this does not happen at a convenient time so that I could wander on down the road to "The Battery Bloke" and order some new ones, oh no, here we are in the middle of some craziness to do with the shortest day of the year in the Northern Hemisphere (longest day of the year where I am) so everyone has shut down for some rampant consumerism. Fortunately, the power is pretty reliable where I am so the UPS is really more for comfort value in the face of storms and the like. I can plug everything direct into the mains and soldier on until the new year.
Another day, another kenel panic
Been trying to get verified exec on NetBSD to work correctly at the page level. I have worked on this off and on for quite a while. I did have it working fine but, unfortunately, it relied on all the filesystems calling genfs_getpages() which does not happen. I have shifted the code into the uvm getpages call but now when I force a modified page detection and try to repeat the test things seem to fall apart, pages get flagged as being modified when they are not and sometimes the kernel panics in genfs_putpages(), unfortunately the kernel core from this panic does not show much, I suspect that the cause of the panic is well in the past and it is just UVM tripping over something bad I have done to it or something I have omitted to do.
I managed to clean up some bugs by compiling with DIAGNOSTIC set in the kernel config. I wanted to use UVMHIST and UVMHIST_PRINT but the output they produce is way too verbose. I disabled the uvm history printing by tweaking the controlling variable with the idea that I would do the setup for my tests, tweak the uvm log printing on and then run the test. This plan has fallen apart somewhat because I am using the qemu machine emulator to provide me with a crashbox machine without requiring extra hardware which is always an advantage when you try hacking code during your daily commute to work on the train.
Qemu has actually worked very well for me, saved a lot of time by allowing me to keep my development environment running while the kernels panic in their qemu sandbox, it also means I don't put my file systems at risk when the kernel crashes. Making a backup of the qemu hard disk is a simple copy and I can just copy back the backup if the machine gets trashed. I can manipulate the qemu hard disk by using the vnd file-as-a-disk-image driver thing (vnode disk driver) to copy files to and from the qemu disk image so installing a new kernel or retrieving the kernel core dumps is quite easy, just a matter of a few short scripts to make life convenient. The only problem I have is that the keystrokes for generating keyboard events for "special" keys like dropping to ddb or changing virtual consoles are a mystery to me - the documented ones don't work - for example, ctrl-alt is supposed to release the focus from the qemu window after it has been locked there by clicking in the window. This does not seem to work and the quick look at the code for qemu I cannot see how they events can be generated in the sdl input handling code - just trying to use a special key combo like ctrl-alt-f1 changes the virtual console on the host machine not in qemu. I have put the question about this up on current-users@n.o, we shall see if there is an answer otherwise I shall just hack something in to qemu that will do what I want... ahhh the joys of having the source.
I managed to clean up some bugs by compiling with DIAGNOSTIC set in the kernel config. I wanted to use UVMHIST and UVMHIST_PRINT but the output they produce is way too verbose. I disabled the uvm history printing by tweaking the controlling variable with the idea that I would do the setup for my tests, tweak the uvm log printing on and then run the test. This plan has fallen apart somewhat because I am using the qemu machine emulator to provide me with a crashbox machine without requiring extra hardware which is always an advantage when you try hacking code during your daily commute to work on the train.
Qemu has actually worked very well for me, saved a lot of time by allowing me to keep my development environment running while the kernels panic in their qemu sandbox, it also means I don't put my file systems at risk when the kernel crashes. Making a backup of the qemu hard disk is a simple copy and I can just copy back the backup if the machine gets trashed. I can manipulate the qemu hard disk by using the vnd file-as-a-disk-image driver thing (vnode disk driver) to copy files to and from the qemu disk image so installing a new kernel or retrieving the kernel core dumps is quite easy, just a matter of a few short scripts to make life convenient. The only problem I have is that the keystrokes for generating keyboard events for "special" keys like dropping to ddb or changing virtual consoles are a mystery to me - the documented ones don't work - for example, ctrl-alt is supposed to release the focus from the qemu window after it has been locked there by clicking in the window. This does not seem to work and the quick look at the code for qemu I cannot see how they events can be generated in the sdl input handling code - just trying to use a special key combo like ctrl-alt-f1 changes the virtual console on the host machine not in qemu. I have put the question about this up on current-users@n.o, we shall see if there is an answer otherwise I shall just hack something in to qemu that will do what I want... ahhh the joys of having the source.
Monday, December 05, 2005
Introduction
I suppose I will have to make the assumption that someone apart from myself will read this one day...you poor sod, taking an interest in my inane ramblings.
My day job is sysadmin'ing a bunch of Solaris machines for a local arm of a multinational company.
Outside of work, part of my "copious" spare time is spent hacking on NetBSD. I am a NetBSD developer and originally was made a developer to work on curses after submitting a patch that added the SYSV style keypad functionality to BSD curses. I have added more bits and pieces to curses and have done complete reimplementations of the ETI (Extended Terminal Interface) libraries libmenu and libform. Lately, I have turned into a wannabe kernel hacker by implementinging verified exec (veriexec) which is a unique security feature that of all the open source OS's, only NetBSD has - it prevents trojans and other executables from running and ensures what you do run is what you expect you are running... along with a bunch of other features. I am currently trying to clean up some patches that extend veriexec to the executable page level. I had this working fine but, unfortunately, only for filesystems that used the generic get pages function (genfs_getpages()). Making the feature available at a more universal level is causing some interesting times as I grapple with more of UVM's guts.
My day job is sysadmin'ing a bunch of Solaris machines for a local arm of a multinational company.
Outside of work, part of my "copious" spare time is spent hacking on NetBSD. I am a NetBSD developer and originally was made a developer to work on curses after submitting a patch that added the SYSV style keypad functionality to BSD curses. I have added more bits and pieces to curses and have done complete reimplementations of the ETI (Extended Terminal Interface) libraries libmenu and libform. Lately, I have turned into a wannabe kernel hacker by implementinging verified exec (veriexec) which is a unique security feature that of all the open source OS's, only NetBSD has - it prevents trojans and other executables from running and ensures what you do run is what you expect you are running... along with a bunch of other features. I am currently trying to clean up some patches that extend veriexec to the executable page level. I had this working fine but, unfortunately, only for filesystems that used the generic get pages function (genfs_getpages()). Making the feature available at a more universal level is causing some interesting times as I grapple with more of UVM's guts.