Tuesday, April 14, 2020

Thomas E. Dickey on NetBSD curses

I was recently pointed at a web page on Thomas E. Dickeys site talking about NetBSD curses.  It seems initially that the page was intended to be a pointer to some differences between ncurses and NetBSD curses and does appear to start off in this vein but it seems that the author has lost the plot as the document evolved and the tail end of it seems to be devolving into some sort of slanging match.  I don't want to go through Mr. Dickey's document point by point, that would be tedious but I would like to pick out some of the things that I believe to be the most egregious.  Please note that even though I am a NetBSD developer, the opinions below are my own and not the NetBSD projects.

The first thing that I would like to mention is that the author seems to be incapable of spelling my name correctly.  It isn't hard, 4 whole letters but the only time that it is spelt correctly is when the text is copied from a NetBSD commit.  If you ever do read this Mr. Dickey, I am not offended, plenty of people struggle with my last name, I am quite used to it.

Before going further, just a bit of background as to how I started working on the NetBSD curses library.  I originally was using a commercial x86 port of System V UNIX on my home PC but started looking elsewhere when I found that the SL/IP implementation was bugged.  At the time Linux had no networking at all but the BSD source code had just been ported to the PC in a project known as 386BSD.  Since this had networking I picked this up and then switched to NetBSD early on.  Being used to the SYSV curses I was disappointed to find that there was no real way of handling cursor and function keys in a terminal independent manner as there is in the SYSV curses.  I developed and submitted a patch to the NetBSD curses library that added the support for the keypad() function and modified getch() to perform the conversion of key sequences into key symbols.  It was on the basis of this patch that I was offered the opportunity to be a NetBSD developer.

When working on NetBSD curses my overall philosophy is to follow The Single UNIX® Specification, Version 2: X/Open Curses, Issue 4 Version 2 
(I will refer to this as SUSv2 from now on) where possible, if a behaviour is undefined then I look to ncurses and match the behaviour there if appropriate on the basis that this normally results in less issues for someone porting an application.  I add ncurses extensions in response to problem reports (PRs).

Let's now have a look at Mr. Dickey's comments (I am tempted to use the word diatribe but that seems a bit strong...).  I won't go over all of them but will pick up the topics that he has mentioned going down his page.

I would like to pick up on the comments about making the window structure opaque.  I don't apologise for this, I believe that the fact that WINDOW * was available for applications to poke at is a violation of software layering and just wrong.  Later in Mr. Dickey's page he takes NetBSD to task over error checking (more on this later) but by making the contents of WINDOW available for the application there is little use in error checking if the application can just write what they want anyway.  It took ncurses until 2007 to opaque WINDOW.  Mr. Dickey makes some ominous sounding statements about adding functions to compensate for the opaquing.  As far as I am concerned the interface to update a window is specified by SUSv2, if there are extensions that need to be added they will be added as they are needed.
Given I made WINDOW opaque nearly 20 years ago, I have yet to see a PR to indicate that more is needed.  The positives of making WINDOW opaque are not only enforcing correct layering of software but it also means that the contents of WINDOW can be modified without requiring a major bump of the library since nothing outside the curses library has access so backward compatibility is maintained.

Making the character string in scanw const just makes sense the function does not need to modify the string so it should be a constant.  The NetBSD project does not need to worry about the compiler objecting because the base compiler works with this statement.

Wide curses was a Google Summer of Code project that was done by Ruibiao Qiu.  I think that this was a great success and added some very useful functionality to NetBSD curses.  It seems that Mr. Dickey is a bit confused by this, in actual fact, he is totally wrong on a number of points:
  1. The non-spacing characters are not stored as a linked list, they are an array
  2. The interface for setting non-spacing characters is anything that uses the complex character type (cchar_t)
He seems a bit mystified by the term non-spacing character, this term comes from SUSv2 described here
The non-spacing characters are dynamically added when required, this was my design as I thought that simply having a fixed array for every character cell in the display was a waste of memory, it would be more efficient to only allocate memory as required for the non-spacing characters.

The function specification for ungetch is ambiguous in SUSv2, NetBSD will accept the output from getch on the basis that the input stream has already been processed and it makes no sense to save the last character in what could be a multi-byte sequence.

As for testing of curses, yes, this is mine.  I am happy that at least some of our curses functions are being tested to ensure that any changes made don't affect the on-screen output and that if there are changes that they are highlighted and can be analysed for correctness.  It is too easy with terminal output to have something that "looks right" but is doing the wrong thing in the background.

Now we get to something I wrote on a mailing list:
Re: curses: create panel from stdscr?

I stand by my comments there.  I was referring to my attempt to report what I thought was an issue with the ncurses libmenu implementation.  I no longer have the email nor the reply from Mr. Dickey but my clear recollection of the tone and attitude was that I would not be bothered raising issues about ncurses with Mr. Dickey ever again.  I wrote the libmenu and libform libraries for NetBSD using the book "Programmer's Guide: Character User Interface (FMLI and ETI)", this is UNIX System Laboratories book so I regard it as definitive for the libraries.  I noticed a variance in the ncurses libmenu implementation but as I have stated my efforts of reporting that were not fruitful.  Mr. Dickey did reach out to me in late 2017 asking for the email I was referencing, his comment was that my response was vague, actually, my exact words were:
I will try and dig it up.  It was a very long time ago.  I think it was
related to some behaviours in libmenu.  I will try and dig back through
my mail history but it may take a bit of time.
Alas I never found the original email but looking at my commits around libmenu I would have sent that email in 1999 some time so I don't think that this is very surprising.  At the time of the reply I thought that perhaps there was some interest in actually addressing the issue but the actual fact is that all it was was a fishing expedition to provide more ammunition for an attack on NetBSD curses which is disappointing.

Getting back to the email quoted above, in the book I have create_panel() is documented as taking a WINDOW as parameter.  Though, internally, stdscr is treated as a window the books description of create_panel() does make it clear that WINDOW is expected to be the return from newwin() not just stdscr.

As for missing functions, they will be added as and when the developers have time.  A PR may guide us in what is needed first but given that NetBSD is a volunteer project the time that the developers can devote to adding new features can be variable.  We are aiming to have something that is useful and not necessarily a clone of ncurses.

Mr. Dickey makes a point about portability.  I myself have never been concerned about making NetBSD curses portable, this is not an aim that I have ever had.  I would much rather be adding new features and fixing bugs for NetBSD than struggling with the complexities of making the NetBSD curses library portable.  If others wish to do this and feed back fixes then they will be considered but in the main, portability is not a focus.

He also turns on some random gcc flags and then berates NetBSD curses for producing extra warnings.  It is difficult to determine what flags he has turned on to produce these warnings.  The NetBSD build system treats any warnings as errors and the build terminates, the NetBSD curses library builds with the standard NetBSD gcc options without issue.  Without knowing what options were uses and what defines were in effect the information that Mr. Dickey has presented is useless.  He is welcome to submit a PR using the public interface and it will be looked into.

He then notes the output differences in the wide character implementations.  Unfortunately he has not reached out to me (at least) with this information.  It would be useful to have the tools to test this issue and rectify it.  I will note that he used a rather old version of NetBSD, it would be interesting to see if these issues still exists.

We then have a snide little bit about the number of error returns and error checking.  He says "Error-checking in NetBSD curses is seen as a problem by its developers" which is absolutely untrue and uncalled for.  He seems to be forgetting at this by that, by his own admission, that NetBSD curses does not have all the functions that ncurses has so there should be less returns of status simply from that.  Even if this were not true, if we look at the ratio of ERR/OK returns per number of statements - using the numbers from his page - NetBSD curses has 12987 statements which translates to a ERR/OK return every 25 statements where as ncurses has 24566 statements which translates to a ERR/OK return ever 38 statements.  So NetBSD curses actually assigns status codes more often than ncurses.  Regardless, I view this metric as specious - simply counting returns is rubbish, look at this:

if ((x > maxx) || (y > maxy))
         return ERR;


if (x > maxx)
        return ERR;

if (y > maxy)
        return ERR;

Both do the same thing but we get double the ERR returns in the second one.  I am not saying this is what is happening but just pointing out the fallacy of counting ERR/OK assignments and trying to draw a conclusion from it without properly analysing the code.

He lambasts NetBSD for commenting out unsupported functions in pycurses, this is just a reflection of the NetBSD curses not having all the functions at the moment.  This is a pkgsrc thing and locally applied, pycurses is unique in that it wants all the curses functions so it can provide bindings, since NetBSD curses doesn't have them then the unsupported ones are commented out.  It would be ideal if they all existed, hopefully one day they will.

The final thing I would like to pick up on is the comment:

"Without documentation, there is no point in making bug reports."

Which is just a lame excuse, even the lack of documentation is something that can be a bug report.  There is a public interface for reporting bugs.  Not using it and then complaining on your own website is not helpful to NetBSD but then again, perhaps that is the intention after all.

Sunday, March 02, 2014

Fixing what fsck cannot fix

Before I start, lets get this out of the way....

WARNING: The tools and techniques described below, if misapplied, will turn your file system consistency to that of warm custard.  If you are not careful you can irreparably damage your data and lose files.  Be warned.

Next, do a backup of your data - right now.  So you don't have to try and salvage a busted file system.  It is a lot less stressful.

Now that we have those public service announcements out of the way...  It came to pass that I wanted to use my NetBSD on a stick but it was having problems booting.  I booted from my HDD and did a fsck of the memory stick filesystem which found a few errors probably the result of an ungraceful shutdown.  One of the files it complained about was the kernel (/netbsd) which would explain the problems booting but it also said that the /mnt directory was corrupted.  Fsck complained about a missing . and .. entry then claimed everything was fixed.  I have a habit of doing a second fsck if the first one found errors and fixed them, just to be certain everything was caught and fixed.  Unfortunately for me fsck still did not like /mnt and went through the same motions as the first time.  Not a good look.

I thought the easiest way may be just to remove the bad directory so I mounted the memory stick and tried to rm the mnt directory on the stick.  Nope, rm said the directory was not empty.  I tried to cd into the directory *kapow* kernel panic due to an inconsistent file system (no surprise there).  So, what to do?  fsck won't fix the error, I could rebuild the stick but I didn't want to spend the time doing that.

Enter the fsdb(8) command, this command allows you to perform some low level manipulation of a file system.  It will allow you to do things to a file system that you are normally prevented from doing - in other words you have the power to make a real mess if you do the wrong thing.  What I wanted to do was just force a removal of the broken mnt directory which is quite easy to do.  For me my memory stick was identified as sd0 and the root file system therein was on the "a" partition so I used fsdb to open the file system:

fsdb -f /dev/rsd0a

which printed a bunch of information about the filesystem and put me at a fsdb prompt.  To remove the bad directory I wanted to do two things, firstly clear the inode associated with the mnt directory and then remove the mnt entry from the parent directory.  To clear the inode we need to know the inode number, there are quite a few ways to get it, "ls" inside fsdb will show the directory entries which contains the inode number, ordinary ls from the command line could be used too if the file system is mountable.  For me, I had the report from fsck that inode 10336 was corrupt and that is was the mnt directory.  I validated this information using the ls in fsdb.  So, knowing the inode, clearing it is a matter of using the "clri" command:

clri 10336

fsdb confirmed the inode is cleared.  Now remove the directory entry from the parent directory (in my case this was /):

cd /
rm mnt

this will invalidate the directory slot associated with the name given.  Once this is done I just quit fsdb:


Fsdb prints a warning message that the file system has been marked dirty and that a fsck is required to clean up any damage.  Following that advice I ran:

fsck -y /dev/rsd0a

and let fsck clean up after the surgery.  The file system cleaned up with no major problems.  I mounted up the memory stick and copied a new kernel onto it since the old one appeared to be mangled.

After that NetBSD on a stick worked fine... much easier than recreating the whole thing from scratch.
Again, you really should not be running fsdb on a filesystem you care about, you should have good backups and not need to resort to this level of skullduggery to recover but fsdb is handy to know about when things go really bad.

Thursday, February 20, 2014

the fix for mvderwin is in

Hello! If you have landed here because of the link from Mr. Dickey's page you may want to consider reading my blog post here

I have been quiet for a while trying to fix the deceptively named mvderwin() function in libcurses.  Going by the name you would imagine that if derwin() creates a sub-window with coordinates relative to the origin of the parent window then mvderwin() should move a sub-window with coordinates relative to the origin of the parent window.  Except it doesn't.  What mvderwin() really does is create a mapping of the specified portion of the parent window into the sub-window, the sub-window does not change location at all.

Fixing the mvderwin code mainly involved tweaking around the screen refresh code so that when it found a sub-window that had been changed using mvderwin() the refresh code copied the characters from the correct place in the parent window into the sub-window region.  Working on the refresh code in libcurses is probably the most difficult thing to do as it is quite complex and easy to break in strange ways.  Fortunately, I have the automated tests for curses to help pick up any problems which is exactly why I wrote the curses automated testing in the first place.  Being able to run a test suite enables me to check I have not broken previously working code and also check the behaviour of my fixes to ensure they are outputing exactly what I think they should be.  The latter is probably the hardest thing to do in curses testing.  It is quite easy to make some changes that displays correctly but is doing things badly in an invisible way.  Something like outputting blanks when it does not need to, just by looking it would be difficult to tell whereas the automated testing suite flags the output as unexpected right away.

Friday, January 03, 2014

Google Summer of Code Mentor Summit

The Google Summer of Code mentor summit happened on the 19th and 20th of October last year.  It has taken me this long to find enough time to sit down and write something about it.

I have been a mentor for the summer of code since its inception.  I have mentored a few students to a successful conclusion.  I don't really take any credit for this, the students did all the hard work and made things happen.  All I did was provide some guidance now and then.  It has been a great experience not only helping someone new work on NetBSD but also get some great outcomes for the NetBSD project.

Every year Google bring a couple of mentors from each of the participating organisations to their Mountain View headquarters for a summit where mentors can share their experiences and learn from each other.  It also is a chance for Google to say "thank you" to the mentors for their efforts in helping make the Google Summer of Code work.  I had been wanting to go to one of these summits for a long time but just did not have the opportunity to leave home for all the previous ones.  Finally, I was able to put my hand up to go.

I arrived in California a few days early so that I could get over the jet lag and catch up with other NetBSD people before the mentor summit.  As the time got closer to the mentor summit the hotel started filling up with mentors, you could see a lot of t-shirts from previous Google Summer of Codes which helped identify the mentors.  One of the great things I found early on was the sense of community.  Other mentors would recognise the t-shirts and start chatting to you.  I remember one time I was wandering around the hotel and another mentor said "oh, hey, you are a mentor too! We are going off to the computer history museum.  You want to come along?" - they guy didn't know me at all but was happy to invite me along on a trip.  I also ran into another mentor waiting at a bus stop and, like me, was heading into San Francisco for a bit of sight seeing.  We chatted on the bus and train about our projects and what we had done for the Summer of Code.  It was good to be part of such a friendly and inclusive bunch.  All through the summit the most common questions seemed to be What project are you from?  and Where are you from?

The summit itself started with a get together around the hotel pool on Friday night.  From that point on I felt totally looked after.  The people from Google had organised everything, be it food and drink friday night, to buses to and from the Google headquarters, all the food, drink and coffee one could wish for there and also the party saturday night.  It was great to be so spoilt.  A big thanks to all the Googlers who put all that together.

At first the summit itself was a bit hard to get my head around.  I am used to conferences where the schedule is fixed in advanced and you pick what you want to listen to and then find the room.  The summit is very different to this, it is an "unconference".  The first thing I realised is that it is important to actually vote on the talk proposals that were put up well in advance of the summit.  Anyone can propose a talk and the proposals that get the most votes get a room to hold that talk, the number of votes garnered determines the size room so it is important to vote (oops, I didn't).  The next is the time and location of the talk can be very fluid - if the time or location doesn't suit it can be changed on the day so checking the board with the talk locations on it regularly is a must.  Once I had the concepts sorted it was not so bad, I attended some very interesting talks.

The summit was over all too fast and now I am looking forward to attending another one.  It is truly a unique experience and one I can highly recommend.

As a footnote and on a totally different subject, travelling with a device that has GPS is such a boon.  I feel far more confident in wandering around know that I can easily work out where I am and how to get back to home base.  I have an android phone and was using OSMAND+ which allows me to navigate without needing an internet connection unlike a lot of other mapping apps.

Thursday, September 12, 2013

Sniffing the USB for answers

Previously I mentioned that part of what helped tracking down the problems with the arduino uno attach was sniffing the USB to see what the bus transactions were doing.  I thought I would write a quick piece as to how I did this.

The first bit was buying a logic analyser.  I looked at getting a dedicated USB analyser but they tend to be a bit on the pricey side and, besides, they are very specialised - I don't expect I will be doing too much USB sniffing so buying a special device just to sort out a particular bug was hard to justify.  Since USB uses standard logic level signals I could get away with capturing the logic levels and using some software to decode what that translated to in terms of USB data.  The added bonus is that I would have a tool for decoding other protocols such as SPI, one-wire, I2C, serial and so forth.  There are lots of options for buying a logic analyser and the price range varies widely depending on the capabilities of the analyser.  For me, I needed something that was fast enough to sample a full speed USB (that runs at 12MHz) and at least two channels, preferably more.

There are a lot of logic analysers based on the Cypress FX2 chips, these guys are just dumb samplers that feed the data back to a host PC over USB - all the triggering and data storage is done on the host.  There were cheaper options but I settled on a salea logic because it was housed in a decent case, had a nice set of probes, a carry case along with software that seemed decent - you can download and run the software without the analyser hardware to try it out which is good.  The salea gave me 8 inputs and just enough bandwidth to sample a full speed USB.

The next step was the software.  The salea came with software which is functional and has some decoding facilities but, unfortunately, those facilities don't extend to decoding USB transactions.  To decode the USB transactions I used the sigrok open source software to perform the decodes.  Sigrok has firmware for the salae logic device, the firmware is loaded when the device is opened (i.e. it is not stored in flash on the device, it is downloaded every time into RAM) so there is no harm in using their firmware.  I ended up not using this feature, because my NetBSD machine was the device under test I used another laptop running Windows and the salea software to perform the data captures.  The captures were exported as VCD (value change dump) files which I copied to my NetBSD machine and ran sigrok-cli  on the data.

To perform the captures I needed to break out the USB signals so I could attach the logic analyser.  To do this I bought a short USB extension cable, cut it in half and soldered the wires to a small piece of veroboard.  I soldered a bit of single line header to the veroboard so each wire in the cable was brought out to a header pin that I could connect the logic analyser probes to.  I wasn't going to bring out the +5V rail but when I was making up the board I forgot and did all four wires.. oh well.  The probe wires from the salae plug neatly onto the header which makes the arrangement safe as well as physically well connected.  Here is a snap of what the breakout board looks like with the probes attached:

Fortunately for me, the cable manufacturer followed the USB hardware specification and used the specified colours for the wires so it was easy to pick the correct wires for probing.  The spec says white is the D- signal, green is the D+ signal and black is ground.  So I connected probe 0 (black) to the white wire and probe 1 (brown) to the green wire.  As a side note - astute readers may have noticed that the probe wire colours correspond to the resistor colour code which makes it easy to know what probe number you are dealing with - ground is sleeved to and labelled to distinguish it from the probes.  Once the probes were connected, I plugged the extension cable into my NetBSD machine, hooked up the salea to the windows machine and started the sampling software.  I configured a trigger to start sampling on the falling edge of D+ as this transition will happen when a device is plugged in and to sample at 24MHz.  I then connected the arduino to the other end of the extension cable.  The analyser triggered and started saving samples.  I let the attach process for the arduino complete, then stopped the capture and exported the sample data as a VCD file.

Analysing the data was a challenge.  The sigrok software is capable of doing it but, to be blunt, their documentation is rather poor.  I ended up on their IRC channel to ask for help.  The guys there were friendly and guided me through what I needed to do.  Once you understand what they are doing it makes sense, the sigrok software uses layers of decoders to analyse the data stream.  For usb they start with a layer called usb_signalling that takes the raw data samples and converts them into USB data.  The next layer is called usb_protocol which takes the output from usb_signalling and interprets the data bytes as USB protocol information.  All the decoders are written in python, it is quite easy to modify existing decoders or write your own.  The command line I used for the decoding was:

sigrok-cli --rate 1000m -i netbsd.vcd -I vcd:compress -P usb_signalling:dm=0:dp=1,usb_protocol

Note that the --rate option is one of my own making, there is a bug currently in sigrok-cli in that it fails to determine the sample rate from a VCD file and will crash. I added --rate to allow me to set the sample rate, this is a hack only, the sigrok developers are looking at a more elegant solution. Also, the sample rate is 1GHz because that is the sample rate set in the VCD. The next argument is the input file (netbsd.vcd) and then the input format (vcd) with the compress flag set to speed up the file reading. Then the -P option specifies the protocol decoders - first usb_signalling which has two arguments that define which probe numbers were connected to what signals - dm is D- and, as above, this was probe 0, similarly dp is D+ and was probe 1. The next decoder layer is usb_protocol which performs the protocol decode. On the machines I have the decode was very slow, it took almost half an hour of churning before sigrok-cli started spitting out any decoded data and took many hours to decode about 30 seconds of sampled data. The end result was output that looks like this:

usb_protocol: "SOF 1781" 
usb_protocol: "SETUP DEV 0 EP 0" 
usb_protocol: "DATA0 00 05 02 00 00 00 00 00" 
usb_protocol: "ACK " 
usb_protocol: "IN DEV 0 EP 0" 
usb_protocol: "NAK " 
usb_protocol: "SOF 1782" 
usb_protocol: "IN DEV 0 EP 0" 
usb_protocol: "DATA1 ()" 
usb_protocol: "ACK " 
usb_protocol: "SOF 1783" 
usb_protocol: "SOF 1784" 
usb_protocol: "SOF 1785" 
usb_protocol: "SOF 1786" 

This actually shows the SetAddress USB request that NetBSD was sending to the device, not only that but shows the problem that we were having - the "IN" command there is the status request for the SetAddress.  The first try results in a "NAK" - negative acknowledgement, the device was too busy to answer, the usb controller retries the IN again and gets the status back - unfortunately by this time it seems that the atmel USB microcode is confused and things don't work right from then on.  What we observed with windows and linux is that the "IN" for the status was delayed until the next start-of-frame (SOF) which gave the device time to process the request before getting hit by a demand for status.  Modifying the NetBSD driver to do the same made things work.

The decoding was not always perfect, I think because the data rate of the USB was right at the limit of the sample rate for the logic analyser (remember Mr. Nyquist says you need a sample rate of at least twice your highest frequency to properly sample data) so sometimes I would see decode errors but, fortunately, the decode mostly worked right and provided us with valuable clues.

Friday, September 06, 2013

Arduino uno/Freetronics etherten attaches now

It took a lot of thrashing about and false trails but we now have the aduino uno (plus others) attaching and also the atmel processors will attach in DFU mode (used for flashing the boot loaders) whereas before they would fail with an address error.  I had to resort to sniffing the USB transactions using a salae logic and decoding with sigrok-cli.  What this showed was the uhci driver was was feeding the control request bus transactions in too fast.  By changing how the transactions were fed and also disallowing what is known as "short" transactions for the control requests.  After fixing these things the attach worked fine.  Many thanks to fellow NetBSD developer Nick who waded in, helped with suggestions and formulated the final working fix.  Hopefully a proper fix will hit NetBSD-current soon.

Sunday, August 04, 2013

Another Red Hat Cert

Just last week I sat and passed the dreaded RHS333 Enterprise Network Security Services exams.  The instructor I had said that he only knew of one person who passed these exams first sitting, that was himself.  Now he knows two...  Not that this matters now, Red Hat are deprecating this certification in favour of a new one - that doesn't invalidate what I have done but going forward there will be a new certification to take the place of RHS333.  Regardless, I thought I would put down my thoughts on how to prepare for these exams as it does apply to pretty much all the Red Hat certifications just in case it helps someone else.

Firstly, I set up a lab environment which was just an extension of what I had described in Linux lab in the lap posting.  I just added a couple of new networks but made them routed instead of NAT type networks because it seemed like cross network traffic was being NATed with the IP address of the bridge interface which messed up the IP address based access control exercises.  On one network (the "good" network) I kickstarted a couple of machines to perform the exercises on, on the other network (the "bad" network) I kickstarted another machine to use as the "bad guy" in the lab exercises.  Then I would just run through the lab exercises until I could do them without even thinking, rebuilding all the machines before going over the labs again so I would have clean machines each time.

Here are a few more tips:

  • If yu have the book from the associated course then do all the exercises in the labs and do them well, make sure you understand what you are doing.  Don't skip any because anything that is covered in that book can be in the exam.
  • If a lab does not work then find out why, this is very important.  If you screw up you need to know why so either you can immediately recognise the error in the exam and fix it or just avoid making the error in the first place.  I have seen someone screw up a lab exercise and went on to do the same error in the exam - they failed because of this.
  • You don't have to memorise everything.  One of my instructors said, "this is an open man exam" meaning that you have all the man pages and documentation that is part of the standard RHEL distribution during the exam.  You can install whatever you like on the exam machines.  So, knowing where to look for information instead of rote memorisation is a valid tactic.  A lot of the time the samples provided in the man pages, sample configuration files or under /usr/share/doc are enough to get you going during the exam.  Just cut and paste from the documentation into your config file and adjust to suit the exam requirements.  So make it part of the exam preparation to hunt down where the trickier configuration documentation is kept so you can immediately bring it up during the exam.  Going hunting during the exam wastes precious time.
  • If you are having problems with something during the exam that other things do not rely on then just move on.  You can always revisit the question later, perhaps you will have worked out what to do or have time to spare to fiddle around.  By moving on you may able to pick up extra marks on things you can get working instead of beating your head against a wall fruitlessly
  • Do the chkconfig on immediately after installing or starting any new service that has been asked for.  If you are asked to provide a service then it must come up on reboot - ensure it does.  Same with selinux settings and other system settings - make sure they are permanent from the get-go, saves having to rework later.  Same with firewall settings, if iptables is running then add the appropriate rules for the service as you go - I usually just edit /etc/sysconfig/iptables directly and reload the service, saves typing because you can copy another existing line (though, command line history would be just the same...)
  • Make sure you are careful with file modes and ownership - I must admit to being sloppy with this during the exam and it does cost me
  • Leave enough time for a couple of reboots and carefully check everything that has been asked for still works after the reboot.  I have found that after drilling on the lab exercises I have plenty of time for this, in all the exams I have faced after starting really working on the lab exercises in my own time I have found that I can easily complete the entire exam with plenty of time to spare.
Hopefully this stuff will make a difference for someone else...