Thursday, September 12, 2013

Sniffing the USB for answers

Previously I mentioned that part of what helped tracking down the problems with the arduino uno attach was sniffing the USB to see what the bus transactions were doing.  I thought I would write a quick piece as to how I did this.

The first bit was buying a logic analyser.  I looked at getting a dedicated USB analyser but they tend to be a bit on the pricey side and, besides, they are very specialised - I don't expect I will be doing too much USB sniffing so buying a special device just to sort out a particular bug was hard to justify.  Since USB uses standard logic level signals I could get away with capturing the logic levels and using some software to decode what that translated to in terms of USB data.  The added bonus is that I would have a tool for decoding other protocols such as SPI, one-wire, I2C, serial and so forth.  There are lots of options for buying a logic analyser and the price range varies widely depending on the capabilities of the analyser.  For me, I needed something that was fast enough to sample a full speed USB (that runs at 12MHz) and at least two channels, preferably more.

There are a lot of logic analysers based on the Cypress FX2 chips, these guys are just dumb samplers that feed the data back to a host PC over USB - all the triggering and data storage is done on the host.  There were cheaper options but I settled on a salea logic because it was housed in a decent case, had a nice set of probes, a carry case along with software that seemed decent - you can download and run the software without the analyser hardware to try it out which is good.  The salea gave me 8 inputs and just enough bandwidth to sample a full speed USB.

The next step was the software.  The salea came with software which is functional and has some decoding facilities but, unfortunately, those facilities don't extend to decoding USB transactions.  To decode the USB transactions I used the sigrok open source software to perform the decodes.  Sigrok has firmware for the salae logic device, the firmware is loaded when the device is opened (i.e. it is not stored in flash on the device, it is downloaded every time into RAM) so there is no harm in using their firmware.  I ended up not using this feature, because my NetBSD machine was the device under test I used another laptop running Windows and the salea software to perform the data captures.  The captures were exported as VCD (value change dump) files which I copied to my NetBSD machine and ran sigrok-cli  on the data.

To perform the captures I needed to break out the USB signals so I could attach the logic analyser.  To do this I bought a short USB extension cable, cut it in half and soldered the wires to a small piece of veroboard.  I soldered a bit of single line header to the veroboard so each wire in the cable was brought out to a header pin that I could connect the logic analyser probes to.  I wasn't going to bring out the +5V rail but when I was making up the board I forgot and did all four wires.. oh well.  The probe wires from the salae plug neatly onto the header which makes the arrangement safe as well as physically well connected.  Here is a snap of what the breakout board looks like with the probes attached:

Fortunately for me, the cable manufacturer followed the USB hardware specification and used the specified colours for the wires so it was easy to pick the correct wires for probing.  The spec says white is the D- signal, green is the D+ signal and black is ground.  So I connected probe 0 (black) to the white wire and probe 1 (brown) to the green wire.  As a side note - astute readers may have noticed that the probe wire colours correspond to the resistor colour code which makes it easy to know what probe number you are dealing with - ground is sleeved to and labelled to distinguish it from the probes.  Once the probes were connected, I plugged the extension cable into my NetBSD machine, hooked up the salea to the windows machine and started the sampling software.  I configured a trigger to start sampling on the falling edge of D+ as this transition will happen when a device is plugged in and to sample at 24MHz.  I then connected the arduino to the other end of the extension cable.  The analyser triggered and started saving samples.  I let the attach process for the arduino complete, then stopped the capture and exported the sample data as a VCD file.

Analysing the data was a challenge.  The sigrok software is capable of doing it but, to be blunt, their documentation is rather poor.  I ended up on their IRC channel to ask for help.  The guys there were friendly and guided me through what I needed to do.  Once you understand what they are doing it makes sense, the sigrok software uses layers of decoders to analyse the data stream.  For usb they start with a layer called usb_signalling that takes the raw data samples and converts them into USB data.  The next layer is called usb_protocol which takes the output from usb_signalling and interprets the data bytes as USB protocol information.  All the decoders are written in python, it is quite easy to modify existing decoders or write your own.  The command line I used for the decoding was:

sigrok-cli --rate 1000m -i netbsd.vcd -I vcd:compress -P usb_signalling:dm=0:dp=1,usb_protocol

Note that the --rate option is one of my own making, there is a bug currently in sigrok-cli in that it fails to determine the sample rate from a VCD file and will crash. I added --rate to allow me to set the sample rate, this is a hack only, the sigrok developers are looking at a more elegant solution. Also, the sample rate is 1GHz because that is the sample rate set in the VCD. The next argument is the input file (netbsd.vcd) and then the input format (vcd) with the compress flag set to speed up the file reading. Then the -P option specifies the protocol decoders - first usb_signalling which has two arguments that define which probe numbers were connected to what signals - dm is D- and, as above, this was probe 0, similarly dp is D+ and was probe 1. The next decoder layer is usb_protocol which performs the protocol decode. On the machines I have the decode was very slow, it took almost half an hour of churning before sigrok-cli started spitting out any decoded data and took many hours to decode about 30 seconds of sampled data. The end result was output that looks like this:

usb_protocol: "SOF 1781" 
usb_protocol: "SETUP DEV 0 EP 0" 
usb_protocol: "DATA0 00 05 02 00 00 00 00 00" 
usb_protocol: "ACK " 
usb_protocol: "IN DEV 0 EP 0" 
usb_protocol: "NAK " 
usb_protocol: "SOF 1782" 
usb_protocol: "IN DEV 0 EP 0" 
usb_protocol: "DATA1 ()" 
usb_protocol: "ACK " 
usb_protocol: "SOF 1783" 
usb_protocol: "SOF 1784" 
usb_protocol: "SOF 1785" 
usb_protocol: "SOF 1786" 

This actually shows the SetAddress USB request that NetBSD was sending to the device, not only that but shows the problem that we were having - the "IN" command there is the status request for the SetAddress.  The first try results in a "NAK" - negative acknowledgement, the device was too busy to answer, the usb controller retries the IN again and gets the status back - unfortunately by this time it seems that the atmel USB microcode is confused and things don't work right from then on.  What we observed with windows and linux is that the "IN" for the status was delayed until the next start-of-frame (SOF) which gave the device time to process the request before getting hit by a demand for status.  Modifying the NetBSD driver to do the same made things work.

The decoding was not always perfect, I think because the data rate of the USB was right at the limit of the sample rate for the logic analyser (remember Mr. Nyquist says you need a sample rate of at least twice your highest frequency to properly sample data) so sometimes I would see decode errors but, fortunately, the decode mostly worked right and provided us with valuable clues.

Friday, September 06, 2013

Arduino uno/Freetronics etherten attaches now

It took a lot of thrashing about and false trails but we now have the aduino uno (plus others) attaching and also the atmel processors will attach in DFU mode (used for flashing the boot loaders) whereas before they would fail with an address error.  I had to resort to sniffing the USB transactions using a salae logic and decoding with sigrok-cli.  What this showed was the uhci driver was was feeding the control request bus transactions in too fast.  By changing how the transactions were fed and also disallowing what is known as "short" transactions for the control requests.  After fixing these things the attach worked fine.  Many thanks to fellow NetBSD developer Nick who waded in, helped with suggestions and formulated the final working fix.  Hopefully a proper fix will hit NetBSD-current soon.