19 Apr 2013

Bit banging WS2811 LED strips with PWM

Why

Up until now, I have been a mild-mannered software developer, but recently I was bitten by the electronics bug. Now the spare room in our apartment is filled with resistors, capacitors and ICs with serial numbers looking like foreign license plates.

And LEDs. I didn't have a clear plan what I was going to do, but I have a 1.5 year old son, so I suspect making some blinkenlichts at home is going to get me at least umpteen million World's Best Daddy points. I bought different kinds: water-proof strips, plain strips, individual pixels, holiday decoration lights… all with some sort of digital control.
 
Which brings us to today's topic. Some of the LED strips use the WS2811 control chip. While its close relative, the WS2801, uses SPI, the WS2811 has its own serial protocol, which rules out most of the hardware features in microcontrollers. So, it's bit banging time.

I'd like to stress that I was doing this as a hobby project. There are very good solutions for the WS2811 control already, such as Alan Burlison's WS2811.h or the FastSPI_LED2 library. The inspiration for reinventing the wheel came from Alan's remark:
Hmm, that means that driving these with a simple C routine is unlikely to be sufficient. I spent a bit of time looking to see if there was any sort of hardware assist that could be brought to bear, but even SPI at 4MHz, close to the maximum that the MCU can support, wouldn't be fast enough as it would still be necessary to marshal each byte into a series of 5-bit patterns to get the timings right for the WS2811 protocol.
Challenge accepted.

How

The communications protocol is serial: the chip picks out the first 24 bits arriving after 50 μs reset pause, and forwards the following bits to the next chip in the chain. Each bit starts with a low-high transition, and the bit value is determined by the timing of the high-low transition. So we're sending a pulse of varying width; if only we could module pulse width somehow
After examining the datasheet for the Atmega328, I noticed this tidbit:
The OCR0x Registers are double buffered when using any of the Pulse Width Modulation (PWM) modes. For the normal and Clear Timer on Compare (CTC) modes of operation, the double buffering is disabled. The double buffering synchronizes the update of the OCR0x Compare Registers to either top or bottom of the counting sequence.
That's perfect for our needs.  The plan is simple: the timer module will be set for a 20 tick cycle (OCR2A) and the PWM limit value (OCR2B) is adjusted depending on the bit value. Thanks to the double-buffering, we have 20 cycles to prepare the next bit, so it becomes feasible writing this in C.

Of course, the devil is in the details. If we update the PWM limit value too fast, we'll lose the bits, so we need to synchronize the updates to the timer overflow. That's about 4 cycles. Then we need to do the update, clear the timer overflow and so on. In total, we use about 13 cycles, so we need to load the next bit in 7 cycles, including loop termination check.

To achieve that, I unrolled an update of a byte into 8 single-bit updates. This gets compiled rather nicely, as the Atmega assembly language has an opcode called SBRS: skip if register bit set. The byte can be kept in the register unchanged without any need for shifting or masking.

The loading of the next byte also takes time. If I used a regular for loop to iterate over an array, the compiler would happily put all the code at the end and mess up the timings between bytes. So I split the operations into smaller bits (advance pointer, dereference pointer, check for termination) and slotted them between bytes.

The C code, then, looks like this:
    sync();
    bang_bit(b, 4);
    data++;
    sync();
    bang_bit(b, 3);
    nextByte = *data;

and the generated assembly looks like (sync and bang_bit are inlined):
    ;; Wait for overflow
    sbis    0x17, 0
    rjmp    .-4
    ;; Bit set or not?
    sbrc    r25, 4
    rjmp    .+4  
    ldi     r24, 0x04
    rjmp    .+2  
    ldi     r24, 0x10
    ;; Update OCR2B
    sts     0x00B4, r24
    ;; Reset overflow
    ldi     r24, 0x01
    out     0x17, r24
    ;; Advance pointer
    adiw    r28, 0x01
    ;; And the next bit 
    sbis    0x17, 0
    rjmp    .-4     
    sbrc    r25, 3
    rjmp    .+4  
    ldi     r24, 0x04
    rjmp    .+2
    ldi     r24, 0x10
    sts     0x00B4, r24
    ldi     r24, 0x01
    out     0x17, r24
    ;; Load next byte
    ld      r16, Y

After a few passes of checking the generated code and tweaking the C code in response, I ended up with the worst-case longest interval between bits being 19 cycles.

Testing on the hardware revealed some instability on the first LED; turns out I had mistyped one of the timing constants. Fix that, upload and hey presto! Blinkenlichts!

The code is up at GitHub; it involves some calls to Arduino library for delays and such. I was too lazy to look those up from the datasheet.

"But how is this related to ZenRobotics?" I hear you ask. I'll let you know after the Housewarming Party.

Bonus

You'll note that the generated code is still less than optimal. In an effort to save registers, r24 gets reused. Keeping the magic constant 0x01 in a dedicated register would save a cycle per bit.
The "bit set" tests also waste cycles with all the jumping around.

Update

Turns out the timings I had were slightly off. They worked fine when I was blasting at full power, but when there were just one or two bits set per byte, the whole thing would go dark. I had to change just one constant, and then things worked.

That's the good side of this approach. I can easily modify the pulse lengths and the length of the bit if needed. If I resort to writing assembly by hand, I can probably squeeze the minimum cycle length to 15 or so.

11 Apr 2013

Gerrit but not gerrit

We like code reviews. Well, at least officially we all like code reviews. Or anyway they are mandatory.

We evaluated a bunch of possible review tools at some point (rietveld, Review Board and gerrit). We would have been happy with rietveld from a user's point of view but we didn't want to ship our source code to AppEngine (and the gae2django project that would allow it to be run on our own server wasn't very mature then). Out of Review Board and gerrit we preferred gerrit.

We liked the gerrit UI and tools (ssh auth and commands over ssh with JSON output are fabulous) but we really didn't like the gerrit workflow.

Gerrit wants to run your main repository and handle the merges into it. The reviewable changes have to be single commits, not branches, so you basically need to rebase your branches for review and merge. (I can understand that, it's kind of nice to review commits as units of change).

How we have been working is that intermediate results are regularly pulled between people and developers are attached to their intermediate commits in their feature branches.

So we use gerrit without the workflow. This comes in two parts: synthetic review commits and merging outside gerrit.

Synthetic review commits

The commits we push to gerrit are created by doing something like:

branch=$(git rev-parse --abbrev-ref HEAD)

git fetch origin
git checkout -b gerrit-review origin/master
git merge --squash $branch
git commit -a
... edit commit message ...
git push gerrit HEAD:refs/for/master/$USER-$branch

(with error checking, checking for an existing review request and re-using the Change-ID, some related magic in the git hooks etc.)

Merging outside gerrit

Instead of pushing Submit on the gerrit review request, our merge master pulls the original branch that was used to create the review request (actually, we inject a field into the review request with the commit hash), merges that into a working copy, runs merge tests and pushes to our regular shared repository. There's a post-receive hook in that repo that force-pushes onto the gerrit repository and all is well.