WhyUp until now, I have been a mild-mannered software developer, but recently I was bitten by the electronics bug. Now the spare room in our apartment is filled with resistors, capacitors and ICs with serial numbers looking like foreign license plates.
And LEDs. I didn't have a clear plan what I was going to do, but I have a 1.5 year old son, so I suspect making some blinkenlichts at home is going to get me at least umpteen million World's Best Daddy points. I bought different kinds: water-proof strips, plain strips, individual pixels, holiday decoration lights… all with some sort of digital control.
Which brings us to today's topic. Some of the LED strips use the WS2811 control chip. While its close relative, the WS2801, uses SPI, the WS2811 has its own serial protocol, which rules out most of the hardware features in microcontrollers. So, it's bit banging time.
I'd like to stress that I was doing this as a hobby project. There are very good solutions for the WS2811 control already, such as Alan Burlison's WS2811.h or the FastSPI_LED2 library. The inspiration for reinventing the wheel came from Alan's remark:
Hmm, that means that driving these with a simple C routine is unlikely to be sufficient. I spent a bit of time looking to see if there was any sort of hardware assist that could be brought to bear, but even SPI at 4MHz, close to the maximum that the MCU can support, wouldn't be fast enough as it would still be necessary to marshal each byte into a series of 5-bit patterns to get the timings right for the WS2811 protocol.Challenge accepted.
HowThe communications protocol is serial: the chip picks out the first 24 bits arriving after 50 μs reset pause, and forwards the following bits to the next chip in the chain. Each bit starts with a low-high transition, and the bit value is determined by the timing of the high-low transition. So we're sending a pulse of varying width; if only we could module pulse width somehow…
After examining the datasheet for the Atmega328, I noticed this tidbit:
The OCR0x Registers are double buffered when using any of the Pulse Width Modulation (PWM) modes. For the normal and Clear Timer on Compare (CTC) modes of operation, the double buffering is disabled. The double buffering synchronizes the update of the OCR0x Compare Registers to either top or bottom of the counting sequence.That's perfect for our needs. The plan is simple: the timer module will be set for a 20 tick cycle (OCR2A) and the PWM limit value (OCR2B) is adjusted depending on the bit value. Thanks to the double-buffering, we have 20 cycles to prepare the next bit, so it becomes feasible writing this in C.
Of course, the devil is in the details. If we update the PWM limit value too fast, we'll lose the bits, so we need to synchronize the updates to the timer overflow. That's about 4 cycles. Then we need to do the update, clear the timer overflow and so on. In total, we use about 13 cycles, so we need to load the next bit in 7 cycles, including loop termination check.
To achieve that, I unrolled an update of a byte into 8 single-bit updates. This gets compiled rather nicely, as the Atmega assembly language has an opcode called SBRS: skip if register bit set. The byte can be kept in the register unchanged without any need for shifting or masking.
The loading of the next byte also takes time. If I used a regular for loop to iterate over an array, the compiler would happily put all the code at the end and mess up the timings between bytes. So I split the operations into smaller bits (advance pointer, dereference pointer, check for termination) and slotted them between bytes.
The C code, then, looks like this:
sync(); bang_bit(b, 4); data++; sync(); bang_bit(b, 3); nextByte = *data;
and the generated assembly looks like (sync and bang_bit are inlined):
;; Wait for overflow sbis 0x17, 0 rjmp .-4 ;; Bit set or not? sbrc r25, 4 rjmp .+4 ldi r24, 0x04 rjmp .+2 ldi r24, 0x10 ;; Update OCR2B sts 0x00B4, r24 ;; Reset overflow ldi r24, 0x01 out 0x17, r24 ;; Advance pointer adiw r28, 0x01 ;; And the next bit sbis 0x17, 0 rjmp .-4 sbrc r25, 3 rjmp .+4 ldi r24, 0x04 rjmp .+2 ldi r24, 0x10 sts 0x00B4, r24 ldi r24, 0x01 out 0x17, r24 ;; Load next byte ld r16, Y
After a few passes of checking the generated code and tweaking the C code in response, I ended up with the worst-case longest interval between bits being 19 cycles.
Testing on the hardware revealed some instability on the first LED; turns out I had mistyped one of the timing constants. Fix that, upload and hey presto! Blinkenlichts!
The code is up at GitHub; it involves some calls to Arduino library for delays and such. I was too lazy to look those up from the datasheet.
"But how is this related to ZenRobotics?" I hear you ask. I'll let you know after the Housewarming Party.
BonusYou'll note that the generated code is still less than optimal. In an effort to save registers, r24 gets reused. Keeping the magic constant 0x01 in a dedicated register would save a cycle per bit.
The "bit set" tests also waste cycles with all the jumping around.
UpdateTurns out the timings I had were slightly off. They worked fine when I was blasting at full power, but when there were just one or two bits set per byte, the whole thing would go dark. I had to change just one constant, and then things worked.
That's the good side of this approach. I can easily modify the pulse lengths and the length of the bit if needed. If I resort to writing assembly by hand, I can probably squeeze the minimum cycle length to 15 or so.