Skip to content

Mapping A Pixie Bitmap Onto the OLED

December 31, 2017

img_0355

For my first try I used a starship bitmap that i had already transformed for the oled but I want to try directly using code that’s meant to display on the 1861 pixie display. The issue is that the pixie display expects to receive a scanline as 8 bytes which it serializes to 64 bits for a scan line. The OLED expects each byte to represent 8 vertical bits so that for bytes A,B,C,D,E,F,G,H
The Pixie displays bits as
A0A1A2…A7B0B1…B7…..H7 on a single scanline
while the OLED lines them up as
A0B0C0D0...H0
A1.........H1
.
.
A7.........H7

to be the first 8 pixels of 8 separate lines.
The 1802 UNO runs code on the AVR to read the emulated 1802’s memory and transposes it for the OLED. The code is in C-ish C++ so i just cribbed it to run directly on the 1802 as follows:

void oledpaint(uint8_t l1, const uint8_t l2)
{
  uint8_t line, block, i, Bx;
  uint32_t B[8], A[8];
  int32_t m=1;
  int32_t n=1;
  uint32_t x,y,t;

  // explanation of the following code: the Pixie display takes a memory byte as a horizontal line of 8 pixels.
  // Alas, the OLED wants a vertical column of 8 pixels.
  // So you need to transpose a block of 8 RAM bytes into 8 vertical bars of 8 pixels, then send these to the OLED.
  // This is calculation intensive, especially when using readable code. So there's some tough code taken from Hacker's Delight.

  // NOTE: we copy ram into an 8 byte A buffer. This copy is because we use line doubling (a pixel is a 2*2 pixel block on the OLED).
  // NOTE 2: it would be simple to set up different memory modes of 128*64, 32*32 etc next to the hard-wired 64*64 mode.

  for (line=l1;line<(l1+l2);line++)        // 8 lines because we're using line doubling
  {
    for (block=0;block<8;block++)
    {
      for (i=0;i<4;i++)           // take just 4 bytes below current top byte, not 8 bytes: line doubling
      {
        A[i*2+1] = A[i*2] = starship[line*32 + block + (3-i)*(8)];  // line*32 - line doubling, would be 64 otherwise. (3-i): line doubling, would be (7-i) otherwise
      }

      // the code below is from Hacker's Delight, see
      // http://www.hackersdelight.org/hdcodetxt/transpose8.c.txt
      // note this trickery depends on 32 bit variables, otherwise
      // you get garbled results.
       x = (A[0]<<24)   | (A[m]<<16)   | (A[2*m]<<8) | A[3*m];
       y = (A[4*m]<<24) | (A[5*m]<<16) | (A[6*m]<<8) | A[7*m];

       t = (x ^ (x >> 7)) & 0x00AA00AA;  x = x ^ t ^ (t << 7);
       t = (y ^ (y >> 7)) & 0x00AA00AA;  y = y ^ t ^ (t << 7);

       t = (x ^ (x >>14)) & 0x0000CCCC;  x = x ^ t ^ (t <<14);
       t = (y ^ (y >>14)) & 0x0000CCCC;  y = y ^ t ^ (t <<14);

       t = (x & 0xF0F0F0F0) | ((y >> 4) & 0x0F0F0F0F);
       y = ((x << 4) & 0xF0F0F0F0) | (y & 0x0F0F0F0F);
       x = t;

       B[0]=x>>24;    B[n]=x>>16;    B[2*n]=x>>8;  B[3*n]=x;
       B[4*n]=y>>24;  B[5*n]=y>>16;  B[6*n]=y>>8;  B[7*n]=y;

      // ----- end of hacker's delight code ------------------
      // B now contains 8 bytes for the OLED

      for (i=0;i<8;i++)
      {
		  spiSend(B[i]);spiSend(B[i]);
      }
    }
  }
}

That routine does a LOT of 32 bit math. It generates almost 8K of 1802 code most of which is calls to 32 bit math routines. It takes almost 5 seconds to fill the screen which, at 4MHz is 1.25 MILLION instructions! On the UNO 1802 the AVR is a lot more powerful but Oscar still splits the updates into 4 to reduce jumpiness. I have visions of breaking up the update and running at 12MHz but it still needs to get sped up considerably.

There are a couple of easy wins – the variables m and n are constant 1’s which a good compiler would factor out but mine won’t. Likewise making B an 8 bit char made a big difference. A couple of other changes got it down to around 1.2 seconds at 4MHz but there it sits.

Advertisements

From → Uncategorized

5 Comments
  1. “That routine does a LOT of 32 bit math. ”

    Which is a problem. This algorithm may be fine on a 32 bit ARM with a barrel shifter but it is not a good match for the 1802. I think that a decent assembly language routine could do a byte in about 100 instruction cycles.

  2. The workhorse subroutine would do one byte something like:

    ;; r15 – pointer to data
    ;; r14.0 – bitmask
    ;; r14.1 – work
    ;; r13 – loop counter
    ;; D – returns repacked byte

    ldi 8
    plo r13
    ldi 0
    phi r14
    sex r15
    L1: glo r14
    and
    bz L2
    ldi 80h
    L2: shl
    ghi r14
    shlc
    phi r14

    inc r15
    dec r13
    bnz L1
    ghi r14

    • So this is taking the xth bit of 8 bytes and stacking them – right? the xth bit being the bitmask in R14.0. so that’s about 80-90 inst byte times the 64×8 assuming the doubling of pixels can be worked in with not a big penalty. so something like 51,200 inst. i was thinking of something that used shifting but with the loop unrolled.

Trackbacks & Pingbacks

  1. Processor Abuse | olduino

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

%d bloggers like this: