Skip to content

More Advanced Inline Assembly – Nope!

LCC1802 has a simple form of inline assembly: if you code asm(“foo”); you get “foo” directly emitted in the assembly output from the compiler. This is fine for the simplest cases (seq,req etc.) but it would be nice to be able to access local variables. I found a more advanced version of the assembly patch i used which seems to be meant to do exactly that. http://www.yqcomputer.com/234_714_1.htm
This is from more than a decade ago but LCC itself is much older than that so that’s ok. I worked it into my working copy but the results so far are not encouraging:
The fragment below shows the C code and the resulting assembly code. The asm calls are just to get the substitutions for the variable names for a global:g, two parameters: p1 and p2, and two locals: L1,L2. The global g generates _g which is fine but i knew that! the locals L1 and L2 generate -4 and -6 instead of something like 2 and 0 or register references. The parameters generate 0 and 2 instead of (I would have hoped) referring to the registers or, worst case, offsets of 6 and 8 from the stack pointer. I may poke at this again and it may just be a matter of adding the frame size(which is actually 6) but in the end i question whether it’s worth it.

int g=7;
void turnqoff(int p1,int p2){
	int L1,L2;
	L1=42; L2=43;
	asm(";	L1:$L1)\n");
	asm(";	L2:$L2\n");
	asm(";	g:$g\n");
	asm(";	p1:$p1 p2:$p2\n");
}
*************compiled code follows**************
_g:
	dw 7
_turnqoff:		;framesize=6
	reserve 4
	st2 R12,'O',sp,(6+1)			
	inc memaddr	
	str2 R13,memaddr			
;void turnqoff(int p1,int p2){
;	L1=42; L2=43;
	st2I 42,'O',sp,(2+1); ASGNI2(addr,acon)
	st2I 43,'O',sp,(0+1); ASGNI2(addr,acon)
;	asm(";	L1:$L1)\n");
;	L1:-4)
;	asm(";	L2:$L2\n");
;	L2:-6
;	asm(";	g:$g\n");
;	g:_g
;	asm(";	p1:$p1 p2:$p2\n");
;	p1:0 p2:2

Making Music With the 1806 Timer Q Toggle

The 1804/5/6 have a built-in 8 bit timer/counter with a bunch of functions. In timer mode, it decrements automatically every 32 machine cycles. When it gets to 0 it resets to whatever you first set it to and counts down again. It can cause an interrupt at 0 OR, it can just toggle Q. This generates a constant square wave without any further program interference. The range, on my 4MHZ olduino is about 30HZ to 7.8KHZ. If you can set the chip up with a ROM or some other way of feeding it instructions it’s an easy way to make sure you have a genuine 1804/5/6. The 10 byte program below generates a 30HZ square wave on my 4MHZ 1806.

68 0D CID ; disable timer interrupts
F8 FF LDI 255; maximum value
68 06 LDC ;load the timer
68 09 ETQ ; enable the Q toggle
68 07 STM ; start the timer

For extra points, it occurred to me that you could use this to make music in the traditional square wave sense. I wrote a tone(frequency,duration) function that calculates the initial timer value needed and starts the timer toggling Q for the required duration. This is similar enough to the equivalent Arduino function that I was able to crib some code from github to play a simple tune. That’s what you see running in the video above. That page also has a worthwhile explanation for converting sheet music to frequency,duration pairs.

void tone(int freq, int dur){ //tone at a particular frequency for a period
	unsigned char t;
	if (0!=freq){//0 would mean quiet for the duration
		asm(" STPC ; stop the timer\n");
		if (freq>7800) t=1; calculate the number of 64us ticks
		else if (freq<30) t=255;
		else t=7800/freq;
		LDC(t);
		asm(" ETQ;  enable the Q toggle\n");
		asm(" STM; start the timer\n");
		delay(dur);
		asm(" STPC\n");
	}else{
		delay(dur);
	}
}
#define playSpeed 2
#define numNotes 29
int line1[] = {
  NOTE_D4, 0, NOTE_F4, NOTE_D4, 0, NOTE_D4, NOTE_G4, NOTE_D4, NOTE_C4,
  NOTE_D4, 0, NOTE_A4, NOTE_D4, 0, NOTE_D4, NOTE_AS4, NOTE_A4, NOTE_F4,
  NOTE_D4, NOTE_A4, NOTE_D5, NOTE_D4, NOTE_C4, 0, NOTE_C4, NOTE_A3, NOTE_E4, NOTE_D4,
  0};

int line1_durations[] = {
  8, 8, 6, 16, 16, 16, 8, 8, 8,
  8, 8, 6, 16, 16, 16, 8, 8, 8,
  8, 8, 8, 16, 16, 16, 16, 8, 8, 2,
  2};

void main(){
	unsigned int thisNote=0,noteDuration,pauseBetweenNotes;
	printf("Hello Axel Fans\n");
	asm(" CID; disable timer interrupts\n");
	for(thisNote=0;thisNote<numNotes;thisNote++){
		noteDuration = 1000/line1_durations[thisNote];
		tone(line1[thisNote], noteDuration * playSpeed);
	}
	printf("\ndone\n");

}

 

Millis Accuracy and Interrupt Overhead On the 1806

I’m working on an Arduino-style millis() function for the 1806. The idea is that I set a timer to interrupt every ms and increment a variable in memory that i can query from my code. A complication is that at 4mhz the timer counts down once every 64us which doesn’t divide evenly into 1000 – the closest you get is 16 counts for 1.024ms. So I set the counter to 16 and let it interrupt the cpu every time it counts down to 0. I increment the millis location in storage and also add 3 to a fraction variable. If the fraction goes over 125 i clear it and boost millis by one. The arduino uses exactly the same factors which is the only reason i figured it out!

unsigned int millis=0; unsigned char fractmillis=0;
void LDC(unsigned char c){
	asm(" glo r12 ; pick up the value\n"
		" LDC ;		set the timer\n");
}
unsigned char GEC(){
	asm(" GEC ;		get the value\n"
		" plo r15\n ldi 0\n phi r15 \n"
		" cretn ;	this is the actual return\n");
	return 42;//just to keep the compiler happy
}
void initmillis(){
	asm(" CID; disable timer interrupts\n");
	LDC(16);//load the timer
	asm(" ldaD R1,.handler\n");
	asm(" STM; start the timer\n");
	asm(" CIE; enable timer interrupts\n");
	return;
	asm(".done: ;millis interrupt cleanup\n"
		" INC 2	  ; X=2!\n"
		" RLXA r15\n"
		" RLXA r14\n"
		" LDA 2	  ; RESTORE DF\n"
		" SHR\n"
		" LDA 2	  ; NOW D\n"
		" RET	  ; now X&P\n");

	asm(".handler: ;actual interrupt handler prolog\n"
		" DEC 2	  ; prepare stack to\n"
		" SAV	  ; SAVE X AND P (from T)\n"
		" BCI .go ; clear timer int\n"
		".go: \n"
		" DEC 2\n"
		" STXD	  ; SAVE D\n"
		" SHLC	  \n"
		" STXD	  ; SAVE DF\n"
		" RSXD r14  ;save memaddr helper reg\n"
		" RSXD r15  ;save work reg\n");

	asm(" ld2 r15,'D',(_millis),0 ;load current millis value\n"
		" inc r15	;increase millis\n"
		" inc r14	;point to fractional part of millis\n"
		" ldn r14	;pick up fractional value immediately following\n"
		" adi 3\n str r14 ;add 3 to the fractional part and put it back\n"
		" smi 125	;test for extra count\n"
		" lbnf 		.noxtra ;no borrow, no extra counts\n"
		" str r14	;store the fraction\n"
		" inc r15	;add extra count to millis\n"
		".noxtra: 	;bypass extra count\n"
		" dec r14\n glo r15\n str r14\n"
		" dec r14\n ghi r15\n str r14\n"
		" lbr .done\n");
}
//timer test 1 - 1806 timer counter demo
#include <olduino.h>
#include <nstdlib.h>
#include <cpu1802spd4port7.h>
#include "timer1806.h"
void main()
{
	unsigned int t1,t2;
	int i,d=100;
	printf("Hello Timer Fans\n");
	asm(" seq\n");
	delay(d);
	asm(" req\n");
	initmillis();
	printf("Now we're clocking!\n");
	t1=millis;
	asm(" seq\n");
	delay(d);
	asm(" req\n");
	t2=millis;
	printf("t1=%d, t2=%d\n\n",t1,t2);
	printf("spin delay of %d ms\n",d);
	printf("covered %d timer ms\n",t2-t1);
	printf("done\n");
}
#include <nstdlib.c>
#include <olduino.c>

17-03-17 timer117-03-17 timer2
So, the bad news is in that first image where it compares the results of my millis calculation with the spin delay showing a spin delay of 100ms compared to 137ms calculated with millis. The better news is the second image which shoes two things: The actual time is almost bang on the 137ms reported by millis and the spin delay’s 100ms is actuslly 113 ms.  Still, that’s an overhead of more than 20% for tracking millis.  I can improve that in a few ways:
I Can reduce the resolution to say 10ms; I can reduce the accuracy and accept that a millisecond will be 1.024ms; I can keep the millis value in a register rather than a global memory variable. I think I’ll try reducing the resolution to 2ms and giving up the fractional ms accuracy.

Also: Ahah! I took out the timer stuff and recompiled for the 1802, then re-measured the spin delay with the logic analyzer. A 100ms nominal spin delay took 103ms with the 1802 instruction set. I think the difference is down to reserving registers 0 and 1 making them unavailable for variables.

Yup: Due to an error in the machine description file, reserving regs 0 and 1 left the compiler with only reg 7 for variable and it was going nuts spilling and reloading. I fixed the error(adding regs 4 and 5 to the pool at the same time) and the spin delay went down to almost the nominal value. Interestingly(to me) I had realized a while ago that R6 was always available for variables even though it’s the link register. Both the SCRT and SCAL/SRET always save it before using it.

So, in the end, I may leave things as they are. I bought 20% in performance with the 1806 and if i give that up for better timing, so be it.

First 1806 Counter/Timer Tests

17-03-16 ttest1

So, my first crack at the 1806 timer functions worked fine.  I set the 8 bit counter to a starting value with LDC then put it into timer mode with STM and it counts down one every 32 machine cycles(TPA’s). I get the value with two GEC’s separated by the normal olduino spin delay which just counts instructions and the results are pretty comparable. At 4MHZ the downcount happens every 64US and the range of the counter is a bit more than 16ms. That sounds too short to be useful but I’m actually planning to run it so it interrupts about every millisecond and use an interrupt routine to track milliseconds in a long int and use that for delay() and millis() ala arduino. The code below is all done in C with inline assembler because I’m lazy. the loading and getting of the timer are in functions because that’s the easiest way to get values in and out of assembly routines.

 

//timer test 1 - 1806 timer counter demo
#include <olduino.h>
#include <nstdlib.h>
#include <cpu1802spd4port7.h>
void LDC(unsigned char c){
	asm(" glo r12 ; pick up the value\n"
		" LDC ;		set the timer\n");
}
unsigned char GEC(){
	asm(" GEC ;		get the value\n"
		" plo r15\n ldi 0\n phi r15 \n"
		" cretn ;	this is the actual return\n");
	return 42;//just to keep the compiler happy
}
void main()
{
	unsigned char t1,t2;
	int i,d=16;
	printf("Hello Timer Fans\n");
	asm(" CID; disable timer interrupts\n");
	LDC(255);//load the timer
	asm(" STM; start the timer\n");
	t1=GEC();
	delay(16);
	t2=GEC();
	printf("t1=%d, t2=%d\n\n",t1,t2);
	printf("spin delay of %d ms\n",d);
	printf("covered about %f timer ms\n",(float)(t1-t2)/(15.625));
	printf("done\n");
}
#include <nstdlib.c>
#include <olduino.c>

 

Hello From The 1806 Side

I’m now pretty happy with the 1806 proof of concept setup. I have a software loader that I load into low memory using an 1802 processor.  The software loader looks like the 1802 in load mode but it writes to memory starting at 0x800 or, if there’s nothing to load, runs the code that’s already there. With the software loader installed i can swap out the 1802 chip for an 1806 and run code that uses the enhanced instruction set.

I’ve got an 1806 target for LCC1802 that compiles C using some of the 1806 enhanced instructions.  It gets a modest (20%) performance improvement and frees up a couple of registers.

When I get back to Ottawa I’m going to install an eprom to hold the loader and install the 1806 permanently.  I’ll then look at incorporating serial drivers directly in the 1802 and redoing the SPI circuit with a goal of eliminating the AVR completely.

This is the output from the”Hello” program loaded at location 0x800 compiled for the 1802 and then the 1806.
17-03-13 hellofromtheotherside

 //fakeloader simulates 1802 load mode in run for 1806
#define nofloats
void main(){
	asm(" 		b4 run\n"			//bypass bootloader if IN pressed
		" 		ldAD 14,0x800\n"	//starting address
		" 		sex 14\n"			//in X register
		"noEF4:	bn4 noEF4\n"		//loop til IN pressed
		"		inp 6\n"			//load memory
		"   	nop\n"
		"		out 7\n"			//and echo
		"yEF4:	b4 yEF4\n"			//wait til switch released
		"		br noEF4\n"			//back for more
		"run:	lbr 0x800\n");	//finally - off we go
}

#include <nstdlib.h>
#include <cpu1802spd4port7.h>
void fun(int i){}

void main()
{
	char* dummy=(void*) &fun;
	printf("Hello World!\n");
	printf("main() is at %x\n",&main);
	printf("Code for a null function starts with %cx\n",dummy[0]);
	if (dummy[0]==0xD5)
		printf("Compiled as an 1802\n");
	else if (dummy[0]==0x68)
		printf("Compiled as an 1806\n");
	else
		printf("I don't know what this is!\n");
}
#include <nstdlib.c>

1806 Dhrystone Results – Solid If Unspectacular

The 1806 port seems to be working reliably now. Almost all of the adaptation is done in the macros conditioned on setting CPU to 1805 at assembly time (the 1804/5/6 share the enhanced instruction set, 1805 is just what the assembler recognizes). The only changes to the machine description file are to use new epilog/prolog files and to add one to the stack offsets for variables and formal parameters. I could probably fold that back into the macros as well. If I make any backwards compatible improvements to the macros or other code I’ll probably do something like that. As it stands, code created by the new target XR18NW wont work on an 1802.

Rerunning the Dhrystone benchmark, the 1806 clocks 81 Dhrystones/Sec vs 68 on the 1802(both at 4MHZ) an 18% improvement. These compare to over 300 Drhrystones/sec for a Z80 with a much better compiler. The 1806 does 2900 instructions per pass vs 3660 for the 1802 which sounds more impressive but some of the 1806 instructions take longer than the 1802’s which makes up the difference.

So: the changes for the 1806 are:

  • adding one to stack offsets for parameters and variables
  • using scal/sret for call return
  • using RLDI for loading 16 bit constants
  • using RLXA and RSXD for 16 bit stack access where practical
  • getting rid of places where I inc/dec the stack pointer to make a work area because the stack pointer now points below the last used byte.
  • fixing up the odd place where the stack discipline was getting me in trouble – usually to do with calls from one assembly routine to another.

The remaining obvious 1806 instruction is a decrement and branch non-zero(DBNZ) which i will probably incorporate but it’s surprisingly tough to do and probably wouldn’t affect the Dhrystone results at all.  All of the 1806 instructions seem more advantageous to someone writing tight code in assembly than for a compiler.

Looking at the generated code reminds me of just how clunky some of it is. I’m tempted to go back and incorporate liveness analysis in my optimizer and have a better look at chaining primitives in the machine description file.

 

 

 

Once Again

I am debugging code across four platforms in the most desperate way imaginable 

I have a problem with something in my 1806 adaptation corrupting memory and i was trying to use avrdude to read out the memory contents but i now note that somewhere in the evolution of the circuit i started loading the output shift register only on actual I/O from the 1802. I triggers on NAND(N1,TPB) so only when the 1802 has written to port 2,4,6, or 7. In the original circuit I think /LOAD would have been accepted as well as N1. So, back to staring at code.