Skip to content

First 1806 Counter/Timer Tests

17-03-16 ttest1

So, my first crack at the 1806 timer functions worked fine.  I set the 8 bit counter to a starting value with LDC then put it into timer mode with STM and it counts down one every 32 machine cycles(TPA’s). I get the value with two GEC’s separated by the normal olduino spin delay which just counts instructions and the results are pretty comparable. At 4MHZ the downcount happens every 64US and the range of the counter is a bit more than 16ms. That sounds too short to be useful but I’m actually planning to run it so it interrupts about every millisecond and use an interrupt routine to track milliseconds in a long int and use that for delay() and millis() ala arduino. The code below is all done in C with inline assembler because I’m lazy. the loading and getting of the timer are in functions because that’s the easiest way to get values in and out of assembly routines.

 

//timer test 1 - 1806 timer counter demo
#include <olduino.h>
#include <nstdlib.h>
#include <cpu1802spd4port7.h>
void LDC(unsigned char c){
	asm(" glo r12 ; pick up the value\n"
		" LDC ;		set the timer\n");
}
unsigned char GEC(){
	asm(" GEC ;		get the value\n"
		" plo r15\n ldi 0\n phi r15 \n"
		" cretn ;	this is the actual return\n");
	return 42;//just to keep the compiler happy
}
void main()
{
	unsigned char t1,t2;
	int i,d=16;
	printf("Hello Timer Fans\n");
	asm(" CID; disable timer interrupts\n");
	LDC(255);//load the timer
	asm(" STM; start the timer\n");
	t1=GEC();
	delay(16);
	t2=GEC();
	printf("t1=%d, t2=%d\n\n",t1,t2);
	printf("spin delay of %d ms\n",d);
	printf("covered about %f timer ms\n",(float)(t1-t2)/(15.625));
	printf("done\n");
}
#include <nstdlib.c>
#include <olduino.c>

 

Hello From The 1806 Side

I’m now pretty happy with the 1806 proof of concept setup. I have a software loader that I load into low memory using an 1802 processor.  The software loader looks like the 1802 in load mode but it writes to memory starting at 0x800 or, if there’s nothing to load, runs the code that’s already there. With the software loader installed i can swap out the 1802 chip for an 1806 and run code that uses the enhanced instruction set.

I’ve got an 1806 target for LCC1802 that compiles C using some of the 1806 enhanced instructions.  It gets a modest (20%) performance improvement and frees up a couple of registers.

When I get back to Ottawa I’m going to install an eprom to hold the loader and install the 1806 permanently.  I’ll then look at incorporating serial drivers directly in the 1802 and redoing the SPI circuit with a goal of eliminating the AVR completely.

This is the output from the”Hello” program loaded at location 0x800 compiled for the 1802 and then the 1806.
17-03-13 hellofromtheotherside

 //fakeloader simulates 1802 load mode in run for 1806
#define nofloats
void main(){
	asm(" 		b4 run\n"			//bypass bootloader if IN pressed
		" 		ldAD 14,0x800\n"	//starting address
		" 		sex 14\n"			//in X register
		"noEF4:	bn4 noEF4\n"		//loop til IN pressed
		"		inp 6\n"			//load memory
		"   	nop\n"
		"		out 7\n"			//and echo
		"yEF4:	b4 yEF4\n"			//wait til switch released
		"		br noEF4\n"			//back for more
		"run:	lbr 0x800\n");	//finally - off we go
}

#include <nstdlib.h>
#include <cpu1802spd4port7.h>
void fun(int i){}

void main()
{
	char* dummy=(void*) &fun;
	printf("Hello World!\n");
	printf("main() is at %x\n",&main);
	printf("Code for a null function starts with %cx\n",dummy[0]);
	if (dummy[0]==0xD5)
		printf("Compiled as an 1802\n");
	else if (dummy[0]==0x68)
		printf("Compiled as an 1806\n");
	else
		printf("I don't know what this is!\n");
}
#include <nstdlib.c>

1806 Dhrystone Results – Solid If Unspectacular

The 1806 port seems to be working reliably now. Almost all of the adaptation is done in the macros conditioned on setting CPU to 1805 at assembly time (the 1804/5/6 share the enhanced instruction set, 1805 is just what the assembler recognizes). The only changes to the machine description file are to use new epilog/prolog files and to add one to the stack offsets for variables and formal parameters. I could probably fold that back into the macros as well. If I make any backwards compatible improvements to the macros or other code I’ll probably do something like that. As it stands, code created by the new target XR18NW wont work on an 1802.

Rerunning the Dhrystone benchmark, the 1806 clocks 81 Dhrystones/Sec vs 68 on the 1802(both at 4MHZ) an 18% improvement. These compare to over 300 Drhrystones/sec for a Z80 with a much better compiler. The 1806 does 2900 instructions per pass vs 3660 for the 1802 which sounds more impressive but some of the 1806 instructions take longer than the 1802’s which makes up the difference.

So: the changes for the 1806 are:

  • adding one to stack offsets for parameters and variables
  • using scal/sret for call return
  • using RLDI for loading 16 bit constants
  • using RLXA and RSXD for 16 bit stack access where practical
  • getting rid of places where I inc/dec the stack pointer to make a work area because the stack pointer now points below the last used byte.
  • fixing up the odd place where the stack discipline was getting me in trouble – usually to do with calls from one assembly routine to another.

The remaining obvious 1806 instruction is a decrement and branch non-zero(DBNZ) which i will probably incorporate but it’s surprisingly tough to do and probably wouldn’t affect the Dhrystone results at all.  All of the 1806 instructions seem more advantageous to someone writing tight code in assembly than for a compiler.

Looking at the generated code reminds me of just how clunky some of it is. I’m tempted to go back and incorporate liveness analysis in my optimizer and have a better look at chaining primitives in the machine description file.

 

 

 

Once Again

I am debugging code across four platforms in the most desperate way imaginable 

I have a problem with something in my 1806 adaptation corrupting memory and i was trying to use avrdude to read out the memory contents but i now note that somewhere in the evolution of the circuit i started loading the output shift register only on actual I/O from the 1802. I triggers on NAND(N1,TPB) so only when the 1802 has written to port 2,4,6, or 7. In the original circuit I think /LOAD would have been accepted as well as N1. So, back to staring at code.

1806 Speed Can Impress

I fought my way past the stack discipline issue and cleaned up the use of SCAL/SRET and it seems that some aspects of the 1806 do give noticeable speed improvements. I was running the test suite on the latest compiler version and i got a persistent mysterious failure in the float test. The test program just seemed to hang toward the end and stop printing. I assumed something was goobering the stack and going off into lala land but it worked in the emulator and, when I put a blink loop in the compiler cleanup routine it showed that main() was in fact completing.

It seemed to relate to printing a long string and for a while i was imagining something to do with bumping some code over a page boundary but, before getting out the logic analyzer, I stuck some more probe points in and convinced myself that all the code was, in fact executing but the OUT 7’s weren’t driving the AVR to pass them on.

I started to think about timing and did instruction counts relative to baud rate. The AVR is sending data to the host at 57600 baud, 5,760 bytes/second. The 1802/1806 at 4mhz is executing 250,000 instructions/sec so I needed at least 44 instructions between OUT 7’s so the AVR could keep up. Counting instruction times the printstr() routine was using about 58 instructions per character printed and the 1806 cut that down to 40. It wasn’t an issue for most programs but the floating point test prints a single string of 300 bytes and that was enough to blow out the AVR’s buffer.

So: in an actual working program, the 1806 knocks about 30% off the time of an 1802 for the same clock speed!

[Following 14 lines are the inner loop of printstr()]
L22:
;    while(*ptr){
;	putc(*ptr++);
	ldaD R12,7    **1806 RLDI is 2.5 inst vs 4 on 1802
	cpy2 R11,R7   **4
	incm R7,1     **1
	ldn1 R13,R11  **2
	zExt R13      **2
	Ccall _out    **1806 SCAL is 5 inst times vs 17 for 1802
;	}
L23:
	ldn1 R11,R7   **2
	jnzU1 R11,L22 **2.5
;}
[following lines are the body of the OUT routine]
_out:	;raw port output **16 instructions plus Cretn which is 4 on 1806, 10 on 1802
	;stores a small tailored program on the stack and executes it
	dec	sp	;work backwards
	ldi	0xD3	;return instruction
	stxd
	cpy2	rt1,sp	;rt1 will point to the OUT instruction
	glo	regarg1	;get the port number
	ani	0x07	;clean it
	ori	0x60	;make it an out instruction - 60 is harmless
	stxd		;store it for execution
	glo	regarg2	;get the byte to be written
	str	sp	;store it where sp points
	sep	rt1	;execute it
;we will come back to here with sp stepped up by one
+	inc	sp	;need to get rid of the 6x instruction
	inc	sp	;and the D3
	Cretn		;and we're done ** 1806 SRET is 4 inst times vs 10 on 1802

I always want the last word so that wordpress doesn’t eat my code!

The 1806 – So Far Meh

So I have the 1806 running reliably and I’ve done a couple of instruction adaptations. So far not so good.  My implementation of SCAL/SRET is clumsy and as a result the code is much bulkier.  My “Hello From The Other Side” program goes from 5,941 bytes to 6,321. My hope would be that if i get the clumsiness fixed this would come more nearly even. The only other new instruction that looked like an easy win is a 16 bit immediate register load RLDI – that brought me back to 6265 bytes. There are just a couple of other instructions that have potential for performance improvement like register store via X and decrement RSXD and companion RLXA.  There are a bunch of decimal instructions which are probably completely useless.

One good thing is that the way i packaged the compiler output as macros really pays off for instruction changes. RLDI, RSXD, and RLXA went into exactly one spot each in the prolog file. The combination of those brought me back to 6161 bytes so, if i started even after SCAL/SRET I’d be down a couple of percent in size and probably a bit better in execution time.

ldiReg:	macro	reg,value
 if MOMCPU=$1805
 	RLDI	reg,value
 else
	ldi	(value)&255
	plo	reg
	ldi	(value)>>8; was/256
	phi	reg
 endif
	endm
popr:	macro	reg
 if MOMCPU=$1805
 	RLXA	reg
 else
	lda	sp
	phi	reg
	lda	sp
	plo 	reg
 endif
	endm

In the image below the 1806 is displaying the four bytes of machine code for the body of a do-nothing function. For the 1802 it would be a single byte D5.
17-02-27-1806
The functional payoff for the 1806 may come from the counter/timer instructions.

An altogether better idea

It occurred to me that the whole xmodem thing was a waste of time.  It’s trivial to have the 1806 fake a load mode like the 1802.  All I need is a little program that runs at startup and accepts binary data whenever IN is pressed.  Then I can use the existing avrdude on windows and more-or-less the same program in the avr bridge processor. Instead of talking to the 1802 load mode the AVR talks to my fake loader program in the 1802/1806.  A piece of cake.

//fakeloader simulates 1802 load mode in run for 1806
void main(){
	asm(" 	b4 run\n"		//bypass bootloader if IN pressed
	" 	ldAD 14,0x2000\n"	//starting address
	" 	sex 14\n"		//in X register
	"noEF4:	bn4 noEF4\n"		//loop til IN pressed
	"yEF4:	b4 yEF4\n"		//wait til switch released
	"	inp 6\n"		//load memory
	"   	nop\n"
	"	out 7\n"		//and echo
	"	br noEF4\n"		//back for more
	"run:	lbr 0x2000\n");	//finally - off we go
}

The 1802/6 loader code really is simple. On startup, it tests EF4 to see if the bootloader should run. Then it loads bytes starting at location 0x2000 every time it sees EF4 go low and then high. To invoke it the AVR resets the 1802/1806 then puts it in run with /EF4 high. It feeds it the code from avrdude then, when that’s done it restarts the 1802/1806 with /EF4 held low to bypass the loader and execute the application at 0x2000.

The changes to the avr loader were simple compared to what was done for xmodem. It looks at the first address avrdude sends for program loading. If it’s 0 it assumes this is an 1802 and it puts it in load mode. If it’s not 0 it puts the assumed 1806 in pseudo-load mode then programs it the same way.

Of course, getting this working involved four programs running on three platforms (avrdude on windows, my loader application on the avr, the fake loader and the loaded application on the 1802/1806). Much head-scratching and logic-analyzer poking was required. But, now that it works it’s all much simpler and once the fake loader is in eprom it should be pretty reliable.

17-02-26-stamps