Skip to content

LCC1802 Code Quality

March 7, 2013

I’m at a stage now where I’m looking at the quality of generated code.  After a little bit of work I’m actually pretty pleased.  I compiled one of my test cases and wen’t through the assembler output carefully looking for clunky stuff or easy improvements.  I was able to clean up the function prolog and make a few small changes to use 1802 instructions better:

  • using increment/decrement for small additions subtractions
  • using direct loading instead of calculating an address for 0 offsets
  • coding special cases for zero tests
  • etc, et-grinding-cetera

Really not hard work though and the results seem pretty good to me.  My usual benchmark is an “8 queens” program that brute-force places 8 queens on a chesss board with none under attack.  The inner loop looks like the following:

	int r;
	for (r = 0; r < 8; r++){
		if (rows[r] && up[r-c+7] && down[r+c]) {
			rows[r] = up[r-c+7] = down[r+c] = 0;
			x[c] = r;
			if (c == 7)
				queens(c + 1);
			rows[r] = up[r-c+7] = down[r+c] = 1;

taking that if statement as a case in point:

;if (rows[r] && up[r-c+7] && down[r+c]) {
	cpy2 R11,R1   <copy r>
	shl2I R11,1   <r*2>
	ld2 R11,(_rows)R11  <rows[r]>
	jzU2 r11,L18   <jump to false path if r is 0>
	alu2 R11,R1,R7,sm,smb  <r-c>
	shl2I R11,1    <(r-c)*2>
	ld2 R11,(_up+14)R11  <up[r-c+7]>
	jzU2 r11,L18;  <jump to false path if up[r-c+7] is 0>
	alu2 R11,R1,R7,add,adc  <r+c>
	shl2I R11,1     <(r+c)*2>
	ld2 R11,(_down)R11  <_down[r+c]>
	jzU2 r11,L18; <jump to false path if _down[r+c] is 0>

you can see that the compiler has figured out on its own that it uses registers for r and c. it pre-calculates the +7 part of the first subscript; and it uses shift left to multiply by 2. For the loop control it uses inc’s to add 1 to r.

This is not the optimal way to write the program for an 1802, but given a random clot of C code, the compiler has done a decent job. I have a few ideas to improve that snippet – like testing the storage for 0 instead of loading it and testing the register or combining the shift left with the copy but I don’t know that they would do that much. The net effect of the first pass optimization was to bring the execution time of the 8 queeens problem down by about 15% and the code size a bit more.

By the way, I’ve cleaned up he code for readability and the are mine. In particular, lines like
ld2 R11,(_up+14)R11 which i hope is a farly obvious 16 bit load from storage is actually generated as:
ld2 R11,’O’,R11,(_up+14)


From → Uncategorized

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: