Skip to content

Whoah, This Statement is Huge – More Compiler Output

March 4, 2018

Testing the SD card programs I was struck by how big the resulting 1802 code was – over 18K. Perusing the assembler listing file I note that the tinyfat routines were particularly big. One module that reads the next line of a text file clocks in at almost 3K! Looking at that module, I found a particularly ugly C statement that generates over 700 bytes of code and makes multiple calls to 32 bit arithmetic routines. It would be much shorter in assembler but i’m more interested in whether I can learn anything that helps me improve code generation generally.

If I look at the first 9 lines of the output, which i think correspond to (uint32_t)BS.fatCopies*(uint32_t)BS.sectorsPerFAT), I notice a bunch of extraneous copying of registers which might easily yield to combination rules although I have a number of combinations in the ruleset already i’ll have to try to track down why they don’t cover this.

**the code being generated is following the rule for reg:CVUI2(INDIRU1(addr)) followed by the rule for CVIU4(reg). I tried putting in a rule for reg:CVIU4(CVUI2(INDIRU1(addr))) which blew up the compiler and one for reg:CVIU4(INDIRU1(addr)) which just gets ignored. Like I say, printing the dags would be a good leg up.**

		sec=((uint32_t)BS.reservedSectors+
		((uint32_t)BS.fatCopies*(uint32_t)BS.sectorsPerFAT)+
		(((uint32_t)BS.rootDirectoryEntries*32)/512)+
		(((uint32_t)currFile.currentCluster-2)*(uint32_t)BS.sectorsPerCluster)+BS.hiddenSectors)+(((uint32_t)currFile.currentPos/512) % (uint32_t)BS.sectorsPerCluster);
;		sec=((uint32_t)BS.reservedSectors+
	ld1 R11,'D',(_BS+3),0
	zExt R11 ;CVUI2: widen unsigned char to signed int (zero extend)
	cpy2 RL8,R11
	sext4 RL8; CVIU4
	ld2 RL10,'D',(_BS+12),0
	zext4 RL10 ;CVUU4: widen unsigned int to unsigned long (zero extend)
	Ccall _mulu4
	cpy4 RL10,RL8; LOADU4(reg)
	st4 RL10,'O',sp,(40+1); ASGNU4
	ld1 R9,'D',(_BS),0
	zExt R9 ;CVUI2: widen unsigned char to signed int (zero extend)
	cpy2 RL8,R9
	sext4 RL8; CVIU4
	st4 RL8,'O',sp,(36+1); ASGNU4
	ld2 RL10,'D',(_currFile+13),0
	zext4 RL10 ;CVUU4: widen unsigned int to unsigned long (zero extend)
	ldI4 RL8,2 ;loading a long unsigned constant
	alu4 RL8,RL10,RL8,sm,smb
	ld4 RL10,'O',sp,(36+1);reg:  INDIRU4(addr)
	Ccall _mulu4
	cpy4 RL10,RL8; LOADU4(reg)
	st4 RL10,'O',sp,(32+1); ASGNU4
	ld4 RL8,'D',(_currFile+20),0;reg:  INDIRU4(addr)
	shrU4I RL8,9
	ld4 RL10,'O',sp,(36+1);reg:  INDIRU4(addr)
	Ccall _modu4
	cpy4 RL10,RL8; LOADU4(reg)
	st4 RL10,'O',sp,(28+1); ASGNU4
	ld2 RL8,'D',(_BS+1),0
	zext4 RL8 ;CVUU4: widen unsigned int to unsigned long (zero extend)
	ld4 RL10,'O',sp,(40+1);reg:  INDIRU4(addr)
	alu4 RL10,RL8,RL10,add,adc
	ld2 RL8,'D',(_BS+4),0
	zext4 RL8 ;CVUU4: widen unsigned int to unsigned long (zero extend)
	shl4I RL8,5; LSHU4(reg,con)
	shrU4I RL8,9
	alu4 RL10,RL10,RL8,add,adc
	ld4 RL8,'O',sp,(32+1);reg:  INDIRU4(addr)
	alu4 RL10,RL10,RL8,add,adc
	ld4 RL8,'D',(_BS+16),0;reg:  INDIRU4(addr)
	alu4 RL10,RL10,RL8,add,adc
	ld4 RL8,'O',sp,(28+1);reg:  INDIRU4(addr)
	alu4 RL10,RL10,RL8,add,adc
	st4 RL10,'O',sp,(44+1); ASGNU4

One thing that would be a big leg up on this stuff would be to write something that would dump the DAG structures the compiler uses so i could see what the darned thing is looking for rather than parsing backwards from the output – it would be the equivalent of using a debugger instead of print statements to debug a program. Googling around i note that LCC has a symbolic target which, among other things, prints the DAGs!

Shown below is a smplified version of that big statement with the symbolic output. It sure looks like CVIU4(CVUI2(INDIRU1(addr))) to me.

	unsigned char fatCopies;
	unsigned int sectorsPerFAT;
	unsigned long dummy;
	dummy=(unsigned long)fatCopies * (unsigned long)sectorsPerFAT;
***************TARGET=symbolic OUTPUT FOLLOWS****************
;	dummy=(unsigned long)fatCopies * (unsigned long)sectorsPerFAT;
testsize.c:5.1:
 2. ADDRLP2 dummy
7. ADDRLP2 fatCopies
6. INDIRU1 #7
5. CVUI2 #6 1
4. CVIU4 #5 2
10. ADDRLP2 sectorsPerFAT
9. INDIRU2 #10
8. CVUU4 #9 2
3. MULU4 #4 #8
1' ASGNU4 #2 #3 4 1

Yep. Trying a more careful version of the combination rule does work, the rule
gives what you see below which is still not great but i have my hands on the controls and i’m steering in the right direction! The real answer here is to see what actually has to be done as long arithmetic and break up that monster equation. It’s served its purpose though in pointing up some bad patterns and how to go after them. Also, yay for
reg: CVIU4(CVUI2(INDIRU1(addr))) “\tld1 R%c,%0\n\tzExt R%c\n\tzExt4 R%c; CVIU4(INDIRU1(addr)):*HOORAY*widen unsigned char to long\n” 1

;	dummy=(unsigned long)fatCopies * (unsigned long)sectorsPerFAT;
	ld1 RL8,'O',sp,(13+1)
	zExt RL8
	zExt4 RL8; CVIU4(INDIRU1(addr)):*HOORAY*widen unsigned char to long

 

Advertisements

From → Uncategorized

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

%d bloggers like this: