Skip to content

Optimization Strategy

October 25, 2013

A lot of the improvements in the Rhinestone Compiler came from recoding support routines but the bulk of my effort went into optimizing the emitted 1802 code.   There are two places to improve the code: the compiler, and the peephole optimizer; and i have slowly developed an approach for doing the work.

The front end of the compiler reads the C code and generates trees of intermediate code which are converted to 1802 machine code by the back end.  An example of an intermediate code sequence would be:
lv=lv+1; (where lv is a local variable) generating the intermediate code:
ASGNI2(
ADDRLP(lv) ,
ADDI2 (
INDIRI2(ADDRLP(lv )),
CONSTI2(1)
))
Which says: Assign to the 2 byte storage area addressed as lv the sum of the contents of the storage at lv and the constant 1.

The back end has rules for each of the constructs, for example to load a a two byte constant into a register the rule is(simplified somewhat):
reg: CNSTI2   “ld2i register,value” where register and value are substituted in by the compiler
and the rule for adding two registers is something like
reg:ADDI2(reg,reg)   “alu2 reg1,reg1,reg2,add,adc”

So overall, the earliest version of the compiler handles the assignment above with something like
ld2i R11,1
ld2  R10,lv(which would be at an offset from the stack register)
alu2 R10,R10,R11,add,adc
st2  R10,lv
(where R10 and R11 are compiler assigned temporary registers)

The good news is that it’s pretty easy to understand what’s going on.  The bad news is that it’s a horrendous number of instructions.  Even if the local variable were already in a register, the compiler would generate 4 instructions to load the ‘1’ into a temporary register then something like 14 more to add the two registers which involves storing one on the stack and using the add, adc sequence.

Fortunately, LCC lets you have multiple rules for emitting the same sequence and chooses between them based on costs you assign.  In this case, I was able to put in a special case rule:
ADDI2(reg,1) which generates “inc reg”.  So the worst case for the above becomes:
ld2  R10,lv(which would be an offset from the stack register)
inc R10
st2  R10,lv

Granted this is still a bit painful but if lv is used much it’s probably in a register and the code is still very clear.

To get the improvement, you need a new rule that is a special case of ADDI2, like:
reg:ADDI2(reg,1)  “inc reg”

So that’s a classic best case rule addition.  You need the equivalent for SUBI2 to generate dec’s but it’s pretty high payback. Then you need rules for ADDU2(unsigned), ADDP2(pointer) and the equivalent subtractions.  Still, that’s only 6 simple rule additions and the results are gratifying.

Where things get hairy is if you to go a step further and improve on the loads and stores surrounding the inc.

more on that later…

Advertisements

From → Uncategorized

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: