Some definite Improvement
The code looks awful to me now but it’s definitely noticeably quicker. I shoehorned in Charles’s idea of a master loop counter/local loop counter to cut down the number of times the inner loops iterate.
In my test harness, the time to convert 1000 4 digit numbers went from 5.6 seconds down to 4.2 seconds.
For smaller numbers the effect may be more noticeable. Yep, the time went down to 3.3 seconds. Factoring the optimized double dabble into the itoa wrapper, the time to prep a 3 digit number for printing went down from 8.9ms to 3.9ms.
That said, the code has become really yucky so unless I can simplify it dramatically i don’t think i’ll put it into production. Having to modify it in a year or so would be a nightmare. The interventions in the code are all tagged with ;mlc
It’s been fun though, so thanks to Charles for the idea.
_dubdab16fo: ;experimental binay-ascii conversion using the double-dabble algorithm for 16 bits ;thanks to Charles Richmond for the suggestion and code ;interger is passed in r12 ;buffer pointer is passed in r13 ;a pointer to the 1st non-zero byte in the buffer is passed back in r15 ;r8-11 are used as temps ;r8 is the working pointer ;r15.0 is bit count(32) and the return value register ;r9.0 is digit count ;15-03-06 first optimization prep - need to run both shift and pre-check backwards. cpy2 r8,r13 ;buffer address pushr r7 ;make r7 avail as master loop counter;mlc ldi 1 ;mlc plo 7 ;initialize master loop counter to 1;mlc ldi 6 ;digit count+1 for trailing 0 plo r9 $$clrlp: ;clear the passed buffer ldi 0 str r8 ;clear a byte inc r8 dec r9 glo r9 ;check the count bnz $$clrlp ;back for more dec r8 ;back off to terminating 0 cpy2 r14,r8 ;save end location ldi 16 ;bit count plo r15 ;now i'm going to spin off any leading 0's in the binary number $$cktop: ghi r12 ;get the top bit of the number shl ;check for a 1 bdf $$bitloop ;move on if we have one shl2 r12 ;shift the input number dec r15 ;reduce the number of times to shift glo r15 bnz $$cktop ; inc r15 ;our whole number was 0 but force at least one pass $$bitloop: glo r7 ;mlc shr ;mlc shr ;mlc plo r9 cpy2 r8,r14 ;point past the units digit $$dcklp: glo r9 ;mlc bz $$pastpc ;mlc dec r8 ldn r8 ;pick up a digit smi 5 ;see if it's greater than 4 bnf $$dnoadd ;if not, bypass add adi 0x08 ;add the 5 black and 3 more str r8 ;put it back $$dnoadd: dec r9 ;decrement digit count br $$dcklp ;and back for next digit $$pastpc: bnf $$noincmlc;mlc -- this is an attempt to increment the mlc only if we pre-correct the most significant digit inc r7;mlc $$noincmlc: ;mlc inc r7 ;mlc --count the shift but the shift is moved to just before the bcd shift loop glo r7 ;mlc adi 3 ;mlc shr ;mlc shr ;mlc plo r9 shl2 r12 ;shift the input number cpy2 r8,r14 ;point r8 just past the units location - ready to walk back $$dshlp: dec r8 ;walk back from 0's position ldn r8 ;get the digit back shlc ;continue the shift phi r15 ;save it for the carry test ani 0x0f ;clear the 10 bit str r8 ;put the digit back ghi r15 ;now test for carry smi 0x10 ; this will make df 1 if the 10 bit is set dec r9 ;decrement the digit count glo r9 bnz $$dshlp ;back for more if needed dec r15 glo r15 bnz $$bitloop cpy2 r15,r13 ;save the starting location of the number ldi 5 ;digit count again plo r9 cpy2 r8,r13 ;point to 1st digit $$upnxt: ldn r8 ;get digit ori 0x30 ;make ascii str r8 ;put it back inc r8 ;next digit dec r9 ;counter glo r9 bnz $$upnxt ;upgrade all spots ldi 4 ;max number of 0's to skip plo r9 ;number of leading 0's to skip $$cknext: ldn r15 ;check digit smi 0x30 ;for '0' bnz $$done inc r15 ;next digit dec r9 ;reduce count glo r9 bnz $$cknext $$done: popr r7;mlc cretn
***********3 digit numbers******** 17:01:30.198> begin itoa 17:01:39.030> 17:01:39.030> done itoa ******original itoa in C took 8.8 ms/number 17:01:39.030> begin itoadd 17:01:42.967> 17:01:42.967> done itoadd ****** itoa with optimized double dabble took 3.9 ms/number 17:01:43.015> begin dd16 17:01:47.705> 17:01:47.705> done dd16 baseline double dabble 4.7 ms per conversion 17:01:47.705> begin dd16f 17:01:52.701> 17:01:52.701> done dd16f synchronized loop double babble went to 5 ms/conversion 17:01:52.701> begin dd16fo 17:01:56.003> 17:01:56.003> done dd16fo final double dabble using master loop counter took 3.3ms/conversion