Over on IEEE Spectrum there’s an article about Lee Hart’s modern version of the Cosmac Elf – this is the CPU card that’s underneath the Olduino. Worth a read and I hope the author & photographer don’t mind my using their image below – the photos are very well done.
Photo by Randi Klett, Aricle by Stephen Cass
So I still have the problem with flaky response from the bagels server on this LAN. I downloaded the Fiddler web traffic tool and looked at the details of the session. Fiddler inserts itself as a proxy between your browser and the server so you can see and time the traffic. The most striking thing I see is that fiddler reports an extremely long TCP/IP connect time 3-5 seconds is common. I was able to match up one fiddler-tracked transaction with the debug log of the server and the times match exactly. It’s just taking an unreasonable length of time for the router to make the connection.
Fiddler also points to a flaw in my http where I don’t include “content length” in my http header (mostly because it’s variable and it would be difficult to track). I will try the simple server where the data length is fixed and see if it’s any different.
Frankly though, I think the problem is just this router/lan.
Update: I tried running the server direct connected to the pc as 169.254.180.2 and it’s flawless which supports my theory. I tried running the simplest server through the LAN and it failed much less frequently but still failed. I think I just have to wait til i’m home with a router I know better and trouble shoot some more.
I changed the two things that seemed most promising based on the arduino logic analyzer trace(gateway address and interrupt register checking) but I still have a bad problem. The game works pretty well with the safari browser on the mac but stalls badly with chrome and internet explorer on the windows machine.
I have a serial debug window open to the olduino and, sometimes, I can see a delay of several seconds between the time i click the browser button and when the olduino sees the data. More often, I see the data show up quickly at the olduino but there is a LONG delay between the olduino sending the response and the browser displaying it. Occasionally the browser says it can’t display the page. It feels like something wrong at the HTTP/HTML level where safari/os x has a looser rule than the windows browsers.
I also find the windows machine quite flaky on this lan, sometimes stalling or failing to find regular web sites. That doesn’t point anywhere useful though.
I did see an error in the section that sends the congratulations message: it was not sending the final html trailer which has the and newlines that define the end of the http transmission. Fixing that didn’t help anything though.
One observation that is probably not related to the problem is that sending an item is done as a sequence: get the buffer pointer, move the data, update the buffer pointer, issue the send command and check that it’s done. When I issue a response with a lot of data items in it, I can see the browser painting the screen top to bottom. I could probably defer issuing the send command until I have the whole page moved to the buffer and it would speed up the rendering if nothing else.
I spotted this on Hack-A-Day and I’m quite excited. It’s a bit the same idea as my Olduino but based on the Intel 8086 and with way more sophisticated hardware including flash memory, onboard ADC, and hardware SPI. The author, Matt Millan describes it as a non-arduino but I think the 80Duino is a worthy member of the Olduino tribe.
The web site, linked above has a good description of the hardware and software and some excellent development photos. I had to take my geek hat off to matt when i saw his logic analyzer:
It’s a bit big though. I think it could kick the 1802 Olduino’s butt on a benchmark but the 1802 can probably run rings around it on the carbot track. http://www.youtube.com/watch?v=X99z4_e0aCE
Matt is going to try an ethernet web server test this weekend and i’m hoping for more collaborative efforts down the line.
So, I re-captured an initialization and receive/send sequence and continued my horrible python hacking. The python code is hidden down at the bottom of this post along with a description of the data streams.
The wiznet initialization sequence is different from mine because it sets all the transmit and receive buffer sizes to 2K. This is the default so i don’t do it. I have also bodged it to deliberately set the gateway ip address to something on the wrong subnet in hopes this was the problem but i don’t think it was.
0 001E 0C ['0x02'] 0x00 the first 16 commands 1 001F 0C ['0x02'] 0x00 set the tx & rx buffer sizes of each socket to 2K 2 001E 2C ['0x02'] 0x00 3 001F 2C ['0x02'] 0x00 4 001E 4C ['0x02'] 0x00 5 001F 4C ['0x02'] 0x00 6 001E 6C ['0x02'] 0x00 7 001F 6C ['0x02'] 0x00 8 001E 8C ['0x02'] 0x00 9 001F 8C ['0x02'] 0x00 10 001E AC ['0x02'] 0x00 11 001F AC ['0x02'] 0x00 12 001E CC ['0x02'] 0x00 13 001F CC ['0x02'] 0x00 14 001E EC ['0x02'] 0x00 15 001F EC ['0x02'] 0x00 16 0009 04 ['0xDE', '0xAD', '0xBE', '0xEF', '0xFE', '0xED'] set MAC 17 000F 04 ['0xC0', '0xA8', '0x01', '0xB1'] 0x00 set my IP 18 0001 04 ['0x0A', '0x00', '0x00', '0x01'] 0x00 set gateway 19 0005 04 ['0xFF', '0xFF', '0xFF', '0x00'] 0x00 set subnet mask
The socket initialization sequence is different only in that it sets the interrupt register to FF. I don’t know if this matters but i’ll try it.
20 0003 08 ['0x00'] 0x00 status socket 0 is closed 21 0001 0C ['0x10'] 0x00 command close socket 0 22 0001 08 ['0x00'] 0x00 check it's complete 23 0002 0C ['0xFF'] 0x00 set socket 0 interrupt to FF 24 0000 0C ['0x01'] 0x00 set socket 0 mode to tcp/ip 25 0004 0C ['0x00'] 0x00 set socket 0 port to 0050 26 0005 0C ['0x50'] 0x00 27 0001 0C ['0x01'] 0x00 command open socket 0 28 0001 08 ['0x00'] 0x00 check it's complete 29 0003 08 ['0x00'] 0x13 socket is open in tcp/ip mode 30 0001 0C ['0x02'] 0x00 command listen on socket 0 31 0001 08 ['0x00'] 0x00 check it's complete
We then see a long period of polling the socket 0 status register waiting for it to change from 0x14 to 0x17 showing a session has been established.
Once it goes to 0x17, it initializes socket 1(!) and sets it to listening. I don’t know whether this is important but it would be a last resort to try coding it.
status listening on 0 1668 0003 08 ['0x00'] 0x17 reads the status as session established 3X 1669 0003 08 ['0x00'] 0x17 1670 0003 08 ['0x00'] 0x17 1671 0003 28 ['0x00'] 0x00 starts listening on socket 1 1672 0001 2C ['0x10'] 0x00 1673 0001 28 ['0x00'] 0x00 1674 0002 2C ['0xFF'] 0x00 1675 0000 2C ['0x01'] 0x00 1676 0004 2C ['0x00'] 0x00 1677 0005 2C ['0x50'] 0x00 1678 0001 2C ['0x01'] 0x00 1679 0001 28 ['0x00'] 0x00 1680 0003 28 ['0x00'] 0x13 1681 0001 2C ['0x02'] 0x00 1682 0001 28 ['0x00'] 0x00
Now the horror. We have received almost 350 bytes(0x015f). The code reads the length and the w5500 would happily slurp us the whole thing in one go but instead we painfully read one byte: check that we’re connected, read the buffer address, read one byte, update the buffer address, tell the chip we’ve updated it, check that the command is done. It then does this AGAIN for each of the received bytes! I don’t think this is any sort of deliberate belt-and-suspenders programming, it’s just the result of how the C++ code was written. Note that we don’t care about any of this, it’s just looking to see the blank line at the end of the data.
1683 0003 08 ['0x00'] 0x17 reads socket 0 status again 1684 0026 08 ['0x00'] 0x01 reads rx data length 2X 1685 0027 08 ['0x00'] 0x5F 1686 0026 08 ['0x00'] 0x01 1687 0027 08 ['0x00'] 0x5F 1688 0003 08 ['0x00'] 0x17 reads socket 0 status again 1689 0026 08 ['0x00'] 0x01 reads rx data length 4X 1690 0027 08 ['0x00'] 0x5F 1691 0026 08 ['0x00'] 0x01 1692 0027 08 ['0x00'] 0x5F 1693 0026 08 ['0x00'] 0x01 1694 0027 08 ['0x00'] 0x5F 1695 0026 08 ['0x00'] 0x01 1696 0027 08 ['0x00'] 0x5F 1697 0028 08 ['0x00'] 0x00 reads rx data pointer 1698 0029 08 ['0x00'] 0x00 RXD 0 47 reads one byte of rx data ascii 47=G 1700 0028 0C ['0x00'] 0x00 writes the rx data pointer 1701 0029 0C ['0x01'] 0x00 1702 0001 0C ['0x40'] 0x00 tells the chip it has written the pointer 1703 0001 08 ['0x00'] 0x00 checks command completion 1704 0003 08 ['0x00'] 0x17 reads socket 0 status again 1705 0026 08 ['0x00'] 0x01 reads rx data length 4X(now shows 1 byte less) 1706 0027 08 ['0x00'] 0x5E 1707 0026 08 ['0x00'] 0x01 1708 0027 08 ['0x00'] 0x5E 1709 0026 08 ['0x00'] 0x01 1710 0027 08 ['0x00'] 0x5E 1711 0026 08 ['0x00'] 0x01 1712 0027 08 ['0x00'] 0x5E 1713 0028 08 ['0x00'] 0x00 reads rx data pointer 1714 0029 08 ['0x00'] 0x01 RXD 0 45 reads one byte of rx data ascii 45=E 1716 0028 0C ['0x00'] 0x00 1717 0029 0C ['0x02'] 0x00 1718 0001 0C ['0x40'] 0x00 1719 0001 08 ['0x00'] 0x00 ad frigging nauseam! ...
Once it has read the whole incoming data stream, it sends a response back. The only difference i can see with my code is that it reads the socket 0 interrupt register looking for the 0x10 bit(send OK) to be set then rewrites the register clearing the rest of the bits. I might jump on this as being interesting but, in fact, it only ever has to read/write the register once i.e. the bit is set right away so it may be moot -easy to try though, at least back in florida.
7308 0003 08 ['0x00'] 0x17 still connected 7309 0024 08 ['0x00'] 0x7B reads tx buf write pointer 7310 0025 08 ['0x00'] 0x04 7b04 7311 7B04 14 ['0x48', '0x54', '0x54', '0x50', '0x2F', '0x31', '0x2E', '0x30', '0x20', '0x32', '0x30', '0x30', '0x20', '0x4F', '0x4B', '0x0D', '0x0A', '0x43', '0x6F', '0x6E', '0x74', '0x65', '0x6E', '0x74', '0x2D', '0x54', '0x79', '0x70', '0x65', '0x3A', '0x20', '0x74', '0x65', '0x78', '0x74', '0x2F', '0x68', '0x74', '0x6D', '0x6C', '0x0D', '0x0A', '0x0D', '0x0A', '0x3C', '0x68', '0x74', '0x6D', '0x6C', '0x3E', '0x3C', '0x73', '0x70', '0x61', '0x6E', '0x20', '0x73', '0x74', '0x79', '0x6C', '0x65', '0x3D', '0x22', '0x63', '0x6F', '0x6C', '0x6F', '0x72', '0x3A', '0x23', '0x30', '0x30', '0x30', '0x30', '0x41', '0x30', '0x22', '0x3E', '0x0D', '0x0A', '0x3C', '0x63', '0x65', '0x6E', '0x74', '0x65', '0x72', '0x3E', '0x3C', '0x68', '0x31', '0x3E', '0x53', '0x69', '0x6D', '0x70', '0x6C', '0x65', '0x73', '0x74', '0x20', '0x53', '0x65', '0x72', '0x76', '0x65', '0x72', '0x20', '0x6F', '0x6E', '0x20', '0x57', '0x69', '0x7A', '0x6E', '0x65', '0x74', '0x20', '0x57', '0x35', '0x35', '0x30', '0x30', '0x3C', '0x2F', '0x68', '0x31', '0x3E', '0x3C', '0x2F', '0x63', '0x65', '0x6E', '0x74', '0x65', '0x72'] 0x00 ..writes hdr1 text.. 7312 0024 0C ['0x7B'] 0x00 updates tx write pointer 7313 0025 0C ['0x8C'] 0x00 7314 0001 0C ['0x20'] 0x00 issuse 'send' command 7315 0001 08 ['0x00'] 0x00 checks command complete 7316 0002 08 ['0x00'] 0x15 reads interrupt register 7317 0002 0C ['0x10'] 0x00 sets interrupt register to 0x10
The other interesting difference is that once it has sent all the data it issues the disconnect command and has to read the status several times before it clears. And it then starts polling on socket 1.
7332 0001 0C ['0x08'] 0x00 command disconnect 7333 0001 08 ['0x00'] 0x00 check complete 7334 0003 08 ['0x00'] 0x18 status 0x18 means socket is closing 7335 0003 08 ['0x00'] 0x18 7336 0003 08 ['0x00'] 0x18 7337 0003 08 ['0x00'] 0x00 now it's closed 7338 0003 08 ['0x00'] 0x00 status listening on 1 status listening on 1
So, things I can do:
- set the gateway ip address properly for the comcast network
- explicitly set the buffer sizes to 2k
- set the interrupt register to ff before i open the socket
- check and reset the interrupt register after the send
- explicitly wait for the socket to show closed after i issue the disconnect
- start a second socket listening when i am processing the first one. (this seems like too much work but…)
I’m going to read about the interrupt register and have a good look at the olduino logic trace and see if i see any other differences.
count=0 oldid='zz' misolist= mosilist= def emit2(opcode,addr): if opcode=='08' and addr=='0003' and misolist[-1]=='0x14': print 'status listening on 0' elif opcode=='28' and addr=='0003' and misolist[-1]=='0x14': print 'status listening on 1' elif opcode=='18' : print 'RXD 0 ',misolist[-1][2:] else: print oldid,addr,opcode,mosilist[3:],misolist[-1] def emit(): global oldid,misolist,mosilist if oldid!='zz': opcode=mosilist[2:] addr=mosilist[2:]+mosilist[2:] emit2(opcode,addr) misolist= mosilist= def accumulate(id,miso,mosi): global misolist,mosilist misolist.append(miso) mosilist.append(mosi) def handleoneline(oneline): global oldid parts=line.strip().split(',') #Time [s],Packet ID,MOSI,MISO id=parts miso=parts mosi=parts #print id,mosi,miso if id!=oldid: emit() oldid=id accumulate(id,miso,mosi) with open('/Users/bill/Desktop/arduinowronggateway.csv') as fp: global count for line in fp: count=count+1 #print line if count>1: handleoneline(line)
The base data from the logic analyzer is shown below. Each line contains a time stamp, a packet id(changes every time /SS is cycled), the contents of MOSI as output by the master(olduino), and MISO as output by the slave W5500. The python code assembles the data from each packet as:
id addr op [OO] ii where id is the packet id, addr is the first two bytes of MOSI data, op is the third, [OO..] is any remaining MOSI bytes and ii is the last byte of any MISO data.
addr is the register or data address being written to or read in the w5500, op is the opcode, OO is the data being written if any and ii is the data being read back. for example, the first packet gets formatted as:
0 001E 0C [‘0x02′] 0x00
where 0C says we’re writing to a register for socket 0, 001E says it’s the transmit buffer size and the data we’re writing is the 0x02 meaning 2K bytes. The interpretation comes from the w5500 data sheet.
Time [s],Packet ID,MOSI,MISO 6.49870304166667,0,0x00,0x01 6.498705625,0,0x1E,0x02 6.49870816666667,0,0x0C,0x03 6.49871075,0,0x02,0x00
Faced with the thousands of lines of logic analyzer trace, i needed a quick and dirty program to consolidate each transaction into a couple of lines with the mosi/miso payload strung out horizontally instead of line to line. Python should be the answer for this. It’s ubiquitous, it’s flexible, it usually does what you want or gives you a readable error message. The trouble is, i can never remember the syntax. It took me hours of dorking around to dredge up the following pathetic fragment which reads the comma separated values and prints the guts. In principle, it’s just a matter of accumulating miso and mosi data until the packet id changes then spitting out one line for each. By now though, I’ve lost all python momentum and loaded the data into excel instead. As it turns out, all i have is the polling transactions, i missed the transaction altogether. I do notice that the arduino code seems to be using socket 1 instead of 0 (read code is 28) instead of 08. Also note that MISO and MOSI are reversed because, of course, i had the wires reversed.
count=0 with open('/Users/bill/Desktop/arduinow5500.csv') as fp: for line in fp: count=count+1 parts=line.split(',') #Time [s],Packet ID,MOSI,MISO id=parts mosi=parts miso=parts print id,mosi,miso
Packet ID MOSI MISO 0 0x01 0x00 0 0x02 0x03 0 0x03 0x28 0 0x14 0x00 1 0x01 0x00 1 0x02 0x03 1 0x03 0x28 1 0x14 0x00
I brought the hardware that i needed to reproduce the bagels server problem on my home network but sadly, not the software. The code is on the windows computer in Florida. It would have been trivial for me to upload it to the repository but nooooh.
It was easy enough to copy the simplest server code from the blog post but that doesn’t exhibit much of the problem. I guess the important thing is to port the simple server code from olduino to arduino and compare logic analyzer traces. If I don’t see a difference at the spi level i’ll have to try to re-create the bagels server here – it wasn’t that hard as i recall.