A discussion on protocols

One of the first things one needs to do when emulating part of a client-server system is to determine the protocol the various applications use to talk to each other. In the modern world, there are many ways to do this. Json, xml, protocol-buffers, many times served over http, to avoid over active corporation firewalls.

Back in the last century, there wasn't the depth of knowledge, or variety of ready made protocols, to use. Bandwidth was tiny, typically 14.4Kbps; that's less than one-onethousandth of that available on a 16Mbps DSL line. Even a large corporation might only have a 64Kbps fixed connection. It took an appreciable amount of time just to send a couple of hundred bytes. 

So, of course, every byte counted when designing a protocol. You didn't go text based, as why take five bytes to send "32768" when you can send it in binary as two. And no need for labels, tags, etc., we're expecting a number, send just the number. 

Now, why not use compression? Well remember the computers of the time were still very slow, compared to today. 66MHz for a typical 486 cpu, maybe 100MHz in a pentium.  Again, today you might see 1.4GHz in a typical laptop, with multi cores, so dozens of times faster. Decompression is slow, compression much more so, and it introduces lag because you need to wait for enough data to appear to be worth compressing. So apart from some basic RLE we don't see it.  (All WorldsAway manged was to flag long sequences of zero bytes, and replace with a count,and it only bothered doing that in changing locale, when there was a lot to send.) It was simple but effective. 

So,WorldsAway. Logging the traffic between client and server fairly quickly revealed the format. Here's the very first packet the client sends.
2 127.0.0.1 -> 192.240.15.77  at 02/03/98 21:31:18
    000A000B 0000011C 31323334 35362E37    *........123456.7*
    38393040 636F6D70 75736572 76652E63    *890@compuserve.c*
    6F6D0000 00000000 00000000 00000000    *om..............*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 52445952 544A594D    *........RDYRTJYM*
    59504B57 47494500 34C815E6 00000002    *YPKWGIE.4È.æ....*
    01020000
As you can see, there are two 16bit numbers, 000A and 000B and a 32bit number 0000011C, which handily equates to the length of the remaining data.  Which includes the login name and password! I have changed the actual values to protect the unwary.

Here's the reply

3 192.240.15.77 -> 127.0.0.1  at 02/03/98 21:31:18
    000A                                   *..              *
4 192.240.15.77 -> 127.0.0.1  at 02/03/98 21:31:18
    000A                                   *..              *
5 192.240.15.77 -> 127.0.0.1  at 02/03/98 21:31:18
    00000010 0000003C 000004B0 0000012C    *.......<...°...,*
    00000E10                               *....            *
It's broken up on the 0A bytes, which is a linefeed in ascii, and often used as the end of a line on *nix based systems.  But you can see it still holds true for 000A, 000A, length, data.

Moving on..
6 192.240.15.77 -> 127.0.0.1  at 02/03/98 21:31:19
    000A                                   *..              *
7 192.240.15.77 -> 127.0.0.1  at 02/03/98 21:31:19
    00100000 00080000 00000000 0000000A    *................*
8 192.240.15.77 -> 127.0.0.1  at 02/03/98 21:31:19
    00120000 00010000 0A                   *.........       *
9 192.240.15.77 -> 127.0.0.1  at 02/03/98 21:31:19
    000C0000 01140000 00010000 00020000    *................*
    00020000 00000000 XXXXXXXX XXXXXXXX    *........AvatarNa*
    XXXX0000 00000000 00000000 00000000    *me..............*
    00000000 00000000 XXXXXXXX XXXXXXXX    *........AvatarNa*    
    XXXX0000 00000000 00000000 00000000    *me.........*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 0001                 *..........      *
We've got three packets merged here, so reformatted you can see -

000A 0010 00000008 0000000000000000 
000A 0012 00000001 00
000A 000C 00000114 <lots of data> 
And the <lots of data> is the list of avatars for the account... Sorry, censored again. The client then offers:
10 127.0.0.1 -> 192.240.15.77  at 02/03/98 21:31:27
    000A000D 00000048 XXXXXXXX XXXXXXXX    *.......HAvatarNa*
    XXXX0000 00000000 00000000 00000000    *me..............*
    00000000 00000000 FFFFFFFF 00000000    *........ÿÿÿÿ....*
    00000000 00000000 00000000 00000000    *................*
    00000000 00000000 00000000 00000000    *................*
                                           *                *
Picking the avatar to use ..

It's pretty clear that the second number is a command code. The variable-length data is formed from fixed length fields that we can pick information out of.

Next time - writing a server that talks this protocol. ..

Comments

Popular Posts