Tree Adaptive Arithmetic Encoding

Arithmetic Encoding is widely applied in signal processing and compression. It begins with an ordered series of symbols and results in a compressed form that can recreate the origional. It can also be used to predict what symbol might appear next in an ongoing series.

In the example to the left, the series of symbols is represented by the colored blocks running along the left side of the screen. They can be anything, and I'll use letters instead for this explanation:

AABCDADCCABBACCA

From this it constructs a dictionary of unique symbols:

A,B,C,D

And assigns each a ratio based on how often that symbol occurs in the series.

A : 6/16 B : 3/16 C : 5/16 D : 2/16

Then the fun begins. A space is created by fractal subdivision - as you see on the left t.e bottom row is divided into segments with each segment as big as it's ratio. For the letters that might look like:

|A____________|B_____|C_________|D___|

This step is then repeated, but within each segment:

|A____B__C___D|A_BC_D|A__B_C_D|ABCD| |A_____________|B______|C_________|D___|

When read from bottom to top, this results in a space that covers every possible series of symbols, but with it's coverage biased towards the symbols most likely to appear (the ones with the biggest ratios.)

Now a single line running up the space (the white line in the picture) can encode the origional series. The process can be reversed to reconstruct the origional.

Live Demo with Sourcecode

Note: In the example on the left, you can see the precision run out on the last two symbols... Aww.