Context Modelling

<< ctxmodel.net

There were talks about counters as lossy compression of histories. Like in "On universal counters".
...However there's only the same statement though. But anyway, the point is that some simple logic applies to counters:

We want to derive a probability estimation from a context history
The best (most precise) way to do that is to keep the whole context history and run a model through it each time when probability estimation is necessary.
Context histories are redundant. so next reasonable step in such a case would be compressing these histories.
But then, even compressed, these histories can take a lot of memory, so, for practical use, lossy compression is necessary.
Lossy compression is where we discard the least usable information.

For example, this is lossy coding too:

  cstate[p0].s[0] = Init_FSM_States( x+1, (y+3)/4 );
  cstate[p0].s[1] = Init_FSM_States( (x+3)/4, y+1 );

here only 0/1 counts are taken into account (so order of bits in history is lost) and some bits are cut off each time but we can visualize the update of such counter by printing an "approximated history".
Like, lets take some string like 001101001 aka '9':

<>          <>     +0
<0>         <0>    +0
<00>        <00>   +1
<001>       <01>   +1
<0011>      <011>  +0
<00110>     <001>  +1
<001101>    <011>  +0
<0011010>   <001>  +0
<00110100>  <0001> +1
<001101001> <011>

Like this, if there's no mistake
(the approximated histories for the ccm counter quoted above)
Now, there're supposedly lots of ideas on how to improve this ;)
By doing the bit removal not purely ad hoc ;)

Well, here's some more explanation:

Context history and context model are unrelated things. We don't have to use the same model for history (lossy) updates and for probability estimation, which is well demonstrated by delayed counter implementations.
Because of the memory limit, we need to lose some information in history updates.
And the most reasonable approach seems to remove the least useful bit.
Like, to estimate the compressed size of some data in context of given bit history and its versions with various bits removed, and keep the version which has the best compression.
Also, in this case, the entropy of history itself can be considered too: we have to limit it with <24 bits (or lower - lookup tables would be too large otherwise).
Slowness of these entropy-based decisions doesn't matter at all. We can safely make it as slow as possible, and use the most advanced models to compress the history, because after effectively quantizing it to 24 bits with lossy constraint, we'd be able to generate a lookup table for all counter states in memory, and then further cluster them until reaching a reasonable number of states (eg. 8 or 16 bits).
The model for counter updates is completely unrelated to modelling of actual data though - its just a more precise approximation of a single history (comparing to an ad hoc counter). But actual models would have SSE etc anyway, and likely there's no need to take that into account.

Of course, state clustering stage is not really necessary but plain lossy entropy coding of histories would produce imprecise number of states then - something like 1639 unique states with a 12 bit entropy limit, which would be usable too, but not exactly efficient.

Anyway, its likely that some contexts deserve more states than one can fit into a byte.

2013-08-13 11:46:49                 >