<< ctxmodel.net

[revision 2, 2008-06-16]

> compared to standard speed = 40:
> - after hit: prob = prob + (1 - prob) / 40 = 0.5125
> - after miss: prob = prob - prob / 40 = 0.4875
> speed controls how much of the error should be used for update, it isn't a
> direct update value itself.

Well, this calls for some comments...
I think that this prediction/error interpretation is bad for theory, because its inherently associated with exact prediction of next value in series, LMC metric, and such - while the only metric a compressor needs is a Shannon metric.
So, there're two interpretations that I generally use, depending on a case:

1. Combinatoric

For binary sequence with only added information that there're n0 zeroes and n1 ones, there're (n0+n1)!/n0!n1! equiprobable cases, and its easily provable that a counter like this would be optimal:
p1 = n1/(n0+n1); // static

if( y ) c1++; else c0++; p = c1/(c1+c0); // dynamic
Then, for nonstationary data with local density deviations, a weighted Shannon metric can be used, and so an interval range sequence for optimal encoding would be
if( y ) c1=c1*w+1,c0=c0*w; else c0=c0*w+1,c1=c1*w; 
p = c1/(c1+c0);
And as a speed optimization, it can be rewritten to avoid division:
p = c1/(c1+c0); 
if( y ) p'=(c1*w+1)/(c0*w+c1*w+1); else p'=(c1*w+1)/(c0*w+c1*w+1); 
if( y ) p'=(c1+1/w)/(c0+c1+1/w); else p'=c1/(c0+c1+1/w); 
if( y ) p'=p*(c0+c1)/(c0+c1+1/w)+(1/w)/(c0+c1+1/w); 
   else p'=p*(c0+c1)/(c0+c1+1/w); 
T = c0+c1 
T' = c0*w+c1*w+1 = (T+1/w)/(1/w) = T*w+1 
wr = 1-(c0+c1)/(c0+c1+1/w) = (1/w)/(c0+c1+1/w) = 1/T' 
if( y ) p'=p*(1-wr)+wr; else p'=p*(1-wr); 
p = p'; T=T' 
Then, to compensate for supposed completely random bits (not correlated with recent history), the prediction of counter can be further improved by static mixing with a fixed "order-(-1)" distribution:
P = p*(1-mw)+0.5*mw
which means that supposedly counter's prediction "p" would be applicable for current symbol/bit with probability (1-mw) and its a completely random bit otherwise. (Of course, any distribution can be used instead of {0.5,0.5}, but lets skip it for simplicity).

As experience shows that this suggestion is commonly useful, the mix can be merged with actual counter:
P = p*(1-mw)+0.5*mw 
T = T*w+1; wr = 1/T; 
if( y ) (p*=(1-wr))+=wr; else p*=(1-wr); 
P' = ((p*(1-wr)+y*wr)*(1-mw)+0.5*mw) = 
   = p*(1-wr)*(1-mw)+0.5*mw+y*wr*(1-mw) = 
   = p*((1-mw)-wr)+0.5*mw+y*wr*(1-mw) = 
   = ((P-0.5*mw)/(1-mw))*(1-wr)*(1-mw)+0.5*mw+y*wr*(1-mw) = 
   = (P-0.5*mw)*(1-wr)+0.5*mw+y*wr*(1-mw) = 
   = P*(1-wr) - 0.5*mw + 0.5*mw*wr + 0.5*mw + y*wr*(1-mw) = 
   = P*(1-wr) + 0.5*mw*wr + y*wr*(1-mw) = 
   = P*(1-wr) + (y*(1-mw)+0.5*mw)*wr 
// y*(1-mw)+0.5*mw == y?1-0.5*mw:0.5*mw 
T = T*w+1; wr = 1/T; omp = 0.5*mw*wr; 
P = P*(1-wr)+(y?wr-omp:omp); 

And that's exactly what I mostly use.

The point here is that such a counter is a generalization of combinatorical counter and so is able to precisely simulate the implementation with actual numbers of occurences. And combinatorical counter is good because it allows to efficiently encode even a completely non-redundant bit sequence. Its only redundancy is information contained in parameter values (which include initial values for P and T).

2. Bit sequence hash

When dynamic probability mapping is used (aka SSE etc), a hash interpretation has a lot of sense too. Well, its obvious that in fact a "probability" updated like explained above is a convolution of bit sequence and so it highly correlates with current suffix bits, especially when a fast "update speed" is used (and mostly it is fast enough).

Thus, because of indirect use, a counter-context like that has a completely different optimization metric comparing to combinatoric counter and so may (and should actually) be implemented with different properties.

For example, the combination of a normal counter delayed by a few bits (that is, updated with bit[x-k] when bit[x] is modelled) and a static linear mapping ( a[bit[x-k..x-1]]*p+a[bit[x-k..x-1]]*(1-p) ) does show good results.

371699 // compressed file size with a single SSE array over normal counter array
370757 // counters delayed by 3 bits and a static mapping before SSE added

And I consider this improvement fairily major as there's no model structure change, no contexts or dynamic elements added, and parameters for the first result were fully optimized by bruteforce search (well, for second as well).

But still, this option is highly dependent on implementation details of the used dynamic mapping (SSE) and is not understood as well as "normal" counters. That is, in most general case of this approach counter can be replaced by static lookup table mapping the bit sequence in context to some precalculated hash values. But, forgetting about the actual size of this lookup table (~2^40 would be a common value), there's no known way for now to optimize the table contents except by direct testing the compression result with different settings.

> speed should be lower for probs like 0.01 or 0.99 because with high speed
> occasional negative bit in a equal bits run would ruin the statistics.
> ie. if we have sequence:
> 0000000000000000000000100000000000000000000000000
> then on sequence:
> 0000000000000000000000
> we should consistently lower the speed because if we encounter bit 1
> it shouldn't change statistics very much because the following bits are
> 0000000000000000000000000

A simple combinatoric counter handles this situation pretty well actually. With completely precise implementation it is able to map strings like ('0' x N1 . '1' . '0' x N2) directly to values from 0 to N-1=N1+N2 (with pre-known N).

Also, of course, sometimes counter tweaks without any theoretical foundation improve compression anyway. But then, if an origin of improvement can be analyzed, it is usually possible to explicitly support some special cases and leave the counter alone. The quoted example is exactly the case like that.
2013-08-01 11:50:39                 >
2014-11-15 09:07:54                 >
2014-11-26 18:39:56                 >
2015-01-11 17:14:52                 >

Write a comment: