<< ctxmodel.net
[revision 2, 2008-06-16]
> compared to standard speed = 40:
> - after hit: prob = prob + (1 - prob) / 40 = 0.5125 > - after miss: prob = prob - prob / 40 = 0.4875 > speed controls how much of the error should be used for update, it isn't a > direct update value itself. Well, this calls for some comments... I think that this prediction/error interpretation is bad for theory, because its inherently associated with exact prediction of next value in series, LMC metric, and such - while the only metric a compressor needs is a Shannon metric. So, there're two interpretations that I generally use, depending on a case: ## 1. CombinatoricFor binary sequence with only added information that there're n0 zeroes and n1 ones,
there're (n0+n1)!/n0!n1! equiprobable cases, and its easily provable that a counter
like this would be optimal:
p1 = n1/(n0+n1); // static if( y ) c1++; else c0++; p = c1/(c1+c0); // dynamicThen, for nonstationary data with local density deviations, a weighted Shannon metric can be used, and so an interval range sequence for optimal encoding would be if( y ) c1=c1*w+1,c0=c0*w; else c0=c0*w+1,c1=c1*w; p = c1/(c1+c0);And as a speed optimization, it can be rewritten to avoid division: p = c1/(c1+c0); if( y ) p'=(c1*w+1)/(c0*w+c1*w+1); else p'=(c1*w+1)/(c0*w+c1*w+1); --- if( y ) p'=(c1+1/w)/(c0+c1+1/w); else p'=c1/(c0+c1+1/w); --- if( y ) p'=p*(c0+c1)/(c0+c1+1/w)+(1/w)/(c0+c1+1/w); else p'=p*(c0+c1)/(c0+c1+1/w); --- T = c0+c1 T' = c0*w+c1*w+1 = (T+1/w)/(1/w) = T*w+1 wr = 1-(c0+c1)/(c0+c1+1/w) = (1/w)/(c0+c1+1/w) = 1/T' if( y ) p'=p*(1-wr)+wr; else p'=p*(1-wr); p = p'; T=T'Then, to compensate for supposed completely random bits (not correlated with recent history), the prediction of counter can be further improved by static mixing with a fixed "order-(-1)" distribution: P = p*(1-mw)+0.5*mwwhich means that supposedly counter's prediction " p"
would be applicable for current symbol/bit with
probability (1-mw) and its a completely random bit
otherwise. (Of course, any distribution can be used
instead of {0.5,0.5}, but lets skip it for simplicity).
As experience shows that this suggestion is commonly useful, the mix can be merged with actual counter: P = p*(1-mw)+0.5*mw T = T*w+1; wr = 1/T; if( y ) (p*=(1-wr))+=wr; else p*=(1-wr); --- P' = ((p*(1-wr)+y*wr)*(1-mw)+0.5*mw) = = p*(1-wr)*(1-mw)+0.5*mw+y*wr*(1-mw) = = p*((1-mw)-wr)+0.5*mw+y*wr*(1-mw) = = ((P-0.5*mw)/(1-mw))*(1-wr)*(1-mw)+0.5*mw+y*wr*(1-mw) = = (P-0.5*mw)*(1-wr)+0.5*mw+y*wr*(1-mw) = = P*(1-wr) - 0.5*mw + 0.5*mw*wr + 0.5*mw + y*wr*(1-mw) = = P*(1-wr) + 0.5*mw*wr + y*wr*(1-mw) = = P*(1-wr) + (y*(1-mw)+0.5*mw)*wr // y*(1-mw)+0.5*mw == y?1-0.5*mw:0.5*mw --- T = T*w+1; wr = 1/T; omp = 0.5*mw*wr; P = P*(1-wr)+(y?wr-omp:omp);And that's exactly what I mostly use. The point here is that such a counter is a generalization of combinatorical counter and so is able to precisely simulate the implementation with actual numbers of occurences. And combinatorical counter is good because it allows to efficiently encode even a completely non-redundant bit sequence. Its only redundancy is information contained in parameter values (which include initial values for P and T).
## 2. Bit sequence hashWhen dynamic probability mapping is used (aka SSE etc), a hash interpretation has a lot of sense too. Well, its obvious that in fact a "probability" updated like explained above is a convolution of bit sequence and so it highly correlates with current suffix bits, especially when a fast "update speed" is used (and mostly it is fast enough). Thus, because of indirect use, a counter-context like that has a completely different optimization metric comparing to combinatoric counter and so may (and should actually) be implemented with different properties. For example, the combination of a normal counter delayed by a few bits (that is, updated with bit[x-k] when bit[x] is modelled)
and a static linear mapping ( a[bit[x-k..x-1]]*p+a[bit[x-k..x-1]]*(1-p) )
does show good results.
371699 // compressed file size with a single SSE array over normal counter array 370757 // counters delayed by 3 bits and a static mapping before SSE added And I consider this improvement fairily major as there's no model structure change, no contexts or dynamic elements added, and parameters for the first result were fully optimized by bruteforce search (well, for second as well). But still, this option is highly dependent on implementation details of the used dynamic mapping (SSE) and is not understood as well as "normal" counters. That is, in most general case of this approach counter can be replaced by static lookup table mapping the bit sequence in context to some precalculated hash values. But, forgetting about the actual size of this lookup table (~ 2^40 would be a common value), there's no known way for now to optimize
the table contents except by direct testing the compression result with
different settings.
> speed should be lower for probs like 0.01 or 0.99 because with high speed
> occasional negative bit in a equal bits run would ruin the statistics. > ie. if we have sequence: > 0000000000000000000000100000000000000000000000000 > then on sequence: > 0000000000000000000000 > we should consistently lower the speed because if we encounter bit 1 > it shouldn't change statistics very much because the following bits are > 0000000000000000000000000 A simple combinatoric counter handles this situation pretty well actually. With completely precise implementation it is able to map strings like ( '0' x N1 . '1' . '0' x N2) directly to values from 0 to N-1=N1+N2
(with pre-known N).
Also, of course, sometimes counter tweaks without any theoretical foundation improve compression anyway. But then, if an origin of improvement can be analyzed, it is usually possible to explicitly support some special cases and leave the counter alone. The quoted example is exactly the case like that. 2013-08-01 11:50:39 >2014-11-15 09:07:54 >2014-11-26 18:39:56 >2015-01-11 17:14:52 > |