Context Modelling

<< ctxmodel.net

> Any other points for discussion

Well, I'd like to discuss many things... eg. what about entropy-coded contexts? That is, how contexts can have internal correlations and uneven distributions, and how to avoid that.

> especially about SSE?

SSE is magical stuff, so I can discuss it forever ;)

Eg. it works due to a index duality, similar to the wave-particle one in physics. That is, the index (probability) you feed into SSE is a relative frequency of event occurences, and a context (history hash) at the same time. And SSE uses the first interpretation for merging the statistics of similar contexts and general stabilization (initialization as a direct mapping guarantees that secondary estimation is never significantly worse than index probability), and still easily adapts to the cases where context interpretation is preferred.

Of course, that's concerning my own SSE, which I'm trying and failing to improve during last 5 years, so believe me, I know what I'm talking about. There're ugly things like SSE coefficients getting out of [0..1] interval, and additional coefficients for dynamic update speed, and complicated update formula in integer version, but it gets worse if I'm trying to change anything - and my conclusion after all these years of analysis is quoted above.

> I think the photon-wave-dualism is an extremer example.

And I think that similarity is extreme, considering statistical nature of quantum mechanics.

As to "how" - its simple. These two interpretations do simultaneously apply to the same SSE index probability, right? Counter is supposed to contain a relative frequency by definition and its value has a high correlation with last bits in context history, right?

Then, the data in different context instances is supposed to have different properties (or there'd be no sense in using this context at all), So there're context instances without significant correlations in the history, and simple linear mapping (even static) is good for them as a compensation for prediction imprecision. And there're some which do benefit from further context modelling, where SSE coefficients are used just like another counter array.

Now, the problem is that this subdivision is not anything stable - the same context can contain completely random sequences and runs of the same value. And thats why I still cannot replace SSE with something else - damned thing works good enough in both extreme cases and allows for smooth transitions. But still it just has to be replaced with interval-coded contexts mix or something, as its not perfect in any specific case - just that nothing available is that adaptable.

> I think SSE is nothing that complicated, like lots of other stuff.

1. Its naive to think that its possible to design a simple-and-clean universal compression model with state-of-art performance. Good model just has to be complicated these days.

2. We're talking about different kinds of SSE here. Eg. paq8 mappings are closer to original SSE in ppmonstr despite the interpolation. And I was talking about my SSE, like the one used in ash, which is significantly more efficient.

2011-04-08 22:58:54 Serge+Osnach    > 
My recent experiments with ppmonstr-like SSE shows, that the best prediction have been achived with agressive update of nearest SSE contexts. For each binary flag in my SSE maps updating "opposite" context (of course, with smaller scale) gives good enought (1-2%) and very stable benefit. Generally, all contexts in whole sse map must be updated with very small values, but for perfomance reasons this is impossible :-)

2011-04-09 14:55:50 Shelwien        > 
I'm still using that - http://encode.ru/threads/1158-SSE2(o0-o1)-demo-aka-2d-interpolated-mapping-of-linear-inputs?p=22841&viewfull=1#post22841 
But as you can see there, its performance is not very good comparing to logistic mixing etc.

2011-04-10 18:18:55 Serge_Osnach    > 
Try updating not only 4 buckets, but nearest 8 with small weights. 
 
If our models gives very precise predictions, then such updates is meaningless. But O0-O1 models of course is far from exellence, so adding such "noise" in updating (may be in prediction too) can give advantage.

2011-04-12 10:31:40 Shelwien        > 
> nearest 8 with small weights.  
 
Sure it would help. 
But interpolated SSE2 is already slow as it is - its possible to  
mix in a few different contexts with the same speed and better 
compression. 
 
There's actually a more interesting case like that though, 
in compression of BWT output its helpful after a string ABC 
to update both o1[B].prob[C] and o1[A].prob[C] - with BWT 
the effect is considerably better comparing to plain data, 
because the probability of BWT inserting an extra word between two, 
is much higher than probability of a mistype in normal text. 
 
SSE does have a lot of untested options though... 
like logistic domain, ternary statistics, delayed update etc. 
 
What are you up to, anyway? :)

2011-04-13 19:36:41 Serge+Osnach    > 
About bwt-o1: 
Such updates in post-BWT model (and in indirect contexts in ppm/cm too) works well, if BWT actually inserts extra symbols. On texts this will not work good, and on .xls files compression will be hurt by great degree. Sequences, common for .xls files looks like AxxxB, (A+1)xxx(B+1), (A+2)xxx(B+2); and they will be sorted perfectly by BWT.  
 
mixing vs updating: 
Updates can be built only on shifts, but mixing may be built on shifts only like Pw = P1+((P2-P1) >> W1Shift)), where W1Shift = (floor(log2(W1)),  when W1 is small enought. Looks good for me and definitely worth to try.

2011-04-16 16:47:18 Serge+Osnach    > 
Let "updating of nearest SSE contexts with smaller wieght" be NC-update :-) 
My recent attempts to mixing some SSE predictions with some other predictions was very confusing. Looks like SSE in unary coding and SSE in binary coding is completely different.  
In binary coding, when avg(p) is something around 1/2, mixing after SSE will give benefit, greater than NC-updates (of course, NC-updates still good). But in unary SSE such mixing is not so good idea. Fine-tuned unary SSE must handle very low probabilities very well, and give precise predictions.  
So, mixing probabilities must be done before SSE in this case. And there NC-updates have no replacement. 
 
It's very easy to make incorrect generalizations about SSE :-(  
For unreleased Epm r10 I made good model for deterministic contexts, where probabilities from 3 different SSE (main SSE, pure non-stationary SSE and middle SSE) was accurately mixed. But some time ago I wrote genetic optimizer for parameters of SSE, loaded first sub-model into optimizer and run it on famous Book1. And guess what? 
Got -100 bytes :-) This was nearly unbelievable, that one simple SSE with tuned NC-updates can beat three models with good mixer. But archive decompresses successfully :-) 
Adding second sub-model and mixer will save another 600 bytes on Book1, this time with optimizations on whole test set. 
 
BTW, what do you mean under terms "logistic domain" and "ternary statistics"?

2011-04-19 19:10:30 Shelwien        > 
> if BWT actually inserts extra symbols. On texts this will not work good,  
 
It usually does, its kind of a sort property - in BWT, we're applying 
a strict sort to somewhat fuzzy keys, so there're many cases where 
similar strings would be clustered together, but not necessarily 
in the same order. 
 
> and on .xls files compression will be hurt 
 
I guess it doesn't matter so much with adaptive mixing. 
Also its not the best idea to spend time on tuning "universal" 
models for data with known structure. 
 
> Updates can be built only on shifts, but mixing may be built on 
> shifts only like Pw = P1+((P2-P1) >> W1Shift)), where W1Shift = 
> (floor(log2(W1)), when W1 is small enought. Looks good for me and 
> definitely worth to try. 
 
Speed-wise, multiplications are not so relevant on modern PCs - 
compression algos are mostly memory-bound anyway. 
Still, shift-based mixing is actually common, but usually its 
static mixing, and its implemented in a more flexible way, like 
(p1*w1+p2*w2)>>log2(w1+w2). 
Adaptive shift-based mixers are possible too (for example, with BFA), 
but likely don't make any sense in practice. 
Also LUT-based state machines are much more promising both for 
compression and speed. 
 
Anyway, the main point in linear components with multiplication 
(including hash functions) is that they're much easier to optimize. 
Afaik, even if you have a reason to replace multiplications with shifts, 
its still better to do that after tuning the multiplication coefs first, 
by gradually replacing the components with shift-based ones and retuning. 
 
> Looks like SSE in unary coding and SSE in binary coding is 
> completely different. 
 
Bits in unary coding are more likely to have a skewed probability. 
 
> But in unary SSE such mixing is not so good idea.  
 
Well, its probably a matter of mixing method and optimization. 
 
> Got -100 bytes :-) This was nearly unbelievable, that one simple SSE 
> with tuned NC-updates can beat three models with good mixer. 
 
That's interesting, but for me more complex models usually give better 
results. And my main problem is overtuning - more complex models can 
be forced to compress given samples better, but its frequently hard 
to guarantee that there's enough symmetry to make it universal... 
 
> BTW, what do you mean under terms "logistic domain"  
 
see wikipedia for "logistic function" and "logit function". 
paq mixer is a linear mixer in logistic domain. 
 
> and "ternary statistics"? 
 
Well, 1,2,3 -> primary, secondary, ternary 
SSE is secondary, so another statistical mapping in SSE would be ternary. 
Linear interpolation in SSE is obviously very rough, and its clearly 
possible to make a better interpolation function by gathering {p1,p2,w} 
statistics. 
"TSE" proposed by Shkarin was different though :) 
 
P.S. It would be probably more convenient to continue this by email 
(shelwien@gmail.com)

2013-08-08 12:55:05                 > 

2014-05-21 12:50:23 Smithf488       > 
permethrin toxicity in cats bessant kgdefgdaeggdgaek

2014-05-22 05:32:07 Pharmk299       > Very nice site!

2014-05-23 13:03:10 Pharmb554       > Very nice site!

2014-05-24 19:41:43 Pharmd94        > Very nice site!

2014-05-26 01:44:10 Pharmd111       > Very nice site!

2014-05-27 07:57:26 Pharmc360       > Very nice site!

2014-05-29 21:26:15 Pharme398       > Very nice site!

2014-11-26 19:01:07                 > 

2015-01-04 22:15:17 Buck            > 
Hello to all, the contents existing at this web page are actually amazing for people experience, well, keep up the nice work fellows.

2015-01-11 17:33:07                 >