<< ctxmodel.net
> Any other points for discussion
Well, I'd like to discuss many things... eg. what about entropy-coded contexts? That is, how contexts can have internal correlations and uneven distributions, and how to avoid that.
> especially about
SSE?
Eg. it works due to a index duality, similar to the wave-particle one in physics.
That is, the index (probability) you feed into
Of course, that's concerning my own
> I think the photon-wave-dualism is an extremer example.
And I think that
As to "how" - its simple. These two interpretations
Then, the data in different context instances is supposed to have different
properties (or there'd be no sense in using this context at all), So there're
context instances without significant correlations in the history, and simple
linear mapping (even static) is good for them as a compensation for
prediction imprecision. And there're some which do benefit from further
context modelling, where
Now, the problem is that this subdivision is not anything stable - the same
context can contain completely random sequences and runs of the same value.
And thats why I still cannot replace
> I think
SSE is nothing that complicated, like lots of other stuff.
1. Its naive to think that its possible to design a simple-and-clean universal compression model with state-of-art performance. Good model just has to be complicated these days.
2. We're talking about different kinds of 2011-04-08 22:58:54 Serge+Osnach >My recent experiments with ppmonstr-like SSE shows, that the best prediction have been achived with agressive update of nearest SSE contexts. For each binary flag in my SSE maps updating "opposite" context (of course, with smaller scale) gives good enought (1-2%) and very stable benefit. Generally, all contexts in whole sse map must be updated with very small values, but for perfomance reasons this is impossible :-) 2011-04-09 14:55:50 Shelwien >I'm still using that - http://encode.ru/threads/1158-SSE2(o0-o1)-demo-aka-2d-interpolated-mapping-of-linear-inputs?p=22841&viewfull=1#post22841 But as you can see there, its performance is not very good comparing to logistic mixing etc. 2011-04-10 18:18:55 Serge_Osnach >Try updating not only 4 buckets, but nearest 8 with small weights. If our models gives very precise predictions, then such updates is meaningless. But O0-O1 models of course is far from exellence, so adding such "noise" in updating (may be in prediction too) can give advantage. 2011-04-12 10:31:40 Shelwien >> nearest 8 with small weights. Sure it would help. But interpolated SSE2 is already slow as it is - its possible to mix in a few different contexts with the same speed and better compression. There's actually a more interesting case like that though, in compression of BWT output its helpful after a string ABC to update both o1[B].prob[C] and o1[A].prob[C] - with BWT the effect is considerably better comparing to plain data, because the probability of BWT inserting an extra word between two, is much higher than probability of a mistype in normal text. SSE does have a lot of untested options though... like logistic domain, ternary statistics, delayed update etc. What are you up to, anyway? :) 2011-04-13 19:36:41 Serge+Osnach >About bwt-o1: Such updates in post-BWT model (and in indirect contexts in ppm/cm too) works well, if BWT actually inserts extra symbols. On texts this will not work good, and on .xls files compression will be hurt by great degree. Sequences, common for .xls files looks like AxxxB, (A+1)xxx(B+1), (A+2)xxx(B+2); and they will be sorted perfectly by BWT. mixing vs updating: Updates can be built only on shifts, but mixing may be built on shifts only like Pw = P1+((P2-P1) >> W1Shift)), where W1Shift = (floor(log2(W1)), when W1 is small enought. Looks good for me and definitely worth to try. 2011-04-16 16:47:18 Serge+Osnach >Let "updating of nearest SSE contexts with smaller wieght" be NC-update :-) My recent attempts to mixing some SSE predictions with some other predictions was very confusing. Looks like SSE in unary coding and SSE in binary coding is completely different. In binary coding, when avg(p) is something around 1/2, mixing after SSE will give benefit, greater than NC-updates (of course, NC-updates still good). But in unary SSE such mixing is not so good idea. Fine-tuned unary SSE must handle very low probabilities very well, and give precise predictions. So, mixing probabilities must be done before SSE in this case. And there NC-updates have no replacement. It's very easy to make incorrect generalizations about SSE :-( For unreleased Epm r10 I made good model for deterministic contexts, where probabilities from 3 different SSE (main SSE, pure non-stationary SSE and middle SSE) was accurately mixed. But some time ago I wrote genetic optimizer for parameters of SSE, loaded first sub-model into optimizer and run it on famous Book1. And guess what? Got -100 bytes :-) This was nearly unbelievable, that one simple SSE with tuned NC-updates can beat three models with good mixer. But archive decompresses successfully :-) Adding second sub-model and mixer will save another 600 bytes on Book1, this time with optimizations on whole test set. BTW, what do you mean under terms "logistic domain" and "ternary statistics"? 2011-04-19 19:10:30 Shelwien >> if BWT actually inserts extra symbols. On texts this will not work good, It usually does, its kind of a sort property - in BWT, we're applying a strict sort to somewhat fuzzy keys, so there're many cases where similar strings would be clustered together, but not necessarily in the same order. > and on .xls files compression will be hurt I guess it doesn't matter so much with adaptive mixing. Also its not the best idea to spend time on tuning "universal" models for data with known structure. > Updates can be built only on shifts, but mixing may be built on > shifts only like Pw = P1+((P2-P1) >> W1Shift)), where W1Shift = > (floor(log2(W1)), when W1 is small enought. Looks good for me and > definitely worth to try. Speed-wise, multiplications are not so relevant on modern PCs - compression algos are mostly memory-bound anyway. Still, shift-based mixing is actually common, but usually its static mixing, and its implemented in a more flexible way, like (p1*w1+p2*w2)>>log2(w1+w2). Adaptive shift-based mixers are possible too (for example, with BFA), but likely don't make any sense in practice. Also LUT-based state machines are much more promising both for compression and speed. Anyway, the main point in linear components with multiplication (including hash functions) is that they're much easier to optimize. Afaik, even if you have a reason to replace multiplications with shifts, its still better to do that after tuning the multiplication coefs first, by gradually replacing the components with shift-based ones and retuning. > Looks like SSE in unary coding and SSE in binary coding is > completely different. Bits in unary coding are more likely to have a skewed probability. > But in unary SSE such mixing is not so good idea. Well, its probably a matter of mixing method and optimization. > Got -100 bytes :-) This was nearly unbelievable, that one simple SSE > with tuned NC-updates can beat three models with good mixer. That's interesting, but for me more complex models usually give better results. And my main problem is overtuning - more complex models can be forced to compress given samples better, but its frequently hard to guarantee that there's enough symmetry to make it universal... > BTW, what do you mean under terms "logistic domain" see wikipedia for "logistic function" and "logit function". paq mixer is a linear mixer in logistic domain. > and "ternary statistics"? Well, 1,2,3 -> primary, secondary, ternary SSE is secondary, so another statistical mapping in SSE would be ternary. Linear interpolation in SSE is obviously very rough, and its clearly possible to make a better interpolation function by gathering {p1,p2,w} statistics. "TSE" proposed by Shkarin was different though :) P.S. It would be probably more convenient to continue this by email (shelwien@gmail.com) 2013-08-08 12:55:05 >2014-05-21 12:50:23 Smithf488 >permethrin toxicity in cats bessant kgdefgdaeggdgaek 2014-05-22 05:32:07 Very nice site!Pharmk299 >2014-05-23 13:03:10 Very nice site!Pharmb554 >2014-05-24 19:41:43 Very nice site!Pharmd94 >2014-05-26 01:44:10 Very nice site!Pharmd111 >2014-05-27 07:57:26 Very nice site!Pharmc360 >2014-05-29 21:26:15 Very nice site!Pharme398 >2014-11-26 19:01:07 >2015-01-04 22:15:17 Buck >Hello to all, the contents existing at this web page are actually amazing for people experience, well, keep up the nice work fellows. 2015-01-11 17:33:07 > |