<< ctxmodel.net
> is time consuming (cmm2 dropped from ~700kb/s to 420 on my pc). Not exactly sure what are you talking about, but in my implementation mixer uses the same update as a counter, so with mixing 2 models you need to update 3 counters, though there're trade-offs like PPM with partial updates.
But its still remains the question whether these kinds of bruteforce methods,
like combining the estimations of multiple models etc, are worth their speed impact.
Eg. is
mix( p0, mix( p1, p2 ) ),> which gives two averages instead of three. Implicitly this gives higher > weights to lower orders (the orders in the "outer mix", here p0),
It doesn't. Because it gives higher weights to the model which is more efficient entropy-wise. Also the mixers are initialized with the weight which doesn't include the first probability at all.
I did it like that 5 years ago, thanks.
> Maybe this can be made efficient?
Well, it I did try to prove it another way, though, by stripping lpaq, but seems that is has too many "unsportsmanlike" tricks in it, so I did manage to disable models except order1 and 2 (despite the #defines ), and text/exe preprocessing, but there're too much stuff still left as even the hash-function and counters are somewhat tuned to the "real data".
> i noticed that this is the most important mixing context > (which should be obvious, after a second thought). Well, thats only if you use the same mixer array for mixing different orders. Which isn't obvious. Instead, it's more obvious imho that you'd need different contexts for mixing different orders - at least different quantization.
> What is a "counter" in your designations? > Do you refer to a probability with adjustment by the error, > or a counter like counting the occurences of 0/1 bits? Well, by counter I mean a statistic counter like these "states". Actually my counter is derived from the same origin as states - integer number of occurences that is. So, it's evolved like this: 1. P0 = bitcount[0]/(bitcount[0]+bitcount[1]); bitcount[bit]++; 2. P0 = bitcount[0]/(bitcount[0]+bitcount[1]); (bitcount[bit]*=(1-1/InvSpeed))++; 3. P0 = bitcount[0]; (bitcount[0]*=(1-1/bitcount[1]))+=(bit?0:1/bitcoun t[1]); (bitcount[1]*=(1-1/InvSpeed))++; 4. P0 = bitcount[0]; wr = Tbl[InvSpeed][bitcount[1]]; (bitcount[0]*=(1-wr))+=(bit?0:wr); bitcount[1]++;So [2] here is a nonstationarity countermeasure, and other changes are mostly speed optimizations. And Matt Mahoney made the next step here, by quantizing such a counter into a single byte - which takes away the freedom of adaptation speed controlling and even adds some redundancy. So of course his approach has its place in the industrial compression schemes (like h264), but I think that it really
complicates the development of new models.
> though. For mixing i use an optimization approach (minimizig the entropy), There was this trick which I called "bruteforce adaptation", BFA - true entropy minimization, which accumulates all the possible codelengths for a given set of weight values and uses the weight with minimal length for actual encoding: for( w=0,i=0; w<=1; w+=1/63,i++ ) { p = p1*(1-w) + p2*w; (len[i]*=(1-1/InvSpeed)) += -log2(bit?1-p); }Here's a working example from that time: http://compression.ru/sh/o1bfa.rar And the new mixing idea is really simple: we update the mixed probability as if it was from a normal counter, and then "back-propagate" the change into reachable parameters... weight, here. Well, it looks like this (here's a "translated" version from my sh_mixer.inc): int w; iMixer::Process( bit, p1, p2, InvSpeed, Limit ) { Counter tmp; tmp.P = p1+(p2-p1)*w; // mixed probability of 0 tmp.Encode( bit ); // encode the bit with mixing result tmp.Update( bit, InvSpeed ); // update tmp counter like usual // speed opt - no sense to change weight when p1 and p2 are too close if( ABS(p2-p1)>Limit ) { // change weight so that we'll get an "updated" value from repeated mixing w = (tmp.P-p1)/(p2-p1); w = max(0,min(1,w)); } }
> weighted averaging (expect the stretch/squash transformation preceeding and > succseeding mixing) with adjustment based on the gradient. Matt modified > the gradient descent to minimize ld( pbit ), rather than standard back > propagation (minimizing RMSE). One problem you always face to is getting > stuck in a local minimum. I think that neural network origin is what makes Matt's mixer so inefficient in my example. That is, maybe he did manage somehow to make it output something useful (its always possible when you build a model from a given set of functions comparing to trying to insert a function into a working model), but it surely doesn't adapt well to some abstract probabilities, out of paq context. Well, my order 2-1-0 mix looks like this: // (de)coding a bit from "flag" variable. // TTable_NN,MMM are counter update mode settings (kinda random here) // also "100" in mixer calls is a limit value = 0.003 in 0..1 interval p1 = order0[z].P; p2 = order1[p][z].P; p3 = order2[pp][p][z].P; p4 = p1 + (p2-p1)*w1[p][z].w; w2[p][z].Process( flag, TTable_5,1,100, p4,p3 ); order0[z].Update( flag, TTable_20, 700 ); order1[p][z].Update( flag, TTable_10, 600 ); order2[pp][p][z].Update( flag, TTable_8, 400 ); w1[p][z].Process( flag, TTable_5,1,100, p1,p2 );
> speed, by only using 3 or 4 active models out of 6 to 8 in mixing, e.g. > orders 654, when coding at order 6 and completly ignoring lower ones - > which improved speed AND compression. > Maybe this is exactly what you meant!? I didn't, Though I'm pretty sure that you'd get a valid result from any set of estimations you'll run through my mixer.
It is quite significant with wide enough mixer context.
> of extra time isn't worth the effort. Well, not sure what are you agreeing with. In fact, I think that its always useful to have the best result you're able to reach within common sense restrictions (eg. no supercomputers etc). Because only when you have values to compare, you'd be able to make a correct decision on where to stop. And the current problem is that paq8 isn't quite that. Its obviously more useful than some widespread archivers for data redundancy analysis, but it's too easy to make a very simple model with better results for custom data - and no wonder as paq8 uses lots of tricks for common files compression, but its compression classes, which aren't particularly good even in theory, are made even less efficient by all these speed and memory trade-offs.
> but special models will definitley do better - like a bi- or trigram word for text, > surrounding pixels for images and so on. I think a specialized model along with maybe > order210 will give nice results; and higher orders are definitely not worth > the effort here.
Well, it depends on the target. 2013-08-13 07:56:12 >2014-11-15 09:33:59 >2014-11-26 20:22:09 >2015-01-11 19:10:00 > |