My optimizer just randomly modifies parameters until it finds some metric improvement (btw, that metric isn't necessarily only a code length. In some cases it includes other components, like a function of model memory size). And parameter subsets may be optimized separately or not depending on the requirements, but either way it continues iteratively until there're no improvements for some long enough time.
> larger amount) of contexts within a model?
Why, of course its supposed to be optimized for every context separately. In this model for a codec which I'm currently supporting, there're 48 counter tables, 24 SSE1 tables and 12 SSE2 tables, each with its own optimized set of parameters (including context).
Also this is a specialized model for a known data type, so I'm sure that a proper universal model would have 10x of that (or more).
There's no real reason to use the same counter parameters for different contexts.
However I don't widely use delayed counters yet, I think that should wait until successful implementation of polynomial (or another) fast optimizer.
The problem there is that I cannot just replace all the counters with delayed ones, though that would surely be the best for compression. But each delayed counter requires two additional multiplications and two table lookups, so its too slow... and searching for a new balanced model structure with current methods would take a lot of time.
Here's a context declaration and code generated for it:
Index y ych: ch, 1!1 yfb: fb, 1!00100010 yp0: p01, 1!0000000000010000 y_s: As, -7!000000000000000 y_p: p, -7!010000000000001 y_q: q, -7!001000000000111 yr1: r1, -7!000000000000000 y_r: r , -7!100000000000000 yr2: r2, -7!000000000000000 y_col: col, 1!0000000000000000000 y = 0; y = y*2 + ((ch>0+(1-1))); y = y*3 + ((fb>2+(1-1))+(fb>6+(1-1))); y = y*2 + ((p01>11+(1-1))); y = y*3 + ((p>1+(-7-1))+(p>14+(-7-1))); y = y*5 + ((q>2+(-7-1))+(q>12+(-7-1))+(q>13+(-7-1))+(q>14+(-7-1))); y = y*2 + ((r >0+(-7-1)));p,q,r are some adjacent values (its a table),
ch is a channel, col is a table column,
and fb,p01 are scaled probabilities from another counters.
I didn't make any "universal" models since ash, so no "general data" for you. Thing is, I really want to create a proper model for text, but it gets too complex.
> > counter under SSE with it.
> My optimization target is the estimated entropy H(source)
Seems like a misunderstanding again...
You see, if you have a prediction p from some model, already optimized by entropy of p sequence, that doesn't mean that SSE(p) sequence would be optimal right away.
In fact, my counters, which was optimized under SSE, are no good without SSE, And SSE is not a continuous function despite its interpolation, so you cannot really calculate a gradient for it.
I'm optimizing all parameters at once.
That is, all 44 parameters in the case of o2rcA. Of course its not always possible, or reasonable, but these coders are still toys comparing to the one for which I made my optimizer.
Here's my optimization scheme:
1. A special class is used for parameter initialization, which calculates the actual parameter values from "bitmaps" like "01011000". This class also adds some unique signatures to these bitstrings, so they can be easily found in the executable.
2. Optimizer itself is a perl script, which toggles these bits in the executable and runs it and keeps the best result. I think its more or less what's called "genetic optimization".
> at once (in parallel, like you do) can sometimes lead to
> oscillations (if the parameters from different classes
> influence each other).
My optimization process can get stuck in a local minimum, but can't oscillate. Optimizer attempts to modify the current best state to get even better result, so its only able to improve the overall result.
> take until your optimization target was reached when tuning
> the fpaq0pv4b order0 model (on the whole SFC)?
With a single counter its really fast, though original fpaq0pv4b uses a shift-counter and I didn't optimize it. But the later version in ST2rc_v3 has 5+12+8+15=40 bits of counter parameter set, and needed like 3 iterations through it, so thats like 120 executables runs.
Well, SFC is compressed in ~10s by ST2rc_v3, so guess it takes 20mins=120*10/60 on my machine.
Anyway, I don't care when its at this scale - while it gets me the best parameters while I sleep.
However, I had to use 6 machines for 4 whole days to optimize a "real" model, and started to think about better approaches after that.
2013-08-08 15:51:28 >
2014-11-27 03:10:29 >
2015-01-12 03:13:08 >