Context Modelling

o5mix	v0 coders for comparison, just tuned to the new target
o6mix	v0 coders for comparison, just tuned to the new target
o5mix-a	same coders with new i/o and E8
o6mix-a	same coders with new i/o and E8
o6mix1a	same coders with new i/o and E8
o6mix1b	o6mix1a with alternate parameter set

But, mysteriously, overall results are worse in my benchmark - speed only increased by 20% in o6mix1 (where 3 submodels were removed), and compression was better in v0.

Guess I'd have to retune these coders back to book1+wcc386 (I don't have the hardware to run the optimization with full datasets) and maybe restore the dual counters at least at o0 and o1.

Durilca'light results added to comparison, and its compression is comparable to CCM, but with 30% faster decoding. As to compression speed, there's just a lot of filters separately processing the data, so I think it could be significantly optimized. But still, durilca filters are not open-source as well.

So I guess it got successfully tuned to SFC, but that tuning is not optimal for other data.

...Also I just understood that there's 6 unnecessary hash function calls in the o6mix model %)

o6mix1a optimization progress - first the o6mix v1 result, then 3 models got cut off and other things changed too. Vertical axis shows the compressed size of my SFC-based optimization target and horizontal axis shows the time in hours.

Codec	Datasize	Ctime	Dtime	Metric	Notes
Intel Core2 Q9450 3.68Ghz=460x8, DDR2 5-5-5-18 @ 920=460*2
ccm-5	10821246	15.857	16.141	1868.1	CCM 1.30c -5 http://christian.martelock.googlepages.com/dl_ccm130c.zip

o2mix-1	14779689	28.063	29.440	2631.8	http://ctxmodel.net/files/MIX/mix_v0.rar
o3mix-1	12933426	41.531	42.295	2485.3
o4mix-1	12163352	51.093	51.861	2470.2
o5mix-1	11911688	62.282	62.422	2547.7
o6mix-1	11864504	76.498	77.751	2707.8

v1_o5mix-1	11799401	59.392	60.248	2505.5	http://ctxmodel.net/files/MIX/mix_v1.rar
v1_o5mix-a-1	11795060	58.922	59.220	2494.1
v1_o6mix-1	11705854	72.906	73.422	2636.2
v1_o6mix-a-1	11703247	73.140	72.781	2629.6
v1_o6mix1a-1	11815431	58.313	56.641	2470.9
v1_o6mix1b-1	11804441	57.875	56.610	2468.4

2008-07-09 07:00:47 Shelwien >
> Damn! 20 hours for parameter optimisation :)

Its not complete, it took more than 30 hours actually.
o6mix1a1 has 1114 bits in parameter config, so no wonder.
And it was Intel Q9450 at 3.68Ghz - not exactly something cheap.

> Are you going to apply the speed optimizations i suggested?
> I've got some more, which i haven't tried.

Not exactly, as most of speed was lost due to complex
context calculation repeated twice per bit. Well, there
was simple ANDs initially, so I didn't pay attention even
after adding hash functions etc.

Of course, I've read your suggestions, but my byte
contexts are clustered together since v0 - and I don't
like the idea of nibble subblocks, and interleaving
weights and counters too - they have different contexts.

Instead, I'm optimizing 3 new versions since yesterday:
1. o6mix with dual o0,o1 and single o2-o6 and with removed
extra context calculation (though I suddenly remembered
that I did it like that in hope that IntelC would factor
it out - if it did, it might become even slower %)
2. same with hashed context indexes calculated once per byte.
3. [2] with single o0,o1.

[1] and [2] are close to the compression level of v0
already, but [3] is worse - dunno if I made some mistake
or optimization is just slow.

> Do you want to implement per-model context merging?

I'm not sure whether I understand what you mean by that.
If its about advanced modelling of context histories with
more than simple counters, then my dynamically mixed
dual counters are something like that... maybe.

Well, current plan is:
0. Completing the optimization of 3 models explained
above, and releasing v2.
1. Making a match model - seems like further development
is impossible without it, as my o6 model loses too much on
files with long matches. This match model probably would
allow to use SSE etc.
2. Switching to 16bit counters with fixed update (removing
the T field from Node2i, basically).
3. Switching to 12+3 (12 bits of probability + 3-bit
history + history terminator) delayed counters. Dunno
about this though, as optimizing it would be really
time-consuming.

2008-07-09 09:28:33 toffer >
I don't like like the idea of your hashing, since you underestimate the speed gain (or don't care atm?).

Ok, i see that you don't use the same contexts for mixers and weights. This was my first impression (didn't look at model_h.inc).

2 & 3 are really good. With "per model context merging" i meant something like this: c_i is the counter state of a counter (order i model) under some context. For each model keep a static SSE mapping of a quantised value of c_i which maps to a probability SSE_i(c_i). Instead of using c_i is the model output use an optimised static linear mix c_i*a + SSE_i(c_i)*(1-a). This will be especially good on higher orders.