<< ctxmodel.net

Mix v1: sh_samples_1 benchmark results
  • new improved (and stream-oriented) E8 filter
  • buffered i/o from fpaq0pv4B instead of caching the whole input file
  • new optimization target, based on SFC (concatenated samples of each SFC file)

o5mixv0 coders for comparison, just tuned to the new target
o5mix-asame coders with new i/o and E8
o6mix1asame coders with new i/o and E8
o6mix1bo6mix1a with alternate parameter set

But, mysteriously, overall results are worse in my benchmark - speed only increased by 20% in o6mix1 (where 3 submodels were removed), and compression was better in v0.

Guess I'd have to retune these coders back to book1+wcc386 (I don't have the hardware to run the optimization with full datasets) and maybe restore the dual counters at least at o0 and o1.

Durilca'light results added to comparison, and its compression is comparable to CCM, but with 30% faster decoding. As to compression speed, there's just a lot of filters separately processing the data, so I think it could be significantly optimized. But still, durilca filters are not open-source as well.

And also this is the SFC benchmark.

So I guess it got successfully tuned to SFC, but that tuning is not optimal for other data.

...Also I just understood that there's 6 unnecessary hash function calls in the o6mix model %)

o6mix1a optimization progress - first the o6mix v1 result, then 3 models got cut off and other things changed too. Vertical axis shows the compressed size of my SFC-based optimization target and horizontal axis shows the time in hours.

Intel Core2 Q9450 3.68Ghz=460x8, DDR2 5-5-5-18 @ 920=460*2
ccm-51082124615.85716.1411868.1 CCM 1.30c -5 http://christian.martelock.googlepages.com/dl_ccm130c.zip 
o2mix-11477968928.06329.4402631.8 http://ctxmodel.net/files/MIX/mix_v0.rar  
v1_o5mix-11179940159.39260.2482505.5 http://ctxmodel.net/files/MIX/mix_v1.rar  

2008-07-09 00:27:29 Anonymous       >
Damn! 20 hours for parameter optimisation :)

Are you going to apply the speed optimizations i suggested? I've got some more, which i haven't tried.

Do you want to implement per-model context merging?
2008-07-09 00:27:45 toffer          > Forgot to put my name in again...
2008-07-09 07:00:47 Shelwien        >
> Damn! 20 hours for parameter optimisation :)

Its not complete, it took more than 30 hours actually.
o6mix1a1 has 1114 bits in parameter config, so no wonder.
And it was Intel Q9450 at 3.68Ghz - not exactly something cheap.

> Are you going to apply the speed optimizations i suggested?
> I've got some more, which i haven't tried.

Not exactly, as most of speed was lost due to complex
context calculation repeated twice per bit. Well, there
was simple ANDs initially, so I didn't pay attention even
after adding hash functions etc.

Of course, I've read your suggestions, but my byte
contexts are clustered together since v0 - and I don't
like the idea of nibble subblocks, and interleaving
weights and counters too - they have different contexts.

Instead, I'm optimizing 3 new versions since yesterday:
1. o6mix with dual o0,o1 and single o2-o6 and with removed
extra context calculation (though I suddenly remembered
that I did it like that in hope that IntelC would factor
it out - if it did, it might become even slower %)
2. same with hashed context indexes calculated once per byte.
3. [2] with single o0,o1.

[1] and [2] are close to the compression level of v0
already, but [3] is worse - dunno if I made some mistake
or optimization is just slow.

> Do you want to implement per-model context merging?

I'm not sure whether I understand what you mean by that.
If its about advanced modelling of context histories with
more than simple counters, then my dynamically mixed
dual counters are something like that... maybe.

Well, current plan is:
0. Completing the optimization of 3 models explained
above, and releasing v2.
1. Making a match model - seems like further development
is impossible without it, as my o6 model loses too much on
files with long matches. This match model probably would
allow to use SSE etc.
2. Switching to 16bit counters with fixed update (removing
the T field from Node2i, basically).
3. Switching to 12+3 (12 bits of probability + 3-bit
history + history terminator) delayed counters. Dunno
about this though, as optimizing it would be really
2008-07-09 09:28:33 toffer          >
I don't like like the idea of your hashing, since you underestimate the speed gain (or don't care atm?).

Ok, i see that you don't use the same contexts for mixers and weights. This was my first impression (didn't look at model_h.inc).

2 & 3 are really good. With "per model context merging" i meant something like this: c_i is the counter state of a counter (order i model) under some context. For each model keep a static SSE mapping of a quantised value of c_i which maps to a probability SSE_i(c_i). Instead of using c_i is the model output use an optimised static linear mix c_i*a + SSE_i(c_i)*(1-a). This will be especially good on higher orders.
2014-11-27 02:46:28                 >
2015-01-12 02:41:56                 >

Write a comment: