<< ctxmodel.net

Mix v6:

sh_samples_1 benchmark results (sh_samples_1)

SFC benchmark results (SFC)

o6mix6 608671optimal single-context coder (mask 405FFF)
o6mix6-t 620111real o2 made for toffer (tuned to full SFC)
o6mix6a 547088something closer to o3 (mask 401FFFDF) + SSE
o6mix6a1 546543SSE speed optimizations
o6mix6a2 546543more optimizations
o6mix6a3 542932yet more + limiters in SSE probability estimation
o6mix6a4 541054tuned version (with expanded SSE context)
o6mix6a5 541037alternate profile
o6mix6a6 542040back to o1 context
o6mix6b 547611hashed SSE array
o6mix6b1 5475856b with different parameters
o6mix7a2 597595o6mix6 with 15:3 delayed counter
o6mix7 541878o6mix6a3 with 15:3 delayed counter
o6mix7b 541140o6mix7 fully tuned
o6mix7c1 554163delayed counter w/o static mappings, D=3
o6mix7c2 553375D=2
o6mix7c3 550097D=1

Quite a lot of experiments was performed and at least now we have a speed-optimized (relatively) implementation of interpolated SSE and a rough but working example of delayed counter.

The goal this time was to try out a delayed counter and to find a better model structure for a fast bitwise coder, and it seems like that task was successfully completed. For example, o6mix6a3 has better compression than v0/o2mix, which had 6 mixed submodels, and is also 3x faster.

However, the delayed counter results were a bit disappointing. Well, for the target file compression was successfully improved, but that gain appears really unstable, especially in the SSE version. Although that can be explained by low context order and target file specifics - there was no long runs of the same value (in context), and its hard to think of any other dependency common for all the bits in byte (as the same counter is used for all the bits), which could be handled by delayed counter.

Well, at least this is finished now, next plans are:

  • blocking the redundancy on incompressible data (by separating of probability estimation and coding passes)
  • adding another counter submodel (counter+SSE+mixer or just SSE)
  • further experiments with counters and SSE contexts


Intel Core2 Q9450 3.68Ghz=460x8, DDR2 5-5-5-18 @ 920=460*2
CodecDatasizeCtimeDtimeMetricNotes
balz113-e137842934263.68858.24922384.1 http://encode.ru/balz/balz113.zip  
balz113-ex135284357436.73458.01522155.1 -ex  
cmm4-m-70135276307243.454229.17323672.1 http://freenet-homepage.de/toffer_86/cmm4_02a_080712_nomm.7z 
cmm4_2a-70130327852236.968244.39023044.6 http://freenet-homepage.de/toffer_86/cmm4_02a_080712.7z 
cmm4_2a-75120225318275.107283.93621899.7  
ppmd-8137974856112.140120.81222878.8 PPMd Jr1 -m1980 -o8 http://compression.ru/ds/ppmdj1.rar 
ppmd-6140889604101.126109.29723208.1 PPMd Jr1 -m1980 -o6 
 
v3_o6mix3d1138634222331.436327.65525269.6 bugfixed ver; http://ctxmodel.net/files/MIX/mix_v3.rar  
v3_o6mix3e139141975259.485259.53324595.7 16bit counters 
v3_o6mix3e2138243249258.875261.03124469.7 2x expanded hash tables 
v3_o6mix3e21138426074243.579244.62524318.9 strict 64byte alignment, like in 3d 
v3_o6mix3e22138426074254.076254.14024424.6 tables aligned (by 4096) 
v3_o6mix3e23139228039256.110257.18624582.4 hash indexes aligned (by 64 bytes) 
 
o6mix3e21_el32132825613248.781251.95323522.3 using Shkarin's LZP preprocessor with -l32 (and E8 before LZP) 
 
v4_o6mix56b1142027465493.826492.15627607.2 nibble hash with collision detection 
v4_o6mix58139235023261.780264.67224664.0 order2-3 match model test: http://ctxmodel.net/files/MIX/mix_v4.rar  
 
v5_o6mix59138256704257.016257.89024438.5 3e2 with coder from 58: http://ctxmodel.net/files/MIX/mix_v5.rar  
v5_o6mix59a135926958286.483287.54624400.5 59 + interpolated SSE<6> over o6 
 
v6_o6mix617250547936.59439.42227384.8 best single order (according to optimizer) = masked o2 
v6_o6mix6a15575149674.45377.09325181.6 masked o3 (or 2.5 - 21bits) with o1 SSE 
v6_o6mix6b15635673077.60980.43825312.7 hashed SSE (still o1 + extra bit) 
v6_o6mix6b115633337377.51681.01625314.8 alternate 6b parameter profile 
v6_o6mix6a115557788275.73575.42125139.0 SSE class optimization 
v6_o6mix6a215557788273.42274.98425132.3 further optimizations 
v6_o6mix6a315454202868.21970.31224918.5 yet more + write instead of increment on update + tricky prediction bounds check 
v6_o6mix6a415526870670.64073.15525062.9 tuned 6a3 
v6_o6mix6a515514894170.75073.34225046.2 alternate profile 
v6_o6mix6a615518136968.07970.20425017.2 back to o1 context for SSE 
v6_o6mix6-t17226062737.53041.56227368.9 SFC-optimized o2 for toffer 
v6_o6mix7154806732100.408102.20325311.0 o6mix6a3 + 15:3 delayed counter + only counter tuned 
v6_o6mix7a217018584461.17264.20327294.7 o6mix6 + 15:3 delayed counter 
v6_o6mix7b15565307199.749102.21925442.7 o6mix7 with all parameters tuned 
v6_o6mix7c115415648596.17292.14125104.5 delayed bits used as SSE context, D=3 
v6_o6mix7c215419122792.49993.99925124.9 D=2 
v6_o6mix7c3154099415103.859105.32725235.2 D=1 

2008-07-27 15:01:13 toffer          >
My results for a single delayed mapping are worse than expected (gain is less than .005%). Different parameters for y=0 and y=1 (the next bit) are very close and it's faster to implement. Guess i'll have to use two separate mappings :)

I'm planning to do some transformations, e.g. don't use a delayed bit as a parameter context, but something like (p>0.5) XOR y, or y(k)==y(k-1), etc...

Have you tried anything like this? I don't like to reinvent the wheele all the time.

BTW i tried to optimize the mixing distribution as well (you set it to p_m=0.5), but i was hardly able to improve the performance due to limited precision (for gradient calculations).
2008-07-28 03:30:46 Shelwien        >
> My results for a single delayed mapping are worse than
> expected (gain is less than .005%).

Well, dunno. Like I posted, the directly used 15:3 delayed
counter had 1.81% better ratio at the optimization target,
and 1.34% better in sh_samples_1 benchmark. (check o6mix6
vs o6mix7a2)

It didn't work out for delayed counter + SSE though, which
means that SSE's context quantization has more effect than
linear probability correction, and its harder to track the
correlations with delayed counter.

So now there's an idea to try skipping the update of
DC's probability part, while using the mappings emulating
the simple counter. Thus primary estimation would always
have the same distribution, and such counter still would
be more adaptive than simple one.

> Different parameters for y=0 and y=1 (the next bit) are
> very close and it's faster to implement.

Yes, but there won't be much gain from that, aside from
tuning instability similar to my DC's.
Alas, there's no such constantly skewed "symbols" in bitwise
coding, like in unary - that's where DCs and the like are
really useful.

> I'm planning to do some transformations, e.g. don't use a
> delayed bit as a parameter context, but something like
> (p>0.5) XOR y, or y(k)==y(k-1), etc...

Well, that could work in SSE context, so why not, but I'd not
expect much from that.

> Have you tried anything like this?
> I don't like to reinvent the wheele all the time.

Thing is that any history quantization basically can be
emulated with the proper set of DC's parameters... so it
might be better to make some tuning passes with different
files and analyze the tuned parameter values.

> BTW i tried to optimize the mixing distribution as well
> (you set it to p_m=0.5),

Don't really understand that... You mean your mix of
primary estimation with SSE? Wonder if I should try
something like that too...
2008-07-29 15:57:38 toffer          >
Two thing - could you *please* make your i/o functions return some file_t (defined as whatever you like...) i don't like to change the returned uints all the time. Could you please post the layout of o6mix* v7? If i want to quickly compare some stuff with older versions, i have to read a lot of source code; the short comments in the table above aren't that helpful.

> So now there's an idea to try skipping the update of
> DC's probability part, while using the mappings emulating
> the simple counter. Thus primary estimation would always
> have the same distribution, and such counter still would
> be more adaptive than simple one.

Sorry, i don't really understand what you mean here. Could you clarify this?

> Don't really understand that... You mean your mix of
> primary estimation with SSE? Wonder if I should try
> something like that too...

ATM i'm trying to create a new model with tuned parameters. It shouldn't be too slow and suited for higher order contexts. As i said the first experiment with a single delayed mapping failed. I think (just a intuitive guess) that something like this will do a good job:

1. delayed counter with two mappings (unfortunately two... :))
2. mix the counter output with a context merging table (i described it some time ago) - static SSE

But actually that's not what i meant in the previous post - i wanted to say, why not tune pm within a counter:

p(k+1) = w0*p(k) + w1*y(k) + w2*pm

Or even simpler:

p(k+1) = w0*p(k) + w1(y(k))*pm(y(k))

I'll add initial value tuning, too. But this will take some time.
2008-07-29 17:21:48 Shelwien        >
> Two thing - could you *please* make your i/o functions
> return some file_t (defined as whatever you like...)

I made this: http://ctxmodel.net/files/MIX/file_std.inc
Is it enough?

> Could you please post the layout of o6mix* v7?
> the short comments in the table above aren't that helpful.

Guess you meant o6mix7* here. But its hard to explain any
more than these short comments because only Node2i is
patched there (sh_node2i.inc) - turned into a rough
delayed counter implementation.
All I can add is that revisions 7-7b just have the counter
replaced and in 7c the scheme is simplified - there's no
static "estimation" mapping and simple update is used
instead of another mapping, but delayed history bits are
included into SSE context instead (1 to 3).

> > So now there's an idea to try skipping the update of
> > DC's probability part, while using the mappings emulating
> > the simple counter. Thus primary estimation would always
> > have the same distribution, and such counter still would
> > be more adaptive than simple one.
>
> Sorry, i don't really understand what you mean here.
> Could you clarify this?

Well, delayed counter is always better when used directly,
but there's no simple correlation between bit history
sequences and probability values, like with a simple counter.
The result is unstable performance under SSE.

So, a proposed tradeoff is using the simple update
(still delayed) and skipping updates on some bits,
deciding by delayed bits.
That would keep the correlations of probability values,
but still allows to somewhat improve the counter's performance;
also skipping updates is good for speed and can be seen
as a "smart parsing".

> But actually that's not what i meant in the previous post
> - i wanted to say, why not tune pm within a counter:
> p(k+1) = w0*p(k) + w1*y(k) + w2*pm

Ah, that. Of course I always did that - look for *mw constants.

Btw, even if it was a misunderstanding, I still tried mixing
of primary and secondary estimations (just the secondary one
was used before) and I like the results:

http://ctxmodel.net/files/MIX/mix_v7.htm
http://ctxmodel.net/files/MIX/mix_v7_SFC.htm

Guess it would be even better if we combine it with simplified
delayed counter like in 7c* revisions.
2008-08-21 11:54:56 toffer          >
Do you have any more progress? What about your audio modeling? I'm still investigating several optimization algorithms.
2008-08-21 13:25:00 Shelwien        >
My jobs keep me busy for now.
And audio compressor is the next thing I'd make when I'd get some free time,
as its seems easier than a new bytewise "universal" compressor
(while continuing with Mix is boring somehow).
Also thanks to Christian I remembered about noise in audio data,
which is a great source of redundancy in my lamix_v0 coder, as
it processes all the bits with the same counters and doesn't
limit the redundancy for low bits in any way.
2009-01-05 21:05:45 Fatlip1         >
Eugine, is "mix" legally dead now? It's really sad since it did pretty well in my tests.
2009-01-06 20:38:30 Shelwien        >
Well, it wasn't quite alive from the beginning ;)
Meaning that it was never intended for practical use.
But I still do some tests with it, even now there're 3 instances
of optimizer running with versions of mix7d2 - byte mapping
experiment.
So in theory its still possible that I'd post more versions,
if something interesting happens with it.
But in practice I'm more interested in bytewise compressors
(did you notice tree_v1 i wonder?), and maybe lossless audio
compressors.
2013-08-09 02:27:27                 >
2014-11-26 07:33:17                 >
2015-01-11 04:42:46                 >

Write a comment:

Name: