Context Modelling

<< ctxmodel.net

Mix v6:

sh_samples_1 benchmark results (sh_samples_1)

o6mix6 608671 optimal single-context coder (mask 405FFF)
o6mix6-t 620111 real o2 made for toffer (tuned to full SFC)
o6mix6a 547088 something closer to o3 (mask 401FFFDF) + SSE
o6mix6a1 546543 SSE speed optimizations
o6mix6a2 546543 more optimizations
o6mix6a3 542932 yet more + limiters in SSE probability estimation
o6mix6a4 541054 tuned version (with expanded SSE context)
o6mix6a5 541037 alternate profile
o6mix6a6 542040 back to o1 context
o6mix6b 547611 hashed SSE array
o6mix6b1 547585 6b with different parameters
o6mix7a2 597595 o6mix6 with 15:3 delayed counter
o6mix7 541878 o6mix6a3 with 15:3 delayed counter
o6mix7b 541140 o6mix7 fully tuned
o6mix7c1 554163 delayed counter w/o static mappings, D=3
o6mix7c2 553375 D=2
o6mix7c3 550097 D=1

Quite a lot of experiments was performed and at least now we have a speed-optimized (relatively) implementation of interpolated SSE and a rough but working example of delayed counter.

The goal this time was to try out a delayed counter and to find a better model structure for a fast bitwise coder, and it seems like that task was successfully completed. For example, o6mix6a3 has better compression than v0/o2mix, which had 6 mixed submodels, and is also 3x faster.

However, the delayed counter results were a bit disappointing. Well, for the target file compression was successfully improved, but that gain appears really unstable, especially in the SSE version. Although that can be explained by low context order and target file specifics - there was no long runs of the same value (in context), and its hard to think of any other dependency common for all the bits in byte (as the same counter is used for all the bits), which could be handled by delayed counter.

Well, at least this is finished now, next plans are:

blocking the redundancy on incompressible data (by separating of probability estimation and coding passes)
adding another counter submodel (counter+SSE+mixer or just SSE)
further experiments with counters and SSE contexts

Codec	Datasize	Ctime	Dtime	Metric	Notes
Intel Core2 Q9450 3.68Ghz=460x8, DDR2 5-5-5-18 @ 920=460*2
balz113-e	137842934	263.688	58.249	22384.1	http://encode.ru/balz/balz113.zip
balz113-ex	135284357	436.734	58.015	22155.1	-ex
cmm4-m-70	135276307	243.454	229.173	23672.1	http://freenet-homepage.de/toffer_86/cmm4_02a_080712_nomm.7z
cmm4_2a-70	130327852	236.968	244.390	23044.6	http://freenet-homepage.de/toffer_86/cmm4_02a_080712.7z
cmm4_2a-75	120225318	275.107	283.936	21899.7
ppmd-8	137974856	112.140	120.812	22878.8	PPMd Jr1 -m1980 -o8 http://compression.ru/ds/ppmdj1.rar
ppmd-6	140889604	101.126	109.297	23208.1	PPMd Jr1 -m1980 -o6

v3_o6mix3d1	138634222	331.436	327.655	25269.6	bugfixed ver; http://ctxmodel.net/files/MIX/mix_v3.rar
v3_o6mix3e	139141975	259.485	259.533	24595.7	16bit counters
v3_o6mix3e2	138243249	258.875	261.031	24469.7	2x expanded hash tables
v3_o6mix3e21	138426074	243.579	244.625	24318.9	strict 64byte alignment, like in 3d
v3_o6mix3e22	138426074	254.076	254.140	24424.6	tables aligned (by 4096)
v3_o6mix3e23	139228039	256.110	257.186	24582.4	hash indexes aligned (by 64 bytes)

o6mix3e21_el32	132825613	248.781	251.953	23522.3	using Shkarin's LZP preprocessor with -l32 (and E8 before LZP)

v4_o6mix56b1	142027465	493.826	492.156	27607.2	nibble hash with collision detection
v4_o6mix58	139235023	261.780	264.672	24664.0	order2-3 match model test: http://ctxmodel.net/files/MIX/mix_v4.rar

v5_o6mix59	138256704	257.016	257.890	24438.5	3e2 with coder from 58: http://ctxmodel.net/files/MIX/mix_v5.rar
v5_o6mix59a	135926958	286.483	287.546	24400.5	59 + interpolated SSE<6> over o6

v6_o6mix6	172505479	36.594	39.422	27384.8	best single order (according to optimizer) = masked o2
v6_o6mix6a	155751496	74.453	77.093	25181.6	masked o3 (or 2.5 - 21bits) with o1 SSE
v6_o6mix6b	156356730	77.609	80.438	25312.7	hashed SSE (still o1 + extra bit)
v6_o6mix6b1	156333373	77.516	81.016	25314.8	alternate 6b parameter profile
v6_o6mix6a1	155577882	75.735	75.421	25139.0	SSE class optimization
v6_o6mix6a2	155577882	73.422	74.984	25132.3	further optimizations
v6_o6mix6a3	154542028	68.219	70.312	24918.5	yet more + write instead of increment on update + tricky prediction bounds check
v6_o6mix6a4	155268706	70.640	73.155	25062.9	tuned 6a3
v6_o6mix6a5	155148941	70.750	73.342	25046.2	alternate profile
v6_o6mix6a6	155181369	68.079	70.204	25017.2	back to o1 context for SSE
v6_o6mix6-t	172260627	37.530	41.562	27368.9	SFC-optimized o2 for toffer
v6_o6mix7	154806732	100.408	102.203	25311.0	o6mix6a3 + 15:3 delayed counter + only counter tuned
v6_o6mix7a2	170185844	61.172	64.203	27294.7	o6mix6 + 15:3 delayed counter
v6_o6mix7b	155653071	99.749	102.219	25442.7	o6mix7 with all parameters tuned
v6_o6mix7c1	154156485	96.172	92.141	25104.5	delayed bits used as SSE context, D=3
v6_o6mix7c2	154191227	92.499	93.999	25124.9	D=2
v6_o6mix7c3	154099415	103.859	105.327	25235.2	D=1

2008-07-27 15:01:13 toffer          > 
My results for a single delayed mapping are worse than expected (gain is less than .005%). Different parameters for y=0 and y=1 (the next bit) are very close and it's faster to implement. Guess i'll have to use two separate mappings :) 
 
I'm planning to do some transformations, e.g. don't use a delayed bit as a parameter context, but something like (p>0.5) XOR y, or y(k)==y(k-1), etc... 
 
Have you tried anything like this? I don't like to reinvent the wheele all the time. 
 
BTW i tried to optimize the mixing distribution as well (you set it to p_m=0.5), but i was hardly able to improve the performance due to limited precision (for gradient calculations).

2008-07-28 03:30:46 Shelwien        > 
> My results for a single delayed mapping are worse than 
> expected (gain is less than .005%).  
 
Well, dunno. Like I posted, the directly used 15:3 delayed 
counter had 1.81% better ratio at the optimization target, 
and 1.34% better in sh_samples_1 benchmark. (check o6mix6 
vs o6mix7a2) 
 
It didn't work out for delayed counter + SSE though, which 
means that SSE's context quantization has more effect than 
linear probability correction, and its harder to track the 
correlations with delayed counter. 
 
So now there's an idea to try skipping the update of 
DC's probability part, while using the mappings emulating 
the simple counter. Thus primary estimation would always 
have the same distribution, and such counter still would 
be more adaptive than simple one. 
 
> Different parameters for y=0 and y=1 (the next bit) are 
> very close and it's faster to implement.  
 
Yes, but there won't be much gain from that, aside from 
tuning instability similar to my DC's. 
Alas, there's no such constantly skewed "symbols" in bitwise 
coding, like in unary - that's where DCs and the like are 
really useful. 
 
> I'm planning to do some transformations, e.g. don't use a 
> delayed bit as a parameter context, but something like 
> (p>0.5) XOR y, or y(k)==y(k-1), etc... 
 
Well, that could work in SSE context, so why not, but I'd not 
expect much from that. 
 
> Have you tried anything like this?  
> I don't like to reinvent the wheele all the time. 
 
Thing is that any history quantization basically can be 
emulated with the proper set of DC's parameters... so it 
might be better to make some tuning passes with different 
files and analyze the tuned parameter values. 
 
> BTW i tried to optimize the mixing distribution as well 
> (you set it to p_m=0.5),  
 
Don't really understand that... You mean your mix of 
primary estimation with SSE? Wonder if I should try 
something like that too...

2008-07-29 15:57:38 toffer          > 
Two thing - could you *please* make your i/o functions return some file_t (defined as whatever you like...) i don't like to change the returned uints all the time. Could you please post the layout of o6mix* v7? If i want to quickly compare some stuff with older versions, i have to read a lot of source code; the short comments in the table above aren't that helpful. 
 
> So now there's an idea to try skipping the update of 
> DC's probability part, while using the mappings emulating 
> the simple counter. Thus primary estimation would always 
> have the same distribution, and such counter still would 
> be more adaptive than simple one.  
 
Sorry, i don't really understand what you mean here. Could you clarify this? 
 
> Don't really understand that... You mean your mix of 
> primary estimation with SSE? Wonder if I should try 
> something like that too... 
 
ATM i'm trying to create a new model with tuned parameters. It shouldn't be too slow and suited for higher order contexts. As i said the first experiment with a single delayed mapping failed. I think (just a intuitive guess) that something like this will do a good job: 
 
1. delayed counter with two mappings (unfortunately two... :)) 
2. mix the counter output with a context merging table (i described it some time ago) - static SSE 
 
But actually that's not what i meant in the previous post - i wanted to say, why not tune pm within a counter: 
 
p(k+1) = w0*p(k) + w1*y(k) + w2*pm 
 
Or even simpler: 
 
p(k+1) = w0*p(k) + w1(y(k))*pm(y(k)) 
 
I'll add initial value tuning, too. But this will take some time.

2008-07-29 17:21:48 Shelwien        > 
> Two thing - could you *please* make your i/o functions 
> return some file_t (defined as whatever you like...)  
 
I made this: http://ctxmodel.net/files/MIX/file_std.inc 
Is it enough? 
 
> Could you please post the layout of o6mix* v7?  
> the short comments in the table above aren't that helpful. 
 
Guess you meant o6mix7* here. But its hard to explain any 
more than these short comments because only Node2i is 
patched there (sh_node2i.inc) - turned into a rough 
delayed counter implementation. 
All I can add is that revisions 7-7b just have the counter 
replaced and in 7c the scheme is simplified - there's no 
static "estimation" mapping and simple update is used 
instead of another mapping, but delayed history bits are 
included into SSE context instead (1 to 3). 
 
> > So now there's an idea to try skipping the update of 
> > DC's probability part, while using the mappings emulating 
> > the simple counter. Thus primary estimation would always 
> > have the same distribution, and such counter still would 
> > be more adaptive than simple one. 
> 
> Sorry, i don't really understand what you mean here.  
> Could you clarify this? 
 
Well, delayed counter is always better when used directly, 
but there's no simple correlation between bit history 
sequences and probability values, like with a simple counter. 
The result is unstable performance under SSE. 
 
So, a proposed tradeoff is using the simple update  
(still delayed) and skipping updates on some bits, 
deciding by delayed bits. 
That would keep the correlations of probability values, 
but still allows to somewhat improve the counter's performance; 
also skipping updates is good for speed and can be seen 
as a "smart parsing". 
 
> But actually that's not what i meant in the previous post 
> - i wanted to say, why not tune pm within a counter: 
> p(k+1) = w0*p(k) + w1*y(k) + w2*pm 
 
Ah, that. Of course I always did that - look for *mw constants. 
 
Btw, even if it was a misunderstanding, I still tried mixing 
of primary and secondary estimations (just the secondary one 
was used before) and I like the results: 
 
http://ctxmodel.net/files/MIX/mix_v7.htm 
http://ctxmodel.net/files/MIX/mix_v7_SFC.htm 
 
Guess it would be even better if we combine it with simplified 
delayed counter like in 7c* revisions.

2008-08-21 11:54:56 toffer          > 
Do you have any more progress? What about your audio modeling? I'm still investigating several optimization algorithms.

2008-08-21 13:25:00 Shelwien        > 
My jobs keep me busy for now. 
And audio compressor is the next thing I'd make when I'd get some free time, 
as its seems easier than a new bytewise "universal" compressor 
(while continuing with Mix is boring somehow). 
Also thanks to Christian I remembered about noise in audio data, 
which is a great source of redundancy in my lamix_v0 coder, as 
it processes all the bits with the same counters and doesn't 
limit the redundancy for low bits in any way.

2009-01-05 21:05:45 Fatlip1         > 
Eugine, is "mix" legally dead now? It's really sad since it did pretty well in my tests.

2009-01-06 20:38:30 Shelwien        > 
Well, it wasn't quite alive from the beginning ;) 
Meaning that it was never intended for practical use. 
But I still do some tests with it, even now there're 3 instances  
of optimizer running with versions of mix7d2 - byte mapping 
experiment. 
So in theory its still possible that I'd post more versions, 
if something interesting happens with it. 
But in practice I'm more interested in bytewise compressors 
(did you notice tree_v1 i wonder?), and maybe lossless audio 
compressors.

2013-08-09 02:27:27                 > 

2014-11-26 07:33:17                 > 

2015-01-11 04:42:46                 >