<< ctxmodel.net

> Especially considering that it requires like a day of coding to make a alpha version ;)

Well, so here's the result: Mix v0 and the detailed benchmark is there.

Its basically just another mixer demo, starting with

mix(mix(mix(o0,o0'),mix(o1,o1')),mix(o2,o2'))
and lamely hashed higher orders (24M per order) are mixed over that. These are not exactly the traditional CM orders though, because the optimizer masked out half of the bits.

> [...] I can patch together an order6 coder [...]
> But even before doing it I know that both its speed and quality
> would be nothing special.

...And now we've got a coder comparable to rar and ppmd in compression, but at 10x slower speed (that can be somewhat fixed though): as expected.

In conclusion, here's what has to be done to reach the compression level of CCM/LPAQ/PAQ9.

  • Match model is a must, it seems;
  • Word model would be helpful too;
  • a match model probably would provide 2x the current speed, but even then only o2mix would be able to compete with CCM - that means 6 counters and 5 mixers at most. So delayed counters have to be used (or state machines, but I won't bother with that);
  • encoding from file instead of reading the file into memory, and also using the stream-oriented version of E8 would somewhat improve speed too;
  • the hashes probably could be fixed for better compression;
But further development of this coder is not really reasonable. Well, except for streamed E8 implementation which could be used elsewhere. Because eg. creating a new bytewise unary coder with tree-based statistics seems much more beneficial.

---

Intel Core2 Q9450 3.68Ghz=460x8, DDR2 5-5-5-18 @ 920=460*2
CodecDatasizeCtimeDtimeMetricNotes
rar371-m117436629817.64216.48627427.2 rar 3.71 -m1 -mdg - http://rarlab.com/rar/wrar371.exe 
rar371-m214845437549.87316.12423407.1 rar 3.71 -m2 -mdg 
rar371-m314451516665.15815.81322803.8 rar 3.71 -m3 -mdg 
rar371-m413981503172.65638.59322304.7 rar 3.71 -m4 -mdg 
rar371-m4t14389279376.14115.71722716.6 rar 3.71 -m4 -mdg -mct- 
rar371-m513987543581.11047.50022411.6 rar 3.71 -m5 -mdg 
rar371-m5t14362934988.43717.15722702.1 rar 3.71 -m5 -mdg -mct- 
 
balz112-e138914945263.48550.79822476.9 http://encode.ru/balz/balz112.zip  
balz112-ex136358563428.93850.50022240.0 -ex  
balz113-e137842934263.68858.24922384.1 http://encode.ru/balz/balz113.zip  
balz113-ex135284357436.73458.01522155.1 -ex  
balz114-e14378159881.40535.87622906.0 http://encode.ru/balz/balz114.zip  
balz114-ex141820766125.79635.40622639.4 -ex  
 
ppms-415123703077.92183.06324539.3 PPMs Jr1 -o4 http://compression.ru/ds/ppmsj.rar 
ppmd-8137974856112.140120.81222878.8 PPMd Jr1 -m1980 -o8 http://compression.ru/ds/ppmdj1.rar 
ppmj-10117045960749.500337.96822417.6 PPMonstr Jr1 -m1980 -o10 http://compression.ru/ds/ppmdj1.rar  
paq9a-7117303349555.610558.84524472.7 paq9a -7 http://cs.fit.edu/~mmahoney/compression/paq9a.zip 
cmm4-75120002417287.188289.26621930.2 http://freenet-homepage.de/toffer_86/cmm4_01f_080527-gcc4.3.7z 
 
st2rc3-216579481013.90531.84326237.8 http://shelwien.googlepages.com/st2rc_v3.rar 
st2rc3e89-316449996013.76732.21926039.1  
st2rc6e89mm-215913353152.76374.02925657.7  
 
ccm-0127465606115.874115.51621187.5 CCM 1.30c -0 http://christian.martelock.googlepages.com/dl_ccm130c.zip 
ccm-1125991688120.141119.62421002.6  
ccm-2124869661121.641121.09420843.5  
ccm-3124017539121.140120.78220706.7  
ccm-4123346433120.859120.50020598.7  
ccm-5123152614121.610121.37520578.0  
ccm-6 123098025124.672124.79620606.7  
ccm-7123066391127.749128.25020639.4  
 
o2mix-3161322206198.733215.49927560.3 http://ctxmodel.net/files/MIX/mix_v0.rar  
o3mix-1148964049326.469336.54726967.6  
o4mix-1142328712413.516421.46826867.1  
o5mix-2139873527521.796524.61027623.1  
o6mix-1139534658655.110665.84429115.8  


2008-07-05 11:54:26 osmanturan      >
> Match model is a must, it seems
I don't think so for PAQish match model. toffer has good example of about with match model vs without match model. Look at this:

http://www.encode.ru/forum/showpost.php?p=1433&postcount=21

Surely, a better model can be made. Maybe, we can ask toffer for it's details. He really likes to share his experiences with any other people. Not like the others :-)
2008-07-05 12:08:26 toffer          >
By accident i found that you've also posted things, which didn't appear at encode.ru, maybe private messaging, etc... Would you mind to post this stuff in the forum and maybe create a thread? It would be better if things don't spread around.
2008-07-05 12:44:32 inikep          >
> at 10x slower speed (that can be somewhat fixed though)

How much improvement do you think is possible?

> only o2mix would be able to compete with CCM

So we still don't know from CCM speed comes.

> But further development of this coder is not really reasonable.

I think that now is a good time to start discussion about a CCM/LPAQ.

Thanks for accepting my "challenge".
2008-07-05 21:29:31 Shelwien        >
@inikep:
> > at 10x slower speed (that can be somewhat fixed though)
>
> How much improvement do you think is possible?

1. A match model would allow to avoid all this mixing
for 50-70% of data. (2x)
2. Dual o0,o1,o2 may be not that useful with higher orders
(1.5x)
3. A single o2 instance takes 16M*(4+2)=96M. It should be
hashed instead (1.2-1.5x)
4. All counter arrays can be reduced in half by removing
the access counters (along with dynamic update speed which
has effect only at coding start) (1.2-1.5x)
5. File i/o and streamed E8 (1.1x)
6. Formal speed optimizations (eg. using the template
scripts which I made for fpaq0pv4B). (1.1x)
7. Division in the mixer can be replaced with a table lookup.
8. "Precise" rc with 64bit multiplication is slow too.
9. A compressor can store counter predictions for the
block instead of mixing them right away, so an asymmetric
implementation is possible.
 
> > only o2mix would be able to compete with CCM
>
> So we still don't know from CCM speed comes.

I'm still not really impressed, in fact.
PPMd has the same speed, and no filters.
Guess I should benchmark durilca'light.

> > But further development of this coder
> > is not really reasonable.
>
> I think that now is a good time to start discussion
> about a CCM/LPAQ.

Maybe. I'm especially interested in the paq9 results with
my finnish dictionary -
http://ctxmodel.net/files/MIX/mix_v0.htm ... and overall
results distribution for that file.
 
> Thanks for accepting my "challenge".

The actual programming took like half an hour at most,
its just a (automatic) parameter optimization which
needed a day.
Btw, I made a new optimization target - concatenated
parts of each SFC file, and started o5 and o6 tuning
again. It might be interesting, because the previous
optimization was incremental (eg. o3mix used optimized
parameters from o2mix and only o3 parameters was
actually tuned in it... and so on), and this time _all_
the parameters are optimized by the same metric.

@toffer:
> By accident i found that you've also posted things, which
> didn't appear at encode.ru, maybe private messaging,
> etc... Would you mind to post this stuff in the forum and
> maybe create a thread? It would be better if things don't
> spread around.

Well, I want to experience maintaining my own site and
also I'm trying to make a blog engine to my liking,
and its reasonable to take advantage of the content I'm writing.

But of course you can cross-post anything to the forum
if you like to discuss it there for some reason, but
anyway there're only a few people who actually discuss
such topics.

Btw, mainly I don't like how its troublesome to make
my posts in the forums to look as I intend them to,
and also Ilia keeps his unstable hosting where DB server
is down 10% of the time.

@osmanturan:
> > Match model is a must, it seems
> I don't think so for PAQish match model. toffer has good
> example of about with match model vs without match model.

Well, I'm talking about a bytewise match model which
would encode a flag for rank0 byte values, and only
proceed with mixing and updating counters when its
not a rank0 symbol. Its a first step of unary coding
actually, and taking into account that rank0 hits
for more than 50% of bytes...
So its not a model specific for very long matches.
but it would probably work good enough for long matches
as well.
2008-07-07 11:26:05 inikep          >
> A match model would allow to avoid all this mixing
> for 50-70% of data. (2x)

Nice, so altogether it's possible to get 6-10x.


> I'm still not really impressed, in fact.
> PPMd has the same speed, and no filters.

I don't believe that filters do matter for CCM's speed. MFC contains over 400 files in 30+ formats.


> Guess I should benchmark durilca'light.

Accoring to maximum compression durilca'light have almost the same decompression speed (173 s) as PPMd, but much slower compression (366 s).
2008-07-07 11:36:05 inikep          >
>I'm especially interested in the paq9 results with
my finnish dictionary

It seems that this file suits PAQ. The results with lpaq8 are similar?
2013-08-13 12:27:18                 >
2013-08-14 20:30:24                 >
2013-10-14 06:39:21                 >
2014-01-06 21:26:13                 >
2014-05-11 19:01:39                 >
2014-07-15 13:49:58                 >

Write a comment:

Name: