Context Modelling

<< ctxmodel.net

> cm engines usually encode one bit at once.

There's nothing "usual" in that actually.

Or, more like, it mostly applies to paq-way compressors, while eg. 5 years ago bitwise compressors were relatively rare.

Also there're no benefits in using binary decomposition, as it requires more complex model code, its slow due to entropy coder calls on every bit, and its bad for compression as most of target files have bytewise structure.

So the only valid reason I can see for writing bitwise compressors is that it allows for more straightforward statistics retrieval, when used in conjunction with a simple hash, in which case a basic compressor might be really simpler at least in design, if not by program code.

But it seems that more frequent reason recently is imitating Matt without any thinking.

> ppmd encodes full byte at once so suffix tree is compact and efficent in that case.

The actual advantage of context trees is precise statistics, while for hash its more natural to have index collisions. But its also a disadvantage at the same time, as merging of similar statistics has to be explicitly implemented for a tree, and hash "does it by default", so when an inexperienced person writes a compressor, the results would be automagically better with hashing.

Also trees have their other drawbacks, like a complicated structure (hard to design), troubles with controlling their size, unpredictable access speed etc.

But still, there's a lot more to data structures than a simple hash. Though even with a simple hash, it could be interesting if only someone tried to design a hash function with context analysis instead of copy-pasting it from some available source.

2011-03-10 12:54:24 Cyan            > 
Interesting idea  
 
"design a hash function with context analysis" 
 
Could you provide some hints on what you're having in mind, using some very simple example ? 
 
Regards

2011-03-11 01:05:55 Shelwien        > 
The best hash function is a data compression function. 
It provides an unique number for each string, with smaller 
numbers corresponding to more probable strings. 
 
Sure, using a full-strength CM for mapping a string to 
hashtable would be impractical. But something like order1 
huffman code might be worth consideration.

2013-08-01 15:38:16                 > 

2014-11-27 03:24:43                 > 

2015-01-12 03:34:26                 >