<< ctxmodel.net

ST2rc story

ST2rc is a demo compressor based on a order2 Schindler Transform + Move-to-Front + Rangecoding. Later order0/order1 mixing was added, MTF appeared inefficient and was removed, and E8/E9 x86 filter was added.

Initial motivation was to write a tool for context history separation, which would allow to demonstrate to some people that a counter is basically a standalone model, so its not necessary to limit yourself with some well-known implementation, especially in the cases (contexts) where a high-quality model generates much less entropy than a simple counter. Well, anyway, first its necessary to estimate the entropy of a context with a good model, and only after that try to balance the estimation precision and speed by expanding the context or using more complex approximations like state machines and delayed counters.

But then writing the symbol sequence from each order2 context to a separate file seemed inconvenient, and a 256x256 index for each context in a single file wasn't really needed as it appeared possible to derive from the transformed data... so in the end ST2 was the result. And then it seemed just logical to add MTF and fpaq0pv4B aka "fastest order0 coder" and compare the result to other order2 coders, thus ST2rc was created.

ST2rc v0. Benchmarks

The alpha version included the standalone ST2/reverse-ST2 trasformation utility, and then unmodified fpaq0pv4B was attached to it (with some i/o wrappers), so it was even possible to unpack the ST2-transformed data with a standalone fpaq0pv4B.

Also some hours was wasted at an attempt to make it a bijective transformation, but in the end it appeared easier to add 2 extra bytes, and be sure that it would be reversible for any data.

ST2rc v1. Benchmarks

The order0 coder got simplified (block length encoding removed), and MTF/reverse-MTF functions added (as traditional for blocksorting compression).

Btw, this MTF implementation deserves some attention because MTF is a relatively resource-intensive algorithm - eg. swap-based designs require 255 operations per each symbol in the worse case, but here it was written in a vectorizable way and successfully vectorized by IntelC.

ST2rc v2. Benchmarks

  • Compression improved by switching to order1 model of MTF output
  • Tested the effect of disabling the fpaq0pv4B "macrofication" preprocessor (processing became slightly slower as expected, but not much).
  • Tested the rc_mm (original rangecoder from fpaq0p) vs rc_sh1m (A carryless rangecoder like rc_mm has a little redundancy by definition)

ST2rc v3. Benchmarks

  • Shift5-counter replaced with a Node2i (with some improvement in compression and decreased speed, as expected)
  • Adaptive mixing of order0 & order1 added (via iMixer)
  • MTF disabled as it hurts compression %-) (no wonder because there're different forms of the same word and the like, so same patterns could repeat in different places)
  • E8/E9 filter added (when E8 or E9 code found, followed by a -16M..16M offset, the offset is relocated; the offset limit also effectively disables the filter on non-binary data).
  • Temporary switch to rc_mm as rc_sh2d produced 200k worse results (due to simplified renormalization)
Troubles at optimization started to appear, such that it became more convenient to optimize the parameters using rc_mm. The main problem is that a rangecoder with full carries is able to generate virtually any amount of code at once, thanks to FF counter. So buffer overflows started to happen during parameter optimization, because really ineffective parameter combinations are sometimes tested too.

And of course rc_mm doesn't have this problem - it just produces much more redundancy in such cases. But then there seemed to be another trouble with rc_sh2d - its compression became worse, apparently due to fast 16-bit renormalizations. It seems that some patterns in the data tend to "synchronize" with renormalization cycle, so that bits with skewed probabilities constantly have to be encoded exactly at the low range values.

ST2rc v4. Benchmarks

  • Mixing with alternate order0 version added.
  • Mixing with alternate order1 version added.
  • Problems with rc_sh2d still not solved :(.
  • With such amount of mixing, the coder became considerably slow.
Adding another submodel does improve the compression, but it comes at its price.

Conclusion (for now)

Of course, the compression could be further improved by the same method - it surely would be useful eg. to mix the same submodels in other order with other mixer instances (and mix the result of that with current estimation), double or triple mixer updates would help too, and there's whole order2 still untouched. Also the speed can be somewhat improved by restoring some of the discared (for readability) speed optimizations (like "macrofication"), and a whole lot of new optimizations are applicable. But still that doesn't seem like a good way, because there's a more convenient scheme for "heavy" modelling techniques - unary coding, that is. And also bitwise coding has some known redundancy in coding the files with incomplete charset (like texts, where only a part of byte values can be encountered), and blocksorting creates very good examples of such data.

Thus, it seems necessary to first switch to unary coding, and then think about adding more advanced techniques like various probability mappings. But alas, that can't be exactly called "switching" because usual binary counters are completely inconvenient for unary scheme (dynamic ranking requires probability remapping which is very redundant with limited precision integer counters).

Well, of course there're some "historical" compressors which already contain all necessary data structures, but some theoretical points had significantly changed since that time, so now its necessary to write something completely new.


2013-08-09 22:13:09                 >
2014-03-24 09:35:07 vZkfNEf         >
aha, I like this topic, bookmark this page, huangjintang.
2014-11-27 06:24:58                 >
2015-01-12 05:39:40                 >

Write a comment:

Name: