ST2rc is a demo compressor based on a order2 Schindler Transform + Move-to-Front + Rangecoding. Later order0/order1 mixing was added, MTF appeared inefficient and was removed, and E8/E9 x86 filter was added.
Initial motivation was to write a tool for context history separation, which would allow to demonstrate to some people that a counter is basically a standalone model, so its not necessary to limit yourself with some well-known implementation, especially in the cases (contexts) where a high-quality model generates much less entropy than a simple counter. Well, anyway, first its necessary to estimate the entropy of a context with a good model, and only after that try to balance the estimation precision and speed by expanding the context or using more complex approximations like state machines and delayed counters.
But then writing the symbol sequence from each order2 context to a separate file seemed inconvenient, and a 256x256 index for each context in a single file wasn't really needed as it appeared possible to derive from the transformed data... so in the end ST2 was the result. And then it seemed just logical to add MTF and fpaq0pv4B aka "fastest order0 coder" and compare the result to other order2 coders, thus ST2rc was created.
ST2rc v0. Benchmarks
The alpha version included the standalone ST2/reverse-ST2 trasformation utility, and then unmodified fpaq0pv4B was attached to it (with some i/o wrappers), so it was even possible to unpack the ST2-transformed data with a standalone fpaq0pv4B.
Also some hours was wasted at an attempt to make it a bijective transformation, but in the end it appeared easier to add 2 extra bytes, and be sure that it would be reversible for any data.
ST2rc v1. Benchmarks
The order0 coder got simplified (block length encoding removed), and MTF/reverse-MTF functions added (as traditional for blocksorting compression).
Btw, this MTF implementation deserves some attention because MTF is a relatively resource-intensive algorithm - eg. swap-based designs require 255 operations per each symbol in the worse case, but here it was written in a vectorizable way and successfully vectorized by IntelC.
ST2rc v2. Benchmarks
ST2rc v3. Benchmarks
And of course rc_mm doesn't have this problem - it just produces much more redundancy in such cases. But then there seemed to be another trouble with rc_sh2d - its compression became worse, apparently due to fast 16-bit renormalizations. It seems that some patterns in the data tend to "synchronize" with renormalization cycle, so that bits with skewed probabilities constantly have to be encoded exactly at the low range values.
ST2rc v4. Benchmarks
Conclusion (for now)
Of course, the compression could be further improved by the same method - it surely would be useful eg. to mix the same submodels in other order with other mixer instances (and mix the result of that with current estimation), double or triple mixer updates would help too, and there's whole order2 still untouched. Also the speed can be somewhat improved by restoring some of the discared (for readability) speed optimizations (like "macrofication"), and a whole lot of new optimizations are applicable. But still that doesn't seem like a good way, because there's a more convenient scheme for "heavy" modelling techniques - unary coding, that is. And also bitwise coding has some known redundancy in coding the files with incomplete charset (like texts, where only a part of byte values can be encountered), and blocksorting creates very good examples of such data.
Thus, it seems necessary to first switch to unary coding, and then think about adding more advanced techniques like various probability mappings. But alas, that can't be exactly called "switching" because usual binary counters are completely inconvenient for unary scheme (dynamic ranking requires probability remapping which is very redundant with limited precision integer counters).
Well, of course there're some "historical" compressors which already contain all necessary data structures, but some theoretical points had significantly changed since that time, so now its necessary to write something completely new.
2013-08-09 22:13:09 >
2014-03-24 09:35:07 vZkfNEf >
aha, I like this topic, bookmark this page, huangjintang.
2014-11-27 06:24:58 >
2015-01-12 05:39:40 >