PPM? Into practice! Dmitry Shkarin Institute for Dynamics of Geospheres, Moscow, Russia E-mail: dmitry.shkarin@mtu-net.ru PPM is one of most promising lossless data compression algorithms using Markov source model of order D. Its main feature is coding of a new (in the given context) symbol in one of inner nodes of the context tree; special escape symbols are used for description of this node. In the reality, the majority of symbols is encoded in inner nodes and Markov model becomes rather conventional. In spite of the fact that PPM algorithm achieves best results on comparison with others, it is used rarely in practical applications, owing to its high computational complexity. This paper is devoted to PPM algorithm implementation that has complexity comparable with widespread practical compression schemes based on LZ77, LZ78 and BWT algorithms. This scheme is named PPM with Information Inheritance (PPMII). $1. PPM algorithm - basic definitions. Let A be discrete alphabet consisting of M>=2 symbols; x^n = x[1],...,x[n], x @ A be first _n_ symbols of message; |L| be length of sequence _L_ or cardinality of set _L_. Probability of symbol _x[n+1] = a @ A_ for a source with memory depends on current context s[d] = x[n],...,x[n-d+1] @ A^d, d<=D, in general case this context has variable length. The set of all possible contexts can be presented as nodes in M-ary tree with depth D. Context (sequence) _s_ describes a path from tree root to current node also denoted as _s_. Usually true conditional probabilities are unknown and the coding conditional probabilities _q(a|s)_ depend on characteristics of one or several subsequences _x^n(s)_. These characteristics are - frequency _f(a|s) = f(a|x^n(s))_ of symbol _a_ in _x^n(s)_, alphabet A(s) = A(s,x^n) = { All _A_ : f(a|s)>0 }, its cardinality _m(s) = |A(s)|_ etc. Already at small _D_, number of model states is big, subsequences _x^n(s)_ are short in average, and their statistics is insufficient for effective compression. In particular, task of _multialphabet coding_ for unknown _A(s)_ is hindered. Multialphabet properties of PPM algorithm [1] are based on (implicit) assumption: the more common initial part of contexts, the more (in average) similarity of their conditional probability distributions. High PPM efficiency means this assumption is fair for majority of coded sources. Let _Z[n]_ be set of nodes _s[d],d<=D_ on current branch x[n], x[n-1], ... of context tree; _s(a)_ be context with maximal length, for which f(a|s)>0; d(a)=|s(a)| (if such context is absent then_d(a)=-1_ ) _q*(a|s)_ and q*(esc|s) be conditional probabilities used for coding in _s_ of symbols _a @ A(s)_ and escape symbol. Escape symbol signals about a new symbol appearance in _A(s)_. Key feature of all PPM modifications is representation of coding conditional probability for any symbol _x[n+1] @ A_ as q( x[n+1] | x^n ) = Prod( q*(esc|s[i]), i=d+1..D ) * q*( x[n+1] | s[d] ), d = d(a) (1) The product of escape conditional probabilities describes to the decoder a sequential descent from _s[D]_ to _s(a)_ (if _d(a)=D_ then this product is equal to 1). Usually q*(a|s) = t(a|s)/T(s), All _a @ A#(s) q*(esc|s) = t(esc|s)/T(s) (2) where _A#(s[j]) = A(s[j]) / A(s[j+1])_, _s[j+1]_ is a _child context_ of _s[j]_ on current tree branch _Z[n]_ (and vice versa, _s[j]_ is a _parent context_ for _s[j+1] ), _t(.|s)_ are _generalized symbol frequencies_ for _x^n(s)_, _T(s)_ is sum of all generalized frequencies. As a rule, _t(.|s)_ are written as _t(.|s) = f(.|s)+c(.|s)_. For example, PPMC [2] corresponds to _c(a|s) = c(esc|s) = 0_, for PPMD [3] - _c(a|s)=-1/2, c(esc|s)=-m/2_. After encoding of next symbol _x[n+1]_ it is necessary to update (to increase by one) its frequency in various tree nodes _s @ Z[n]_. First PPM variants [1] used _full updates_, at which, frequencies are changed for all _s @ Z[n]_. Later PPM variants [2] use _update exclusions_ , at which, frequencies are changed only in nodes with length |s|>=|s(a)|. In some sense, full updates and update exclusions correspond to two ultimate cases. During a descent along _Z[n]_ with help of escape symbols, one must eliminate symbols already checked in higher order contexts (exclusions). Let's name such symbols as masked symbols and contexts containing only such symbols - masked contexts. Let's designate _s_ as _binary context_ if _m(s)=1_ and two only probabilities are used, i.e. _q(a|s)=1-q(esc|s)_. $2. Evaluation of generalized frequencies of symbols. 1. Information inheritance. Main difficulty of all explicit context-based modeling schemes is statistics insufficiency in higher order contexts. There were proposed many ways to overcome this problem: predictions weighting for lower and higher order contexts (CTW, interpolated Markov model), symbol coding only in contexts that have enough (by some criterion) statistics (LOE, state selection). All these methods require big computational resources and unacceptable for our purposes. We can take advantage of a distribution functions similarity in parent and child contexts and set initial value of generalized symbol frequency in a child context with regard to information about this symbol gathered in a parent context. Such approach has two merits: firstly, reference to parent statistics occurs only at addition of new symbol to child context, i.e. rarely enough, that supposes existence of linear (not depending on tree depth) time complexity solutions; secondly, owing to rare use of parent statistics again, model can quickly adapt to variations of character of input data. Next notation is used below - _s[i]_ is the new context (T(s[i])=0) or the context, to that new symbol _a_ to be added (t(a|s[i])=0), _s[k]_ is the longest context that contains current symbol _a_ (s[k]=s(a)). Addition of a new symbol to the old context will be our initial concern. Locally, at the given point of coded text, it would be optimal immediately to use PPM-model of order _k_, i.e. to reduce tree depth to _k_; in this way, we eliminate errors associated with inaccuracy of escape probabilities estimation. On the other hand, we need only statistics similar to statistics in _s[i]_, for this reason, we must perform reduction of tree depth only along context branch _s[j],k ( =S/N0 ) value is subtracted from _S_, i.e. change of _S_ is _dS = d[j]-S/N0_ for each step. _N0_ is chosen as power of two to eliminate division operation. Number _h_ of vector _w(s)_ components and their influence on _q(esc|w)_ can be found experimentally only. They are enumerated below in the decreasing influence order. 1) Of course, escape probability depends on generalized symbol frequency _q(a|s)_ to a great degree. This variable is quantized to 128 values. 2) PPM uses similarity of parent and child contexts, therefore, alphabet size _m(s[i-1])_ of the parent have strong effect to the escape probability from the child. This _w(s)_ component is quantized to 4 values. 3) The experiments show high predictable data blocks are interleaved with low predictable ones in real sources. Sizes of these blocks are small, approximately 3-5 symbols, it corresponds to a natural language text segmentation into words and parts of words. Previous encoded symbol probability in previous context is included into _w(s)_ to watch switching between such blocks. This variable is quantized to 2 values. 4) Current coded symbol correlates to previous symbol mostly. However, we can not include whole previous symbol into _w(s)_ because number of SEE-contexts would be too big and frequency of each separate SEE-context would be too small. Only one bit flag is included into _w(s)_, flag value is set to 0 if two higher bits of previous symbol are zeroed and to 1 for opposite case. 5) The long block of symbols with length _L_ is the sequence of input symbols for which, there were not escapes to the lower orders and coding probability for _L_ symbols of this sequence was larger 1/2. It is quite probable PPM model with _D0); the position in coded string is only remembered for new contexts (T(s)=0). ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ ³ CONTEXT structure ³ ³ m(s)>1: ³ m(s)=1: ³ m(s) - 2 bytes ³ m(s) - 1 byte T(s) - 2 bytes ³ TRANSITION structure - 4 bytes Link to TRANSITIONs array - 4 bytes ³ Link to suffix(es) - 4 bytes Link to suffix(es) - 4 bytes ³ ³ ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ TRANSITION structure a - 1 byte t(a|s) - 1 byte Link to _successor(s,a)_ - 4 bytes ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Fig.1. Data structures of PPMII algorithm. Graphical representation of the used data structures is presented on Fig.1. It can be seen from this figure that the proposed algorithm wastes 12 bytes for each repeated (binary or nonbinary) context and 6 bytes for each transition structure in the nonbinary context. For comparison, one of the most memory economical PPM/PPM* implementations [6] requires 8 4-byte machine words for the nonbinary context structure, 6 words for the binary context structure and 6 words for the transition structure in the nonbinary context. The precision of the frequency representation is 1 for binary contexts and 1/4 for nonbinary ones. Statistics in nonbinary context is scaled (all frequencies are halved) when the value of one of frequencies exceeds threshold 30. Simplified variant of the range coder [7] is used as an entropy coder. Division operation in equation (5) is approximated by series of comparisons, other used approximations can be found in [5]. $5. Complicated variant. PPMII algorithm demonstrates very nice results, so it becomes to be interesting to look at maximal compression efficiency that similar approach can provide. We will not limit ourselves by requirement of low computational complexity in this paragraph. This modification of initial scheme is named complicated PPMII (cPPMII). Some improvements can be obtained by mere removing of introduced simplifications. Described at the end of $5 approximation of (5) has been cancelled. Delayed addition of new symbol to context ($2.1) is performed for any contexts, but not just for binary ones. For the update exclusions modification ($2.2), statistics is updated at any context length including _|s(a)|=D_ case. Moreover, statistics is updated for three contexts in parent-child chain with increments 1/2, 1/4, and 1/8. The precision of the frequency representation is enlarged up to 1/8. Other improvements require individual considerations. 1. Improving probability estimation for more probable symbols and for less probable ones. Statistics in the parents is accumulated faster than in "young" child contexts (with small _T(s)_), so there is good reason to use parent statistics repeatedly. Generalized frequency of each of more probable symbols (symbol _a1_ falls into this group when _q(a1|s)>=q(aMPS|s)/2_, _aMPS_ means the most probable symbol (MPS) in the context) is corrected when the child _s[i]_ is very young ( T(s[i])= q(aMPS|s[i])/2 ³ ´ T(s[i])~=(+)/2. Then, at small _N(i)_, we can update statistics not only for _i_ value, but also for (i-1) and (i+1), with some small weight that decreasing while _N(i)_ increasing. For calculations simplicity, this weight is chosen equal to 2^-Ceiling(log(N(i))) and it is set to zero after exceeding some threshold _N0_. In cPPMII, this technique is applied to any variable that is quantized to more than two values. 3. Adaptive probability estimation for MPS. Encoded message length is markedly affected by precise probability estimation of MPS, therefore, after frequency correction (see §5.1), adaptive probability estimation for MPS is performed. SEE-model for MPS is built similar to the model for escape symbols. Generalized frequency _t(aMPS|s)_ behavior is modeled for contexts without masked symbols and conditional probability _q(aMPS|s) behavior is modeled for contexts with masked symbols. For contexts without masked symbols, vector _w(s)_ consists of: 1) generalized symbol frequency _t(aMPS|s)_, it is quantized to 68 values; 2) one bit flag indicating whether statistics rescaling was performed; 3) the comparison result of current context length |s| with average used context length (averaging is performed over last 128 symbols); 4) the same as 4) and 6) items in $3.1 enumeration; For contexts with masked symbols, vector _w(s)_ consists of: 1) probability estimation _q(aMPS|s)_, it is quantized to 40 values; 2) the comparison result of average nonmasked symbol frequency with average masked symbol frequency; 3) one bit flag indicating whether only one symbol is masked; 4) the same as 2) and 4) items in previous enumeration; 4. Additional SEE-contexts components. Additional SEE-context fields for binary contexts are next: 1) the comparison result of _m(s[i-1])_ with number of symbols in the previous context; 2) the number of binary parent contexts, it is quantized to 2 values; 3) the flag built of two higher bits of symbol preceding to already encoded symbol; 4) the same as 3) item in the first $5.3 enumeration; Additional SEE-context fields for contexts with masked symbol are next: 1) the same as 1) item in the first $5.3 enumeration; 2) the same as 2) item in the second $5.3 enumeration; 5. Adaptive escape frequency estimation for nonbinary contexts (without masked symbols). Escape frequency estimation model for these contexts is built similarly to one for contexts with masked symbols, i.e. escape frequency is modeled, but not escape probability. Vector _w(s)_ consists of the next components: 1) the alphabet size _m(s)_, it is quantized to 25 values; 2) the result of calculation (4*m(s[i])>3*m(s[i-1])); 3) the same as 4) - 6) items in $3.1 enumeration; 4) the same as 2) and 3) items in the first $5.3 enumeration; 5) the same as 1) item in the first $5.4 enumeration; $6. Experimental results. PPMII algorithm and its complicated modification were implemented on C++ programming language and this implementation is publicly available at [5]. PPMd.exe executable file corresponds to the basic algorithm and PPMonstr.exe corresponds to cPPMII. All experiments were carried out on standard Calgary corpus [8]. ÚÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄ¿ ³ Order ³ PPMD ³ + II ³ + UEM ³ + SEE1³ + EE1 ³ + SEE2 ³ ÃÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄ´ ³ 2 ³ 2.790 ³ 2.766 ³ 2.766 ³ 2.767 ³ 2.767 ³ 2.759 ³ ³ 3 ³ 2.427 ³ 2.387 ³ 2.387 ³ 2.382 ³ 2.379 ³ 2.366 ³ ³ 4 ³ 2.310 ³ 2.254 ³ 2.254 ³ 2.235 ³ 2.230 ³ 2.212 ³ ³ 5 ³ 2.290 ³ 2.215 ³ 2.211 ³ 2.185 ³ 2.178 ³ 2.158 ³ ³ 6 ³ 2.297 ³ 2.204 ³ 2.197 ³ 2.166 ³ 2.158 ³ 2.137 ³ ³ 8 ³ 2.319 ³ 2.196 ³ 2.186 ³ 2.150 ³ 2.142 ³ 2.118 ³ ³ 10 ³ 2.339 ³ 2.195 ³ 2.184 ³ 2.143 ³ 2.136 ³ 2.111 ³ ³ 16 ³ 2.369 ³ 2.194 ³ 2.182 ³ 2.137 ³ 2.130 ³ 2.104 ³ ÃÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄ´ ³ Table 1. PPMII: step by step. ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ 1. Evaluation of contribution of each algorithm part. Unweighted average bits per byte (bpb) are presented in Table 1 for each PPM algorithm modification as described in $$2-3. Let's describe each column: [ PPMD] - original PPMD by P.G.Howard (implementation of author); [+ II] - previous scheme plus information inheritance ($2.1, w=1/4 in (5)); [+ UEM] - previous scheme plus update exclusions modification ($2.2); [+ SEE1] - previous scheme plus SEE-model for binary contexts ($3.1, w=1 for binary contexts); [+ EE1] - previous scheme plus escape estimation for nonbinary contexts ($3.2); [+ SEE2] - previous scheme plus SEE-model for contexts with masked symbols ($3.3); Table 2. Integral characteristics of various compressors. ÚÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¿ ³ Model ³ PPMII ³ cPPMII ³ ÃÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÄ´ ³ Order ³ Average ³ Time, ³ Memory, ³ Average ³ Time, ³ Memory,³ ³ ³ bpb ³ sec ³ MB ³ bpb ³ sec ³ MB ³ ÃÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄ´ ³ 2 ³ 2.759 ³ 3.18 ³ 0.6 ³ 2.716 ³ 8.51 ³ 1.1 ³ ³ 3 ³ 2.366 ³ 3.79 ³ 1.0 ³ 2.321 ³ 10.60 ³ 1.5 ³ ³ 4 ³ 2.212 ³ 4.51 ³ 1.9 ³ 2.170 ³ 12.46 ³ 2.4 ³ ³ 5 ³ 2.158 ³ 5.21 ³ 3.5 ³ 2.114 ³ 14.00 ³ 4.0 ³ ³ 6 ³ 2.137 ³ 5.88 ³ 5.6 ³ 2.090 ³ 15.21 ³ 6.1 ³ ³ 8 ³ 2.118 ³ 6.76 ³ 10.1 ³ 2.067 ³ 16.85 ³ 10.6 ³ ³ 10 ³ 2.111 ³ 7.25 ³ 13.3 ³ 2.057 ³ 17.57 ³ 13.8 ³ ³ 16 ³ 2.104 ³ 7.74 ³ 16.2 ³ 2.047 ³ 18.56 ³ 16.7 ³ ÃÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄ´ ³ For comparison ³ ³ ÚÄÄÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÂÄÄÄÄÄÄÂÄÄÄÄÄÄ¿ ³ ³ ³ ZIP -9 ³ 2.764 ³ 5.93 ³ 0.5 ³ ³ ³ ³ BZIP2 -8 ³ 2.368 ³ 5.87 ³ 6.0 ³ ³ ³ ³ PPMZ2 ³ 2.139 ³ n/a ³ >100 ³ ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÁÄÄÄÄÄÄÁÄÄÄÄÄÄÙ ³ ³ Table 2. Integral characteristics of various compressors. ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ 2. Time and memory requirements. In the second experiment, time and memory requirements of PPMII and cPPMII implementations were compared with widespread practical implementations of LZ77 (ZIP [9]) and BWT (BZIP2 [10]) algorithms and, also, with most powerful implementation of PPM* algorithm (PPMZ2 [11]). Results of measurement of compression efficiency (average bpb), of compression time (seconds) and of maximal memory requirements (megabytes) are presented in Table 2. It can be seen from table that basic PPMII algorithm provides wide range of opportunities. At small _D_ (2-3), it gives compression efficiency comparable to one of ZIP or BZIP2 with faster compression speed and smaller memory requirements than BZIP2 ones. At medium _D_ (4-6), PPMII efficiency is noticeable better than ZIP and BZIP2 ones and time/memory requirements remain moderate. Lastly, at high _D_ (8-16), PPMII outperforms the best of described programs PPMZ2 by all characteristics. Complicated cPPMII gives even better efficiency, but its low execution speed is not very promising. ÄÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄÂÄÄÄÄÄÄÄ File ³ CTW ³ ACB ³ FSMX ³ PPMZ2 ³ PPMII ³ cPPMII³ cPPMII³ cPPMII ³ ³ ³ ³ ³ -8 ³ -8 ³ -16 ³ -64 ÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÅÄÄÄÄÄÄ BIB ³ 1.782 ³ 1.935 ³ 1.786 ³ 1.717 ³ 1.732 ³ 1.694 ³ 1.679 ³ 1.676 BOOK1 ³ 2.158 ³ 2.317 ³ 2.184 ³ 2.195 ³ 2.192 ³ 2.136 ³ 2.135 ³ 2.135 BOOK2 ³ 1.869 ³ 1.936 ³ 1.862 ³ 1.843 ³ 1.838 ³ 1.795 ³ 1.782 ³ 1.782 GEO ³ 4.608 ³ 4.555 ³ 4.458 ³ 4.576 ³ 4.349 ³ 4.163 ³ 4.159 ³ 4.158 NEWS ³ 2.322 ³ 2.317 ³ 2.285 ³ 2.205 ³ 2.205 ³ 2.160 ³ 2.142 ³ 2.137 OBJ1 ³ 3.814 ³ 3.498 ³ 3.678 ³ 3.661 ³ 3.536 ³ 3.507 ³ 3.497 ³ 3.498 OBJ2 ³ 2.473 ³ 2.201 ³ 2.283 ³ 2.241 ³ 2.206 ³ 2.154 ³ 2.118 ³ 2.110 PAPER1 ³ 2.247 ³ 2.343 ³ 2.250 ³ 2.214 ³ 2.194 ³ 2.152 ³ 2.144 ³ 2.142 PAPER2 ³ 2.190 ³ 2.337 ³ 2.213 ³ 2.184 ³ 2.181 ³ 2.130 ³ 2.124 ³ 2.124 PIC ³ 0.800 ³ 0.745 ³ 0.781 ³ 0.751 ³ 0.757 ³ 0.721 ³ 0.715 ³ 0.704 PROGC ³ 2.330 ³ 2.332 ³ 2.291 ³ 2.257 ³ 2.215 ³ 2.178 ³ 2.161 ³ 2.161 PROGL ³ 1.595 ³ 1.505 ³ 1.545 ³ 1.445 ³ 1.470 ³ 1.433 ³ 1.398 ³ 1.390 PROGP ³ 1.636 ³ 1.502 ³ 1.531 ³ 1.448 ³ 1.522 ³ 1.489 ³ 1.414 ³ 1.391 TRANS ³ 1.394 ³ 1.293 ³ 1.325 ³ 1.214 ³ 1.257 ³ 1.228 ³ 1.186 ³ 1.172 AverageÄÅÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÅÄÄÄÄÄÄ bpb ³ 2.230 ³ 2.201 ³ 2.177 ³ 2.139 ³ 2.118 ³ 2.067 ³ 2.047 ³ 2.041 ÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄ Table 3. Compression efficiency for various compression schemes. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 3. Compression efficiency. In the last experiment, more detailed comparison is performed for proposed algorithms and for algorithms described in the literature. Next schemes are presented in Table 3: implementation [12] of CTW method [13] with binary decomposition (results were taken from [14]), associative coder by G.Buyanovsky (ACB) [15], the best of described by S.Bunton FSMX coders [6], PPMZ2 by C.Bloom [11]. Next column contains PPMII results for _D = 8_ and last three columns contain cPPMII results for _D = 8, 16, 64_. It is necessary to emphasize implementation [12] of CTW uses symbol decomposition specially optimized for English texts. Let's remark, in opposite to other PPM schemes, PPMII works well enough for nontextual data. Now, it is really - universal data compression ;-). References. 1. Cleary, J.G. and Witten, I.H. (1984) Data compression using adaptive coding and partial string matching. IEEE Trans. on Comm., 32(4):396-402. 2. Moffat, A. (1990) Implementing the PPM data compression scheme. IEEE Trans. on Comm., 38(11):1917-1921. 3. Howard, P. G. (1993) The Design and Analysis of Efficient Lossless Data Compression Systems. PhD thesis, Brown University. 4. Bloom, C. (1996) Solving the Problems of Context Modeling. www.cbloom.com/papers/. 5. Shkarin, D. (2001) PPMd - fast PPM compressor for textual data. ftp.elf.stuba.sk/pub/pc/pack/ppmdh.rar. 6. Bunton, S. (1996) On-Line Stochastic Processes in Data Compression. PhD thesis, University of Washington. 7. Martin, G.N.N. (1979) Range encoding: an algorithm for removing redundancy from a digitised message. Presented to the Video & Data Recording conference. Southampton. 8. The Calgary Compression Corpus. ftp.cpsc.ucalgary.ca/pub/projects/text.compression.corpus/. 9. Info-ZIP Group (1999) Zip v.2.3 - compression and file packaging utility. www.cdrom.com/pub/infozip/. 10. Seward, J. (2000) BZip2 v.1.0 - block-sorting file compressor. www.muraroa.demon.co.uk/. 11. Bloom, C. (1999) PPMZ2 - High Compression Markov Predictive Coder. www.cbloom.com/src/. 12. Volf, P.A.J. (1996) Text compression methods based on context weighting. Technical report, Stan Ackermans Institute, Eindhoven University of Technology. 13. Willems, F., Shtarkov, Y. and Tjalkens, T. (1995) The context-tree weighting method: Basic properties. IEEE Trans. on Inf. Theory, 41(3):653-664. 14. Volf P.A.J and Willems F.M.J.(1998) Switching between two universal source coding algorithms. Proc. IEEE Data Compression Conf. pp.491-500. 15. Buyanovsky, G. (1994) Associative coding. The Monitor, 8:10-22, in Russian.