<< ctxmodel.net

> > Then again, nobody said that unrolling and vectorization
> > and prefetching couldn't be used for LZ decoding.
> Actually, I did. I have tested several unrolled versions for
> copying matches, but I didn't succeeded in.

Well, I wasn't talking about copy routines (that's obvious), but about things like:

  • aligned reading, and calculating the hashes for multiple positions at once
  • keeping some useful substrings in registers
  • branch removal by vector instructions (like min/max and select)
  • rangecoder i/o optimizations - like what IntelC does for fpaq0pv4B
  • rangecoder multiplication removal by range splitting

> Although, in old days I have some really good results with my ancient pentium-2 400mhz.
> New CPUs are really different (like mine: core2 2.2ghz).

Not that much is different actually... compilers still do weird things instead of optimization.

> With my knowledge, I cannot make huge difference with assembly
> when we compare pure c/c++ code.

I think its a matter of tools, not knowledge.

Only compilers are able to use global optimization techniques, automated vectorization, and adjust the inlined code to environment. But only their developers know how to force them to do something specific (if anybody knows at all). So even if you see how to improve some code block (and compiler-generated code still looks ugly most of the time) - its mostly waste of the time as that particular code block would probably disappear after any source modification.

So its still possible to manually write faster code (if only because of compilers not knowing how to use stack and flags), but then you basically have to write it all manually - the approach with asm inlines only for bottlenecks doesn't work anymore as compiler optimization around these asm inlines significantly deteriorates.

> So, currently I'm working on implementations. The optimization comes at the end

Language syntax has a major effect on the algorithms and data structures choice. So there might be a sense to write a (simple and slow) assembly implementation first, and then port the algorithm to C++ or whatever (for automated optimizations). Sounds reversed, but after all compiler is a code management tool which allows to automate some routine optimizations - just write your own if you can't deal with existing ones.

2014-11-26 09:24:13                 >
2015-01-11 06:50:40                 >

Write a comment: