> > and prefetching couldn't be used for LZ decoding.
> Actually, I did. I have tested several unrolled versions for
> copying matches, but I didn't succeeded in.
Well, I wasn't talking about copy routines (that's obvious), but about things like:
> New CPUs are really different (like mine: core2 2.2ghz).
Not that much is different actually... compilers still do weird things instead of optimization.
> when we compare pure c/c++ code.
I think its a matter of tools, not knowledge.
Only compilers are able to use global optimization techniques, automated vectorization, and adjust the inlined code to environment. But only their developers know how to force them to do something specific (if anybody knows at all). So even if you see how to improve some code block (and compiler-generated code still looks ugly most of the time) - its mostly waste of the time as that particular code block would probably disappear after any source modification.
So its still possible to manually write faster code (if only because of compilers not knowing how to use stack and flags), but then you basically have to write it all manually - the approach with asm inlines only for bottlenecks doesn't work anymore as compiler optimization around these asm inlines significantly deteriorates.
Language syntax has a major effect on the algorithms and data structures choice. So there might be a sense to write a (simple and slow) assembly implementation first, and then port the algorithm to C++ or whatever (for automated optimizations). Sounds reversed, but after all compiler is a code management tool which allows to automate some routine optimizations - just write your own if you can't deal with existing ones.
2014-11-26 09:24:13 >
2015-01-11 06:50:40 >