Fix PR23384.
The patch adds instructions number generated by a solution to LSR cost under "-lsr-insns-cost" option.
Performance improvement on x86:
spec2000
177.mesa on -O2 +3%
256.bzip2 on -Ofast -flto +1.5%
Differential D28307
Add Instruction number to LSR cost model (PR23384) part 1 of 3 evstupac on Jan 4 2017, 11:46 AM. Authored by
Details Fix PR23384. 177.mesa on -O2 +3%
Diff Detail
Event Timeline
Comment Actions Updated test according to the latest comments. movl (%rsi,%rcx,4), %eax ... incq %rax cmpq %rcx, %r8 or movl (%rsi,%rcx), %eax ... addq $4, %rax cmpq %rcx, %r8 "cmpq %rcx, %r8" could also be replaced with "decq %r8". Comment Actions LGTM. I assume that you're going to send a message to llvm-dev asking people to test this? Comment Actions Thanks! Comment Actions Hi, I'd like to see opt tests as well and one piece of the code seems dead to me unless I am missing something. See my inline comment. Also a related question, what is your plan to move with that heuristic? Generally speaking, I think the number of instructions is a good heuristic only for code size and it is not even a given. Indeed, for performance we care about critical path, IPL, this kind of thing. What am I saying is having more instructions is not necessarily a bad thing. The other thing is I believe we can end up with cases with less instructions, but more registers and potentially more spill. This is possible to happen because the heuristic that account for "spill" only trigger when the NumRegs gets high enough. However, NumRegs may be low enough and spill may already happen. See my comment below as well. Cheers,
Comment Actions Hi, Thanks for taking a look.
As we deal in D27695, there will be next part moving Solution cost comparison to a target part. I understand that the change is very sensitive - so an option gives testing opportunity.
Currently LSR take in account only Number of Registers. It is very important for 32 bit mode, but less important for 64 bit mode and float/vector loops. Thanks,
Comment Actions ...
I agree, for performance we should look at critical-path length, ILP, etc. However, here, we're only ranking one formula at a time, and these generally correspond to dependent chains of computations. Making each formula shorter (i.e. using fewer instructions), as a result, will tend to make each path-length shorter. This is obviously heuristic, because we don't have an interface here to query expected latencies, but most of these things are simple integer operations, so I suspect it is not a bad approximation. Nevertheless, if any of these formula are on the critical path, then hopefully it too will get shorter. For ILP, we may want a longer sequence, but I think it would be hard to reason about that here. It seems better to do that in the MachineCombiner, or something at that level, and for that purpose, using fewer instructions is also probably better: they'll be easier to pattern-match at the MI level.
Comment Actions Please merge the opt and llc test cases into one file.
That may help catch the difference between those two modes. E.g., have a common prefix BOTH and one WITH and one WITHOUT. If the difference between WITH and WITHOUT is too big (i.e., not that much BOTH prefix), don't bother.
Comment Actions LGTM, nitpicks on the check pattern for the llc test case.
|