Resolved PR33954.
This patch contains two more constraint that aim to reduce the noise cases where we convert CMOV into branch for small gain, and spending more cycles due to overhead.
Details
Diff Detail
Event Timeline
Added couple of tests that represents the bug in PR33954 and cover the change in the patch.
Notice that I needed to modify one of the previous tests.
I measured the performance manually on a small manually written main function, and noticed improvement with the new decision - as explained in the LIT test pr33954.ll.
lib/Target/X86/X86CmovConversion.cpp | ||
---|---|---|
412 | Would this be a bit more readable? if (diff[i] < GainCycleThreashold) return false; bool WorthOptLoop = false; if (Diff[1] == Diff[0]) WorthOptLoop = Diff[0] * 8 >= LoopDepth[0].Depth; else if (Diff[1] > Diff[0]) WorthOptLoop = (Diff[1] - Diff[0]) * 2 >= (LoopDepth[1].Depth - LoopDepth[0].Depth) && (Diff[1] * 8 >= LoopDepth[1].Depth); | |
test/CodeGen/X86/pr33954.ll | ||
2 | Since the -x86-cmov-converter is invoked explicitly, should the -x86-cmov-converter-threshold=4 flag be added too? | |
test/CodeGen/X86/x86-cmov-converter.ll | ||
29 | Maybe emphasize that in this case we bail out due to a low absolute gain in *cycles*, despite the *relative* gain being acceptable. | |
100 | if (t *a> b) |
Thanks Zvi for the comments.
Please see answers below.
test/CodeGen/X86/pr33954.ll | ||
---|---|---|
2 | Actually, I want to guard the threshold, whoever change it, should revisit this test. | |
test/CodeGen/X86/x86-cmov-converter.ll | ||
100 | Why is this important? |
Would this be a bit more readable?