Injecting equivalent IR in place of x86 intrinsic calls via InstCombineCalls.cpp.
Details
Diff Detail
- Repository
- rL LLVM
- Build Status
Buildable 18403 Build 18403: arc lint + arc unit
Event Timeline
test/CodeGen/X86/combine-shl.ll | ||
---|---|---|
609 | Where is the SSE codegen? |
I fear that LICM is going to separate this pattern if its used in a loop and then isel won't be able to see the whole pattern.
Can you provide an example of such behavior? I tried various (rather simple) tests like this one:
m128i foo(m128i a, __m128i b, int x, int y) {
__m128i tmp; for (int i = 0; i < y; i++) { x *= 20; tmp = _mm_sll_epi64(a, b); } return _mm_slli_epi64(tmp, x);
}
and for all of them the whole pattern get hoisted out of the loop and folds without a problem.
Nevermind, I got it wrong - after compiling something similar to https://bugs.llvm.org/show_bug.cgi?id=37417, it does split the pattern. Moreover, it splits it in different manner for different intrinsics and hoists more than one more instruction, so it might not be as simple as moving back than one subtraction in the bug reproducer. I don't think there is any other way though - replicating simplifyX86immShift and simplifyX86varShift directly in IR would be a nightmare.
We already do a lot in InstCombineCalls to convert x86 vector shift calls to generic shifts (mainly for constants, valuetracking might allow us to do a bit more?), and as others have mentioned, splitting IR like this is likely to cause us a lot of problems.
It could be useful to allow the backend to remove out of bounds shift masks where the instruction supports implicit zeroing (e.g. to avoid sanitizer warnings: https://gcc.godbolt.org/z/Abcq8n)
Where is the SSE codegen?