This patch lowers the SAD intrinsics to native LLVM IR. Comes with a clang patch (D45722).
Details
Diff Detail
Event Timeline
lib/IR/AutoUpgrade.cpp | ||
---|---|---|
2389 ↗ | (On Diff #142771) | Can't N be calculated from CI.getType()? |
Don't remove the -fast-isel tests - they should be updated to test the generic codegen instead
How much value are we getting out of this change? Does this expose a lot of optimization potential to the middle end? This is a pretty complex sequence. How easy/likely is for the middle end to mess this up and make it hard for the backend to recognize?
This will be the 3rd code path we'll have for PSADBW recognition.....
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
38075 | Can we reuse/tweak matchBinOpReduction to do this for us? |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
38075 | This is not a scalar reduction. The patterns calls for a sum of specifically formed vectors (hence all the checks below) to form the PSADBW instruction where it is exactly semantically fitting rather than where it can be used as a reduction tool. This is also why the third path to recognize it is being added - other paths use it for reductions and so don't actually need the input pattern to match it in terms of which qword the specific byte corresponds to. |
Can we reuse/tweak matchBinOpReduction to do this for us?