Previously in the conversion cost table there are no entries for integer-integer conversions on SSE2. This will result in imprecise costs for certain vectorized operations. This patch adds those entries for SSE2. The cost numbers are counted from the result of running llc on the new test case in this patch.
Details
Diff Detail
Event Timeline
lib/Target/X86/X86TargetTransformInfo.cpp | ||
---|---|---|
734 | These values don't appear to be correct for SSE41 which has PMOVSX/PMOVZX ops - maybe split off the 128-bit extension ops from AVXConversionTbl into SSE41ConversionTbl ? |
lib/Target/X86/X86TargetTransformInfo.cpp | ||
---|---|---|
734 | SSSE3 also provides pshufb from which several operations here can benefit. So you suggestion adding more tables for SSSE3/SSE4.1? |
lib/Target/X86/X86TargetTransformInfo.cpp | ||
---|---|---|
734 | You may not need SSSE3 (often PSHUFB is as costly as fixed shuffles on older hardware - it just reduces register use) but splitting the extensions from AVX into SSE41 needs to be done. | |
755 | I haven't checked this very thoroughly but you might need to improve non-simple type handling here, especially for extensions? |
lib/Target/X86/X86TargetTransformInfo.cpp | ||
---|---|---|
734 | I agree. SSE41 really provides several instructions that can greatly reduce those costs. I have added a table for SSE41 and also updated the test case. PTAL. | |
755 | This patch is actually adding cost entries for non-simple types. I added another query on the table below for non-simple types except this one. Those entries of float/int conversions also need to be updated for non-simple types and I will do it later. To prevent vector-size/types combination explosion, I think we probably need to redesign the cost table. |
Update the patch by adding a cost table for SSE4.1. The test case is also updated accordingly.
LGTM - if possible please add a FIXME comment describing the future work necessary to improve the handling of simple + non-simple types.
These values don't appear to be correct for SSE41 which has PMOVSX/PMOVZX ops - maybe split off the 128-bit extension ops from AVXConversionTbl into SSE41ConversionTbl ?