PSADBW pattern currently supports the 32 bit IR pattern and only GLT (greather than) comparison.
The patch extends the pattern to catch also 64 bit IR pattern and includes all other comparison types (not only GLT).
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
29045 | instrcutions -> instructions | |
29226 | This looks a bit suspicious. I suggest you split the patch in two - one to extend to arbitrary integer types, the other for this. | |
test/CodeGen/X86/sad_variations.ll | ||
7 | Any reason the test cases have to be so large? Can you match a single psadbw instead of several? |
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
29226 | Unfortunately, the larger integer types change will have no effect if i don't skip the sign extend instruction, so i prefer to keep it together. I attached a ppt that explains the mathematical prove behind removing skipping the sign extend.Basically the logic i try to explain is:
The last function in the *.ll file demonstrates the pattern. |
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
29226 | I haven't really looking at the slides yet, but regardless - I don't understand why you say the change will have no effect. |
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
29226 | You are correct. I can just as well replace it with a zext (I will update the code). |
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
29226 | I think you'll need an extra check for the zext as well. In fact, you probably just want to match ZEXT/SEXT/ANYEXT here (assuming it's correct). %0 = bitcast i8* %cur to <8 x i8>* %1 = load <8 x i8>, <8 x i8>* %0, align 1 %2 = zext <8 x i8> %1 to <8 x i64> %3 = bitcast i8* %ref to <8 x i8>* %4 = load <8 x i8>, <8 x i8>* %3, align 1 %5 = zext <8 x i8> %4 to <8 x i64> %6 = sub nsw <8 x i64> %2, %5 %7 = icmp slt <8 x i64> %6, zeroinitializer %8 = sub nsw <8 x i64> zeroinitializer, %6 %9 = select <8 x i1> %7, <8 x i64> %8, <8 x i64> %6 %rdx.shuf = shufflevector <8 x i64> %9, <8 x i64> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef> Right? |
Added support for zext (instead of just sext) and added a test case (Thank you Michael)
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
29226 | I will add any_extend as you suggested. I have to agree with you that the pattern that you specified is the more expected pattern. Anyway, the pattern you suggested will be matched and I added the test case to the test file. This is the example of the C code which is the base to all my lit tests: #define ABS_GT(X) (((X)>0)?(X):-(X)) unsigned int sad8_c(const unsigned char * const cur, const unsigned char * const ref, const unsigned int stride) { unsigned int sad = 0; unsigned int j; unsigned char const *ptr_cur = cur; unsigned char const *ptr_ref = ref; for (j = 0; j < 8; j++) { sad += ABS_GT(ptr_cur[0] - ptr_ref[0]); sad += ABS_GT(ptr_cur[1] - ptr_ref[1]); sad += ABS_GT(ptr_cur[2] - ptr_ref[2]); sad += ABS_GT(ptr_cur[3] - ptr_ref[3]); sad += ABS_GT(ptr_cur[4] - ptr_ref[4]); sad += ABS_GT(ptr_cur[5] - ptr_ref[5]); sad += ABS_GT(ptr_cur[6] - ptr_ref[6]); sad += ABS_GT(ptr_cur[7] - ptr_ref[7]); ptr_cur += stride; ptr_ref += stride; } return sad; } |
Added new function to test file and added ANY_EXTEND option as well (Thanks again Michael)
instrcutions -> instructions