Fixed wrong conditions for generating (S[LR]I X, Y, C2) from
(or (and X, BvecC1), (lsl Y, C2)). The optimisation is also
enabled by default now.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Looks good.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
7963–7964 | I know you didn't write this part, but it's generally not a great idea to lower via intrinsics if it can be helped. Can you add a ISel node, a lot like ARMISD::VSHLIMM, and plumb that through tablegen instead? |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
7963–7964 | Sure! I will start working on this right away. |
Thanks. LGTM.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
3153–3157 | You could possibly use uint64_t ShiftAmount = Op.getConstantOperandVal(3), simplify it a little. It won't handle the report_fatal_error, but this should have already been checked at a higher level, somewhere in clang to make sure the shift amount is a constant. Same for the ElemSizeInBits check below. You could change them to asserts. Up to you. |
This seems to cause misoptimizations for me:
$ cat test.c void rgb16to24(const unsigned char *src, unsigned char *dst, int src_size) { unsigned char *d = dst; const unsigned short *s = (const unsigned short *)src; const unsigned short *end = s + src_size / 2; while (s < end) { unsigned short bgr = *s++; *d++ = ((bgr&0xF800)>>8) | ((bgr&0xF800)>>13); *d++ = ((bgr&0x07E0)>>3) | ((bgr&0x07E0)>> 9); *d++ = ((bgr&0x001F)<<3) | ((bgr&0x001F)>> 2); } } $ clang -target aarch64-linux-gnu -S -O3 test.c -o test.s
There's quite a bit of difference in the output of the compiler for this function caused by this patch - I haven't yet traced it and tracked down exactly which bit of it is wrong, but the testsuite that contains this code reports unexpected values.
llvm/test/CodeGen/AArch64/arm64-sli-sri-opt.ll | ||
---|---|---|
5 | Please generate simple tests like this with update_llc_test_checks.py. This transform is simply wrong, as written: sli is not the vector version of bfi. |
It appears that there were probably more things wrong with this optimization than just the condition.
I have reverted the patch for now so that I can take more time to make sure that I get this fix right.
You could possibly use uint64_t ShiftAmount = Op.getConstantOperandVal(3), simplify it a little. It won't handle the report_fatal_error, but this should have already been checked at a higher level, somewhere in clang to make sure the shift amount is a constant.
Same for the ElemSizeInBits check below. You could change them to asserts. Up to you.