This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Fix conditions for lowering to S[LR]I
ClosedPublic

Authored by PetreTudor on Apr 3 2020, 6:21 AM.

Details

Summary

Fixed wrong conditions for generating (S[LR]I X, Y, C2) from
(or (and X, BvecC1), (lsl Y, C2)). The optimisation is also
enabled by default now.

Diff Detail

Event Timeline

PetreTudor created this revision.Apr 3 2020, 6:21 AM

Looks good.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7963–7964

I know you didn't write this part, but it's generally not a great idea to lower via intrinsics if it can be helped.

Can you add a ISel node, a lot like ARMISD::VSHLIMM, and plumb that through tablegen instead?

PetreTudor added inline comments.Apr 3 2020, 7:30 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
7963–7964

Sure! I will start working on this right away.

Now lowering to S[LR]I via TableGen + ISel nodes.

PetreTudor marked 2 inline comments as done.Apr 14 2020, 7:03 AM

Fix styling problem caused by linter (inconsistent with local style).

dmgreen accepted this revision.Apr 14 2020, 10:47 AM

Thanks. LGTM.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3153–3157

You could possibly use uint64_t ShiftAmount = Op.getConstantOperandVal(3), simplify it a little. It won't handle the report_fatal_error, but this should have already been checked at a higher level, somewhere in clang to make sure the shift amount is a constant.

Same for the ElemSizeInBits check below. You could change them to asserts. Up to you.

This revision is now accepted and ready to land.Apr 14 2020, 10:47 AM

Simplified logic for converting intrinsics to AArch64ISD::VS[LR]I.

PetreTudor marked an inline comment as done.Apr 15 2020, 5:49 AM
This revision was automatically updated to reflect the committed changes.

This seems to cause misoptimizations for me:

$ cat test.c
void rgb16to24(const unsigned char *src, unsigned char *dst, int src_size) {
    unsigned char *d = dst;
    const unsigned short *s = (const unsigned short *)src;
    const unsigned short *end = s + src_size / 2;

    while (s < end) {
        unsigned short bgr = *s++;
        *d++ = ((bgr&0xF800)>>8) | ((bgr&0xF800)>>13);
        *d++ = ((bgr&0x07E0)>>3) | ((bgr&0x07E0)>> 9);
        *d++ = ((bgr&0x001F)<<3) | ((bgr&0x001F)>> 2);
    }
}
$ clang -target aarch64-linux-gnu -S -O3 test.c -o test.s

There's quite a bit of difference in the output of the compiler for this function caused by this patch - I haven't yet traced it and tracked down exactly which bit of it is wrong, but the testsuite that contains this code reports unexpected values.

efriedma added inline comments.
llvm/test/CodeGen/AArch64/arm64-sli-sri-opt.ll
5

Please generate simple tests like this with update_llc_test_checks.py.

This transform is simply wrong, as written: sli is not the vector version of bfi.

It appears that there were probably more things wrong with this optimization than just the condition.
I have reverted the patch for now so that I can take more time to make sure that I get this fix right.