This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Custom lower <4 x i8> loads
ClosedPublic

Authored by SjoerdMeijer on Jun 23 2021, 6:33 AM.

Details

Summary

This custom lowers <4 x i8> vector loads using a 32-bit load, followed by 2 SSHLL instructions to extend it to a <4 x i32> vector. Before, it was really inefficient and expensive to construct a <4 x i32> for this as 4 byte loads and 4 moves were used. With this improvement SLP vectorisation might for example become profitable, see D103629.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Jun 23 2021, 6:33 AM
SjoerdMeijer requested review of this revision.Jun 23 2021, 6:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2021, 6:33 AM
SjoerdMeijer added inline comments.Jun 23 2021, 6:35 AM
llvm/test/CodeGen/AArch64/neon-extload.ll
38

I am trying to remember how big-endian works in LLVM, but since I noticed these reverse here, this looked okay'ish to me, but I haven't tested BE. Any opinions on this welcome (while I look a bit more at this)....

dmgreen added inline comments.Jun 23 2021, 7:50 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1129

What about v4i16 as well? And EXTLOAD (which is probably fine to treat as a ZEXTLOAD).

4485

It may be worth checking or asserting that the type is v4i32/v4i16.

Also DL is more common.

4506

SIGN_EXTEND/ZERO_EXTEND do not need a second VT argument, I don't believe.

4510

ISD::SIGN_EXTEND > ExtType?

Missing testcases for load+ext to <4 x i16>.

llvm/test/CodeGen/AArch64/neon-extload.ll
38

Looks fine to me. The rev32 comes out of lowering the ISD::BITCAST.

SjoerdMeijer added inline comments.Jun 24 2021, 1:00 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1129

I thought the v4i16 were already handled, but will double check and precommit some tests for this (the new test in this patch, extended with v4i16 cases ) if we don't have them already.

Yeah, I thought about EXTLOAD, but wasn't sure how to trigger this, but will look into this.

llvm/test/CodeGen/AArch64/neon-extload.ll
38

Thanks for confirming!

Matt added a subscriber: Matt.Jun 24 2021, 10:56 AM

Address comments.

SjoerdMeijer added inline comments.Jun 24 2021, 11:41 AM
llvm/test/CodeGen/AArch64/neon-extload.ll
0

Ahh, just spotted that this is a regression.
Will look into this.

efriedma accepted this revision.Jun 24 2021, 11:56 AM

LGTM

llvm/test/CodeGen/AArch64/neon-extload.ll
0

I'm not really concerned; IR-level optimizations should catch this.

This revision is now accepted and ready to land.Jun 24 2021, 11:56 AM

Fixed that regression by looking if there is one use that is an vector_extract_elt. But I can remove it if you think this is not necessary.

I don't really care either way...

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
4497

Op.hasOneUse() is very different from Op->hasOneUse().

Fixed that regression by looking if there is one use that is an vector_extract_elt. But I can remove it if you think this is not necessary.

Yeah I wouldn't worry. There may be other ways to fix it if we need, to do with demanded elements of a load. But this code will likely not come up in practice.

llvm/test/CodeGen/AArch64/neon-extload.ll
0

I would perhaps remove this dot, as dots in function names are a little unusual.

Okay, will remove this before committing (and change that function name in that test).

Thanks for the suggestions and help with this work @efriedma and @dmgreen !

This revision was landed with ongoing or failed builds.Jun 25 2021, 1:54 AM
This revision was automatically updated to reflect the committed changes.

@SjoerdMeijer Since this commit, test-suite::GCC-C-execute-pr60960.test has been failing on our AArch64 bots:
https://lab.llvm.org/buildbot/#/builders/185/builds/40

(we moved them around recently so I think we missed building this commit when it first landed)

https://github.com/llvm/llvm-test-suite/blob/main/SingleSource/Regression/C/gcc-c-torture/execute/pr60960.c

@SjoerdMeijer Since this commit, test-suite::GCC-C-execute-pr60960.test has been failing on our AArch64 bots:
https://lab.llvm.org/buildbot/#/builders/185/builds/40

(we moved them around recently so I think we missed building this commit when it first landed)

https://github.com/llvm/llvm-test-suite/blob/main/SingleSource/Regression/C/gcc-c-torture/execute/pr60960.c

Owwww....... thanks for reporting. I am looking into this. I will do a first finger on the pulse before I revert this since I haven't heard any other complaints, but let me know if you prefer to first revert it.

There is something going on that I will have to look at tomorrow, so will revert this.