This custom lowers <4 x i8> vector loads using a 32-bit load, followed by 2 SSHLL instructions to extend it to a <4 x i32> vector. Before, it was really inefficient and expensive to construct a <4 x i32> for this as 4 byte loads and 4 moves were used. With this improvement SLP vectorisation might for example become profitable, see D103629.
Details
Diff Detail
Event Timeline
llvm/test/CodeGen/AArch64/neon-extload.ll | ||
---|---|---|
38 | I am trying to remember how big-endian works in LLVM, but since I noticed these reverse here, this looked okay'ish to me, but I haven't tested BE. Any opinions on this welcome (while I look a bit more at this).... |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1129 | What about v4i16 as well? And EXTLOAD (which is probably fine to treat as a ZEXTLOAD). | |
4481 | It may be worth checking or asserting that the type is v4i32/v4i16. Also DL is more common. | |
4502 | SIGN_EXTEND/ZERO_EXTEND do not need a second VT argument, I don't believe. | |
4506 | ISD::SIGN_EXTEND > ExtType? |
Missing testcases for load+ext to <4 x i16>.
llvm/test/CodeGen/AArch64/neon-extload.ll | ||
---|---|---|
38 | Looks fine to me. The rev32 comes out of lowering the ISD::BITCAST. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1129 | I thought the v4i16 were already handled, but will double check and precommit some tests for this (the new test in this patch, extended with v4i16 cases ) if we don't have them already. Yeah, I thought about EXTLOAD, but wasn't sure how to trigger this, but will look into this. | |
llvm/test/CodeGen/AArch64/neon-extload.ll | ||
38 | Thanks for confirming! |
llvm/test/CodeGen/AArch64/neon-extload.ll | ||
---|---|---|
47 | Ahh, just spotted that this is a regression. |
LGTM
llvm/test/CodeGen/AArch64/neon-extload.ll | ||
---|---|---|
47 | I'm not really concerned; IR-level optimizations should catch this. |
Fixed that regression by looking if there is one use that is an vector_extract_elt. But I can remove it if you think this is not necessary.
I don't really care either way...
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
4493 | Op.hasOneUse() is very different from Op->hasOneUse(). |
Yeah I wouldn't worry. There may be other ways to fix it if we need, to do with demanded elements of a load. But this code will likely not come up in practice.
llvm/test/CodeGen/AArch64/neon-extload.ll | ||
---|---|---|
47 | I would perhaps remove this dot, as dots in function names are a little unusual. |
@SjoerdMeijer Since this commit, test-suite::GCC-C-execute-pr60960.test has been failing on our AArch64 bots:
https://lab.llvm.org/buildbot/#/builders/185/builds/40
(we moved them around recently so I think we missed building this commit when it first landed)
Owwww....... thanks for reporting. I am looking into this. I will do a first finger on the pulse before I revert this since I haven't heard any other complaints, but let me know if you prefer to first revert it.
There is something going on that I will have to look at tomorrow, so will revert this.
What about v4i16 as well? And EXTLOAD (which is probably fine to treat as a ZEXTLOAD).