This is an archive of the discontinued LLVM Phabricator instance.

ARM64: improve non-zero memset isel by ~2x
ClosedPublic

Authored by jfb on Sep 5 2018, 3:50 PM.

Details

Summary

I added a few ARM64 memset codegen tests in r341406 and r341493, and annotated
where the generated code was bad. This patch fixes the majority of the issues by
requesting that a 2xi64 vector be used for memset of 32 bytes and above.

The patch leaves the former request for f128 unchanged, despite f128
materialization being suboptimal: doing otherwise runs into other asserts in
isel and makes this patch too broad.

This patch hides the issue that was present in bzero_40_stack and bzero_72_stack
because the code now generates in a better order which doesn't have the store
offset issue. I'm not aware of that issue appearing elsewhere at the moment.

rdar://problem/44157755

Diff Detail

Repository
rL LLVM

Event Timeline

jfb created this revision.Sep 5 2018, 3:50 PM
MatzeB accepted this revision.Sep 5 2018, 4:06 PM

Nice change, LGTM!
Maybe wait a couple days before committing in case someone better versed with SelectionDAG wants to chime in.

lib/Target/AArch64/AArch64ISelLowering.cpp
5198–5220 ↗(On Diff #164119)

unrelated, so maybe split this into a separate commit when this review is approved (very nice change though)

This revision is now accepted and ready to land.Sep 5 2018, 4:06 PM
jfb updated this revision to Diff 164125.Sep 5 2018, 4:40 PM
jfb marked an inline comment as done.
  • Rebase
lib/Target/AArch64/AArch64ISelLowering.cpp
5198–5220 ↗(On Diff #164119)

I committed it in r341504.

If I'm understanding correctly, the reason the f128 operations aren't efficient is that float immediate lowering doesn't know how to use movi? That seems like it would be more straightforward to solve by fixing the float immediate lowering, rather than messing with memset lowering.

jfb added a comment.EditedSep 5 2018, 4:47 PM

If I'm understanding correctly, the reason the f128 operations aren't efficient is that float immediate lowering doesn't know how to use movi? That seems like it would be more straightforward to solve by fixing the float immediate lowering, rather than messing with memset lowering.

I initially looked at this but it wasn't more straightforward. It's also odd to me because we know memset is i8, so v2xi64 clearly works, but we don't know that other values will work as f128 or anything else (and we don't have the value to look at here). We also know that ARM64 likes pairs of i64, whereas f128 is inherently something you'd need to load from a constant pool unless you're really lucky.

jfb added inline comments.Sep 5 2018, 4:52 PM
lib/Target/AArch64/AArch64ISelLowering.cpp
8350 ↗(On Diff #164125)

I'd like to point out how this line doesn't match the above comment:

Don't use AdvSIMD to implement 16-byte memset.

efriedma accepted this revision.Sep 5 2018, 5:25 PM

LGTM

I initially looked at this but it wasn't more straightforward

Well, even if it isn't simple, I would guess it's useful for f32 and f64 anyway. Maybe not so much for f1128, though.

efriedma added inline comments.Sep 5 2018, 5:44 PM
lib/Target/AArch64/AArch64ISelLowering.cpp
8347 ↗(On Diff #164125)

Err, just realized one minor thing; technically, I guess you're supposed to check hasNEON() for v2i64, not hasFPARMv8(). Not really a big deal, though; they're essentially the same in practice.

jfb updated this revision to Diff 164143.Sep 5 2018, 8:16 PM
  • can use NEON
jfb marked an inline comment as done.Sep 5 2018, 8:16 PM
jfb added inline comments.
lib/Target/AArch64/AArch64ISelLowering.cpp
8347 ↗(On Diff #164125)

Interesting! I split it up in 2, I assume that f128 requires the previous flag.

This revision was automatically updated to reflect the committed changes.
jfb marked an inline comment as done.