This is an archive of the discontinued LLVM Phabricator instance.

AArch64: support arm64_32, an ILP32 slice for watchOS.
ClosedPublic

Authored by t.p.northover on Apr 29 2019, 6:36 AM.

Details

Summary

This is the main CodeGen patch to support the arm64_32 watchOS ABI on the LLVM side.

FastISel is mostly disabled for now since it would generate incorrect code for ILP32. Disabling it also allows for a more incremental approach to implementing that side since we have sensible fallbacks in place.

Diff Detail

Event Timeline

t.p.northover created this revision.Apr 29 2019, 6:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2019, 6:36 AM
Herald added subscribers: dang, jfb, arphaman and 8 others. · View Herald Transcript

Looks like there's no reviewers set here. @t.p.northover who should review this?

Looks like there's no reviewers set here. @t.p.northover who should review this?

Anyone reasonably active in the AArch64 backend. The primary review location is still the llvm-commits mailing list, so I don't tend to bother adding specific reviewers unless there's a completely obvious candidate.

Gerolf added a subscriber: Gerolf.

+ reviewers to move this forward hopefully

Rebased patch.

efriedma added inline comments.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
9739

Does this have any practical effect for other targets?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3737–3738

What's changing here? Does it make sense to add any comments?

llvm/test/CodeGen/AArch64/tail-call.ll
4–5

What are you trying to do here?

llvm/test/CodeGen/AArch64/win64_vararg.ll
272

There isn't any obvious reason for this test to change?

Thanks for upstreaming this, Tim.

At a high level, my main care about here is that what is upstreamed here doesn't conflict with also adding support at some point in the future for the AArch64 ILP32 ABI - for which a beta quality specification from Arm exists.

I browsed through the code changes. Without being a deep expert in a lot of the areas touched, I just shared a few thoughts in the few places where I saw some change that I couldn't easily come up with a plausible explanation for.

I'm assuming that there is nothing in here that would prevent adding support for non-Darwin AArch64 ILP32 later?

llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
1131–1132

I assume this patch introduces support for the Darwin-specific arm64_32 ILP32 ABI.
That makes me wonder why there is a need to define/create the "getTheAArch64_32Target()". Doesn't "getTheARM64_32Target()" suffice to enable support for the Darwin-specific arm64_32 ILP32 ABI?

llvm/lib/Target/AArch64/AArch64ISelLowering.h
258–260

I found it surprising that this method doesn't loop for whether ILP32 is targeted or not to decide whether the pointer type is a 32 or 64 bit integer.
Would it be worthwhile to add a small comment here why the integer pointer type is always 64bits?
Maybe I'm missing something completely trivial though...

t.p.northover marked 6 inline comments as done.Jul 8 2019, 3:09 AM

At a high level, my main care about here is that what is upstreamed here doesn't conflict with also adding support at some point in the future for the AArch64 ILP32 ABI - for which a beta quality specification from Arm exists.

I think most of the changes should be reusable.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
9739

I don't think so. It's a correctness issue so if other targets were affected then they'd start seeing vregs without definitions and things.

I believe the driving arm64_32 feature is the fact that pointers are extended beyond 32-bits by the caller, which leads to some weird (but valid) DAG nodes trying to communicate that fact. Other targets only assertzext from invalid types.

llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
1131–1132

I think it would be enough for production purposes, but the extra lines allow someone to write triple strings starting with aarch64_32 in tests and so on if they want to.

I still have a long-term goal to switch Clang over to using aarch64 as the canonical name in LLVM (aarch64-apple-ios14.0 etc). I was prevented when I tried years ago because ld64 wasn't ready, but I think I fixed that and I should try again some time.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3737–3738

We use the ability to lookup whether a register has already been added (use at line 3801 in this diff). If it has then we're trying to combine two parts of (say) a [2 x i32] into a single register for compatibility with armv7k IR and need to do the bitwise arithmetic to make that happen.

I don't know that a comment here would work (it would either be a historic note, or pre-empting what comes later). I'll try to do something to call it out unobtrusively at the use-point.

llvm/lib/Target/AArch64/AArch64ISelLowering.h
258–260

A comment would definitely be worthwhile. This is the function that enables the large change I upstreamed earlier so that pointer types in the DAG can remain 64-bits (to exploit addressing-modes available) and get truncated to 32-bits in memory.

llvm/test/CodeGen/AArch64/tail-call.ll
4–5

These functions are using the arrays purely to consume register space during a call, but after this change [8 x i32] only uses x0-x3 (in full). It's only an IR-level change (i.e. it doesn't affect C or C++ ABI), but I'll limit it to the arm64_32 target when I update the diff.

llvm/test/CodeGen/AArch64/win64_vararg.ll
272

I think it's because of the std::map change you called out above. It implicitly sorts the list of registers that get copied, perturbing the DAG and scheduling.

I think I'll switch it back to SmallVector and use std::find_if to handle the (rare) ARM compatibility instead. It ought to be faster in the common case and won't have this side-effect.

  1. Switched back to SmallVector to track registers used, reducing code perturbation. I decided to add a SmallSet too for queries instead of using std::find_if directly because searches actually happen regardless.
  2. Disabled special handling for [N x i32] except on arm64_32 MachO.
  3. Noticed a bug in 2 above, where we allocated 2N registers anyway and fixed it.
  4. Added getPointerType comment.
t.p.northover marked an inline comment as done.Jul 8 2019, 5:29 AM
t.p.northover added inline comments.
llvm/test/CodeGen/AArch64/win64_vararg.ll
272

Well, as you can see that accounted for a lot of the differences, but the fmovs still get reordered w.r.t. the store. I have no idea why this is: the DAG is identical and I'm reasonably sure it's harmless so I blame gremlins.

AlexDenisov added inline comments.
llvm/lib/Target/AArch64/AArch64Subtarget.h
430

When I compile LLVM with this patch applied I'm getting the error:

AArch64Subtarget.h:430:34: error: only virtual member functions can be marked 'override'
  bool addrSinkUsingGEPs() const override {
                                 ^~~~~~~~~

Removing the override keyword fixes it, but I'm curious where it comes from? I cannot see any usage of this method across the code base.

Ah sorry, it was part of an NFC change in a different commit. I've rolled it into this diff; it's splitting off GEP sinking from the useAA callback since they're not really related.

aemerson added inline comments.Aug 22 2019, 12:48 AM
llvm/lib/Target/AArch64/AArch64CallingConvention.cpp
114–117

Comment here explaining why we need this? I'm guessing for passing arrays of i32s & compatibility with armv7k?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3398

Why does this happen at all?

3737–3738

Looks like we still need some form of documentation here at RegUsed use?

llvm/test/CodeGen/AArch64/arm64-collect-loh.ll
163

Can you add a comment here explaining that inbounds is needed for arm64_32 to produce the same code.

llvm/test/CodeGen/AArch64/arm64-zero-cycle-zeroing.ll
22–28

It's odd that this test changed?

llvm/test/CodeGen/AArch64/arm64_32-neon.ll
7

Is anything really expected to change with NEON & arm64_32?

t.p.northover marked 7 inline comments as done.Sep 9 2019, 4:13 AM
t.p.northover added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3398

When lowering a return of (say) [2 x i32], the two components will get mapped to (X0, AExtUpper) and (X0, ZExt), a duplication of X0.

llvm/test/CodeGen/AArch64/arm64-collect-loh.ll
163

I'd kind of prefer not to. It's incidental to this test but the main complication of arm64_32 CodeGen, so I don't think it's really what a random reader of this test is going to be looking for.

llvm/test/CodeGen/AArch64/arm64-zero-cycle-zeroing.ll
22–28

Yes, it's not needed now (I have no idea why it was) so I've reverted these changes.

llvm/test/CodeGen/AArch64/arm64_32-neon.ll
7

Not to change particularly, but this is in some sense the parts of NEON that could be affected by arm64_32: ABI boundaries and load/store addressing-modes.

Testing it separately avoids duplicating the AArch64 tests that really aren't different (e.g. arithmetic) but still gives us the coverage.

t.p.northover marked an inline comment as done.

Added comments mostly, and undid some changes to tests.

aemerson accepted this revision.Sep 11 2019, 4:20 PM

Everything else seems reasonable to me.

This revision is now accepted and ready to land.Sep 11 2019, 4:20 PM
t.p.northover closed this revision.Sep 12 2019, 3:22 AM

Thanks Amara, it's r371722.

loladiro added inline comments.
llvm/lib/Target/AArch64/AArch64FastISel.cpp
531

@t.p.northover Out of curiosity, what is this assertion guarding against? In our frontend we use non-zero address spaces to indicate GC-tracked pointers, which we expect to be ignored by the backend (stripping them is possible of course, but the performance impact is surprisingly high). As far as I can tell this is the only place in the Aarch64 backend that looks at address spaces and we encountered it when trying to port to Apple Silicon (not quite sure why nobody complained on linux, but maybe people didn't run with assertions).

t.p.northover added inline comments.Feb 25 2021, 2:16 AM
llvm/lib/Target/AArch64/AArch64FastISel.cpp
531

I'm afraid I don't remember, but it looks overcautious to me now as well. I've just removed the check from main.