This is an archive of the discontinued LLVM Phabricator instance.

[ARM] GlobalISel: Load i1, i8 and i16 args from stack
ClosedPublic

Authored by rovka on Dec 15 2016, 6:26 AM.

Details

Summary

Add support for loading i1, i8 and i16 arguments from the stack, with or without
the ABI extension flags.

When the ABI extension flags are present, we load a 4-byte value, otherwise we
preserve the size of the load and let the instruction selector replace it with a
LDRB/LDRH. This generates the same thing as DAGISel (it's not entirely clear to
me which calling conventions and in what circumstances may be lacking the ext
flags, feel free to point it out).

Diff Detail

Repository
rL LLVM

Event Timeline

rovka updated this revision to Diff 81569.Dec 15 2016, 6:26 AM
rovka retitled this revision from to [ARM] GlobalISel: Load i1, i8 and i16 args from stack.
rovka updated this object.
rovka added reviewers: ab, qcolombet, t.p.northover.
rovka added subscribers: llvm-commits, rengolin, dsanders.
t.p.northover added inline comments.Dec 16 2016, 11:55 AM
lib/Target/ARM/ARMCallLowering.cpp
134–142 ↗(On Diff #81569)

Is this necessary? Just because the caller stored 4 bytes doesn't mean we have to load 4. You might make some kind of efficiency argument if you expect most arithmetic to be done on i32 (as in C code), but I don't think this really helps there anyway because the G_SEXT/G_ZEXT will still exist in the MIR.

Also, if this really is necessary the MMO is misrecording the size of the memory operation.

rovka added a comment.Dec 19 2016, 7:43 AM
lib/Target/ARM/ARMCallLowering.cpp
134–142 ↗(On Diff #81569)

Well, actually we're not generating a G_ZEXT or G_SEXT here... Should we?

It looks like the simplest thing to do, but OTOH it's the caller's job to generate the extension, so if we did anything here we'd just be duplicating the caller's work (and introducing a potential source of bugs).

I'm not sure what the best course of action is:

  • Keep the size and generate the G_LOAD + G_ZEXT/G_SEXT instructions
  • Generate G_LOAD with size 4
  • Generate LDRi12 directly as in this patch

Are we breaking any assumptions if we choose one of the last 2? Would that be premature optimization?

rovka updated this revision to Diff 85007.Jan 19 2017, 12:27 PM
rovka edited the summary of this revision. (Show Details)

I've been thinking about this a bit more and I think it's better to generate a G_LOAD, but for a size of 4 so that we preserve the extension performed by the caller.
This is achieved by changing the type of the virtual register that we're supposed to load into to a 32-bit scalar (which is sufficient for now, since we only support scalars at the moment). I've also fixed the size of the memory operand, which was wrong in the previous versions of the patch.
Let me know what you think.
Thanks.

rengolin added inline comments.Jan 23 2017, 9:02 AM
lib/Target/ARM/ARMCallLowering.cpp
134–142 ↗(On Diff #81569)

I believe we should start with generating the right load (B/H/W) and Z/S-extend.

It may duplicate the caller's work (since it was their job), but it's also safer if the caller is not following the ABI.

Why would that matter, well, because we support multiple ABIs, and I don't claim to know all of them by heart.

So, we implement the safe option now, and later on optimise for specific ABIs if we can guarantee it'll be safe there, too.

rovka added inline comments.Jan 25 2017, 2:18 AM
lib/Target/ARM/ARMCallLowering.cpp
134–142 ↗(On Diff #81569)

I'm not sure I understand why you think it's safer to duplicate the caller's work. Wherever there's duplication, there's the possibility of going out-of-sync between the two sides, and I don't think guarding against the caller not following the ABI is worth that risk. Also, by doing the extension to 32 bits we're already assuming *something* about the ABI. Are you thinking about something in particular about an ABI that would break if we just load those 32 bits? FWIW, the current instruction selection just loads 32 bits for every calling convention supported by ARM (except for ghccc, which just asserts).

rengolin accepted this revision.Jan 25 2017, 3:21 AM
rengolin added inline comments.
lib/Target/ARM/ARMCallLowering.cpp
134–142 ↗(On Diff #81569)

I was under the impression that Tim had knowledge of some ABI (Apple) that didn't follow that rule. Re-reading his comment now it seems it was just a point to optimisation, which should not be our worry for now.

If the current selector does this, than not changing behaviour is the way to go. Bonus points for following the ABI. :)

This revision is now accepted and ready to land.Jan 25 2017, 3:21 AM
This revision was automatically updated to reflect the committed changes.