This is an archive of the discontinued LLVM Phabricator instance.

[ARM] push LR before __gnu_mcount_nc
ClosedPublic

Authored by jcai19 on Jul 19 2019, 2:35 PM.

Details

Summary

__gnu_mcount_nc expects the value of LR register to be the top value of the stack on ARM32.
Clang currently does not push the value before the call.

Fixes PR33845.

Diff Detail

Repository
rL LLVM

Event Timeline

jcai19 created this revision.Jul 19 2019, 2:35 PM
jcai19 updated this revision to Diff 210904.Jul 19 2019, 2:41 PM

Update the commit message.

jcai19 retitled this revision from push LR before mcount on ARM to [ARM] push LR before __gnu_mcount_nc on ARM.Jul 19 2019, 2:45 PM
jcai19 edited the summary of this revision. (Show Details)
jcai19 edited the summary of this revision. (Show Details)Jul 19 2019, 2:56 PM
jcai19 added reviewers: kristof.beyls, efriedma.

Thanks for this!

I realize I'm not a reviewer, but have a few nits in passing anyway. :)

I don't plan to stamp this because I've zero familiarity here.

llvm/lib/Target/ARM/CustomCallLoweringPass.cpp
21 ↗(On Diff #210904)

nit: when overriding a virtual function,

bool runOnMachineFunction(MachineFunction &MF) override {

is preferred over

virtual bool runOnMachineFunction(MachineFunction &MF) {

25 ↗(On Diff #210904)

nit: can this be iterated over via a range-for loop?

for (MachineBasicBlock &MBB : MF) {
  for (MachineInstruction &MI : MBB) {
    // ...
  }
}
29 ↗(On Diff #210904)

nit: similar range-for question over something like BBI->operands()?

35 ↗(On Diff #210904)

please clang-format all patches

I don't like introducing magic calling convention rules for a non-intrinsic function with a specific name; we should model this more explicitly somehow. Probably the simplest thing to do would be to introduce an ARM intrinsic, lower it to a pseudo-instruction, and expand it late (in ExpandPostRAPseudos or something like that).

jcai19 updated this revision to Diff 210921.Jul 19 2019, 3:55 PM

Add additional test on GV and update format.

jcai19 marked an inline comment as done.Jul 19 2019, 3:57 PM
jcai19 marked 3 inline comments as done.

I don't like introducing magic calling convention rules for a non-intrinsic function with a specific name; we should model this more explicitly somehow. Probably the simplest thing to do would be to introduce an ARM intrinsic, lower it to a pseudo-instruction, and expand it late (in ExpandPostRAPseudos or something like that).

Thanks for the suggestions, will start work on it.. I also realized this solution may have other issues that @t.p.northover kindly pointed out in another code review that I accidentally creates (https://reviews.llvm.org/D65037) for the same change.

jcai19 updated this revision to Diff 213154.Aug 2 2019, 5:53 PM

Introduce new ARMISD node for __gnu_mcount_nc.

jcai19 updated this revision to Diff 213159.Aug 2 2019, 6:01 PM

Remove an unnecssary blank line.

jcai19 added a comment.Aug 2 2019, 6:09 PM

I don't like introducing magic calling convention rules for a non-intrinsic function with a specific name; we should model this more explicitly somehow. Probably the simplest thing to do would be to introduce an ARM intrinsic, lower it to a pseudo-instruction, and expand it late (in ExpandPostRAPseudos or something like that).

So I have implemented the codegen part as suggested. To add a new intrinsic for this call seems would introduce quite some code to the target-independent part of SelectionDAG, specifically SelectionDAGBuilder and SelectionDAG classes, just for ARM. Is that wanted? Thanks.

introduce quite some code to the target-independent part of SelectionDAG, specifically SelectionDAGBuilder and SelectionDAG classes

Not sure why you think this would be necessary. Every target has target-specific intrinsics, and we have infrastructure to ensure they don't require any changes to target-independent code.

llvm/lib/Target/ARM/ARMFastISel.cpp
2417 ↗(On Diff #213159)

80 cols

jcai19 updated this revision to Diff 213728.Aug 6 2019, 2:48 PM

Introduce a new ARM intrinsic for __gnu_mcount_nc.

Herald added a project: Restricted Project. · View Herald TranscriptAug 6 2019, 2:48 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
jcai19 added a comment.Aug 6 2019, 2:54 PM

introduce quite some code to the target-independent part of SelectionDAG, specifically SelectionDAGBuilder and SelectionDAG classes

Not sure why you think this would be necessary. Every target has target-specific intrinsics, and we have infrastructure to ensure they don't require any changes to target-independent code.

I guess what I was trying to say was that I could not find a way to handle the new ARM intrinsic in SelectionDAGBuilder::visitTargetIntrinsic easily with the existing code so I had to introduce a case just for the ARM intrinsic in SelectionDAGBuilder::visitIntrinsic, and I am not sure if that is acceptable.

Thanks so much for this patch! I look forward to support for __gnu_mcount for the arm32 Linux kernel.

What a horrible function. AAPCS? Who cares about that?

haha

llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
1944 ↗(On Diff #213159)

Here down seems to match the previous case? Maybe you could "Dont Repeat Yourself (DRY)" up the code by creating a shared function?

1950 ↗(On Diff #213159)

No need for {} for single statement blocks.

llvm/lib/Target/ARM/ARMFastISel.cpp
211 ↗(On Diff #213159)

How many other call sites would have to be updated if this was not a default parameter?

2417 ↗(On Diff #213159)

also, what's up with \01?

llvm/lib/Target/ARM/ARMISelLowering.cpp
2298 ↗(On Diff #213159)

Ditto on single statement bodies. {}

llvm/test/CodeGen/ARM/gnu_mcount_nc.ll
5 ↗(On Diff #213159)

Don't we want to check that these occur in a parent function before a call to a child function?

6 ↗(On Diff #213159)

why does the -NOT case duplicate the non-NOT case?

jcai19 updated this revision to Diff 213748.Aug 6 2019, 4:36 PM
jcai19 marked 6 inline comments as done.

Update based on comments.

jcai19 marked an inline comment as done.Aug 6 2019, 4:37 PM
jcai19 added inline comments.
llvm/lib/Target/ARM/ARMFastISel.cpp
211 ↗(On Diff #213159)

There should be only 2 references to this function. But this argument is likely to be false in most cases.

2417 ↗(On Diff #213159)

I am not sure why the name is prefixed with \01 either but it was there in the original code.

llvm/test/CodeGen/ARM/gnu_mcount_nc.ll
6 ↗(On Diff #213159)

They are unnecessary, I have removed them.

jcai19 marked 2 inline comments as done.Aug 6 2019, 4:38 PM
nickdesaulniers added inline comments.Aug 6 2019, 5:05 PM
llvm/lib/Target/ARM/ARMFastISel.cpp
211 ↗(On Diff #213159)

Then I'd just make it an explicit arg and update the 2 call sites. If there were many call sites, then the default param would cut down on code churn, but I don't think 2 call sites is unreasonable to just be explicit about such arguments.

2418 ↗(On Diff #213748)

memcmp is a code smell in a C++ codebase, but I see that IntrinsicName is a C style string. Is there a reason why strncmp isn't used?

2587 ↗(On Diff #213748)

{} are not needed here since you're not introducing a new scope for variables.

llvm/test/CodeGen/ARM/gnu_mcount_nc.ll
11 ↗(On Diff #213748)

This test case can probably be simplified to just a call and ret void.

jcai19 updated this revision to Diff 213762.Aug 6 2019, 5:34 PM

Update based on comments.

jcai19 marked 4 inline comments as done.Aug 6 2019, 5:38 PM
jcai19 added inline comments.
llvm/lib/Target/ARM/ARMFastISel.cpp
2418 ↗(On Diff #213748)

Good catch! I was thinking about one of str* functions and somehow ended up using memcpy. Guess I haven't written C code for a while :). Anyway, maybe strcmp is better here as the size of the two strings should match too?

jcai19 retitled this revision from [ARM] push LR before __gnu_mcount_nc on ARM to [ARM] push LR before __gnu_mcount_nc.Aug 6 2019, 5:41 PM

This seems better.

I'm not sure I follow why this needs special handling in SelectionDAGBuilder::visitIntrinsicCall, as opposed to just using ISD::INTRINSIC_VOID like other similar target-specific intrinsics. (You can custom-lower INTRINSIC_VOID in ARMTargetLowering::LowerOperation, if that makes it easier.)

I'd just skip changing fast-isel; with an intrinsic, if fast-isel misses, we just fall back to the SelectionDAG code that does the right thing.

nickdesaulniers marked an inline comment as done.
nickdesaulniers removed a subscriber: nickdesaulniers.
nickdesaulniers added inline comments.
clang/lib/Basic/Targets/ARM.cpp
325 ↗(On Diff #213762)

Doesn't require changes, but for anyone curious about the \01, see the comment in MangleContext::mangleName.

jcai19 marked an inline comment as done.EditedAug 7 2019, 6:21 PM

This seems better.

I'm not sure I follow why this needs special handling in SelectionDAGBuilder::visitIntrinsicCall, as opposed to just using ISD::INTRINSIC_VOID like other similar target-specific intrinsics. (You can custom-lower INTRINSIC_VOID in ARMTargetLowering::LowerOperation, if that makes it easier.)

Thanks for the suggestion, and I agree the code would look cleaner this way. But I have some questions on implementation details, and please bear with me if they seem naive since I am new to backend. So I have been trying to reuse the code of ARMTargetLowering::LowerCall to build a SelectionDAG call node for the new intrinsic, which essentially is a function call with a push instruction before. This is also how some intrinsics like memcpy or memet implemented, which get lowered to target-specific calls at SelectionDAGBuilder::visitIntrinsicCall. If we wait until ARMTargetLowering::LowerOperation when legalizing DAGs to lower the intrinsic, then I am not sure if we can still reuse the code, as some of the information needed is gone by this stage. I did see code handling ARM intrinsic on ARMTargetLowering::LowerOperation, but they didn't seem to need to generate function calls later.

Yes, it's technically a "call", but you don't need the call lowering code. That's dedicated to stuff like passing arguments, returning values, checking whether the function can be tail-called, etc. None of that applies here; the intrinsic always corresponds to exactly one pseudo-instruction, a BL_PUSHLR.

Yes, it's technically a "call", but you don't need the call lowering code. That's dedicated to stuff like passing arguments, returning values, checking whether the function can be tail-called, etc. None of that applies here; the intrinsic always corresponds to exactly one pseudo-instruction, a BL_PUSHLR.

Thanks for the clarification. I will look into it.

jcai19 updated this revision to Diff 214263.Aug 8 2019, 4:47 PM

Lower the new intrinsic when legalizing DAGs.

jcai19 marked an inline comment as done.Aug 8 2019, 4:58 PM
jcai19 added inline comments.
clang/lib/Basic/Targets/ARM.cpp
325 ↗(On Diff #213762)

Thanks for the reference!

I've just added a few fly-by nits; I'm afraid I didn't do an in-depth review.

llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
1156 ↗(On Diff #214263)

I wonder whether this is a good debug printing line to commit?
IIUC, this will print every MI instruction that gets looked at by ArmExpandPseudo.
I would imagine that that could produce too much noise. It'd be more interesting if only the MIs that actually got transformed would be printed.
But maybe best to just not add this debug printing line in this patch?

1931–1932 ↗(On Diff #214263)

Did you clang-format the patch?

llvm/test/CodeGen/ARM/gnu_mcount_nc.ll
1–2 ↗(On Diff #214263)

Given that the push-lr transform only gets implemented for DAGISel (IIUC), maybe it'd be useful to also have test run lines that check the correct thing happens when using fastisel and globalisel (presumably by falling back to DAGISel)?

jcai19 updated this revision to Diff 214491.Aug 9 2019, 6:49 PM

Lower the intrinsic to pseudo instructions directly, instead of SelectDAG nodes first.

jcai19 marked 4 inline comments as done.Aug 9 2019, 6:54 PM

@efriedma I have changed my implementation to lower llvm.gnu.eabi.mcount intrinsic into pseudo instructions directly, instead of first lowering them into SelectionDAG call nodes. Thanks.

llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
1156 ↗(On Diff #214263)

Sorry, I forgot to remove it. I was using it to debug my change locally.

llvm/test/CodeGen/ARM/gnu_mcount_nc.ll
1–2 ↗(On Diff #214263)

That's a good point. Checks added for fast-isel and global-isel.

jcai19 updated this revision to Diff 214496.Aug 9 2019, 7:04 PM
jcai19 marked 2 inline comments as done.

clang-format the patch.

jcai19 marked an inline comment as done.Aug 9 2019, 7:05 PM
kristof.beyls added inline comments.Aug 12 2019, 12:04 AM
llvm/test/CodeGen/ARM/gnu_mcount_nc.ll
1–6 ↗(On Diff #214496)

It seems the -fast-isel/-global-isel command line options are missing in the RUN lines aiming to test fast and global isel do the right thing?

jcai19 marked an inline comment as done.Aug 12 2019, 12:01 PM
jcai19 added inline comments.
llvm/test/CodeGen/ARM/gnu_mcount_nc.ll
1–6 ↗(On Diff #214496)

Sorry must have forgotten to add the instruction selection options while copying the RUN commands. Just tested locally and -fast-isel option worked, although -global-isel failed due to "LLVM ERROR: unable to map instruction: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.arm.gnu.eabi.mcount)". It seems global-isel does not fall back to DAGISel? Will have to investigate further.

It seems global-isel does not fall back to DAGISel?

It does, for targets where it's enabled by default, or if you use the right flags. I think you want -global-isel -global-isel-abort=2?

jcai19 updated this revision to Diff 214696.Aug 12 2019, 1:07 PM

Add proper instruction selection options to unit test.

It seems global-isel does not fall back to DAGISel?

It does, for targets where it's enabled by default, or if you use the right flags. I think you want -global-isel -global-isel-abort=2?

Yes -global-isel-abort=2 fixed the issue. Thanks!

jcai19 updated this revision to Diff 214698.Aug 12 2019, 1:19 PM

Added back an accidently-deleted blank line.

llvm/lib/Target/ARM/ARMISelLowering.cpp
3485 ↗(On Diff #214698)

Op.getOperand(0).getValueType() == MVT::Other ? 1 : 0 could be replaced with Op.getOperand(0).getValueType() == MVT::Other

3487 ↗(On Diff #214698)

Why construct dl if we don't use it in the default case, or under certain conditions below? Maybe move the definition closer to its use below. Though I see temporary SDLoc(Op) below, which should be sufficient (so you can remove dl).

efriedma added inline comments.Aug 12 2019, 1:32 PM
llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
1927 ↗(On Diff #214696)

I think you need to ensure that lr actually contains the correct value, somehow. Normally the call will come before anything that would clobber lr, but you're not actually enforcing that anywhere: LR isn't listed as an input to BL_PUSHLR.

To make this work correctly, I think the return address actually needs to be an argument to the BL_PUSHLR instruction. See ARMTargetLowering::LowerRETURNADDR for how to make an appropriate copy.

llvm/test/CodeGen/ARM/gnu_mcount_nc.ll
6 ↗(On Diff #214696)

Please add -verify-machineinstrs to all these invocations.

jcai19 updated this revision to Diff 214884.Aug 13 2019, 11:04 AM

Mark LR as live-in at (t)BL_PUSHLR instruction.

jcai19 marked 4 inline comments as done.Aug 13 2019, 11:13 AM
jcai19 added inline comments.
llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
1927 ↗(On Diff #214696)

My takeaway from your comment is to mark LR explicitly alive to make sure compiler will restore LR if it clobbers the register before the BL_PUSHLR instruction. Did I understand correctly? Thanks. Anyway, this change seems to be needed once I turned on -verify-machineinstrs in the unit test, which complained the push/stmdb instruction trying to use a dead LR register.

jcai19 updated this revision to Diff 214889.Aug 13 2019, 11:27 AM

clang-format

llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
1931 ↗(On Diff #214889)

should there be a space in this comment (and the one on line 1941) between bl and __gnu_mcount_nc?

jcai19 updated this revision to Diff 214908.Aug 13 2019, 1:40 PM

Fix a typo.

jcai19 marked 2 inline comments as done.Aug 13 2019, 1:41 PM
jcai19 added inline comments.
llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
1931 ↗(On Diff #214889)

Yes you are right. Thanks!

llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
1922 ↗(On Diff #214908)

This could be Register ReturnReg = MI.getOperand(0).getReg(); then the below cleaned up. DRY

(and a few more opportunities in the return values of ARMTargetLowering::LowerINTRINSIC_VOID)

With that change, LGTM, and thank you for the patch!

jcai19 updated this revision to Diff 214945.Aug 13 2019, 3:18 PM
jcai19 marked an inline comment as done.

Updates based on comments.

jcai19 marked an inline comment as done.Aug 13 2019, 3:19 PM
jcai19 added inline comments.
llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
1922 ↗(On Diff #214908)

Thank you for all the comments! I have made changes accordingly.

nickdesaulniers accepted this revision.Aug 14 2019, 2:49 PM
nickdesaulniers added a subscriber: eli.friedman.

Great! LGTM and thank you for this patch. Please give 24hrs for @eli.friedman or @kristof.beyls to leave comments before merging.

This revision is now accepted and ready to land.Aug 14 2019, 2:49 PM

Great! LGTM and thank you for this patch. Please give 24hrs for @eli.friedman or @kristof.beyls to leave comments before merging.

Sounds good! Thanks for all the comments.

This revision was automatically updated to reflect the committed changes.
This comment was removed by jcai19.
jcai19 reopened this revision.Aug 16 2019, 4:22 PM
This revision is now accepted and ready to land.Aug 16 2019, 4:22 PM
jcai19 updated this revision to Diff 215710.Aug 16 2019, 4:25 PM

Fix frontend mcount unit tests.

jcai19 closed this revision.Aug 16 2019, 4:34 PM

Upsteamed to r369173.