Page MenuHomePhabricator

[ARM] push LR before __gnu_mcount_nc
ClosedPublic

Authored by jcai19 on Jul 19 2019, 2:35 PM.

Details

Summary

__gnu_mcount_nc expects the value of LR register to be the top value of the stack on ARM32.
Clang currently does not push the value before the call.

Fixes PR33845.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jcai19 updated this revision to Diff 213748.Aug 6 2019, 4:36 PM
jcai19 marked 6 inline comments as done.

Update based on comments.

jcai19 marked an inline comment as done.Aug 6 2019, 4:37 PM
jcai19 added inline comments.
llvm/lib/Target/ARM/ARMFastISel.cpp
211 ↗(On Diff #213159)

There should be only 2 references to this function. But this argument is likely to be false in most cases.

2417 ↗(On Diff #213159)

I am not sure why the name is prefixed with \01 either but it was there in the original code.

llvm/test/CodeGen/ARM/gnu_mcount_nc.ll
7

They are unnecessary, I have removed them.

jcai19 marked 2 inline comments as done.Aug 6 2019, 4:38 PM
nickdesaulniers added inline comments.Aug 6 2019, 5:05 PM
llvm/lib/Target/ARM/ARMFastISel.cpp
2418 ↗(On Diff #213748)

memcmp is a code smell in a C++ codebase, but I see that IntrinsicName is a C style string. Is there a reason why strncmp isn't used?

2587 ↗(On Diff #213748)

{} are not needed here since you're not introducing a new scope for variables.

211 ↗(On Diff #213159)

Then I'd just make it an explicit arg and update the 2 call sites. If there were many call sites, then the default param would cut down on code churn, but I don't think 2 call sites is unreasonable to just be explicit about such arguments.

llvm/test/CodeGen/ARM/gnu_mcount_nc.ll
12

This test case can probably be simplified to just a call and ret void.

jcai19 updated this revision to Diff 213762.Aug 6 2019, 5:34 PM

Update based on comments.

jcai19 marked 4 inline comments as done.Aug 6 2019, 5:38 PM
jcai19 added inline comments.
llvm/lib/Target/ARM/ARMFastISel.cpp
2418 ↗(On Diff #213748)

Good catch! I was thinking about one of str* functions and somehow ended up using memcpy. Guess I haven't written C code for a while :). Anyway, maybe strcmp is better here as the size of the two strings should match too?

jcai19 retitled this revision from [ARM] push LR before __gnu_mcount_nc on ARM to [ARM] push LR before __gnu_mcount_nc.Aug 6 2019, 5:41 PM

This seems better.

I'm not sure I follow why this needs special handling in SelectionDAGBuilder::visitIntrinsicCall, as opposed to just using ISD::INTRINSIC_VOID like other similar target-specific intrinsics. (You can custom-lower INTRINSIC_VOID in ARMTargetLowering::LowerOperation, if that makes it easier.)

I'd just skip changing fast-isel; with an intrinsic, if fast-isel misses, we just fall back to the SelectionDAG code that does the right thing.

nickdesaulniers marked an inline comment as done.
nickdesaulniers removed a subscriber: nickdesaulniers.
nickdesaulniers added inline comments.
clang/lib/Basic/Targets/ARM.cpp
326

Doesn't require changes, but for anyone curious about the \01, see the comment in MangleContext::mangleName.

jcai19 marked an inline comment as done.EditedAug 7 2019, 6:21 PM

This seems better.

I'm not sure I follow why this needs special handling in SelectionDAGBuilder::visitIntrinsicCall, as opposed to just using ISD::INTRINSIC_VOID like other similar target-specific intrinsics. (You can custom-lower INTRINSIC_VOID in ARMTargetLowering::LowerOperation, if that makes it easier.)

Thanks for the suggestion, and I agree the code would look cleaner this way. But I have some questions on implementation details, and please bear with me if they seem naive since I am new to backend. So I have been trying to reuse the code of ARMTargetLowering::LowerCall to build a SelectionDAG call node for the new intrinsic, which essentially is a function call with a push instruction before. This is also how some intrinsics like memcpy or memet implemented, which get lowered to target-specific calls at SelectionDAGBuilder::visitIntrinsicCall. If we wait until ARMTargetLowering::LowerOperation when legalizing DAGs to lower the intrinsic, then I am not sure if we can still reuse the code, as some of the information needed is gone by this stage. I did see code handling ARM intrinsic on ARMTargetLowering::LowerOperation, but they didn't seem to need to generate function calls later.

Yes, it's technically a "call", but you don't need the call lowering code. That's dedicated to stuff like passing arguments, returning values, checking whether the function can be tail-called, etc. None of that applies here; the intrinsic always corresponds to exactly one pseudo-instruction, a BL_PUSHLR.

Yes, it's technically a "call", but you don't need the call lowering code. That's dedicated to stuff like passing arguments, returning values, checking whether the function can be tail-called, etc. None of that applies here; the intrinsic always corresponds to exactly one pseudo-instruction, a BL_PUSHLR.

Thanks for the clarification. I will look into it.

jcai19 updated this revision to Diff 214263.Aug 8 2019, 4:47 PM

Lower the new intrinsic when legalizing DAGs.

jcai19 marked an inline comment as done.Aug 8 2019, 4:58 PM
jcai19 added inline comments.
clang/lib/Basic/Targets/ARM.cpp
326

Thanks for the reference!

I've just added a few fly-by nits; I'm afraid I didn't do an in-depth review.

llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
1157

I wonder whether this is a good debug printing line to commit?
IIUC, this will print every MI instruction that gets looked at by ArmExpandPseudo.
I would imagine that that could produce too much noise. It'd be more interesting if only the MIs that actually got transformed would be printed.
But maybe best to just not add this debug printing line in this patch?

1931–1932

Did you clang-format the patch?

llvm/test/CodeGen/ARM/gnu_mcount_nc.ll
2–3

Given that the push-lr transform only gets implemented for DAGISel (IIUC), maybe it'd be useful to also have test run lines that check the correct thing happens when using fastisel and globalisel (presumably by falling back to DAGISel)?

jcai19 updated this revision to Diff 214491.Aug 9 2019, 6:49 PM

Lower the intrinsic to pseudo instructions directly, instead of SelectDAG nodes first.

jcai19 marked 4 inline comments as done.Aug 9 2019, 6:54 PM

@efriedma I have changed my implementation to lower llvm.gnu.eabi.mcount intrinsic into pseudo instructions directly, instead of first lowering them into SelectionDAG call nodes. Thanks.

llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
1157

Sorry, I forgot to remove it. I was using it to debug my change locally.

llvm/test/CodeGen/ARM/gnu_mcount_nc.ll
2–3

That's a good point. Checks added for fast-isel and global-isel.

jcai19 updated this revision to Diff 214496.Aug 9 2019, 7:04 PM
jcai19 marked 2 inline comments as done.

clang-format the patch.

jcai19 marked an inline comment as done.Aug 9 2019, 7:05 PM
kristof.beyls added inline comments.Aug 12 2019, 12:04 AM
llvm/test/CodeGen/ARM/gnu_mcount_nc.ll
2–7

It seems the -fast-isel/-global-isel command line options are missing in the RUN lines aiming to test fast and global isel do the right thing?

jcai19 marked an inline comment as done.Aug 12 2019, 12:01 PM
jcai19 added inline comments.
llvm/test/CodeGen/ARM/gnu_mcount_nc.ll
2–7

Sorry must have forgotten to add the instruction selection options while copying the RUN commands. Just tested locally and -fast-isel option worked, although -global-isel failed due to "LLVM ERROR: unable to map instruction: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.arm.gnu.eabi.mcount)". It seems global-isel does not fall back to DAGISel? Will have to investigate further.

It seems global-isel does not fall back to DAGISel?

It does, for targets where it's enabled by default, or if you use the right flags. I think you want -global-isel -global-isel-abort=2?

jcai19 updated this revision to Diff 214696.Aug 12 2019, 1:07 PM

Add proper instruction selection options to unit test.

It seems global-isel does not fall back to DAGISel?

It does, for targets where it's enabled by default, or if you use the right flags. I think you want -global-isel -global-isel-abort=2?

Yes -global-isel-abort=2 fixed the issue. Thanks!

jcai19 updated this revision to Diff 214698.Aug 12 2019, 1:19 PM

Added back an accidently-deleted blank line.

llvm/lib/Target/ARM/ARMISelLowering.cpp
3552

Op.getOperand(0).getValueType() == MVT::Other ? 1 : 0 could be replaced with Op.getOperand(0).getValueType() == MVT::Other

3554

Why construct dl if we don't use it in the default case, or under certain conditions below? Maybe move the definition closer to its use below. Though I see temporary SDLoc(Op) below, which should be sufficient (so you can remove dl).

efriedma added inline comments.Aug 12 2019, 1:32 PM
llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
1928

I think you need to ensure that lr actually contains the correct value, somehow. Normally the call will come before anything that would clobber lr, but you're not actually enforcing that anywhere: LR isn't listed as an input to BL_PUSHLR.

To make this work correctly, I think the return address actually needs to be an argument to the BL_PUSHLR instruction. See ARMTargetLowering::LowerRETURNADDR for how to make an appropriate copy.

llvm/test/CodeGen/ARM/gnu_mcount_nc.ll
7

Please add -verify-machineinstrs to all these invocations.

jcai19 updated this revision to Diff 214884.Aug 13 2019, 11:04 AM

Mark LR as live-in at (t)BL_PUSHLR instruction.

jcai19 marked 4 inline comments as done.Aug 13 2019, 11:13 AM
jcai19 added inline comments.
llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
1928

My takeaway from your comment is to mark LR explicitly alive to make sure compiler will restore LR if it clobbers the register before the BL_PUSHLR instruction. Did I understand correctly? Thanks. Anyway, this change seems to be needed once I turned on -verify-machineinstrs in the unit test, which complained the push/stmdb instruction trying to use a dead LR register.

jcai19 updated this revision to Diff 214889.Aug 13 2019, 11:27 AM

clang-format

llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
1932

should there be a space in this comment (and the one on line 1941) between bl and __gnu_mcount_nc?

jcai19 updated this revision to Diff 214908.Aug 13 2019, 1:40 PM

Fix a typo.

jcai19 marked 2 inline comments as done.Aug 13 2019, 1:41 PM
jcai19 added inline comments.
llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
1932

Yes you are right. Thanks!

llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
1923

This could be Register ReturnReg = MI.getOperand(0).getReg(); then the below cleaned up. DRY

(and a few more opportunities in the return values of ARMTargetLowering::LowerINTRINSIC_VOID)

With that change, LGTM, and thank you for the patch!

jcai19 updated this revision to Diff 214945.Aug 13 2019, 3:18 PM
jcai19 marked an inline comment as done.

Updates based on comments.

jcai19 marked an inline comment as done.Aug 13 2019, 3:19 PM
jcai19 added inline comments.
llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp
1923

Thank you for all the comments! I have made changes accordingly.

nickdesaulniers accepted this revision.Aug 14 2019, 2:49 PM
nickdesaulniers added a subscriber: eli.friedman.

Great! LGTM and thank you for this patch. Please give 24hrs for @eli.friedman or @kristof.beyls to leave comments before merging.

This revision is now accepted and ready to land.Aug 14 2019, 2:49 PM

Great! LGTM and thank you for this patch. Please give 24hrs for @eli.friedman or @kristof.beyls to leave comments before merging.

Sounds good! Thanks for all the comments.

This revision was automatically updated to reflect the committed changes.
This comment was removed by jcai19.
jcai19 reopened this revision.Aug 16 2019, 4:22 PM
This revision is now accepted and ready to land.Aug 16 2019, 4:22 PM
jcai19 updated this revision to Diff 215710.Aug 16 2019, 4:25 PM

Fix frontend mcount unit tests.

jcai19 closed this revision.Aug 16 2019, 4:34 PM

Upsteamed to r369173.