__gnu_mcount_nc expects the value of LR register to be the top value of the stack on ARM32.
Clang currently does not push the value before the call.
Fixes PR33845.
__gnu_mcount_nc expects the value of LR register to be the top value of the stack on ARM32.
Clang currently does not push the value before the call.
Fixes PR33845.
llvm/lib/Target/ARM/ARMFastISel.cpp | ||
---|---|---|
211 ↗ | (On Diff #213159) | There should be only 2 references to this function. But this argument is likely to be false in most cases. |
2417 ↗ | (On Diff #213159) | I am not sure why the name is prefixed with \01 either but it was there in the original code. |
llvm/test/CodeGen/ARM/gnu_mcount_nc.ll | ||
6 ↗ | (On Diff #213159) | They are unnecessary, I have removed them. |
llvm/lib/Target/ARM/ARMFastISel.cpp | ||
---|---|---|
211 ↗ | (On Diff #213159) | Then I'd just make it an explicit arg and update the 2 call sites. If there were many call sites, then the default param would cut down on code churn, but I don't think 2 call sites is unreasonable to just be explicit about such arguments. |
2418 ↗ | (On Diff #213748) | memcmp is a code smell in a C++ codebase, but I see that IntrinsicName is a C style string. Is there a reason why strncmp isn't used? |
2587 ↗ | (On Diff #213748) | {} are not needed here since you're not introducing a new scope for variables. |
llvm/test/CodeGen/ARM/gnu_mcount_nc.ll | ||
11 ↗ | (On Diff #213748) | This test case can probably be simplified to just a call and ret void. |
llvm/lib/Target/ARM/ARMFastISel.cpp | ||
---|---|---|
2418 ↗ | (On Diff #213748) | Good catch! I was thinking about one of str* functions and somehow ended up using memcpy. Guess I haven't written C code for a while :). Anyway, maybe strcmp is better here as the size of the two strings should match too? |
This seems better.
I'm not sure I follow why this needs special handling in SelectionDAGBuilder::visitIntrinsicCall, as opposed to just using ISD::INTRINSIC_VOID like other similar target-specific intrinsics. (You can custom-lower INTRINSIC_VOID in ARMTargetLowering::LowerOperation, if that makes it easier.)
I'd just skip changing fast-isel; with an intrinsic, if fast-isel misses, we just fall back to the SelectionDAG code that does the right thing.
clang/lib/Basic/Targets/ARM.cpp | ||
---|---|---|
325 ↗ | (On Diff #213762) | Doesn't require changes, but for anyone curious about the \01, see the comment in MangleContext::mangleName. |
Thanks for the suggestion, and I agree the code would look cleaner this way. But I have some questions on implementation details, and please bear with me if they seem naive since I am new to backend. So I have been trying to reuse the code of ARMTargetLowering::LowerCall to build a SelectionDAG call node for the new intrinsic, which essentially is a function call with a push instruction before. This is also how some intrinsics like memcpy or memet implemented, which get lowered to target-specific calls at SelectionDAGBuilder::visitIntrinsicCall. If we wait until ARMTargetLowering::LowerOperation when legalizing DAGs to lower the intrinsic, then I am not sure if we can still reuse the code, as some of the information needed is gone by this stage. I did see code handling ARM intrinsic on ARMTargetLowering::LowerOperation, but they didn't seem to need to generate function calls later.
Yes, it's technically a "call", but you don't need the call lowering code. That's dedicated to stuff like passing arguments, returning values, checking whether the function can be tail-called, etc. None of that applies here; the intrinsic always corresponds to exactly one pseudo-instruction, a BL_PUSHLR.
clang/lib/Basic/Targets/ARM.cpp | ||
---|---|---|
325 ↗ | (On Diff #213762) | Thanks for the reference! |
I've just added a few fly-by nits; I'm afraid I didn't do an in-depth review.
llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp | ||
---|---|---|
1156 ↗ | (On Diff #214263) | I wonder whether this is a good debug printing line to commit? |
1931–1932 ↗ | (On Diff #214263) | Did you clang-format the patch? |
llvm/test/CodeGen/ARM/gnu_mcount_nc.ll | ||
1–2 ↗ | (On Diff #214263) | Given that the push-lr transform only gets implemented for DAGISel (IIUC), maybe it'd be useful to also have test run lines that check the correct thing happens when using fastisel and globalisel (presumably by falling back to DAGISel)? |
Lower the intrinsic to pseudo instructions directly, instead of SelectDAG nodes first.
@efriedma I have changed my implementation to lower llvm.gnu.eabi.mcount intrinsic into pseudo instructions directly, instead of first lowering them into SelectionDAG call nodes. Thanks.
llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp | ||
---|---|---|
1156 ↗ | (On Diff #214263) | Sorry, I forgot to remove it. I was using it to debug my change locally. |
llvm/test/CodeGen/ARM/gnu_mcount_nc.ll | ||
1–2 ↗ | (On Diff #214263) | That's a good point. Checks added for fast-isel and global-isel. |
llvm/test/CodeGen/ARM/gnu_mcount_nc.ll | ||
---|---|---|
1–6 ↗ | (On Diff #214496) | It seems the -fast-isel/-global-isel command line options are missing in the RUN lines aiming to test fast and global isel do the right thing? |
llvm/test/CodeGen/ARM/gnu_mcount_nc.ll | ||
---|---|---|
1–6 ↗ | (On Diff #214496) | Sorry must have forgotten to add the instruction selection options while copying the RUN commands. Just tested locally and -fast-isel option worked, although -global-isel failed due to "LLVM ERROR: unable to map instruction: G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.arm.gnu.eabi.mcount)". It seems global-isel does not fall back to DAGISel? Will have to investigate further. |
It seems global-isel does not fall back to DAGISel?
It does, for targets where it's enabled by default, or if you use the right flags. I think you want -global-isel -global-isel-abort=2?
llvm/lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
3485 ↗ | (On Diff #214698) | Op.getOperand(0).getValueType() == MVT::Other ? 1 : 0 could be replaced with Op.getOperand(0).getValueType() == MVT::Other |
3487 ↗ | (On Diff #214698) | Why construct dl if we don't use it in the default case, or under certain conditions below? Maybe move the definition closer to its use below. Though I see temporary SDLoc(Op) below, which should be sufficient (so you can remove dl). |
llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp | ||
---|---|---|
1927 ↗ | (On Diff #214696) | I think you need to ensure that lr actually contains the correct value, somehow. Normally the call will come before anything that would clobber lr, but you're not actually enforcing that anywhere: LR isn't listed as an input to BL_PUSHLR. To make this work correctly, I think the return address actually needs to be an argument to the BL_PUSHLR instruction. See ARMTargetLowering::LowerRETURNADDR for how to make an appropriate copy. |
llvm/test/CodeGen/ARM/gnu_mcount_nc.ll | ||
6 ↗ | (On Diff #214696) | Please add -verify-machineinstrs to all these invocations. |
llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp | ||
---|---|---|
1927 ↗ | (On Diff #214696) | My takeaway from your comment is to mark LR explicitly alive to make sure compiler will restore LR if it clobbers the register before the BL_PUSHLR instruction. Did I understand correctly? Thanks. Anyway, this change seems to be needed once I turned on -verify-machineinstrs in the unit test, which complained the push/stmdb instruction trying to use a dead LR register. |
llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp | ||
---|---|---|
1931 ↗ | (On Diff #214889) | should there be a space in this comment (and the one on line 1941) between bl and __gnu_mcount_nc? |
llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp | ||
---|---|---|
1931 ↗ | (On Diff #214889) | Yes you are right. Thanks! |
llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp | ||
---|---|---|
1922 ↗ | (On Diff #214908) | This could be Register ReturnReg = MI.getOperand(0).getReg(); then the below cleaned up. DRY (and a few more opportunities in the return values of ARMTargetLowering::LowerINTRINSIC_VOID) With that change, LGTM, and thank you for the patch! |
llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp | ||
---|---|---|
1922 ↗ | (On Diff #214908) | Thank you for all the comments! I have made changes accordingly. |
Great! LGTM and thank you for this patch. Please give 24hrs for @eli.friedman or @kristof.beyls to leave comments before merging.