__gnu_mcount_nc expects the value of LR register to be the top value of the stack on ARM32.
Clang currently does not push the value before the call.
Fixes PR33845.
Differential D65019
[ARM] push LR before __gnu_mcount_nc jcai19 on Jul 19 2019, 2:35 PM. Authored by
Details __gnu_mcount_nc expects the value of LR register to be the top value of the stack on ARM32. Fixes PR33845.
Diff Detail
Event TimelineComment Actions Thanks for this! I realize I'm not a reviewer, but have a few nits in passing anyway. :) I don't plan to stamp this because I've zero familiarity here.
Comment Actions I don't like introducing magic calling convention rules for a non-intrinsic function with a specific name; we should model this more explicitly somehow. Probably the simplest thing to do would be to introduce an ARM intrinsic, lower it to a pseudo-instruction, and expand it late (in ExpandPostRAPseudos or something like that). Comment Actions Thanks for the suggestions, will start work on it.. I also realized this solution may have other issues that @t.p.northover kindly pointed out in another code review that I accidentally creates (https://reviews.llvm.org/D65037) for the same change. Comment Actions So I have implemented the codegen part as suggested. To add a new intrinsic for this call seems would introduce quite some code to the target-independent part of SelectionDAG, specifically SelectionDAGBuilder and SelectionDAG classes, just for ARM. Is that wanted? Thanks. Comment Actions
Not sure why you think this would be necessary. Every target has target-specific intrinsics, and we have infrastructure to ensure they don't require any changes to target-independent code.
Comment Actions I guess what I was trying to say was that I could not find a way to handle the new ARM intrinsic in SelectionDAGBuilder::visitTargetIntrinsic easily with the existing code so I had to introduce a case just for the ARM intrinsic in SelectionDAGBuilder::visitIntrinsic, and I am not sure if that is acceptable. Comment Actions Thanks so much for this patch! I look forward to support for __gnu_mcount for the arm32 Linux kernel.
haha
Comment Actions This seems better. I'm not sure I follow why this needs special handling in SelectionDAGBuilder::visitIntrinsicCall, as opposed to just using ISD::INTRINSIC_VOID like other similar target-specific intrinsics. (You can custom-lower INTRINSIC_VOID in ARMTargetLowering::LowerOperation, if that makes it easier.) I'd just skip changing fast-isel; with an intrinsic, if fast-isel misses, we just fall back to the SelectionDAG code that does the right thing.
Comment Actions Thanks for the suggestion, and I agree the code would look cleaner this way. But I have some questions on implementation details, and please bear with me if they seem naive since I am new to backend. So I have been trying to reuse the code of ARMTargetLowering::LowerCall to build a SelectionDAG call node for the new intrinsic, which essentially is a function call with a push instruction before. This is also how some intrinsics like memcpy or memet implemented, which get lowered to target-specific calls at SelectionDAGBuilder::visitIntrinsicCall. If we wait until ARMTargetLowering::LowerOperation when legalizing DAGs to lower the intrinsic, then I am not sure if we can still reuse the code, as some of the information needed is gone by this stage. I did see code handling ARM intrinsic on ARMTargetLowering::LowerOperation, but they didn't seem to need to generate function calls later. Comment Actions Yes, it's technically a "call", but you don't need the call lowering code. That's dedicated to stuff like passing arguments, returning values, checking whether the function can be tail-called, etc. None of that applies here; the intrinsic always corresponds to exactly one pseudo-instruction, a BL_PUSHLR.
Comment Actions I've just added a few fly-by nits; I'm afraid I didn't do an in-depth review.
Comment Actions Lower the intrinsic to pseudo instructions directly, instead of SelectDAG nodes first. Comment Actions @efriedma I have changed my implementation to lower llvm.gnu.eabi.mcount intrinsic into pseudo instructions directly, instead of first lowering them into SelectionDAG call nodes. Thanks.
Comment Actions
It does, for targets where it's enabled by default, or if you use the right flags. I think you want -global-isel -global-isel-abort=2?
Comment Actions Great! LGTM and thank you for this patch. Please give 24hrs for @eli.friedman or @kristof.beyls to leave comments before merging. This comment was removed by jcai19. |
I wonder whether this is a good debug printing line to commit?
IIUC, this will print every MI instruction that gets looked at by ArmExpandPseudo.
I would imagine that that could produce too much noise. It'd be more interesting if only the MIs that actually got transformed would be printed.
But maybe best to just not add this debug printing line in this patch?