This is an archive of the discontinued LLVM Phabricator instance.

[MachineFunction] Base support for call site info tracking
ClosedPublic

Authored by djtodoro on Apr 24 2019, 5:14 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

djtodoro created this revision.Apr 24 2019, 5:14 AM

Do we ever see ourselves supporting anything except registers here?

include/llvm/CodeGen/MIRYamlMapping.h
346 ↗(On Diff #196426)

We only have an unsigned : 16 for this in DILocalVariable, and realistically there won't be more than 256 arguments passed in *registers* on any architecture, so this could be an unsigned char.

StringValue Reg;
unsigned char / uint16_t ArgNo;

oh wait.. this is only for YAML? I don't care then :-)

include/llvm/CodeGen/MachineFunction.h
380 ↗(On Diff #196426)

... but here I do!

struct ArgRegPair {
  unsigned Reg;
  unsigned ArgNo : 16;
}
bjope added inline comments.Apr 25 2019, 12:21 AM
include/llvm/CodeGen/MachineFunction.h
379 ↗(On Diff #196426)

My experience is that argument numbers are tricky. The source function on C level could have X arguments, which becomes Y arguments in LLVM IR, and then after lowering we get Z arguments. Since this is in MachineFunction I would assume that this counts the lowered formal arguments or something like that (even though I have no idea how that will be mapped back to source level arguments later in debug info)?

Maybe we want to explain what ArgNo is (how is it counted). Or is that explained somewhere else (then perhaps some kind of reference could be helpful)?

aprantl added inline comments.Apr 25 2019, 8:23 AM
include/llvm/CodeGen/MachineFunction.h
379 ↗(On Diff #196426)

That is a very good point. How does this behave for

struct pair { int a; int b; };

struct pair foo(struct pair p) {}

on x86_64, where a and b are passed in argument registers 0 and 1, respectively?

NikolaPrica added inline comments.Apr 25 2019, 9:09 AM
include/llvm/CodeGen/MachineFunction.h
379 ↗(On Diff #196426)

@bjope You are right, ArgNo represents the "ID" of lower argument. For now we cover cases when argument is passed through one register. But in the future ArgNo should be used to cover cases when argument transferring is split to two registers and then we would have:

ArgNo: 1 : Reg0
ArgNo: 1 : Reg1

But if we want to preserve mappings between source-IR-MachineIR I guess that we would just need to adjust those ArgNos to match them.
We will make a comment about this.

djtodoro updated this revision to Diff 199216.May 13 2019, 1:39 AM

-Update ArgRegPair struct
-Rebase

bjope added inline comments.May 13 2019, 10:20 AM
lib/CodeGen/MIRParser/MIRParser.cpp
440 ↗(On Diff #199216)

I guess it is OK to silently ignore the CallSitesInfo if the support isn't enabled (although I do not know if we have solved similar compatibility problems differently before). Another alternative is ofcourse to give an error if we find CallSiteInfo in the MIR and it isn't enabled. Maybe the latter is easier for the user to understand (no risk for a surpise that it just is silently ignored).

Right now we do not even parse and validate the named reigster references in the CallSiteInfo when EnableDebugEntryValues is false. So we also have the option of checking the EnableDebugEntryValues late (guarding the addCallArgsForwardingRegs call) to not ignore, but simply don't use the info when EnableDebugEntryValues is false.

Is the "silent ignore" option carefully chosen here? Either way, I think some code comments could elaborate a little bit about the choice.

441 ↗(On Diff #199216)

I suggest creating a helper function for this (similar to setupRegisterInfo etc. above) instead of inlining it here in initializeMachineFunction. Maybe it should include the check for EnableDebugEntryValues (might depend on the above questions about the behavior when there is call site info and EnableDebugEntryValues is false).

aprantl added inline comments.
include/llvm/CodeGen/MIRYamlMapping.h
354 ↗(On Diff #199216)

///

thegameg added a subscriber: qcolombet.

I think this is the first type of entry in MIR that references instructions directly, and it seems quite easy to forget to update the offset in the block when adding/removing instructions, but I guess checking for ranges and if the instruction is a call might help.

@qcolombet, did anything like this come up in the past?

lib/CodeGen/MIRParser/MIRParser.cpp
440 ↗(On Diff #199216)

I agree with @bjope. We should at least validate all parts of the MIR, and I would prefer having an error if it's not enabled than just silently ignore it (and have a test for both cases). Either way, also agree with adding comments regardless of the direction this takes.

442 ↗(On Diff #199216)

Should this be somehow validated too? We should at least check if the block is in range and if the instruction is in range, and at least a test for each case.

445 ↗(On Diff #199216)

Should this be an error?

did anything like this come up in the past?

Not that I know of and I agree this seems very fragile.

Side question, I thought the right way to fix that was to provide better APIs to query ABI lowering information for debugger and other tools. Adding this information here seems a step in a wrong direction.

djtodoro updated this revision to Diff 199790.May 16 2019, 5:02 AM

-Add a helper function for initialization of call site info
-Add tests for error checking of call site initialization

Maybe it would be better to use a call instruction number? For example, use {id: 1} for first call instruction in MIR stream instead of {bb: 0 , offset: 3}? Or use {bb:0 , ID: 1} for the first instruction in basic block 0?

Side question, I thought the right way to fix that was to provide better APIs to query ABI lowering information for debugger and other tools. Adding this information here seems a step in a wrong direction.

@qcolombet Thanks for participating in this! Call ABI lowering happens at ISEL stage so we thought that it is the right place to collect information about it and preserve it inside MF. We are open for suggestions to try out. Please provide us with more details about your idea.

Please provide us with more details about your idea.

When we lower a call we invoke some method to do the lowering to register, frame index etc. The idea would be to extract this code (easier said than done) into its own library and use this API to produce the debug info instead of carrying this information in the MachineFunction.

Please provide us with more details about your idea.

When we lower a call we invoke some method to do the lowering to register, frame index etc. The idea would be to extract this code (easier said than done) into its own library and use this API to produce the debug info instead of carrying this information in the MachineFunction.

Thanks for the suggestion. Sorry for the late response. This would indeed be very useful.

We think that there are too many difficulties for bringing such API. In order to provide such API, we would need to know what function is called with the MIR call instruction in order to invoke APIs lowering process for that function. This should be easy for direct calls, (i.e. we have pointer to global function and we could somehow invoke lowering process for it, yet there is the question: how we should handle variadic function calls). However, for indirect calls we cannot think of any good way. Some kind of mapping between an IR call instruction and the MIR call instruction is needed for that (but this again requires carrying additional information and it is not suitable for other tools I suppose). Different mapping could possibly be achieved by debug/source location of the MIR call instruction. Basically, that is extracting information from the source code. That does not sound good at all! If we ignore the fact that we significantly raise the complexity of this feature with such approach, calling fronted procedures in order to reconstruct the calling context smells like significant overhead at least.

Currently we are inserting additional code in call lowering methods to collect call arguments forwarding information. If someone comes up with good solution, the only thing that needs to be changed is the place where we store collected information and to remove all code that updates MachineFunction’s call site info through the backend.

aprantl accepted this revision.May 28 2019, 9:46 AM

minor nitpick inside.

include/llvm/CodeGen/MachineFunction.h
387 ↗(On Diff #201509)

even better:

unsigned Reg;
uint16_t ArgNo;

(in that order)

This revision is now accepted and ready to land.May 28 2019, 9:46 AM
djtodoro updated this revision to Diff 201882.May 29 2019, 5:49 AM

-Addressing suggestions

djtodoro updated this revision to Diff 203798.Jun 10 2019, 4:30 AM

-Remove an unwanted code refactor change

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2019, 12:50 AM