This is an archive of the discontinued LLVM Phabricator instance.

Add a heap alloc site marker field to the ExtraInfo in MachineInstrs.
ClosedPublic

Authored by akhuang on Oct 17 2019, 1:52 PM.

Details

Summary

Add instruction marker to MachineInstr ExtraInfo. This does almost the
same thing as Pre/PostInstrSymbols, except that it doesn't create a label until
printing instructions. This allows for labels to be put around instructions that
are deleted/duplicated somewhere.
Use this marker to track heap alloc site call instructions.

Also undo the workaround in r375137.

This is a fix for https://bugs.llvm.org/show_bug.cgi?id=43479#add_comment

Diff Detail

Event Timeline

akhuang created this revision.Oct 17 2019, 1:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 17 2019, 1:52 PM
rnk added inline comments.Oct 17 2019, 3:29 PM
llvm/include/llvm/CodeGen/MachineFunction.h
324–325

Now that CodeViewDebug is going to be in charge of filling in this data structure, I think it should be moved to CodeViewDebug::FunctionInfo. Actually, I see we already have them there, so we can probably remove this vector completely.

325

I don't think this map keyed on MachineInstr pointers will do the right thing in the presence of dead code elimination and code duplication.

For dead code elimination, it could have an ABA problem like the SelectionDAG nodes had before, where our solution was to clear the map after codegen. I think MachineInstr addresses can be recycled if a call is deleted and another instruction allocated at the same address.

For code duplication, are you sure this looks up the correct type? I would expect one of the instructions to not be present in here, and return a null DIType. For the taildup-heapallocsite.ll test case that I added, can you make sure that both S_HEAPALLOCSITE records use the same type indices?

I think you can store the DIType directly in the ExtraInfo struct and that will mean we don't need this anymore.

llvm/include/llvm/CodeGen/MachineInstr.h
248

This label doesn't seem like it's ever emitted. It's really just a special label with the name "heapallocsite". I think you can store the DIType* here, and use that as a heap allocation call site indicator. Another option would be to store metadata references here, which would make it extensible. We could, for example, have this point to a vector of pairs of fixed metadata kinds (MD_dbg, MD_tbaa, etc) and Metadata*. It would provide a natural pathway from moving any kind of metadata annotation from an LLVM IR instruction to a MachineInstr.

Whichever attachment you choose, it should avoid the need for the extra DenseMap, and it should get copied when tail duplication clones the instruction.

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1072–1078

I think it would be best to, as much as possible, avoid having codeview specific codepaths here.

llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
1106–1109

It'll be nice to avoid this. :)

1413–1414

I think you can use this code pattern here to find all the heap allocation sites, and save them on FunctionInfo for later. This beginFunctionImpl method is sort of the place where the debug info handlers get a chance to look at the entire function before each instruction is emitted. We already do big up front passes over the whole function to create the lexical scope tree, which is what places labels around block scopes.

You should be able to reuse the requestLabelBeforeInsn / requestLabelAfterInsn APIs. The label names won't be as easy to FileCheck, but there will be fewer labels in the output, which is cleaner in some ways.

llvm/lib/CodeGen/MachineInstr.cpp
508–509

This doesn't track the new instrmarker, so now postinstr symbols and instrmarkers can't coexist on the same instruction.

539–540

I don't think this does the right thing. If I understand correctly, Info is a single pointer. The .set call overwrites the pointer, and then the second call to getPostInstrSymbol will always return null, and then the .set call will reset the pointer to null again.

This extra info code is subtle. Now that there are five possibilities, we might want to come up with some shared logic for this.

Actually, I'm surprised we didn't write unit tests for this in the first place. Oops. It's highly testable data manipulation. You can see examples in llvm/unittests/CodeGen/MachineInstrTest.cpp.

rnk added a comment.Oct 17 2019, 3:38 PM

Thanks for putting up with these micro-optimized data structures, sorry they're opaque.

llvm/lib/CodeGen/MachineInstr.cpp
539–540

I meant to elaborate on what I meant by "shared logic": I was thinking that all of the "extra info" accessors should call a routine that takes:

  • an MMO ArrayRef
  • a pre instr symbol
  • a post instr symbol
  • the new thing, DIType*, Metatadata*, or an ArrayRef<pair<unsigned, Metadata*>>

It then checks each case in turn:

  • only one MMO and nothing else: use the EIIK_MMO case
  • only one pre instr symbol: use the EIIK_PreInstr case

...

  • if no cases apply, make the extra info struct
akhuang updated this revision to Diff 225908.Oct 21 2019, 10:15 AM
akhuang marked 7 inline comments as done.

-Moved the logic for editing extra info for machine instrs into a separate function.
-Moved label creation code into CodeViewDebug.
-Remove now unused heapallocsite vector from MachineFunction.

I still need to write unit tests for the MachineInstr stuff. I've just been figuring
how to generate MCSymbols from the faked MachineFunction.

akhuang updated this revision to Diff 225910.Oct 21 2019, 10:18 AM

Clang format

rnk accepted this revision.Oct 21 2019, 3:38 PM

Nice, this looks good. :)

I think if we're going to do full due diligence for this system which we are starting to depend on, next we should serialize this data via MIR. In the MIParser we only handle MMOs, but we should handle these labels and heap alloc site markers here:
https://github.com/llvm/llvm-project/blob/18f805a/llvm/lib/CodeGen/MIRParser/MIParser.cpp#L953
That can be a separate change.

llvm/test/CodeGen/X86/taildup-heapallocsite.ll
45–57

Let's CHECK for two S_HEAPALLOCSITE records as well.

This revision is now accepted and ready to land.Oct 21 2019, 3:38 PM
akhuang updated this revision to Diff 226350.Oct 24 2019, 4:08 PM
akhuang marked an inline comment as done.
  • Add unit tests for MachineInstr extra info
  • Fix logic for the assert in setExtraInfo

Sounds good. I also realized that the logic for the "Should never have only a single symbol allocated out-of-line!" is wrong now. I added a parameter to the function to keep track of whether the assert should happen. Or do you think it can just be removed?

akhuang retitled this revision from Add an instruction marker field to the ExtraInfo in MachineInstrs. to Add a heap alloc site marker field to the ExtraInfo in MachineInstrs..Oct 24 2019, 4:12 PM
akhuang edited the summary of this revision. (Show Details)
rnk accepted this revision.Oct 24 2019, 4:49 PM
rnk added inline comments.
llvm/lib/CodeGen/MachineInstr.cpp
331–335

I think you could get rid of all this and the parameter. It seems trivial with the new code structure.

akhuang updated this revision to Diff 226436.Oct 25 2019, 8:41 AM

-remove unnecessary assert, rebase

This revision was automatically updated to reflect the committed changes.