This is an archive of the discontinued LLVM Phabricator instance.

MIR Statepoint refactoring. Part 4: ISEL changes.
ClosedPublic

Authored by dantrushin on Jun 11 2020, 5:12 AM.

Details

Summary

Implement variadic-def statepoint lowering.
Assign N base/derived non-constant pointers to VRegs, pass others
on stack slots, as in old implementation.
So new statepoint looks like this:

    
reloc1,reloc2,... = STATEPOINT ..., base1, derived1<tied-def0>, base2, derived2<tied-def1>, ...

N is limited by the maximal number of tied registers machine
instruction can have (15 at the moment).
If any of gc.relocates are not in the same basic block (as the case for
invoke statepoint), use old spilling scheme too.

Full change set is available at D81603.

Diff Detail

Event Timeline

dantrushin created this revision.Jun 11 2020, 5:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 11 2020, 5:12 AM
reames requested changes to this revision.Jun 19 2020, 2:50 PM

Denis, I'm having a really hard time wrapping my head around this code, even knowing what it is supposed to do. We need to clean this up a bit; it's not in a state where I can approve it.

A couple of suggestions for you:

  1. Since some of the complexity here seems specific to invokes, let's restrict the first patch to call instructions. If we see an invoke, we can simply use the stack lowering.
  2. The handling around needing the first N GC values in a sorted order seems overly complicated. I was also unsure about the correctness of derived pointer handling when base != derived. What would you think of explicitly keeping track of which operands we've created that are tied def/use pairs? We could visit operands in the current order and then use the stack lowering when the size of the defs list is larger than our threshold.
  3. Testing wise, I think it make sense to start with a new test file, build out a set of tests which exercise this, and only update all of the tests once we have something which is expected to work end to end. We're not there yet.
llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp
70

For consistency with the above, please call this "-use-registers-for-gc-values".

This revision now requires changes to proceed.Jun 19 2020, 2:50 PM
  1. Since some of the complexity here seems specific to invokes, let's restrict the first patch to call instructions. If we see an invoke, we can simply use the stack lowering.

Are you talking about that cross-block usage (CopyToRegs/CopyFromRegs) stuff?
All this complexity is already there for gc.result.
For gc.relocate I do exactly the same, only I have to use separate map, because ValueMap is occupied by gc.result
Some of the complexities was required to handle undef/duplicate gc.relocate cases. Maybe now, when we have undef handling upstream, I'll be able to simplify it a bit

  1. The handling around needing the first N GC values in a sorted order seems overly complicated. I was also unsure about the correctness of derived pointer handling when base != derived. What would you think of explicitly keeping track of which operands we've created that are tied def/use pairs? We could visit operands in the current order and then use the stack lowering when the size of the defs list is larger than our threshold.

I can track them all the way through lowering, yes (in FunctionLoweringInfo), but the problem is that I need to tie registers in InstrEmitter. Since STATEPOINT does not have meaningful MachineDescription, I need to get this information from somewhere else.
I cannot store that mapping in SDNode, and InstrEmitter does not have access to SelectionDAG or FunctionLoweringInfo, so I have to figure it out from the statepoint SDNode itself.
"First N" approach is the only one I can think of.
I can map gc values to registers in order until I encounter something 'non-register' and move that sorting stuff to say, CodeGenPrepare. Does it sounds better?
Could you clarify why you're worrying about base!=derived case? Base pointers are not interesting at all - they all can be assigned to registers (no need to tie them). It is base==derived case which is interesting/complicated. I've been worrying about LLVM's
ability to handle to identical uses only one of which is tied to def, but apparently it works.

  1. Testing wise, I think it make sense to start with a new test file, build out a set of tests which exercise this, and only update all of the tests once we have something which is expected to work end to end. We're not there yet.

Could you explain? This change is last one in series and completes the whole sequence. What's missing here?

Version without GC pointers sorting.

dantrushin updated this revision to Diff 275085.EditedJul 2 2020, 6:16 AM

Do not handle statepoints with cross-block relocates, spill their args as before.

Correctly handle vectors of pointers.

reames added a comment.Jul 7 2020, 2:14 PM

Initial comments, continue to look for other suggestions.

llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp
227

Rebase on commit b172cd781.

532

Please replace your modifications to this function with the following:

DenseSet<SDValue> LowerAsVReg;
if (UseRegistersForGCPointers && isa<CallInst>(SI.StatepointInstr)) {

for (unsigned i = 0; i < SI.Bases.size(); ++i) {
  if (willDirectlyLower(getValue(SI.Bases[i])) || <is vector type>
    continue;
  LowerAsVReg.insert(getValue(SI.Bases[i]));
  if (LowerAsVReg.size() == N) break;

}
}

auto requireSpillSlot = [&](const Value *V) {

if (isGCValue(V))
   return LowerAsVReg.contains(getValue(V)));
return !(LiveInDeopt || UseRegistersForDeoptValues);

};

reames added a comment.Jul 7 2020, 3:15 PM

Required Change - Please introduce a runtime flag which controls how many values are handled via vregs. Default this value to zero. This will remove all existing test diffs; if it doesn't you have a bug. Then introduce a new test file, optional copied from an existing one, called statepoint-gc-regs.ll which enumerates sufficient coverage for the new feature.

This is not a suggestion, it is a requirement. I specifically need to see a zero diff in the existing tests to have confidence in the changes.

reames requested changes to this revision.Jul 7 2020, 4:40 PM

I've spent several hours today wrapping my head around this patch. I think I've found a material simplification which should greatly simplify the code.

You track the def offset for each pointer of interest. You have plumbed through the merge value code with the purpose of being able to get to the SDNode corresponding to the actual statepoint. In the gc.relocate code, you combine these two pieces into enough information to extract the relevant definition from the statepoint. Under the assumption that we're only talking about relocates within a single basic block, this is correct.

My suggested simplification is the following. Rather than maintaining a map to offsets in the STATEPOINT node, simply maintain a map from pointer to SDValue representing the particular output of the node. Doing this means that you do not need any of the merge value code, all of the result propagation remains the same, and the gc.relocate changes are isolated. You would need to export these values in the cross block case, but this is fairly easily handled by copying each output to a vreg during lowering of the statepoint and having the vreg be the value tracked in the FuncInfo.

If my written description is not clear, or you disagree with my conclusions, let's find a time to talk offline.

In addition to this larger change, I have added a couple of smaller FIXMEs throughout the day. I've also been landing NFCs where doing so made sense in the process of understanding your changes. You should expect a non-trivial, but not particularly difficult either, rebase.

You should consider this comment a blocking comment. I do not intend to spend further time on this review until either a) the changes noted today have been made, or b) we've talked offline and we've agreed to an alternate approach.

llvm/include/llvm/CodeGen/FunctionLoweringInfo.h
106

Marker (see overall comment)

This revision now requires changes to proceed.Jul 7 2020, 4:40 PM

Changed DerivedPtrMap and moved it to StatepointLowering.h as it only
needed for local gc.relocate processing;
Rebased on tip, rewrote lowerStatepointMetaArgs as been told;
Deleted test mods, will add separate test later;

dantrushin marked 3 inline comments as done.Jul 10 2020, 8:00 AM
dantrushin edited the summary of this revision. (Show Details)Jul 10 2020, 9:12 AM

On first skim, looks much much better. I'm going to do a detailed pass through, but thank you for making the major design change requested.

Detailed comments on the new implementation. These are on the whole minor. Remember to add your new test file.

I need to look more closely at the code outside StatepointLowering, will do that in a separate comment shortly.

llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp
538

I don't think the isNonIntegralPointer check is needed here. Can you either remove or explain?

557

This map population isn't related to stack reservation. Please either separate it into it's own loop, or merge it with the population of the LowerAsVReg set above.

(If I understood this, it's simply the index into a flattened list of the values to be spilled in registers.)

558

I'd suggest simply not adding anything for the map for the spill case. Using contains checks are more idiomatic of the purpose.

600–601

This block of code is functionally broken when base != derived. You have only added the vreg information for the derived, and would need to spill the base so that the GC can find it. The fix is trivial, pass "true" for the base case when Base != Derived.

(Also, this is the probably the profitable lowering, so don't be tempted to add the base to a reg.)

644–647

See comment above about count checks. (Contains)

694

Isn't this simply Ptr2ResNo.size()? (after the change suggest above)

810

Again, contains check, not default value.

1089

Add a comment describing what this does.

reames requested changes to this revision.Jul 10 2020, 11:38 AM

(Comments on code outside StatepointLowering. Minor +1 question)

Overall, if you address the style comments from the this and the last comment, I think we're close to an LGTM.

llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp
225

This line looks unnecessary provided the defs list is marked variadic. If you want to leave it for the moment, that's fine, but I think we can clean this up in a follow up change.

(To be specific, OutOperandList in Target.td for STATEPOINT can be variable_ops, and this line disappears.)

llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
128

I don't understand the implication of this change, can you explain? On the surface, this looks likely okay, but not understanding the reason for the change is bothering me.

llvm/lib/CodeGen/TargetLoweringBase.cpp
1051

Please add a comment which explains you're relying on all the definitions coming before any FI and thus the indices not changing.

This revision now requires changes to proceed.Jul 10 2020, 11:38 AM
dantrushin marked 11 inline comments as done.Jul 10 2020, 12:43 PM
dantrushin added inline comments.
llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp
225

II here is MCInstrDesc.
Statepoint has 0 defs in it MCIntrDesc. (It is MachineInstr::getNumDefs which work correctly for variadic outs).
So I really need it here

llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
128

Again, MCInstr.getNumDefs() for statepoint always returns 0.
This code assumes that all 'dynamic' defs are implicit, so it accesses II.ImplicitDefs without any checks.
But statepoint has no implicit defs, so for it II.ImplicitDefs is NULL.
That functions does the the same thing, but with proper checks.

llvm/lib/CodeGen/SelectionDAG/StatepointLowering.cpp
600–601

vreg information is only needed for derived pointers to relocate them.
(base pointers are not relocated). So I could just assign all bases to VRegs (if at all possible).
I just did not want to have same value both in stack slot and vreg.

So IMHO the better approach is to check if base is found in LowerAsVReg (or that is what you mean?)

As for profitability, our goal is to shift as much spilling as possible to register allocator , doesn't it?

dantrushin marked 2 inline comments as done.

Addressed comments.

Now it appears that Ptr2ResNo and LowerAsVReg somewhat duplicate
each other. Perhaps I can rename Ptr2ResNo to LowerAsVReg and get
rid of old LowerAsVReg? Or it will hurt readability?

reames requested changes to this revision.Jul 11 2020, 3:26 PM

I took some time today to apply this patch with the attention of addressing some of the unaddressed comments and landing it to unblock progress here. However, I hit two problems.

First, a trivial test case crashes with this flag enabled.

define void @in_register(i8 addrspace(1)* %a) #1 gc "statepoint-example" {

%sp = call token (i64, i32, void ()*, i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()* undef, i32 0, i32 0, i32 0, i32 0) ["gc-live" (i8 addrspace(1)* %a)]
%a1 = call i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token %sp, i32 0, i32 0)
call void @use(i8 addrspace(1)* %a1)
ret void

}

Second, there appears to be a semantic problem around the handling of base vs derived slots unless we *always* spill the base. We can't tie both uses to a single def. This may warrant some offline discussion.

Denis, please address all of the previous applied comments, including writing targetting tests.

llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp
225

Please read the line above your change with the explicit handling for variadic defs.

(Not a blocking item. I'll fix after submit if needed)

This revision now requires changes to proceed.Jul 11 2020, 3:26 PM
dantrushin marked 3 inline comments as done.

Another attempt to submit update.

Additionally in this version:

  • Added tests (used multiple checks per function to better demonstrate how it works in practice - after ISel, after RA and resulting asm).
  • Changed boolean flag to counter to simplify possible landing ahead of D81646.
  • Use single map LowerAsVreg instead of pair of set LowerAsVReg and map Ptr2ResNo.

NOTES:

  • This can be landed only after D81645 (Part 1: Basic MI level changes).
  • If landed before D81646 (Part 2: Operand Folding), vreg limit cannot be raised too high or regalloc will run out of registers.
  • If landed after D81647 (Part 3; Spill GC Ptr regs), last run line in test must be updated to allow passing pointers in CSRs.
dantrushin marked an inline comment as done.Jul 14 2020, 5:35 AM

Second, there appears to be a semantic problem around the handling of base vs derived slots unless we *always* spill the base. We can't tie both uses to a single def. This may warrant some offline discussion.

Yes, we can't. But apparently, we do not need to.
Basically, that tied-ness information is only needed to glue two live ranges into one during SSA flattening. And it does not matter how many uses end live range at the same instruction.
Luckily, LLVM appears to handle it properly.

But if included tests (and checks) do not make you believe it works, we should throw all this stuff away and try some different approach.

llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp
225

First, please notice !MF->getTarget().usesPhysRegsForValues() there.

/// True if the target uses physical regs (as nearly all targets do). False
/// for stack machines such as WebAssembly and other virtual-register
/// machines. If true, all vregs must be allocated before PEI. If false, then
/// callee-save register spilling and scavenging are not needed or used. If
/// false, implicitly defined registers will still be assumed to be physical
/// registers, except that variadic defs will be allocated vregs.
virtual bool usesPhysRegsForValues() const { return true; }

This function returns true for all targets except WebAssembly, so HasVRegVariadicDefs is
always false on e.g., X86

Second, looks like variadicOpsAreDefs as it was implemented for ARM does not means exactly what I wanted.
Looking at MCInstrDesc::hasDefOfPhysReg I see that variadic defs are supposed to go after static uses.
Even though this would work for current definition of STATEPOINT - because it have no static uses
(and this is somewhat questionable from my POW) - I want STATEPOINT defs to come first, just like in all
ordinary instructions.
WebAssembly abused this attribute for its own purposes, but I did not want to.
In fact, for our purposes we can compute isVariadicDefs property same way as isVariadic is computed,
without any need for additional attributes. I'm not sure if isVariadicDefs would work for WebAssembly/ARM
so I decided not to dig into this until we have first variadic-defs instruction in tree.

Clear DerivedPtrMap at new statepoint.

reames accepted this revision.Jul 25 2020, 2:25 PM

LGTM

I'm approving this patch not because it's perfect, but because it doesn't break anything in the existing code and iterating here is no longer productive. Once this lands, we can continue the discussion about base/derived handling with specific concrete examples.

Since Denis is out on vacation, I plan to land this patch and then iterate on top with a couple of minor improvements.

This revision is now accepted and ready to land.Jul 25 2020, 2:25 PM
This revision was automatically updated to reflect the committed changes.

Second, there appears to be a semantic problem around the handling of base vs derived slots unless we *always* spill the base. We can't tie both uses to a single def. This may warrant some offline discussion.

Yes, we can't. But apparently, we do not need to.
Basically, that tied-ness information is only needed to glue two live ranges into one during SSA flattening. And it does not matter how many uses end live range at the same instruction.
Luckily, LLVM appears to handle it properly.

But if included tests (and checks) do not make you believe it works, we should throw all this stuff away and try some different approach.

I filed a bug with a detailed description of the problem and one suggestion as to how to approach. See https://bugs.llvm.org/show_bug.cgi?id=46917, let's take discussion there (or offline).