I need a way to be able to constrain a register def to use the same
input register throughout a loop. For a VGPR, we don't model the other
live values in other lanes. In cases where we are explicitly
scalarizing operations and writing different lanes sequentially, we
need to model this as an input tied operand to the def instruction
inside the loop. The verifier doesn't allow adding a tied implicit use
to an explicit def, so move this constraint into a copy we can use to
accumulate the real result. This should be folded out after allocation
like a normal copy when possible.
Details
Details
Diff Detail
Diff Detail
Event Timeline
Comment Actions
Do you have a use case for this, or can you explain a bit more what this solves, perhaps with an example?
Normal waterfall loops ought to be solved in the same way as any other loop with divergent exit: a value written inside the loop is observed from outside the loop, so there should be a COPY to a vgpr / vector register bank if necessary, but then the regular register allocation etc. does the rest. We don't need a TIED_COPY there today, or rather, I don't think we should need one.