This is an archive of the discontinued LLVM Phabricator instance.

CodeGen: Add DetectDeadLanes pass.
ClosedPublic

Authored by MatzeB on Mar 23 2016, 8:50 PM.

Details

Summary

The DetectDeadLanes pass performs a dataflow analysis of used/defined
subregister lanes across COPY instructions and instructions that will
get lowered to copies. It detects dead definitions and uses reading
undefined values which are obscured by COPY and subregister usage.

These dead definitions cause trouble in the register coalescer which
cannot deal with definitions suddenly becoming dead after coalescing
COPY instructions.

For now the pass only adds dead and undef flags to machine operands. It
should be easy to extend it in the future to remove the dead
instructions and redo the analysis for the affected virtual
registers.

Diff Detail

Repository
rL LLVM

Event Timeline

MatzeB updated this revision to Diff 51508.Mar 23 2016, 8:50 PM
MatzeB retitled this revision from to CodeGen: Add DetectDeadLanes pass..
MatzeB updated this object.
MatzeB added a reviewer: qcolombet.
MatzeB set the repository for this revision to rL LLVM.

Note that this patch depends on two small TableGen changes (adding of reverseComposeSubRegIndexLaneMask()) which are simple enough for post-commit review but which I have not pushed yet.

qcolombet edited edge metadata.Mar 29 2016, 11:47 AM

Hi Matthias,

This mostly looks good to me with a couple of nitpicks, remarks, and questions.

Cheers,
-Quentin

lib/CodeGen/DetectDeadLanes.cpp
25 ↗(On Diff #51508)

How do we end up generating this kind of code?

Although I agree we need to support it, we should try not to generate that.

54 ↗(On Diff #51508)

Do not indent namespace:
http://llvm.org/docs/CodingStandards.html#namespace-indentation

I am surprise clang-format do not do the right thing.

57 ↗(On Diff #51508)

Shouldn't this be public?

69 ↗(On Diff #51508)

Add doxygen comments.

74 ↗(On Diff #51508)

Detail what does it mean to transfer the used lanes (BTW, you should use \p for UsedLanes).

90 ↗(On Diff #51508)

containing

92 ↗(On Diff #51508)

Should we merge those two with like a SmallVectorSet?
Indeed, if I am readying the code correctly, both need to be in sync.

97 ↗(On Diff #51508)

Override the getPassName method as well to ease debugging.

108 ↗(On Diff #51508)

A remark: since the goal right now is just to fix up the representation for the register coalescer, there is no value in supporting those.
However, if we go toward teaching this pass dead code elimination, then yes, it would make sense to do that.

135 ↗(On Diff #51508)

I would have expected the check with InWorkList to be hidden into addToWorklist.

141 ↗(On Diff #51508)

Use a switch on the Opcode.

148 ↗(On Diff #51508)

What is the benefit of going backward?
I found it confusing to have a "for(;;)" for this traversal.

I would have expected something like:
for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2)

174 ↗(On Diff #51508)

Add a message in the assert. Like unexpected copy-like instruction.

188 ↗(On Diff #51508)

Add a comment why we limit ourselves to instruction with only one def.
For future reference, we need to know if this is a limitation or just that it does not make sense to support more than one definition.

188 ↗(On Diff #51508)

What happens with inlineasm?
I.e., we do not get the actual number of definitions via the static description.
Explain why it is okay.

211 ↗(On Diff #51508)

Ditto: Check hidden inside addToWorklist.

219 ↗(On Diff #51508)

switch for on the opcode

235 ↗(On Diff #51508)

Message in assert.

237 ↗(On Diff #51508)

The formatting looks weird to me.
I.e., I would have expected clang-format to put the assignment operator on the previous line.

252 ↗(On Diff #51508)

The motivating example in the comment at the beginning of the file does not need sub register liveness to produce dead definitions after coalescing, does it?

I.e., is this true we do not need that pass when enableSubRegLiveness is false?

261 ↗(On Diff #51508)

For the arrays, wouldn't using plain arrays be more effective?
In particular, we do not iterate on those and having easily access to there length is useless.

305 ↗(On Diff #51508)

What does this else block covers?
A register is either virtual or physical, right?

What am I missing?

346 ↗(On Diff #51508)

Message in assert.

357 ↗(On Diff #51508)

No need for the curly braces.

MatzeB updated this revision to Diff 51990.Mar 29 2016, 3:15 PM
MatzeB edited edge metadata.
MatzeB marked 23 inline comments as done.

Thanks for the review. Addressed the comments.

lib/CodeGen/DetectDeadLanes.cpp
25 ↗(On Diff #51508)

The example here is artificially designed to require a minimum amount of instructions to demonstrate the issue.

It's a good question however where the real issues come from, I'll take a look around.

108 ↗(On Diff #51508)

There is still some value as it improves the precision of detecting unused/dead lanes. However from what I can see the ARM target is the only one implementing these instructions and the pass doesn't even run for this target, so I leave the support for this as a future enhancement.

135 ↗(On Diff #51508)

Micro optimization because in the initial pass below I can insert without even checking if the value is already in the list. Anyway with the switch to SetVector this point is moot.

148 ↗(On Diff #51508)

If the subregister in the REG_SEQUENCE overlap each other (I couldn't find any evidence that this is not allowed), then later operands may override parts of the earlier operands. That's why this loop moves backwards and reduces the UsedLanes bitmask with each operand.

188 ↗(On Diff #51508)

At this point we only care about lowersToCopy() instructions. They are the only ones that need subsequent transfer/updating all other operands have been handled in the first pass already.
I added a comment explaining this as the check for the copy-likeness comes later in the form of the !DefinedByCopy[X].

252 ↗(On Diff #51508)

This pass does not use any sub-register liveness. My idea here was that it is probably not worth spending the compiletime on this pass for a target that does not even care to enable subregister liveness (though admittedly the pass is quite fast and more powerful than DeadMachineInstrElim because it finds the dataflow fixpoint). You think it is a good idea to enable for all targets, I measured it usually taking around 0.5% compiletime.

305 ↗(On Diff #51508)

The operand can also be 0 (aka. "NoRegister"). Added a comment.

qcolombet added inline comments.Mar 29 2016, 3:52 PM
lib/CodeGen/DetectDeadLanes.cpp
149 ↗(On Diff #51990)

I wouldn’t have expected overlaps, but you’re right this is not forbidden.
That being said, this means that we make an assumption on how the register sequence will be broken down and scheduled.

I would rather treat each operand independently and ignore the overlapping but print something in the debug output if the case happens.

253 ↗(On Diff #51990)

I know it does not use any sub-register liveness.

What I am saying is without that pass, the coalescer might expose dead defs it was not expecting and cause later crashes.
I.e., if I understood correctly, this pass should always run to avoid potential crashes, right?

306 ↗(On Diff #51990)

We usually put the “noreg” case earlier.
In that case, I would have expected this check to be right after getReg().

That being said, I am fine with the current form, it is just unusual :).

MatzeB updated this revision to Diff 52006.Mar 29 2016, 4:39 PM
MatzeB marked an inline comment as done.

Change REG_SEQUENCE transfer function to make no assumptions about the order in which subregisters are processed, this is more conservative in the case that the subreg indexes overlap and allows a nicer forward loop.

lib/CodeGen/DetectDeadLanes.cpp
253 ↗(On Diff #51990)

Without subregister liveness the coalescer is not able to recognize these dead definitions, so from a correctness perspective we only need the pass for subregister liveness enabled targets.

qcolombet accepted this revision.Mar 29 2016, 4:45 PM
qcolombet edited edge metadata.

LGTM.

lib/CodeGen/DetectDeadLanes.cpp
254 ↗(On Diff #52006)

Great, that is the information I was looking for :).

Add a comment in runOnMachineFunction when we skip the pass to know why it is not run.

This revision is now accepted and ready to land.Mar 29 2016, 4:45 PM

As the SetVector/BitSet reviews seem to be stalled I will commit a version which directly uses a BitVector+std::deque now.
More testing also revealed that I have to check for cross-class copies which render lanemask information invalid.

This revision was automatically updated to reflect the committed changes.