This is an archive of the discontinued LLVM Phabricator instance.

CodeGen: Add DetectDeadLanes pass.
ClosedPublic

Authored by MatzeB on Mar 23 2016, 8:50 PM.

Details

Summary

The DetectDeadLanes pass performs a dataflow analysis of used/defined
subregister lanes across COPY instructions and instructions that will
get lowered to copies. It detects dead definitions and uses reading
undefined values which are obscured by COPY and subregister usage.

These dead definitions cause trouble in the register coalescer which
cannot deal with definitions suddenly becoming dead after coalescing
COPY instructions.

For now the pass only adds dead and undef flags to machine operands. It
should be easy to extend it in the future to remove the dead
instructions and redo the analysis for the affected virtual
registers.

Diff Detail

Repository
rL LLVM

Event Timeline

MatzeB updated this revision to Diff 51508.Mar 23 2016, 8:50 PM
MatzeB retitled this revision from to CodeGen: Add DetectDeadLanes pass..
MatzeB updated this object.
MatzeB added a reviewer: qcolombet.
MatzeB set the repository for this revision to rL LLVM.

Note that this patch depends on two small TableGen changes (adding of reverseComposeSubRegIndexLaneMask()) which are simple enough for post-commit review but which I have not pushed yet.

qcolombet edited edge metadata.Mar 29 2016, 11:47 AM

Hi Matthias,

This mostly looks good to me with a couple of nitpicks, remarks, and questions.

Cheers,
-Quentin

lib/CodeGen/DetectDeadLanes.cpp
25

How do we end up generating this kind of code?

Although I agree we need to support it, we should try not to generate that.

54

Do not indent namespace:
http://llvm.org/docs/CodingStandards.html#namespace-indentation

I am surprise clang-format do not do the right thing.

57

Shouldn't this be public?

69

Add doxygen comments.

74

Detail what does it mean to transfer the used lanes (BTW, you should use \p for UsedLanes).

90

containing

92

Should we merge those two with like a SmallVectorSet?
Indeed, if I am readying the code correctly, both need to be in sync.

97

Override the getPassName method as well to ease debugging.

108

A remark: since the goal right now is just to fix up the representation for the register coalescer, there is no value in supporting those.
However, if we go toward teaching this pass dead code elimination, then yes, it would make sense to do that.

135

I would have expected the check with InWorkList to be hidden into addToWorklist.

141

Use a switch on the Opcode.

148

What is the benefit of going backward?
I found it confusing to have a "for(;;)" for this traversal.

I would have expected something like:
for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2)

174

Add a message in the assert. Like unexpected copy-like instruction.

188

Add a comment why we limit ourselves to instruction with only one def.
For future reference, we need to know if this is a limitation or just that it does not make sense to support more than one definition.

188

What happens with inlineasm?
I.e., we do not get the actual number of definitions via the static description.
Explain why it is okay.

211

Ditto: Check hidden inside addToWorklist.

219

switch for on the opcode

235

Message in assert.

237

The formatting looks weird to me.
I.e., I would have expected clang-format to put the assignment operator on the previous line.

252

The motivating example in the comment at the beginning of the file does not need sub register liveness to produce dead definitions after coalescing, does it?

I.e., is this true we do not need that pass when enableSubRegLiveness is false?

261

For the arrays, wouldn't using plain arrays be more effective?
In particular, we do not iterate on those and having easily access to there length is useless.

305

What does this else block covers?
A register is either virtual or physical, right?

What am I missing?

346

Message in assert.

357

No need for the curly braces.

MatzeB updated this revision to Diff 51990.Mar 29 2016, 3:15 PM
MatzeB edited edge metadata.
MatzeB marked 23 inline comments as done.

Thanks for the review. Addressed the comments.

lib/CodeGen/DetectDeadLanes.cpp
25

The example here is artificially designed to require a minimum amount of instructions to demonstrate the issue.

It's a good question however where the real issues come from, I'll take a look around.

108

There is still some value as it improves the precision of detecting unused/dead lanes. However from what I can see the ARM target is the only one implementing these instructions and the pass doesn't even run for this target, so I leave the support for this as a future enhancement.

135

Micro optimization because in the initial pass below I can insert without even checking if the value is already in the list. Anyway with the switch to SetVector this point is moot.

148

If the subregister in the REG_SEQUENCE overlap each other (I couldn't find any evidence that this is not allowed), then later operands may override parts of the earlier operands. That's why this loop moves backwards and reduces the UsedLanes bitmask with each operand.

188

At this point we only care about lowersToCopy() instructions. They are the only ones that need subsequent transfer/updating all other operands have been handled in the first pass already.
I added a comment explaining this as the check for the copy-likeness comes later in the form of the !DefinedByCopy[X].

252

This pass does not use any sub-register liveness. My idea here was that it is probably not worth spending the compiletime on this pass for a target that does not even care to enable subregister liveness (though admittedly the pass is quite fast and more powerful than DeadMachineInstrElim because it finds the dataflow fixpoint). You think it is a good idea to enable for all targets, I measured it usually taking around 0.5% compiletime.

305

The operand can also be 0 (aka. "NoRegister"). Added a comment.

qcolombet added inline comments.Mar 29 2016, 3:52 PM
lib/CodeGen/DetectDeadLanes.cpp
150

I wouldn’t have expected overlaps, but you’re right this is not forbidden.
That being said, this means that we make an assumption on how the register sequence will be broken down and scheduled.

I would rather treat each operand independently and ignore the overlapping but print something in the debug output if the case happens.

254

I know it does not use any sub-register liveness.

What I am saying is without that pass, the coalescer might expose dead defs it was not expecting and cause later crashes.
I.e., if I understood correctly, this pass should always run to avoid potential crashes, right?

307

We usually put the “noreg” case earlier.
In that case, I would have expected this check to be right after getReg().

That being said, I am fine with the current form, it is just unusual :).

MatzeB updated this revision to Diff 52006.Mar 29 2016, 4:39 PM
MatzeB marked an inline comment as done.

Change REG_SEQUENCE transfer function to make no assumptions about the order in which subregisters are processed, this is more conservative in the case that the subreg indexes overlap and allows a nicer forward loop.

lib/CodeGen/DetectDeadLanes.cpp
254

Without subregister liveness the coalescer is not able to recognize these dead definitions, so from a correctness perspective we only need the pass for subregister liveness enabled targets.

qcolombet accepted this revision.Mar 29 2016, 4:45 PM
qcolombet edited edge metadata.

LGTM.

lib/CodeGen/DetectDeadLanes.cpp
255

Great, that is the information I was looking for :).

Add a comment in runOnMachineFunction when we skip the pass to know why it is not run.

This revision is now accepted and ready to land.Mar 29 2016, 4:45 PM

As the SetVector/BitSet reviews seem to be stalled I will commit a version which directly uses a BitVector+std::deque now.
More testing also revealed that I have to check for cross-class copies which render lanemask information invalid.

This revision was automatically updated to reflect the committed changes.