This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Analysis/
-
Analysis/
-
InstructionSimplify.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
lcssa-phi.mir

Differential D63489

[InstSimplify] LCSSA PHIs should not be simplified away
AbandonedPublic

Authored by alex-t on Jun 18 2019, 6:11 AM.

Download Raw Diff

Details

Reviewers

rampitec
nhaehnle
hakzsam
lebedev.ri
spatel

Summary

We rely on LCSSA in case when the uniform value defined inside the loop with divergent exit is used outside the loop.
In some cases EarlyCSE eliminates LCSSA PHIs because of the algorithm used in SimplifyPHINode function.
It always assumes more then one incoming values and mistakenly removes LCSSA PHIs.
This causes generation of incorrect code in AMDGPU backend.

Diff Detail

Event Timeline

alex-t created this revision.Jun 18 2019, 6:11 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 18 2019, 6:11 AM

Herald added subscribers: hiraditya, tpr. · View Herald Transcript

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

(Also missing tests)

MIR test added

Herald added a subscriber: jvesely. · View Herald TranscriptJun 18 2019, 7:41 AM

In D63489#1548288, @lebedev.ri wrote:

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

Manually written IR w/o lcssa phis will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

In D63489#1548451, @alex-t wrote:

In D63489#1548288, @lebedev.ri wrote:

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)
How does this affect other targets (backends)? Does this need some TLI hook?

In D63489#1548470, @lebedev.ri wrote:

In D63489#1548451, @alex-t wrote:

In D63489#1548288, @lebedev.ri wrote:

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)

How does this affect other targets (backends)? Does this need some TLI hook?

The other condition should probably be TTI.hasBranchDivergence().

In D63489#1548720, @rampitec wrote:

In D63489#1548470, @lebedev.ri wrote:

In D63489#1548451, @alex-t wrote:

In D63489#1548288, @lebedev.ri wrote:

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)

How does this affect other targets (backends)? Does this need some TLI hook?

The other condition should probably be TTI.hasBranchDivergence().

Sure! Good point. Thanks.
Although adding target specific information to such a general part requires a lot of changes to pass TTI from the Pass level to all the calls.

In D63489#1548470, @lebedev.ri wrote:

In D63489#1548451, @alex-t wrote:

In D63489#1548288, @lebedev.ri wrote:

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)

How does this affect other targets (backends)? Does this need some TLI hook?

In fact I still insist that this is a bug in -instsimplify pass.
The algorithm is written in such a way that it always expect more then one input in PHI node.
That means that the person who's written it had no intention to really remove the LCSSA PHIs but was just unaware of their existence!
It is not correct to remove LCSSA in similar manner as PHI nodes with equal inputs are removed.
So, any other backend should not suffer from this change. It is not about the convenience for AMDGPU backend but about correctness.
Could you provide an example where LCSSA PHIs are added and then intentionally removed?

In D63489#1548837, @alex-t wrote:

In D63489#1548470, @lebedev.ri wrote:

In D63489#1548451, @alex-t wrote:

In D63489#1548288, @lebedev.ri wrote:

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)

How does this affect other targets (backends)? Does this need some TLI hook?

In fact I still insist that this is a bug in -instsimplify pass.
The algorithm is written in such a way that it always expect more then one input in PHI node.
That means that the person who's written it had no intention to really remove the LCSSA PHIs but was just unaware of their existence!
It is not correct to remove LCSSA in similar manner as PHI nodes with equal inputs are removed.
So, any other backend should not suffer from this change. It is not about the convenience for AMDGPU backend but about correctness.
Could you provide an example where LCSSA PHIs are added and then intentionally removed?

Can you please specify why this transform is invalid from LLVM IR point of view? https://godbolt.org/z/D8gKNc
In endloop BB, which has a single predecessor BB - loop, %counter.lcssa value can only be %counter value.

In D63489#1548844, @lebedev.ri wrote:

In D63489#1548837, @alex-t wrote:

In D63489#1548470, @lebedev.ri wrote:

In D63489#1548451, @alex-t wrote:

In D63489#1548288, @lebedev.ri wrote:

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)

How does this affect other targets (backends)? Does this need some TLI hook?

In fact I still insist that this is a bug in -instsimplify pass.
The algorithm is written in such a way that it always expect more then one input in PHI node.
That means that the person who's written it had no intention to really remove the LCSSA PHIs but was just unaware of their existence!
It is not correct to remove LCSSA in similar manner as PHI nodes with equal inputs are removed.
So, any other backend should not suffer from this change. It is not about the convenience for AMDGPU backend but about correctness.
Could you provide an example where LCSSA PHIs are added and then intentionally removed?

Can you please specify why this transform is invalid from LLVM IR point of view? https://godbolt.org/z/D8gKNc
In endloop BB, which has a single predecessor BB - loop, %counter.lcssa value can only be %counter value.

Nothing is invalid from the IR point of view. This is all a kludge to get divergence information into SelectionDAG. There needs to be an IR instruction at the use point for the DAG to query the divergence analysis

In D63489#1548844, @lebedev.ri wrote:

In D63489#1548837, @alex-t wrote:

In D63489#1548470, @lebedev.ri wrote:

In D63489#1548451, @alex-t wrote:

In D63489#1548288, @lebedev.ri wrote:

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)

How does this affect other targets (backends)? Does this need some TLI hook?

In fact I still insist that this is a bug in -instsimplify pass.
The algorithm is written in such a way that it always expect more then one input in PHI node.
That means that the person who's written it had no intention to really remove the LCSSA PHIs but was just unaware of their existence!
It is not correct to remove LCSSA in similar manner as PHI nodes with equal inputs are removed.
So, any other backend should not suffer from this change. It is not about the convenience for AMDGPU backend but about correctness.
Could you provide an example where LCSSA PHIs are added and then intentionally removed?

Can you please specify why this transform is invalid from LLVM IR point of view? https://godbolt.org/z/D8gKNc
In endloop BB, which has a single predecessor BB - loop, %counter.lcssa value can only be %counter value.

Okay. You are right for this concrete example. I also agree that LCSSA need to be simplified in trivial cases.
The problem is that this algorithm eliminates them in all cases. It looks like it was designed to detect PHIs with equal inputs and eliminates LCSSA unintentionally.
Anyway, I've got what is your objection about.

In D63489#1548851, @arsenm wrote:

In D63489#1548844, @lebedev.ri wrote:

In D63489#1548837, @alex-t wrote:

In D63489#1548470, @lebedev.ri wrote:

In D63489#1548451, @alex-t wrote:

In D63489#1548288, @lebedev.ri wrote:

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)

How does this affect other targets (backends)? Does this need some TLI hook?

In fact I still insist that this is a bug in -instsimplify pass.
The algorithm is written in such a way that it always expect more then one input in PHI node.
That means that the person who's written it had no intention to really remove the LCSSA PHIs but was just unaware of their existence!
It is not correct to remove LCSSA in similar manner as PHI nodes with equal inputs are removed.
So, any other backend should not suffer from this change. It is not about the convenience for AMDGPU backend but about correctness.
Could you provide an example where LCSSA PHIs are added and then intentionally removed?

Can you please specify why this transform is invalid from LLVM IR point of view? https://godbolt.org/z/D8gKNc
In endloop BB, which has a single predecessor BB - loop, %counter.lcssa value can only be %counter value.

Nothing is invalid from the IR point of view. This is all a kludge to get divergence information into SelectionDAG. There needs to be an IR instruction at the use point for the DAG to query the divergence analysis

That means I have to change all the function declarations on the way to be able to pass TTI to check if it is the "divergent" target (

In D63489#1548938, @alex-t wrote:

In D63489#1548851, @arsenm wrote:

In D63489#1548844, @lebedev.ri wrote:

In D63489#1548837, @alex-t wrote:

In D63489#1548470, @lebedev.ri wrote:

In D63489#1548451, @alex-t wrote:

In D63489#1548288, @lebedev.ri wrote:

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)

How does this affect other targets (backends)? Does this need some TLI hook?

In fact I still insist that this is a bug in -instsimplify pass.
The algorithm is written in such a way that it always expect more then one input in PHI node.
That means that the person who's written it had no intention to really remove the LCSSA PHIs but was just unaware of their existence!
It is not correct to remove LCSSA in similar manner as PHI nodes with equal inputs are removed.
So, any other backend should not suffer from this change. It is not about the convenience for AMDGPU backend but about correctness.
Could you provide an example where LCSSA PHIs are added and then intentionally removed?

Can you please specify why this transform is invalid from LLVM IR point of view? https://godbolt.org/z/D8gKNc
In endloop BB, which has a single predecessor BB - loop, %counter.lcssa value can only be %counter value.

Nothing is invalid from the IR point of view. This is all a kludge to get divergence information into SelectionDAG. There needs to be an IR instruction at the use point for the DAG to query the divergence analysis

That means I have to change all the function declarations on the way to be able to pass TTI to check if it is the "divergent" target (

I don't think InstSimplify should depend on target information

In D63489#1548945, @arsenm wrote:

In D63489#1548938, @alex-t wrote:

In D63489#1548851, @arsenm wrote:

In D63489#1548844, @lebedev.ri wrote:

In D63489#1548837, @alex-t wrote:

In D63489#1548470, @lebedev.ri wrote:

In D63489#1548451, @alex-t wrote:

In D63489#1548288, @lebedev.ri wrote:

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)

How does this affect other targets (backends)? Does this need some TLI hook?

In fact I still insist that this is a bug in -instsimplify pass.
The algorithm is written in such a way that it always expect more then one input in PHI node.
That means that the person who's written it had no intention to really remove the LCSSA PHIs but was just unaware of their existence!
It is not correct to remove LCSSA in similar manner as PHI nodes with equal inputs are removed.
So, any other backend should not suffer from this change. It is not about the convenience for AMDGPU backend but about correctness.
Could you provide an example where LCSSA PHIs are added and then intentionally removed?

Can you please specify why this transform is invalid from LLVM IR point of view? https://godbolt.org/z/D8gKNc
In endloop BB, which has a single predecessor BB - loop, %counter.lcssa value can only be %counter value.

Nothing is invalid from the IR point of view. This is all a kludge to get divergence information into SelectionDAG. There needs to be an IR instruction at the use point for the DAG to query the divergence analysis

That means I have to change all the function declarations on the way to be able to pass TTI to check if it is the "divergent" target (

I don't think InstSimplify should depend on target information

Can't you just schedule an LCSSAPass somewhere late in pipeline, before SelectionDAG and after early-cse?

In D63489#1548945, @arsenm wrote:

In D63489#1548938, @alex-t wrote:

In D63489#1548851, @arsenm wrote:

In D63489#1548844, @lebedev.ri wrote:

In D63489#1548837, @alex-t wrote:

In D63489#1548470, @lebedev.ri wrote:

In D63489#1548451, @alex-t wrote:

In D63489#1548288, @lebedev.ri wrote:

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)

How does this affect other targets (backends)? Does this need some TLI hook?

In fact I still insist that this is a bug in -instsimplify pass.
The algorithm is written in such a way that it always expect more then one input in PHI node.
That means that the person who's written it had no intention to really remove the LCSSA PHIs but was just unaware of their existence!
It is not correct to remove LCSSA in similar manner as PHI nodes with equal inputs are removed.
So, any other backend should not suffer from this change. It is not about the convenience for AMDGPU backend but about correctness.
Could you provide an example where LCSSA PHIs are added and then intentionally removed?

Can you please specify why this transform is invalid from LLVM IR point of view? https://godbolt.org/z/D8gKNc
In endloop BB, which has a single predecessor BB - loop, %counter.lcssa value can only be %counter value.

Nothing is invalid from the IR point of view. This is all a kludge to get divergence information into SelectionDAG. There needs to be an IR instruction at the use point for the DAG to query the divergence analysis

That means I have to change all the function declarations on the way to be able to pass TTI to check if it is the "divergent" target (

I don't think InstSimplify should depend on target information

That's exactly why I don't like this idea as well.

In D63489#1548954, @lebedev.ri wrote:

In D63489#1548945, @arsenm wrote:

In D63489#1548938, @alex-t wrote:

In D63489#1548851, @arsenm wrote:

In D63489#1548844, @lebedev.ri wrote:

In D63489#1548837, @alex-t wrote:

In D63489#1548470, @lebedev.ri wrote:

In D63489#1548451, @alex-t wrote:

In D63489#1548288, @lebedev.ri wrote:

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)

How does this affect other targets (backends)? Does this need some TLI hook?

In fact I still insist that this is a bug in -instsimplify pass.
The algorithm is written in such a way that it always expect more then one input in PHI node.
That means that the person who's written it had no intention to really remove the LCSSA PHIs but was just unaware of their existence!
It is not correct to remove LCSSA in similar manner as PHI nodes with equal inputs are removed.
So, any other backend should not suffer from this change. It is not about the convenience for AMDGPU backend but about correctness.
Could you provide an example where LCSSA PHIs are added and then intentionally removed?

Can you please specify why this transform is invalid from LLVM IR point of view? https://godbolt.org/z/D8gKNc
In endloop BB, which has a single predecessor BB - loop, %counter.lcssa value can only be %counter value.

Nothing is invalid from the IR point of view. This is all a kludge to get divergence information into SelectionDAG. There needs to be an IR instruction at the use point for the DAG to query the divergence analysis

That means I have to change all the function declarations on the way to be able to pass TTI to check if it is the "divergent" target (

I don't think InstSimplify should depend on target information

Can't you just schedule an LCSSAPass somewhere late in pipeline, before SelectionDAG and after early-cse?

We tried but got a bunch of troubles: https://reviews.llvm.org/D60834

lebedev.ri added reviewers: lebedev.ri, spatel.Jun 19 2019, 8:52 AM

Since this problem is being encountered, i'm guessing that even without this workaround,
the same problem can still be encountered if the original IR is not in LCSSA form to begin with,
i.e. i think the D60834 sounds like the properer fix (ignoring the roadblocks on the way there.)

In D63489#1550529, @lebedev.ri wrote:

Since this problem is being encountered, i'm guessing that even without this workaround,
the same problem can still be encountered if the original IR is not in LCSSA form to begin with,
i.e. i think the D60834 sounds like the properer fix (ignoring the roadblocks on the way there.)

Agree - InstSimplify is a very early (but frequently used) target-independent (no TTI/TLI) canonicalization pass/analysis. I don't see why it should be crippled as proposed in this patch.
The problem is that target-independent IR canonicalization passes like EarlyCSE are running late in the IR codegen pipeline?

spatel mentioned this in rL363842: [InstSimplify] add a phi test with 1 incoming value; NFC.Jun 19 2019, 10:20 AM

Also, this code change causes lots of existing middle-end test failures, so it doesn't appear to be some harmless/degenerate pattern:

Failing Tests (43):

LLVM :: Transforms/CodeGenPrepare/X86/computedgoto.ll
LLVM :: Transforms/Coroutines/ArgAddr.ll
LLVM :: Transforms/CorrelatedValuePropagation/select.ll
LLVM :: Transforms/GlobalDCE/complex-constantexpr.ll
LLVM :: Transforms/IndVarSimplify/rewrite-loop-exit-value.ll
LLVM :: Transforms/Inline/AArch64/phi.ll
LLVM :: Transforms/Inline/bfi-update.ll
LLVM :: Transforms/Inline/inline-fast-math-flags.ll
LLVM :: Transforms/Inline/inline_constprop.ll
LLVM :: Transforms/InstCombine/gep-combine-loop-invariant.ll
LLVM :: Transforms/InstCombine/gepphigep.ll
LLVM :: Transforms/InstMerge/st_sink_bugfix_22613.ll
LLVM :: Transforms/InstSimplify/phi.ll
LLVM :: Transforms/LICM/2003-05-02-LoadHoist.ll
LLVM :: Transforms/LoopInstSimplify/basic.ll
LLVM :: Transforms/LoopSimplify/ashr-crash.ll
LLVM :: Transforms/LoopSimplify/merge-exits.ll
LLVM :: Transforms/LoopUnroll/runtime-loop-multiple-exits.ll
LLVM :: Transforms/LoopUnroll/runtime-loop4.ll
LLVM :: Transforms/LoopUnroll/runtime-multiexit-heuristic.ll
LLVM :: Transforms/LoopUnroll/runtime-unroll-remainder.ll
LLVM :: Transforms/LoopUnroll/unroll-cleanup.ll
LLVM :: Transforms/LoopUnswitch/2007-07-12-ExitDomInfo.ll
LLVM :: Transforms/LoopUnswitch/unswitch-equality-undef.ll
LLVM :: Transforms/LoopVectorize/AMDGPU/packed-math.ll
LLVM :: Transforms/LoopVectorize/X86/already-vectorized.ll
LLVM :: Transforms/LoopVectorize/X86/float-induction-x86.ll
LLVM :: Transforms/LoopVectorize/X86/reg-usage.ll
LLVM :: Transforms/LoopVectorize/X86/small-size.ll
LLVM :: Transforms/LoopVectorize/X86/unroll-pm.ll
LLVM :: Transforms/LoopVectorize/first-order-recurrence.ll
LLVM :: Transforms/LoopVectorize/gcc-examples.ll
LLVM :: Transforms/LoopVectorize/if-conversion.ll
LLVM :: Transforms/LoopVectorize/opt.ll
LLVM :: Transforms/LoopVectorize/read-only.ll
LLVM :: Transforms/LoopVectorize/reduction.ll
LLVM :: Transforms/LoopVectorize/value-ptr-bug.ll
LLVM :: Transforms/LoopVectorize/vectorize-once.ll
LLVM :: Transforms/PhaseOrdering/reassociate-after-unroll.ll
LLVM :: Transforms/PhaseOrdering/simplifycfg-options.ll
LLVM :: Transforms/SimpleLoopUnswitch/2007-07-12-ExitDomInfo.ll
LLVM :: Transforms/SimpleLoopUnswitch/update-scev.ll
LLVM :: Transforms/ThinLTOBitcodeWriter/no-type-md.ll

spatel mentioned this in rG3e03bf692134: [InstSimplify] add a phi test with 1 incoming value; NFC.Jun 19 2019, 10:21 AM

alex-t abandoned this revision.Jun 20 2019, 6:35 AM

Revision Contents

Path

Size

llvm/

lib/

Analysis/

InstructionSimplify.cpp

5 lines

test/

CodeGen/

AMDGPU/

lcssa-phi.mir

31 lines

Diff 205345

llvm/lib/Analysis/InstructionSimplify.cpp

	Show First 20 Lines • Show All 4,089 Lines • ▼ Show 20 Lines

	Value llvm::SimplifyExtractElementInst(Value Vec, Value *Idx,			Value llvm::SimplifyExtractElementInst(Value Vec, Value *Idx,
	const SimplifyQuery &Q) {			const SimplifyQuery &Q) {
	return ::SimplifyExtractElementInst(Vec, Idx, Q, RecursionLimit);			return ::SimplifyExtractElementInst(Vec, Idx, Q, RecursionLimit);
	}			}

	/// See if we can fold the given phi. If not, returns null.			/// See if we can fold the given phi. If not, returns null.
	static Value SimplifyPHINode(PHINode PN, const SimplifyQuery &Q) {			static Value SimplifyPHINode(PHINode PN, const SimplifyQuery &Q) {
				// LCSSA PHIs only have one input value
				// return nullptr here to avoid simplifying them away
				// since they are necessary to hold out of the loop dependecies.
				if (PN->getNumIncomingValues() < 2)
				return nullptr;
	// If all of the PHI's incoming values are the same then replace the PHI node			// If all of the PHI's incoming values are the same then replace the PHI node
	// with the common value.			// with the common value.
	Value *CommonValue = nullptr;			Value *CommonValue = nullptr;
	bool HasUndefInput = false;			bool HasUndefInput = false;
	for (Value *Incoming : PN->incoming_values()) {			for (Value *Incoming : PN->incoming_values()) {
	// If the incoming value is the phi node itself, it can safely be skipped.			// If the incoming value is the phi node itself, it can safely be skipped.
	if (Incoming == PN) continue;			if (Incoming == PN) continue;
	if (isa<UndefValue>(Incoming)) {			if (isa<UndefValue>(Incoming)) {
	▲ Show 20 Lines • Show All 1,250 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/lcssa-phi.mir

This file was added.

				# RUN: llc -march=amdgcn -run-pass early-cse -o - %s \| FileCheck %s
				# Check that LCSSA PHI was not eliminated
				# CHECK: %counter.lcssa = phi i32 [ %counter, %loop ]

				--- \|
				define amdgpu_kernel void @lcssa-phi(i32 addrspace(1)* %out, i32 addrspace(1)* %in) {
				entry:
				%tid = call i32 @llvm.amdgcn.workitem.id.x()
				br label %loop

				loop: ; preds = %flow, %entry
				%counter = phi i32 [ 0, %entry ], [ %inc, %flow ]
				%cond = icmp ugt i32 %counter, %tid
				br i1 %cond, label %endloop, label %flow

				flow: ; preds = %loop
				%inc = add i32 %counter, 1
				br label %loop

				endloop: ; preds = %loop
				%counter.lcssa = phi i32 [ %counter, %loop ]
				store i32 %counter.lcssa, i32 addrspace(1)* %out
				ret void
				}

				; Function Attrs: nounwind readnone speculatable
				declare i32 @llvm.amdgcn.workitem.id.x() #0

				attributes #0 = { nounwind readnone speculatable }

				...