This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Transforms/InstCombine/
-
lib/
-
Transforms/
-
InstCombine/
2/4
InstCombineSelect.cpp

Differential D93065

[InstCombine] Disable optimizations of select instructions that causes propagation of poison values
AbandonedPublic

Authored by congzhe on Dec 10 2020, 1:24 PM.

Download Raw Diff

Details

Reviewers

lebedev.ri
aqjune
nlopes
spatel
gkistanova
nikic

Summary

This is a work-in-progress.

Relevant discussions on Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=48353

In brief (cited from Roman Lebedev),

define i1 @src(i1 %cmp.i, i1 %cmp41) {
%entry:
  %cmp4 = select i1 %cmp.i, i1 1, i1 %cmp41
  ret i1 %cmp4
}

define i1 @tgt(i1 %cmp.i, i1 %cmp41) {
%entry:
  %cmp4 = or i1 %cmp.i, %cmp41
  ret i1 %cmp4
}

Transformation doesn't verify!
ERROR: Target is more poisonous than source

Example:
i1 %cmp.i = #x1 (1)
i1 %cmp41 = poison

Source:
i1 %cmp4 = #x1 (1)

Target:
i1 %cmp4 = poison
Source value: #x1 (1)

Target value: poison

Therefore, this patches disabled all such optimizations of select instructions that result in propagation of poison values.

In addition, minor changes are also made to ensure there is no functional problem after disabling the abovementioned optimizations.

Performance measurement should be taken to measure potential degradations. Currently in progress. Many regression tests should be updated as well.

Diff Detail

Event Timeline

congzhe created this revision.Dec 10 2020, 1:24 PM

Herald added a reviewer: gkistanova. · View Herald TranscriptDec 10 2020, 1:24 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

congzhe requested review of this revision.Dec 10 2020, 1:24 PM

Herald added a subscriber: llvm-commits. · View Herald TranscriptDec 10 2020, 1:24 PM

congzhe edited the summary of this revision. (Show Details)Dec 10 2020, 1:28 PM

congzhe edited the summary of this revision. (Show Details)

aqjune added inline comments.Dec 11 2020, 12:30 AM

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
2643	Is this code still reachable when TrueVal/FalseVal are vectors of i1?

Updated to address comments.

congzhe added inline comments.Dec 11 2020, 8:58 PM

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
2643	Thanks for the comment! Now updated to handle vectors of i1 as well.

Performance data? No regressions for benchmarks?

This will require other parts of the compiler (especially anything dealing with and/or'd branch conditions, like SCEV, LVI, ValueTracking etc) to understand this new form of and/or first. I don't think most optimizations care about whether and/or is poison-blocking or not, but they need to recognize the select form if we don't canonicalize.

This revision now requires changes to proceed.Dec 12 2020, 1:17 AM

You will also need to update several regression tests (make check or ninja check) to show new canonical form.
I'm still not sure which path (this or insert freezes) is easier/better - if we have perf data, maybe it will tell us how to go?

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
2640	This seems more complicated than necessary. Can't we just transfer the check from the block above to here: if (SelType->isIntOrIntVectorTy() && !SelType->isIntOrIntVectorTy(1) && ...) We can do that as an NFC preliminary patch regardless of whether we decide to proceed with this or not.

In D93065#2450181, @xbolva00 wrote:

Performance data? No regressions for benchmarks?

Thanks for the comment! As mentioned in the summary, the performance testing is currently in progress.

In D93065#2450198, @nikic wrote:

This will require other parts of the compiler (especially anything dealing with and/or'd branch conditions, like SCEV, LVI, ValueTracking etc) to understand this new form of and/or first. I don't think most optimizations care about whether and/or is poison-blocking or not, but they need to recognize the select form if we don't canonicalize.

Thank you, I do agree with you that some infrastructures like SCEV might be impacted. Could you clarify it a bit? I am working on unit tests and perf degradation. Should the cause of any of them be those infrastructures, I will post fixes. Would that address your concern? Or something more than that is needed?"

In D93065#2450391, @spatel wrote:

You will also need to update several regression tests (make check or ninja check) to show new canonical form.
I'm still not sure which path (this or insert freezes) is easier/better - if we have perf data, maybe it will tell us how to go?

Thanks Sanjay, by "perf data" I'm wondering which benchmarks does the community mostly care about? LLVM test-suite?

As you said there's two ways to address this bug -- one is this patch, the other is inserting freezes. With this patch I do see degradations on some internal benchmarks and I'm working on the fix. Do you think it is the right track? I just would like to make sure before going too far.

The work I do is on AArch64. It may or may not impact other architectures (X86, PowerPC, etc). I'm wondering what is the typical way in the community to address performance degradations since addressing degradations for all possible architectures seems significant amount of work hence not likely?

Current failing regression tests:

Failed Tests (10):
  Clang :: CodeGenOpenCL/amdgpu-nullptr.cl
  LLVM :: Transforms/InstCombine/icmp.ll
  LLVM :: Transforms/InstCombine/minmax-fold.ll
  LLVM :: Transforms/InstCombine/select-bitext.ll
  LLVM :: Transforms/InstCombine/select-cmp-br.ll
  LLVM :: Transforms/InstCombine/select.ll
  LLVM :: Transforms/InstCombine/select_meta.ll
  LLVM :: Transforms/PGOProfile/chr.ll
  LLVM :: Transforms/PhaseOrdering/X86/vector-reductions.ll
  LLVM :: Transforms/PhaseOrdering/unsigned-multiply-overflow-check.ll

congzhe updated this revision to Diff 311691.Dec 14 2020, 1:36 PM

congzhe added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
2640	Thanks for the comment, now updated accordingly and will post this piece of code as another separate patch.

In D93065#2452949, @congzhe wrote:

In D93065#2450198, @nikic wrote:

This will require other parts of the compiler (especially anything dealing with and/or'd branch conditions, like SCEV, LVI, ValueTracking etc) to understand this new form of and/or first. I don't think most optimizations care about whether and/or is poison-blocking or not, but they need to recognize the select form if we don't canonicalize.

Thank you, I do agree with you that some infrastructures like SCEV might be impacted. Could you clarify it a bit? I am working on unit tests and perf degradation. Should the cause of any of them be those infrastructures, I will post fixes. Would that address your concern? Or something more than that is needed?"

In D93065#2450391, @spatel wrote:

Thanks Sanjay, by "perf data" I'm wondering which benchmarks does the community mostly care about? LLVM test-suite?

We would get a different answer for each person/company, but test-suite is solid common ground. I think SPEC is also widely used by a large part of the community.

As you said there's two ways to address this bug -- one is this patch, the other is inserting freezes. With this patch I do see degradations on some internal benchmarks and I'm working on the fix. Do you think it is the right track? I just would like to make sure before going too far.

The work I do is on AArch64. It may or may not impact other architectures (X86, PowerPC, etc). I'm wondering what is the typical way in the community to address performance degradations since addressing degradations for all possible architectures seems significant amount of work hence not likely?

In general, we try to address known regressions in advance. If you see a problem on AArch64, it will probably not be too different for the other CPU targets.
It is acknowledged that it is impossible to know how a change like this will affect everything, so as long as we have a plan to deal with problems, it's fine to proceed.
My suggestion is to compare the regressions between this patch vs. adding freeze on test-suite. Is there a clear winner (number and average size of regressions)?
Thank you for pushing this forward!

In D93065#2453190, @spatel wrote:

In D93065#2452949, @congzhe wrote:

In D93065#2450198, @nikic wrote:

This will require other parts of the compiler (especially anything dealing with and/or'd branch conditions, like SCEV, LVI, ValueTracking etc) to understand this new form of and/or first. I don't think most optimizations care about whether and/or is poison-blocking or not, but they need to recognize the select form if we don't canonicalize.

Thank you, I do agree with you that some infrastructures like SCEV might be impacted. Could you clarify it a bit? I am working on unit tests and perf degradation. Should the cause of any of them be those infrastructures, I will post fixes. Would that address your concern? Or something more than that is needed?"

In D93065#2450391, @spatel wrote:

Thanks Sanjay, by "perf data" I'm wondering which benchmarks does the community mostly care about? LLVM test-suite?

We would get a different answer for each person/company, but test-suite is solid common ground. I think SPEC is also widely used by a large part of the community.

As you said there's two ways to address this bug -- one is this patch, the other is inserting freezes. With this patch I do see degradations on some internal benchmarks and I'm working on the fix. Do you think it is the right track? I just would like to make sure before going too far.

The work I do is on AArch64. It may or may not impact other architectures (X86, PowerPC, etc). I'm wondering what is the typical way in the community to address performance degradations since addressing degradations for all possible architectures seems significant amount of work hence not likely?

In general, we try to address known regressions in advance. If you see a problem on AArch64, it will probably not be too different for the other CPU targets.
It is acknowledged that it is impossible to know how a change like this will affect everything, so as long as we have a plan to deal with problems, it's fine to proceed.
My suggestion is to compare the regressions between this patch vs. adding freeze on test-suite. Is there a clear winner (number and average size of regressions)?
Thank you for pushing this forward!

On test suite it seems we did not have a clear winner.

Out of ~3600 microbenchmarks I choose the ones with running time greater than 4us, that gives me ~250 benchmarks. Regarding the overall geometric mean difference to llvm trunk code, both the freeze solution and this patch have less than 0.7% difference to trunk. We have 9 microbenchmarks that degrades over 2% for the freeze solution where the geomean for those 9 degradations is 3.7%; for this patch we have 12 microbenchmarks that degrades over 2% for the freeze solution where the geomean for those 12 degradations is 4.3%.

If I choose the microbenchmarks with running time greater than 0.5us, that gives me ~400 microbenchmarks where I have similar observations.

However, on internal benchmarks I see much less degradation from the freeze solution compared to this patch.

Does this sound like a clue that we should go ahead with the freeze solution? If so I'll post the freeze solution and see how we can go from there.

In D93065#2460747, @congzhe wrote:

However, on internal benchmarks I see much less degradation from the freeze solution compared to this patch.

Does this sound like a clue that we should go ahead with the freeze solution? If so I'll post the freeze solution and see how we can go from there.

Ok, the freeze solution might be easier to implement. It seems more straightforward to teach passes to peek through a freeze vs. learn new patterns? I defer to @aqjune and @nikic and other reviewers for opinions/experience. I haven't done that work myself.

Using freeze loses information (if some of the inputs was poison). Plus It requires an extra op.
If we canonicalize around select there's no loss of information and it's just 1 instruction.

The disadvantage is that then we have 2 ways or doing boolean ANDs/ORs. Though most analyses can be patched easily, as most LLVM analyses' results are of the form "x has property foo unless it's poison". So for those analyses using and/or or select is the same (as the only difference between these is propagation of poison).
Other analyses/optimization can learn about select as needed.

In D93065#2461196, @nlopes wrote:

Using freeze loses information (if some of the inputs was poison). Plus It requires an extra op.
If we canonicalize around select there's no loss of information and it's just 1 instruction.

The disadvantage is that then we have 2 ways or doing boolean ANDs/ORs. Though most analyses can be patched easily, as most LLVM analyses' results are of the form "x has property foo unless it's poison". So for those analyses using and/or or select is the same (as the only difference between these is propagation of poison).
Other analyses/optimization can learn about select as needed.

Thank you for raising up the good point! I understand that we lose information by preventing poison values from propagation using freeze. But I'm unclear what would be the side effect or problem with that? I'd appreciate it if you could clarify a bit, thanks!

In D93065#2461517, @congzhe wrote:

In D93065#2461196, @nlopes wrote:

Using freeze loses information (if some of the inputs was poison). Plus It requires an extra op.
If we canonicalize around select there's no loss of information and it's just 1 instruction.

The disadvantage is that then we have 2 ways or doing boolean ANDs/ORs. Though most analyses can be patched easily, as most LLVM analyses' results are of the form "x has property foo unless it's poison". So for those analyses using and/or or select is the same (as the only difference between these is propagation of poison).
Other analyses/optimization can learn about select as needed.

Thank you for raising up the good point! I understand that we lose information by preventing poison values from propagation using freeze. But I'm unclear what would be the side effect or problem with that? I'd appreciate it if you could clarify a bit, thanks!

In practice, probably not a lot. But it may have implications for loop optimization, like:

for (i=0; some_bool && i < limit; ++i) {
...
}

If you remove the poison from the i+1 < limit bit it may make the work of SCEV harder (or impossible; didn't think the example through carefully).
Another example is hoisting of sext(i32)->i64 out of loops. This is important for perf and it relies on poison to prove no wrapping and we can't destroy that. Probably most loop conditions don't have ANDs/ORs, hence why I say the impact in practice is small.

for (i=0; some_bool && i < limit; ++i) {
...
}

Assuming that all relevant optimizations are fully implemented in both scenarios (and/or with freeze vs. select), I think their performance diff should be zero because C/C++ constrains variables to have well-defined values.
Currently, this isn't enforced by clang. We already have !noundef, and simply attaching it to loads will do the trick.
Should we move toward the direction and suggest it to llvm-dev? The diff in clang src won't be big, but people might worry about their program being more undefined (which are written in that way due to a practical reason).

In D93065#2463069, @nlopes wrote:
In D93065#2461517, @congzhe wrote:

In D93065#2461196, @nlopes wrote:

Using freeze loses information (if some of the inputs was poison). Plus It requires an extra op.
If we canonicalize around select there's no loss of information and it's just 1 instruction.

The disadvantage is that then we have 2 ways or doing boolean ANDs/ORs. Though most analyses can be patched easily, as most LLVM analyses' results are of the form "x has property foo unless it's poison". So for those analyses using and/or or select is the same (as the only difference between these is propagation of poison).
Other analyses/optimization can learn about select as needed.

Thank you for raising up the good point! I understand that we lose information by preventing poison values from propagation using freeze. But I'm unclear what would be the side effect or problem with that? I'd appreciate it if you could clarify a bit, thanks!

In practice, probably not a lot. But it may have implications for loop optimization, like:
for (i=0; some_bool && i < limit; ++i) {
...
}
If you remove the poison from the i+1 < limit bit it may make the work of SCEV harder (or impossible; didn't think the example through carefully).

Can I just make sure my understanding is correct -- so when we check the SCEV of some_bool && i < limit; we do recursion backwards on this select instruction (after this patch) or on an AND instruction (before this patch). If we choose the freeze approach, we'll do recursion on the AND instruction and eventually hit a freeze instruction which SCEV does not know how to handle, hence SCEV will just return CouldNotCompute?

In D93065#2465056, @congzhe wrote:
In D93065#2463069, @nlopes wrote:
In practice, probably not a lot. But it may have implications for loop optimization, like:
for (i=0; some_bool && i < limit; ++i) {
...
}
If you remove the poison from the i+1 < limit bit it may make the work of SCEV harder (or impossible; didn't think the example through carefully).
Can I just make sure my understanding is correct -- so when we check the SCEV of some_bool && i < limit; we do recursion backwards on this select instruction (after this patch) or on an AND instruction (before this patch). If we choose the freeze approach, we'll do recursion on the AND instruction and eventually hit a freeze instruction which SCEV does not know how to handle, hence SCEV will just return CouldNotCompute?

It's more than just teaching SCEV how to handle freeze. A freeze instruction destroys UB and therefore not even the most precise SCEV would be able to recover the information that is lost. select retains more information/UB.
So if SCEV requires poison from the add nsw to e.g. determine the number of iterations, we would miss out with the freeze implementation. It's a hypothetical loss that I can't quantify without testing. My argument for using select is just that it makes the IR smaller and therefore preferable. Since it has the nice extra benefit of keeping more UB, then great, but not the main argument.

In D93065#2464486, @aqjune wrote:

Assuming that all relevant optimizations are fully implemented in both scenarios (and/or with freeze vs. select), I think their performance diff should be zero because C/C++ constrains variables to have well-defined values.
Currently, this isn't enforced by clang. We already have !noundef, and simply attaching it to loads will do the trick.

Thinking about this again, this isn't true. Since !noundef can be stripped when the load is hoisted, there still is a possible UB loss.
So, in terms of preserving UB, using select will be the better approach.

But, there are so many transformations to consider if select is used; from InstCombine optimizations that involve And/Or to analysis, simplifycfg, etc. It seems to fix all of them is a hard job.

In D93065#2466356, @aqjune wrote:

In D93065#2464486, @aqjune wrote:

Assuming that all relevant optimizations are fully implemented in both scenarios (and/or with freeze vs. select), I think their performance diff should be zero because C/C++ constrains variables to have well-defined values.
Currently, this isn't enforced by clang. We already have !noundef, and simply attaching it to loads will do the trick.

Thinking about this again, this isn't true. Since !noundef can be stripped when the load is hoisted, there still is a possible UB loss.
So, in terms of preserving UB, using select will be the better approach.

But, there are so many transformations to consider if select is used; from InstCombine optimizations that involve And/Or to analysis, simplifycfg, etc. It seems to fix all of them is a hard job.

I think we'll just have to bite the bullet and do that. In a systematic way, before this lands. I think that the number of places that need to be changed is actually rather limited as we're only interested in and/or on booleans. We don't have that many analyses dealing with conditions.

Here's a sample patch for LVI: https://gist.github.com/nikic/fac8124a95901b7ff9ac44513f554578 With this PatternMatch boilerplate in place, the actual LVI change is trivial. The main work is adding relevant tests...

Worth noting is that some of the code deleted in this patch should be retained as a canonicalization. E.g. a ? a : b should become a ? true : b, a ? b : true should become !a ? true : b, etc. That way the remaining code only has to deal with canonical select patterns.

It may also be worthwhile to convert the special cases of icmp (a, X) ? (icmp a, Y) : false to icmp (a, X) & (icmp a, Y), though I'm not sure if that is even useful if we handle the select pattern reasonable everywhere else.

I think we'll just have to bite the bullet and do that. In a systematic way, before this lands. I think that the number of places that need to be changed is actually rather limited as we're only interested in and/or on booleans. We don't have that many analyses dealing with conditions.

Worth noting is that some of the code deleted in this patch should be retained as a canonicalization. E.g. a ? a : b should become a ? true : b, a ? b : true should become !a ? true : b, etc. That way the remaining code only has to deal with canonical select patterns.

Yes, this makes sense to me.

It may also be worthwhile to convert the special cases of icmp (a, X) ? (icmp a, Y) : false to icmp (a, X) & (icmp a, Y), though I'm not sure if that is even useful if we handle the select pattern reasonable everywhere else.

I think this is helpful because it helps the expression syntactically show more information. Poison propagation is more aggressive in the latter case.
If we want to support this transformation, I think having something like impliesPoison in ValueTracking and calling it when doing this conversion will be helpful. I locally have an old patch that does this (maybe somewhere in Phabricator too).
What do you think about this workflow:
(1) Add the LogicalOr/And pattern match, make a few important analysis (LVI, transformations in ValueTracking) as well transformations like GVN support select pattern.
(2) If necessary, add an analysis function that can be used for conditionally allowing select->and/or.
(3) See how the performance goes..!

nikic mentioned this in rG5bc5c016c4bf: [CVP] Add tests for select form of and/or (NFC).Dec 26 2020, 12:48 PM

nikic mentioned this in D93827: [PatternMatch][LVI] Handle select-form and/or in LVI.Dec 26 2020, 12:58 PM

nikic mentioned this in rG0af42d3dc73e: [PatternMatch][LVI] Handle select-form and/or in LVI.Dec 27 2020, 8:41 AM

nikic mentioned this in D93840: [InstCombine] Disable unsafe select transform behind a flag.Dec 27 2020, 9:58 AM

aqjune mentioned this in D93841: [GVN] Use m_LogicalAnd/Or to propagate equality from branch conditions.Dec 27 2020, 10:49 AM

aqjune mentioned this in D93842: [EarlyCSE] Use m_LogicalAnd/Or matchers to handle branch conditions.Dec 27 2020, 11:28 AM

aqjune mentioned this in rGf1d648b973d3: [GVN] Use m_LogicalAnd/Or to propagate equality from branch conditions.Dec 27 2020, 12:29 PM

aqjune mentioned this in rGd3f1f7b6bca5: [EarlyCSE] Use m_LogicalAnd/Or matchers to handle branch conditions.Dec 27 2020, 12:36 PM

aqjune mentioned this in D93845: [ValueTracking] Use m_LogicalAnd/Or to look into conditions.Dec 27 2020, 1:42 PM

Relevant patches in the past that partially allows select -> and/or: https://reviews.llvm.org/D77868 , https://reviews.llvm.org/D78152

aqjune mentioned this in rG860199dfbe60: [ValueTracking] Use m_LogicalAnd/Or to look into conditions.Dec 27 2020, 3:33 PM

aqjune mentioned this in D93853: [CodeGen] recognize select form of and/ors when splitting branch conditions.Dec 28 2020, 1:00 AM

I wanted to make ScalarEvolution recognize select pattern, and became a bit uncertain about its validity.

Take this example: https://alive2.llvm.org/ce/z/NsP9ue
SCEV's computeExitLimit can return min(n, m) as ExactNotTaken value, so I put llvm.assume to show its validity.
But it fails because the exit limit becomes poison if n is zero and m is poison. This will make e.g. replacing the last value of i with min(n, m) invalid.
If and is used instead, this becomes okay: https://alive2.llvm.org/ce/z/K9rbJk

If there is a guard about n and m at the loop header it would be safe I think. Is this what SCEV assumes? Then, how should the input look like?

In D93065#2472631, @aqjune wrote:

I wanted to make ScalarEvolution recognize select pattern, and became a bit uncertain about its validity.

Take this example: https://alive2.llvm.org/ce/z/NsP9ue
SCEV's computeExitLimit can return min(n, m) as ExactNotTaken value, so I put llvm.assume to show its validity.
But it fails because the exit limit becomes poison if n is zero and m is poison. This will make e.g. replacing the last value of i with min(n, m) invalid.
If and is used instead, this becomes okay: https://alive2.llvm.org/ce/z/K9rbJk

If there is a guard about n and m at the loop header it would be safe I think. Is this what SCEV assumes? Then, how should the input look like?

I think you are right about this. SCEV doesn't really have the facilities to represent this. I think the only thing we can do at the SCEV level is to at least determine that the first condition implies an upper bound, even if we can't use the second one. Guess this is more motivation to try fairly hard to convert select -> and/or for the cases where we can do so.

nikic mentioned this in rG4a16c507cb68: [InstCombine] Disable unsafe select transform behind a flag.Dec 28 2020, 1:44 PM

In D93065#2472644, @nikic wrote:

In D93065#2472631, @aqjune wrote:

I wanted to make ScalarEvolution recognize select pattern, and became a bit uncertain about its validity.

Take this example: https://alive2.llvm.org/ce/z/NsP9ue
SCEV's computeExitLimit can return min(n, m) as ExactNotTaken value, so I put llvm.assume to show its validity.
But it fails because the exit limit becomes poison if n is zero and m is poison. This will make e.g. replacing the last value of i with min(n, m) invalid.
If and is used instead, this becomes okay: https://alive2.llvm.org/ce/z/K9rbJk

If there is a guard about n and m at the loop header it would be safe I think. Is this what SCEV assumes? Then, how should the input look like?

I think you are right about this. SCEV doesn't really have the facilities to represent this. I think the only thing we can do at the SCEV level is to at least determine that the first condition implies an upper bound, even if we can't use the second one. Guess this is more motivation to try fairly hard to convert select -> and/or for the cases where we can do so.

Okay.. let me make a patch for this.

aqjune mentioned this in D93882: [SCEV] recognize logical and/or pattern.Dec 28 2020, 6:55 PM

aqjune mentioned this in D93935: [ConstraintElimination] Add support for select form of and/or.Dec 29 2020, 11:21 PM

aqjune mentioned this in rGbfedd5d2b650: [ConstraintElimination] Add support for select form of and/or.Dec 30 2020, 4:37 AM

aqjune mentioned this in D93943: [SimplifyCFG] Update SimplifyBranchOnICmpChain to recognize select form of and/or.Dec 30 2020, 5:20 AM

Okay, I think the remaining optimizations (other than opened patches) are:

LoopUnswitch.cpp: I can address this
PredicateInfo.cpp: I'm not sure whether a local change will be enough, because the relevant classes in PredicateInfo.h are exposed to other optimizations.
SimpleLoopUnswitch.cpp, LoopPredication.cpp, InductiveRangeCheckElimination.cpp: have no idea

After LoopUnswitch and PredicateInfo is updated & the select -> and/or is conditionally allowed, we can try performance evaluation again and see whether there still exists regression. What do you think? @nikic @congzhe

In D93065#2475022, @aqjune wrote:

Okay, I think the remaining optimizations (other than opened patches) are:

LoopUnswitch.cpp: I can address this

PredicateInfo.cpp: I'm not sure whether a local change will be enough, because the relevant classes in PredicateInfo.h are exposed to other optimizations.

SimpleLoopUnswitch.cpp, LoopPredication.cpp, InductiveRangeCheckElimination.cpp: have no idea

After LoopUnswitch and PredicateInfo is updated & the select -> and/or is conditionally allowed, we can try performance evaluation again and see whether there still exists regression. What do you think? @nikic @congzhe

Thank you all for putting all the efforts addressing the canonicalization of select form!
Sure I will apply all these patches, check the benchmark performance and let you know the result. Thanks again!

aqjune mentioned this in rG509fa8e02e25: [SCEV] recognize logical and/or pattern.Dec 31 2020, 11:38 AM

aqjune mentioned this in rG5cdf6ed74489: [CodeGen] recognize select form of and/ors when splitting branch conditions.Dec 31 2020, 11:55 AM

For loop unswitch, I'll work on it after D93764 is done

@aqjune Do you have any automated way of duplicating the InstCombine tests with logical and/or, similar to what you did for insertelement? Or does this have to be done by hand?

In D93065#2486134, @nikic wrote:

@aqjune Do you have any automated way of duplicating the InstCombine tests with logical and/or, similar to what you did for insertelement? Or does this have to be done by hand?

Technically, it wouldn't be hard.
But I'm slightly concerned that the situation is slightly different from the vector's poison vector placeholder patches, because this is about disabling optimizations.
A mail should be sent to llvm-dev to notify this issue and people should be aware that performance degradation may happen.
Well, the root problem is that I don't have enough time for this for a while until the end of Jan. :( I don't want someone else to take over the responsibility either; this would be a lot of work.

I think the modest solution is to simply revert the poison constant folding patch (D92270), except a few cases like division. select -> and/or won't interact with poison from div undef because executing div undef already raises UB in src.
After the select -> and/or is removed, the patch should definitely be applied again,

aqjune mentioned this in rG395c737d9fce: [SimplifyCFG] Update SimplifyBranchOnICmpChain to recognize select form of….Jan 18 2021, 3:54 PM

Current diff for flipping the flag: https://gist.github.com/nikic/f65b36adb70c93f5da9bfe3422fd8904 There's still a number of cases that can be handled.

I have a patch for supporting logical and/or in PredicateInfo, but waiting on D94447 to land first.

I also noticed that SimplifyCFG is creating and/or when merging branch conditions in https://github.com/llvm/llvm-project/blob/ecf696641e6ce4b22e8c8ea3c7476b9c1f0f200b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp#L2959. I think it would make sense to switch that code over to use the select form first. That will make the transform correct in SimplifyCFG, but still let it be folded by InstCombine. As other passes have already been adjusted to recognize logical and/or, this should be low impact.

In D93065#2508119, @nikic wrote:

Current diff for flipping the flag: https://gist.github.com/nikic/f65b36adb70c93f5da9bfe3422fd8904 There's still a number of cases that can be handled.

I think I can handle or_andn_cmp_1_logical and its family, llvm.umul.with.overflow.i64 related transformations, and PR41069_logical.

I also noticed that SimplifyCFG is creating and/or when merging branch conditions in https://github.com/llvm/llvm-project/blob/ecf696641e6ce4b22e8c8ea3c7476b9c1f0f200b/llvm/lib/Transforms/Utils/SimplifyCFG.cpp#L2959. I think it would make sense to switch that code over to use the select form first. That will make the transform correct in SimplifyCFG, but still let it be folded by InstCombine. As other passes have already been adjusted to recognize logical and/or, this should be low impact.

+1, this very makes sense to me; InstCombine happens immediately after SimplifyCFG in many places.
I believe non-InstCombine passes coming after SimplifyCFG are already familiar with unoptimized select instructions as well because SimplifyCFG is already generating such ops (see test6f)
I'll make a patch for this.

aqjune mentioned this in D95026: [SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe.Jan 20 2021, 12:16 AM

I don't think I'll be able to do something for a while; I'm currently occupied by something else.
Should we do something to address the bugs? I think we should revert relevant poison-optimizing patches. Then, the select bug must be fixed and then relevant optimizations can be relanded.

To soothe the reported bugs only, I think it is enough to revert folds in shifts (shl, ashr, lshr). What do you think? @nikic

@aqjune Is https://bugs.llvm.org/show_bug.cgi?id=48353 the only known (real-life) issue, or are there others? I'm okay with making shift optimization temporarily more conservative again. Not sure to what degree this addresses the issue though, even as as workaround. Shouldn't it also be easy to trigger something based on add nsw arithmetic in C? Or is there a reason why that is less likely?

A thought I had is to do something like https://gist.github.com/nikic/e855476bcd87124ff8550ad9b5432f26, basically assume that clang has annotated everything possible as noundef, which should give us a pretty good heuristic to distinguish "poison might practically occur here" and "poison can theoretically occur here". The main casualty of that approach is the "one hot merge" optimization, which just doesn't seem valid to do with selects.

Hi,

In D93065#2525779, @nikic wrote:

@aqjune Is https://bugs.llvm.org/show_bug.cgi?id=48353 the only known (real-life) issue, or are there others? I'm okay with making shift optimization temporarily more conservative again. Not sure to what degree this addresses the issue though, even as as workaround. Shouldn't it also be easy to trigger something based on add nsw arithmetic in C? Or is there a reason why that is less likely?

There is a reason why other ops are less likely to be problematic - existing transformations don't simplify non-shifts with well-defined operands into undef or poison.
For example, add nuw -1, -1 isn't being folded to poison. Similarly gep inbounds as well even if the offset is proven to be out of bounds (unless it is undef).
Vector operations like extractelement may produce poison from well-defined operands, but I believe the introduction of such instruction itself rarely happens.
The existence of such folding matters because a well-defined program that is emitted from the frontend is unlikely to have undef or poison as operands of arithmetic operations; it must be created from somewhere else, such as from 1u << 33.

But, since there can exist an unexpected situation that I'm not aware of, I reverted D92270 . I think the InstCombine shift patch (D93998) should be reverted as well.

A thought I had is to do something like https://gist.github.com/nikic/e855476bcd87124ff8550ad9b5432f26, basically assume that clang has annotated everything possible as noundef, which should give us a pretty good heuristic to distinguish "poison might practically occur here" and "poison can theoretically occur here". The main casualty of that approach is the "one hot merge" optimization, which just doesn't seem valid to do with selects.

I agree this approach will be helpful.
Seeing arguments as no-undef is practically fine. This simulates the alternative semantics which is that passing undef or poison as a function argument is UB. (strictly speaking, this makes dead arg elim/function outlining/hoisting fncall invalid, but for analysis purpose I think this won't be a source of miscompilation)
For loads, seeing them as noundef might be allowing too much I think; they are a source of undefs from reading uninitialized values. Maybe we can start with inferring and attaching !noundef to loads first.

I'll be able to restart working on things after a week or earlier.

aqjune mentioned this in D96945: [InstCombine] Add simplification of two logical and/ors.Feb 18 2021, 1:48 AM

aqjune mentioned this in D97360: [TTI] Consider select form of and/or i1 as having arithmetic cost.Feb 23 2021, 11:11 PM

dmgreen added a subscriber: dmgreen.Mar 1 2021, 7:50 AM

aqjune mentioned this in rGc89d9d8a48c0: [TTI] Consider select form of and/or i1 as having arithmetic cost.Mar 1 2021, 9:18 AM

spatel mentioned this in D97730: [SDAG] allow vector types for select->logic folds.Mar 1 2021, 1:45 PM

aqjune mentioned this in D97756: [LoopUnswitch] unswitch if cond is in select form of and/or as well.Mar 2 2021, 1:22 AM

spatel mentioned this in rG7fce3322a283: [SDAG] allow vector types for select->logic folds.Mar 2 2021, 6:30 AM

Simple loop unswitch fix: D97756
Update SimplifyCFG to use select instead of and/or when merging conditional branches: D95026

aqjune mentioned this in D97537: [Codegenprepare] Use IV increment instead of IV if we can prove it is not a poisoned value.Mar 4 2021, 7:55 AM

aqjune mentioned this in rG5bb38e84d3d0: [LoopUnswitch] unswitch if cond is in select form of and/or as well.Mar 7 2021, 8:20 AM

aqjune mentioned this in rG99108c791de0: [SimplifyCFG] Update FoldBranchToCommonDest to be poison-safe.Mar 7 2021, 8:38 AM

Next goal: make LoopSimplify turn on PoisonSafe flag, add remaining instcombine transformation support for select

aqjune mentioned this in rG07c3b97e184d: [InstCombine] Add simplification of two logical and/ors.Mar 7 2021, 9:39 AM

@aqjune Found a leftover in SCEV: https://github.com/llvm/llvm-project/blob/ece1403acadadf0b101bc68a8c69c613ca4f816f/llvm/lib/Analysis/ScalarEvolution.cpp#L10155

Should be fixed via https://reviews.llvm.org/rGb00209ed100c

The optimization has been removed in D101191, so this patch can be abandoned now.

In D93065#2756486, @nikic wrote:

The optimization has been removed in D101191, so this patch can be abandoned now.

Sure, patch abandoned. Thanks again for the work!

congzhe abandoned this revision.May 15 2021, 1:28 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineSelect.cpp

41 lines

Diff 311691

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp

Show First 20 Lines • Show All 2,620 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::visitSelectInst(SelectInst &SI) {
if (Instruction *I = canonicalizeSelectToShuffle(SI))		if (Instruction *I = canonicalizeSelectToShuffle(SI))
return I;		return I;

if (Instruction I = canonicalizeScalarSelectOfVecs(SI, this))		if (Instruction I = canonicalizeScalarSelectOfVecs(SI, this))
return I;		return I;

CmpInst::Predicate Pred;		CmpInst::Predicate Pred;

if (SelType->isIntOrIntVectorTy(1) &&
TrueVal->getType() == CondVal->getType()) {
if (match(TrueVal, m_One())) {
// Change: A = select B, true, C --> A = or B, C
return BinaryOperator::CreateOr(CondVal, FalseVal);
}
if (match(TrueVal, m_Zero())) {
// Change: A = select B, false, C --> A = and !B, C
Value *NotCond = Builder.CreateNot(CondVal, "not." + CondVal->getName());
return BinaryOperator::CreateAnd(NotCond, FalseVal);
}
if (match(FalseVal, m_Zero())) {
// Change: A = select B, C, false --> A = and B, C
return BinaryOperator::CreateAnd(CondVal, TrueVal);
}
if (match(FalseVal, m_One())) {
// Change: A = select B, C, true --> A = or !B, C
Value *NotCond = Builder.CreateNot(CondVal, "not." + CondVal->getName());
return BinaryOperator::CreateOr(NotCond, TrueVal);
}

// select a, a, b -> a \| b
// select a, b, a -> a & b
if (CondVal == TrueVal)
return BinaryOperator::CreateOr(CondVal, FalseVal);
if (CondVal == FalseVal)
return BinaryOperator::CreateAnd(CondVal, TrueVal);

// select a, ~a, b -> (~a) & b
// select a, b, ~a -> (~a) \| b
if (match(TrueVal, m_Not(m_Specific(CondVal))))
return BinaryOperator::CreateAnd(TrueVal, FalseVal);
if (match(FalseVal, m_Not(m_Specific(CondVal))))
return BinaryOperator::CreateOr(TrueVal, FalseVal);
}

// Selecting between two integer or vector splat integer constants?		// Selecting between two integer or vector splat integer constants?
//		//
// Note that we don't handle a scalar select of vectors:		// Note that we don't handle a scalar select of vectors:
// select i1 %c, <2 x i8> <1, 1>, <2 x i8> <0, 0>		// select i1 %c, <2 x i8> <1, 1>, <2 x i8> <0, 0>
// because that may need 3 instructions to splat the condition value:		// because that may need 3 instructions to splat the condition value:
// extend, insertelement, shufflevector.		// extend, insertelement, shufflevector.
if (SelType->isIntOrIntVectorTy() &&		//
		// Do not handle i1 TrueVal and FalseVal otherwise would result in
		// zext i1 to i1.
		if (SelType->isIntOrIntVectorTy() && !SelType->isIntOrIntVectorTy(1) &&
CondVal->getType()->isVectorTy() == SelType->isVectorTy()) {		CondVal->getType()->isVectorTy() == SelType->isVectorTy()) {
// select C, 1, 0 -> zext C to int		// select C, 1, 0 -> zext C to int
		spatelUnsubmitted Not Done Reply Inline Actions This seems more complicated than necessary. Can't we just transfer the check from the block above to here: if (SelType->isIntOrIntVectorTy() && !SelType->isIntOrIntVectorTy(1) && ...) We can do that as an NFC preliminary patch regardless of whether we decide to proceed with this or not. spatel: This seems more complicated than necessary. Can't we just transfer the check from the block…
		congzheAuthorUnsubmitted Done Reply Inline Actions Thanks for the comment, now updated accordingly and will post this piece of code as another separate patch. congzhe: Thanks for the comment, now updated accordingly and will post this piece of code as another…
if (match(TrueVal, m_One()) && match(FalseVal, m_Zero()))		if (match(TrueVal, m_One()) && match(FalseVal, m_Zero()))
return new ZExtInst(CondVal, SelType);		return new ZExtInst(CondVal, SelType);

		aqjuneUnsubmitted Not Done Reply Inline Actions Is this code still reachable when TrueVal/FalseVal are vectors of i1? aqjune: Is this code still reachable when TrueVal/FalseVal are vectors of i1?
		congzheAuthorUnsubmitted Done Reply Inline Actions Thanks for the comment! Now updated to handle vectors of i1 as well. congzhe: Thanks for the comment! Now updated to handle vectors of i1 as well.
// select C, -1, 0 -> sext C to int		// select C, -1, 0 -> sext C to int
if (match(TrueVal, m_AllOnes()) && match(FalseVal, m_Zero()))		if (match(TrueVal, m_AllOnes()) && match(FalseVal, m_Zero()))
return new SExtInst(CondVal, SelType);		return new SExtInst(CondVal, SelType);

// select C, 0, 1 -> zext !C to int		// select C, 0, 1 -> zext !C to int
if (match(TrueVal, m_Zero()) && match(FalseVal, m_One())) {		if (match(TrueVal, m_Zero()) && match(FalseVal, m_One())) {
Value *NotCond = Builder.CreateNot(CondVal, "not." + CondVal->getName());		Value *NotCond = Builder.CreateNot(CondVal, "not." + CondVal->getName());
return new ZExtInst(NotCond, SelType);		return new ZExtInst(NotCond, SelType);
▲ Show 20 Lines • Show All 366 Lines • Show Last 20 Lines