This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold two-value clamp patterns
AbandonedPublic

Authored by qiucf on Jul 30 2021, 3:27 AM.

Details

Summary

This patch is for PR43053, the following pattern:

%cmp1 = icmp slt i32 %num, C1
%s1 = select i1 %cmp1, i32 %num, i32 C1
%cmp2 = icmp sgt i32 %s1, C2
%r = select i1 %cmp2, i32 %s1, i32 C2
; C2 = C1 - 1

can be folded into

%cmp3 = icmp sgt %num, C2
%r = select i1 %cmp3, C1, C2

Diff Detail

Unit TestsFailed

Event Timeline

qiucf created this revision.Jul 30 2021, 3:27 AM
qiucf requested review of this revision.Jul 30 2021, 3:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 30 2021, 3:27 AM
RKSimon added inline comments.Jul 30 2021, 3:45 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
3110

(style) auto *

llvm/test/Transforms/InstCombine/minmax-fold.ll
1511

pre-commit these tests and rebase to show the diffs in the patch

The initial code

// s1 = (n < C1) ? n : C1
// s2 = (s1 > C1 - 1) ? s1 : C1 - 1

had a clear semantics of

s1 = min(n, C1)
s2 = max(s1, C1 - 1)

These pattern could be analyzed by other passes that understand semantics of min and max, such as SCEV using passes. If it participates in other min or max expression, there is a lot of ways how we can simplify it.

This new form, despite it has fewer instructions, is completely arcane. Just looking at it, it's not so easy to understand what it actually means. And no pass will.

I'd rather suggest to do changes like this as close to codegen as possible, for example in CodeGenPrepare.

The initial code

// s1 = (n < C1) ? n : C1
// s2 = (s1 > C1 - 1) ? s1 : C1 - 1

had a clear semantics of

s1 = min(n, C1)
s2 = max(s1, C1 - 1)

These pattern could be analyzed by other passes that understand semantics of min and max, such as SCEV using passes. If it participates in other min or max expression, there is a lot of ways how we can simplify it.

This new form, despite it has fewer instructions, is completely arcane. Just looking at it, it's not so easy to understand what it actually means. And no pass will.

I'd rather suggest to do changes like this as close to codegen as possible, for example in CodeGenPrepare.

Sounds like SCEV should be aware of this way to parse such a select-of-constants as min-max.

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
3102

I think you should just produce select of constants here.

The initial code

// s1 = (n < C1) ? n : C1
// s2 = (s1 > C1 - 1) ? s1 : C1 - 1

had a clear semantics of

s1 = min(n, C1)
s2 = max(s1, C1 - 1)

These pattern could be analyzed by other passes that understand semantics of min and max, such as SCEV using passes. If it participates in other min or max expression, there is a lot of ways how we can simplify it.

This new form, despite it has fewer instructions, is completely arcane. Just looking at it, it's not so easy to understand what it actually means. And no pass will.

I'd rather suggest to do changes like this as close to codegen as possible, for example in CodeGenPrepare.

Sounds like SCEV should be aware of this way to parse such a select-of-constants as min-max.

I know it's a different case, but I would consider the saturation case to be canonical as:

%m = call i32 @llvm.smin.i32(i32 %num, i32 127)
%r = call i32 @llvm.smax.i32(i32 %m, i32 -128)

And I believe we have code that relies on that.

But if this does indeed simplify to

%1 = icmp sgt i32 %0, 126
%r = select i1 %1, i32 127, i32 126

that doesn't sound worse than the min and max in this case.

The code might be better if it made use of matchSelectPattern. There are "SPF" matchers for matching min/max patterns, like in factorizeMinMaxTree and moveNotAfterMinMax.

qiucf updated this revision to Diff 366228.Aug 13 2021, 3:21 AM
qiucf marked 3 inline comments as done.
qiucf edited the summary of this revision. (Show Details)
RKSimon added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
3100

It'd be better if we could use m_APInt here to support vectors (including handling undef element cases).

3110

Can we use m_SMin etc. ? Ideally we'd have something that matches min/max intrinsics as well as select patterns.

Sorry for missing this review earlier.
I implemented the corresponding folds for intrinsics with:
https://reviews.llvm.org/rG025bb5290379

Can you confirm that the pattern matching and tests here correspond to those (replace the intrinsics with cmp+sel or vice-versa)?

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
3110

m_MaxOrMin() will match either pattern (intrinsic or cmp+sel). We're starting from a select here, so I'm not sure if it's worth including tests with mismatched patterns (one of each).
Also see getInverseMinMaxFlavor() or getInverseMinMaxPred() in ValueTracking.h.

Sorry for missing this review earlier.
I implemented the corresponding folds for intrinsics with:
https://reviews.llvm.org/rG025bb5290379

Can you confirm that the pattern matching and tests here correspond to those (replace the intrinsics with cmp+sel or vice-versa)?

Yes, it does similar thing to this. So will it be better to fold such cmp-select pattern to the intrinsic?

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
3100

Hmm.. Does match(xxx, m_APInt(&x)) support vector type?

Sorry for missing this review earlier.
I implemented the corresponding folds for intrinsics with:
https://reviews.llvm.org/rG025bb5290379

Can you confirm that the pattern matching and tests here correspond to those (replace the intrinsics with cmp+sel or vice-versa)?

Yes, it does similar thing to this. So will it be better to fold such cmp-select pattern to the intrinsic?

We intend to canonicalize cmp-select to min/max intrinsics, but we need to address some more regressions that are visible in:
D98152

So it's a question of timing/duplication - when we get D98152 done, then this patch probably becomes obsolete because we won't see the cmp-select pattern any more.

Ah - I just realized the tests here are pre-committed, so we can see what effect converting to intrinsics will have on them in D98152.
It looks like we get some of the examples, but miss some others. So we will need to improve the analysis of the intrinsics for 2-way clamps either way. If you want to investigate that, that would be great.

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
3100

Partly - m_APInt will automatically match a splat constant <42, 42, 42,...>, but not an arbitrary vector constant.
For this fold, matching a splat is probably sufficient.

m_APInt is basically free in terms of coding compared to m_ConstantInt, so we prefer to use m_APInt over m_ConstantInt for more general matching.

Matt added a subscriber: Matt.Oct 2 2021, 6:42 AM

Sounds like SCEV should be aware of this way to parse such a select-of-constants as min-max.

We can, theoretically, teach SCEV to recognize min and max in all kinds of arcane expressions including arithmetics, zexts, xors, shifts and whatsoever. It's not going to happen in reality, off couese.

The initial code

// s1 = (n < C1) ? n : C1
// s2 = (s1 > C1 - 1) ? s1 : C1 - 1

had a clear semantics of

s1 = min(n, C1)
s2 = max(s1, C1 - 1)

These pattern could be analyzed by other passes that understand semantics of min and max, such as SCEV using passes. If it participates in other min or max expression, there is a lot of ways how we can simplify it.

This new form, despite it has fewer instructions, is completely arcane. Just looking at it, it's not so easy to understand what it actually means. And no pass will.

I'd rather suggest to do changes like this as close to codegen as possible, for example in CodeGenPrepare.

Sounds like SCEV should be aware of this way to parse such a select-of-constants as min-max.

@qiucf could you please add a test with -analyze -scalar-evolution to see if SCEV is able to recognize this as max/min? If not, please add a FIXME in this test, this is a good improvement to do.

mkazantsev resigned from this revision.Oct 7 2021, 10:48 PM

My only concern here is the recognition of this pattern by SCEV. I don't see much other problems but don't feel comfortable to approve this in InstCombine so leaving this to other reviewers' judgement.

spatel added a comment.EditedOct 22 2021, 11:49 AM

@spatel Any thoughts?

Not with the direction - as noted earlier, we're already trying this with the intrinsics, so doing it with cmp+select just makes things consistent. There are a few implementation/test questions:

  1. What logic diffs are there between this and 025bb5290379 ?
  2. Add/adjust tests based on those diffs.
  3. Use m_APInt so we get splat vectors.

@qiucf - will you continue this patch soon?

@qiucf - will you continue this patch soon?

Sure. Sorry for leaving this patch for some time. I'll take these issue these days.

qiucf updated this revision to Diff 383754.Nov 1 2021, 2:57 AM

Use m_APInt to match vector splats.

qiucf added a comment.Nov 1 2021, 3:00 AM

Not with the direction - as noted earlier, we're already trying this with the intrinsics, so doing it with cmp+select just makes things consistent. There are a few implementation/test questions:

  1. What logic diffs are there between this and 025bb5290379 ?
  2. Add/adjust tests based on those diffs.
  3. Use m_APInt so we get splat vectors.

@qiucf - will you continue this patch soon?

I applied D98152 and try. The two currently affected cases (minmax-fold.ll, icmp-dom.ll) here are still passed with it but without my patch. But for below vector case:

define <4 x i32> @twoway_clamp_gt(<4 x i32> %num) {
entry:
  %cmp1 = icmp sgt <4 x i32> %num, <i32 13767, i32 13767, i32 13767, i32 13767>
  %s1 = select <4 x i1> %cmp1, <4 x i32> %num, <4 x i32> <i32 13767, i32 13767, i32 13767, i32 13767>
  %cmp2 = icmp slt <4 x i32> %s1, <i32 13768, i32 13768, i32 13768, i32 13768>
  %r = select <4 x i1> %cmp2, <4 x i32> %s1, <4 x i32> <i32 13768, i32 13768, i32 13768, i32 13768>
  ret <4 x i32> %r
}

; Got
define <4 x i32> @twoway_clamp_gt(<4 x i32> %num) {
entry:
  %0 = call <4 x i32> @llvm.smax.v4i32(<4 x i32> %num, <4 x i32> <i32 13767, i32 13767, i32 13767, i32 13767>)
  %1 = call <4 x i32> @llvm.umin.v4i32(<4 x i32> %0, <4 x i32> <i32 13768, i32 13768, i32 13768, i32 13768>)
  ret <4 x i32> %1
}

; Not
define <4 x i32> @twoway_clamp_gt(<4 x i32> %num) {
entry:
  %0 = icmp slt <4 x i32> %num, <i32 13768, i32 13768, i32 13768, i32 13768>
  %r = select <4 x i1> %0, <4 x i32> <i32 13767, i32 13767, i32 13767, i32 13767>, <4 x i32> <i32 13768, i32 13768, i32 13768, i32 13768>
  ret <4 x i32> %r
}

Seems not covered by that?

Not with the direction - as noted earlier, we're already trying this with the intrinsics, so doing it with cmp+select just makes things consistent. There are a few implementation/test questions:

  1. What logic diffs are there between this and 025bb5290379 ?
  2. Add/adjust tests based on those diffs.
  3. Use m_APInt so we get splat vectors.

@qiucf - will you continue this patch soon?

I applied D98152 and try. The two currently affected cases (minmax-fold.ll, icmp-dom.ll) here are still passed with it but without my patch. But for below vector case:

define <4 x i32> @twoway_clamp_gt(<4 x i32> %num) {
entry:
  %cmp1 = icmp sgt <4 x i32> %num, <i32 13767, i32 13767, i32 13767, i32 13767>
  %s1 = select <4 x i1> %cmp1, <4 x i32> %num, <4 x i32> <i32 13767, i32 13767, i32 13767, i32 13767>
  %cmp2 = icmp slt <4 x i32> %s1, <i32 13768, i32 13768, i32 13768, i32 13768>
  %r = select <4 x i1> %cmp2, <4 x i32> %s1, <4 x i32> <i32 13768, i32 13768, i32 13768, i32 13768>
  ret <4 x i32> %r
}

; Got
define <4 x i32> @twoway_clamp_gt(<4 x i32> %num) {
entry:
  %0 = call <4 x i32> @llvm.smax.v4i32(<4 x i32> %num, <4 x i32> <i32 13767, i32 13767, i32 13767, i32 13767>)
  %1 = call <4 x i32> @llvm.umin.v4i32(<4 x i32> %0, <4 x i32> <i32 13768, i32 13768, i32 13768, i32 13768>)
  ret <4 x i32> %1
}

; Not
define <4 x i32> @twoway_clamp_gt(<4 x i32> %num) {
entry:
  %0 = icmp slt <4 x i32> %num, <i32 13768, i32 13768, i32 13768, i32 13768>
  %r = select <4 x i1> %0, <4 x i32> <i32 13767, i32 13767, i32 13767, i32 13767>, <4 x i32> <i32 13768, i32 13768, i32 13768, i32 13768>
  ret <4 x i32> %r
}

Seems not covered by that?

We may need some generalization for mismatched signedness of min/max like this:
https://alive2.llvm.org/ce/z/NV_bKr
(That should translate for all 4 combinations for min/max?)
After that, we will recognize the special case for clamp of 2 values.

But that doesn't need to hold this patch up unless I'm misunderstanding the comment.

RKSimon added inline comments.Nov 1 2021, 10:34 AM
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
3184

You've reused C1 and not referred to C2?

qiucf updated this revision to Diff 384358.Nov 3 2021, 1:29 AM
qiucf marked an inline comment as done.
Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2023, 12:00 AM
Herald added a subscriber: StephenFan. · View Herald Transcript