This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineCasts.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
zext.ll

Differential D72396

[InstCombine] fold zext of masked bit set/clear
ClosedPublic

Authored by spatel on Jan 8 2020, 6:19 AM.

Download Raw Diff

Details

Reviewers

lebedev.ri
xbolva00
nikic
kadircet

Commits

rGc7da0c383a1b: [InstCombine] fold zext of masked bit set/clear

Summary

This does not solve PR17101, but it is one of the
underlying diffs noted here:
https://bugs.llvm.org/show_bug.cgi?id=17101#c8

We could ease the one-use checks for the 'clear'
(no 'not' op) half of the transform, but I do not
know if that asymmetry would make things better
or worse.

Proofs:
https://rise4fun.com/Alive/uVB

Name: masked bit set
%sh1 = shl i32 1, %y
%and = and i32 %sh1, %x
%cmp = icmp ne i32 %and, 0
%r = zext i1 %cmp to i32
=>
%s = lshr i32 %x, %y
%r = and i32 %s, 1

Name: masked bit clear
%sh1 = shl i32 1, %y
%and = and i32 %sh1, %x
%cmp = icmp eq i32 %and, 0
%r = zext i1 %cmp to i32
=>
%xn = xor i32 %x, -1
%s = lshr i32 %xn, %y
%r = and i32 %s, 1

Note: this is a re-post of a patch that I committed at:
rGa041c4ec6f7a

I thought the change was small and obviously good, so I did not post it for pre-commit review per policy:
http://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access

The commit was reverted though:
rGb212eb7159b40c98b3c40619b82b996fb903282b

So either I am not seeing the problem with this patch, or it is uncovering a problem in another part of LLVM.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.Jan 8 2020, 6:19 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 8 2020, 6:19 AM

Herald added subscribers: hiraditya, mcrosier. · View Herald Transcript

spatel edited the summary of this revision. (Show Details)Jan 8 2020, 6:20 AM

Internally, I can see that either (1) this causes the following code to miscompile, or (2) the code is UB:
https://github.com/grpc/grpc/blob/master/src/core/ext/transport/chttp2/transport/frame_settings.cc#L48
Note that in that code the two checks on lines 57 and 64 are related if that helps.

Working on providing a small reproducible test case.

In D72396#1810034, @krasimir wrote:

Internally, I can see that either (1) this causes the following code to miscompile, or (2) the code is UB:
https://github.com/grpc/grpc/blob/master/src/core/ext/transport/chttp2/transport/frame_settings.cc#L48
Note that in that code the two checks on lines 57 and 64 are related if that helps.

What is the maximal value of count there? Is it count u< 32? (Is that code UBSan-clean?)

Working on providing a small reproducible test case.

The original code seems ubsan-clean.

Reproducer:

Running under x86_64.

steps

$ clang -lsdc++ -O3 main.cpp
$ ./a.out < input.txt

main.cpp

// main.cpp
#include <cstdint>
#include <cstdio>

int f(uint32_t* olds, const uint32_t* news, uint32_t mask, size_t count) {
  size_t i;
  uint32_t n = 0;
  for (i = 0; i < count; i++) {
    n += (news[i] != olds[i] || (mask & (1u << i)) != 0);
  }
  return n;
}

int main() {
  size_t count;
  uint32_t mask;
  scanf("%zu %u", &count, &mask);
  printf("count: %zu\n", count);
  printf("mask: %u\n", mask);
  uint32_t* olds = new uint32_t[count];
  uint32_t* news = new uint32_t[count];
  for (int i = 0; i < count; ++i) {
    scanf("%u", olds + i);
  }
  for (int i = 0; i < count; ++i) {
    scanf("%u", news + i);
  }
  printf("olds: ");
  for (int i = 0; i < count; ++i) {
    printf("%u ", olds[i]);
  }
  printf("\n");
  printf("news: ");
  for (int i = 0; i < count; ++i) {
    printf("%u ", news[i]);
  }
  printf("\n");
  printf("\n\nn:%d\n", f(olds, news, mask, count));
}

input.txt

7 8
4096 1 4294967295 65535 16384 16777216 0
4096 0 0 4194304 4194304 8192 1

Expected output (without this patch):

count: 7
mask: 8
olds: 4096 1 4294967295 65535 16384 16777216 0 
news: 4096 0 0 4194304 4194304 8192 1 


n:6

Output with this patch:

(with clang built at revision 903e5c3028d)

count: 7
mask: 8
olds: 4096 1 4294967295 65535 16384 16777216 0 
news: 4096 0 0 4194304 4194304 8192 1 


n:1

I agree that this seem to be either causing, or triggering, a miscompile.
Most standalone testcase: https://godbolt.org/z/EMWGof
On rG19ace449a3da4058428495283b3b15826f8d7d34, this compiles to:

; ModuleID = '/tmp/test.cpp'
source_filename = "/tmp/test.cpp"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: norecurse nounwind readnone uwtable
define dso_local i32 @_Z1fjjj(i32 %olds, i32 %news, i32 %mask) local_unnamed_addr #0 {
entry:
  %0 = and i32 %mask, 1
  ret i32 %0
}

attributes #0 = { norecurse nounwind readnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 10.0.0 (git@github.com:LebedevRI/llvm-project.git debb93d97f264d542b0d9ca6f2f50f0fd72c7356)"}

but with this patch reversed, it compiles to

; ModuleID = '/tmp/test.cpp'
source_filename = "/tmp/test.cpp"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: norecurse nounwind readnone uwtable
define dso_local i32 @_Z1fjjj(i32 %olds, i32 %news, i32 %mask) local_unnamed_addr #0 {
entry:
  %cmp1 = icmp ne i32 %news, %olds
  %and = and i32 %mask, 1
  %cmp110 = zext i1 %cmp1 to i32
  %conv12 = or i32 %and, %cmp110
  ret i32 %conv12
}

attributes #0 = { norecurse nounwind readnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 10.0.0 (git@github.com:LebedevRI/llvm-project.git debb93d97f264d542b0d9ca6f2f50f0fd72c7356)"}

----------------------------------------
define i32 @_Z1fjjj(i32 %olds, i32 %news, i32 %mask) {
%entry:
  %cmp1 = icmp ne i32 %news, %olds
  %and = and i32 %mask, 1
  %cmp110 = zext i1 %cmp1 to i32
  %conv12 = or i32 %and, %cmp110
  ret i32 %conv12
}
=>
define i32 @_Z1fjjj(i32 %olds, i32 %news, i32 %mask) {
%entry:
  %0 = and i32 %mask, 1
  ret i32 %0
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
i32 %olds = #x00080000 (524288)
i32 %news = #x00000000 (0)
i32 %mask = undef

Source:
i1 %cmp1 = #x1 (1)
i32 %and = #x00000000 (0)       [based on undef value]
i32 %cmp110 = #x00000001 (1)
i32 %conv12 = #x00000001 (1)

Target:
i32 %0 = #x00000000 (0)
Source value: #x00000001 (1)
Target value: #x00000000 (0)

Summary:
  0 correct transformations
  1 incorrect transformations
  0 errors

This revision now requires changes to proceed.Jan 8 2020, 9:16 AM

Thanks all for the reductions!

From what I can see so far, this patch is exposing an existing bug. We do this in instcombine independently of this patch:

$ cat sel.ll 
define i1 @poison_sel(i1 %cmp1, i1 %t) {
  %s = select i1 %cmp1, i1 %t, i1 true
  ret i1 %s
}
$ opt -instcombine sel.ll -S
define i1 @poison_sel(i1 %cmp1, i1 %t) {
  %not.cmp1 = xor i1 %cmp1, true
  %s = or i1 %not.cmp1, %t
  ret i1 %s
}

https://rise4fun.com/Alive/0uez
ERROR: Target is more poisonous than Source for i1 %s

I'm very surprised we didn't hit this sooner, but we have this set of 'select' folds in InstCombiner::visitSelectInst() that have existed for at least 10 years, and they are all poison-unsafe:

Name: fval is true
  %s = select i1 %cmp1, i1 %x, i1 true
  =>
  %not = xor i1 %cmp1, true
  %s = or i1 %not, %x

Name: fval is false
  %s = select i1 %cmp1, i1 %x, i1 false
  =>
  %s = and i1 %cmp1, %x

Name: tval is true
  %s = select i1 %cmp1, i1 true, i1 %x
  =>
  %s = or i1 %cmp1, %x

Name: tval is false
  %s = select i1 %cmp1, i1 false, i1 %x
  =>
  %not = xor i1 %cmp1, true
  %s = and i1 %not, %x

spatel mentioned this in D72412: [InstSimplify] select Cond, true, false --> Cond.Jan 8 2020, 1:37 PM

Very nice! People have been talking about those select folds being unsound for a long time, but I think this is the first time they cause a real-life miscompile...

spatel mentioned this in rGf53b38d12a7b: [InstSimplify] select Cond, true, false --> Cond.Jan 9 2020, 6:09 AM

spatel mentioned this in D75961: [InstCombine] reduce demand-limited bool math to logic .Mar 10 2020, 2:29 PM

spatel mentioned this in rGfae900921b12: [InstCombine] reduce demand-limited bool math to logic.Mar 11 2020, 1:02 PM

What is status of this patch?

In D72396#2213671, @xbolva00 wrote:

What is status of this patch?

We need to remove the unsafe select transforms and see if anything regresses. Then, add transforms to avoid those regressions. When there are no more known regressions after removing the select transforms, this patch should be good to go.

It should be ready to go?

In D72396#2696806, @xbolva00 wrote:

It should be ready to go?

I don't think we're there yet. cc @aqjune to see what is remaining.

The most basic test is still converted to bitwise logic:

define i1 @poison_sel(i1 %cmp1, i1 %t) {
  %s = select i1 %cmp1, i1 %t, i1 true
  ret i1 %s
}

The C++ reproducer from https://reviews.llvm.org/D72396#1810203 is also still miscompiled with this patch applied.

There were many plumbings done to existing transformations to recognize select instead of and/or, and then D99674 did conditional disabling of the select folding.
There hasn't been a performance regression reported, except instruction cost related issue which is immediately patched later.

This is a very hopeful news because the patch is virtually disabling the folding in most cases.
The patch blocks select a, b, false -> and a, b (and similarly for or) if b contains op that creates poison, e.g. icmp (add nsw), x.

In the patch I promised to fully remove the select folding if there is no performanace regression for 2 weeks, and now the duration has passed, so maybe it is time to remove it.
In a day or two I'll have a time to make a patch for its full removal.

In D72396#2697589, @aqjune wrote:

In a day or two I'll have a time to make a patch for its full removal.

Great - thank you for all of the patches!

aqjune mentioned this in D101191: [InstCombine] Fully disable select to and/or i1 folding.Apr 23 2021, 12:00 PM

aqjune mentioned this in rG8a156d1c2795: [InstCombine] Fully disable select to and/or i1 folding.May 5 2021, 5:30 PM

The select fix landed last week (and hasn't been reverted yet -- yay), so it's probably possible to pick this back up now.

In D72396#2748629, @nikic wrote:

The select fix landed last week (and hasn't been reverted yet -- yay), so it's probably possible to pick this back up now.

Thanks for the reminder...right on cue, it looks like we just got a regression report. :)
So I'll hold off a bit to see if we can fix that up first.

I think it is safe to try this again. It's been about 2 weeks since D101191 landed, and the C++ repro program produces the expected output now.
The patch still applies cleanly, so no updates needed.
Can someone re-review / mark this as approved? Thanks!

Ping.

I implicitly stopped blocking this the moment the fixes landed, but sure, i can say that out loud: let's try again :)
Thanks.

This revision is now accepted and ready to land.May 28 2021, 9:10 AM

Closed by commit rGc7da0c383a1b: [InstCombine] fold zext of masked bit set/clear (authored by spatel). · Explain WhyMay 29 2021, 6:12 AM

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rGc7da0c383a1b: [InstCombine] fold zext of masked bit set/clear.

Carrot mentioned this in D104567: [InstCombine] Don't transform code if DoTransform is false.Jun 18 2021, 12:53 PM

Carrot mentioned this in rG575ba6f42560: [InstCombine] Don't transform code if DoTransform is false.Jun 18 2021, 6:03 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineCasts.cpp

20 lines

test/