This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] fold zext of masked bit set/clear
ClosedPublic

Authored by spatel on Jan 8 2020, 6:19 AM.

Details

Summary

This does not solve PR17101, but it is one of the
underlying diffs noted here:
https://bugs.llvm.org/show_bug.cgi?id=17101#c8

We could ease the one-use checks for the 'clear'
(no 'not' op) half of the transform, but I do not
know if that asymmetry would make things better
or worse.

Proofs:
https://rise4fun.com/Alive/uVB

Name: masked bit set
%sh1 = shl i32 1, %y
%and = and i32 %sh1, %x
%cmp = icmp ne i32 %and, 0
%r = zext i1 %cmp to i32
=>
%s = lshr i32 %x, %y
%r = and i32 %s, 1

Name: masked bit clear
%sh1 = shl i32 1, %y
%and = and i32 %sh1, %x
%cmp = icmp eq i32 %and, 0
%r = zext i1 %cmp to i32
=>
%xn = xor i32 %x, -1
%s = lshr i32 %xn, %y
%r = and i32 %s, 1

Note: this is a re-post of a patch that I committed at:
rGa041c4ec6f7a

I thought the change was small and obviously good, so I did not post it for pre-commit review per policy:
http://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access

The commit was reverted though:
rGb212eb7159b40c98b3c40619b82b996fb903282b

So either I am not seeing the problem with this patch, or it is uncovering a problem in another part of LLVM.

Diff Detail

Event Timeline

spatel created this revision.Jan 8 2020, 6:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 8 2020, 6:19 AM
spatel edited the summary of this revision. (Show Details)Jan 8 2020, 6:20 AM

Internally, I can see that either (1) this causes the following code to miscompile, or (2) the code is UB:
https://github.com/grpc/grpc/blob/master/src/core/ext/transport/chttp2/transport/frame_settings.cc#L48
Note that in that code the two checks on lines 57 and 64 are related if that helps.

Working on providing a small reproducible test case.

Internally, I can see that either (1) this causes the following code to miscompile, or (2) the code is UB:
https://github.com/grpc/grpc/blob/master/src/core/ext/transport/chttp2/transport/frame_settings.cc#L48
Note that in that code the two checks on lines 57 and 64 are related if that helps.

What is the maximal value of count there? Is it count u< 32? (Is that code UBSan-clean?)

Working on providing a small reproducible test case.

krasimir added a comment.EditedJan 8 2020, 8:30 AM

The original code seems ubsan-clean.

Reproducer:

Running under x86_64.

steps

$ clang -lsdc++ -O3 main.cpp
$ ./a.out < input.txt

main.cpp

// main.cpp
#include <cstdint>
#include <cstdio>

int f(uint32_t* olds, const uint32_t* news, uint32_t mask, size_t count) {
  size_t i;
  uint32_t n = 0;
  for (i = 0; i < count; i++) {
    n += (news[i] != olds[i] || (mask & (1u << i)) != 0);
  }
  return n;
}

int main() {
  size_t count;
  uint32_t mask;
  scanf("%zu %u", &count, &mask);
  printf("count: %zu\n", count);
  printf("mask: %u\n", mask);
  uint32_t* olds = new uint32_t[count];
  uint32_t* news = new uint32_t[count];
  for (int i = 0; i < count; ++i) {
    scanf("%u", olds + i);
  }
  for (int i = 0; i < count; ++i) {
    scanf("%u", news + i);
  }
  printf("olds: ");
  for (int i = 0; i < count; ++i) {
    printf("%u ", olds[i]);
  }
  printf("\n");
  printf("news: ");
  for (int i = 0; i < count; ++i) {
    printf("%u ", news[i]);
  }
  printf("\n");
  printf("\n\nn:%d\n", f(olds, news, mask, count));
}

input.txt

7 8
4096 1 4294967295 65535 16384 16777216 0
4096 0 0 4194304 4194304 8192 1

Expected output (without this patch):

count: 7
mask: 8
olds: 4096 1 4294967295 65535 16384 16777216 0 
news: 4096 0 0 4194304 4194304 8192 1 


n:6

Output with this patch:

(with clang built at revision 903e5c3028d)

count: 7
mask: 8
olds: 4096 1 4294967295 65535 16384 16777216 0 
news: 4096 0 0 4194304 4194304 8192 1 


n:1
lebedev.ri requested changes to this revision.Jan 8 2020, 9:16 AM

I agree that this seem to be either causing, or triggering, a miscompile.
Most standalone testcase: https://godbolt.org/z/EMWGof
On rG19ace449a3da4058428495283b3b15826f8d7d34, this compiles to:

; ModuleID = '/tmp/test.cpp'
source_filename = "/tmp/test.cpp"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: norecurse nounwind readnone uwtable
define dso_local i32 @_Z1fjjj(i32 %olds, i32 %news, i32 %mask) local_unnamed_addr #0 {
entry:
  %0 = and i32 %mask, 1
  ret i32 %0
}

attributes #0 = { norecurse nounwind readnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 10.0.0 (git@github.com:LebedevRI/llvm-project.git debb93d97f264d542b0d9ca6f2f50f0fd72c7356)"}

but with this patch reversed, it compiles to

; ModuleID = '/tmp/test.cpp'
source_filename = "/tmp/test.cpp"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: norecurse nounwind readnone uwtable
define dso_local i32 @_Z1fjjj(i32 %olds, i32 %news, i32 %mask) local_unnamed_addr #0 {
entry:
  %cmp1 = icmp ne i32 %news, %olds
  %and = and i32 %mask, 1
  %cmp110 = zext i1 %cmp1 to i32
  %conv12 = or i32 %and, %cmp110
  ret i32 %conv12
}

attributes #0 = { norecurse nounwind readnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 10.0.0 (git@github.com:LebedevRI/llvm-project.git debb93d97f264d542b0d9ca6f2f50f0fd72c7356)"}
----------------------------------------
define i32 @_Z1fjjj(i32 %olds, i32 %news, i32 %mask) {
%entry:
  %cmp1 = icmp ne i32 %news, %olds
  %and = and i32 %mask, 1
  %cmp110 = zext i1 %cmp1 to i32
  %conv12 = or i32 %and, %cmp110
  ret i32 %conv12
}
=>
define i32 @_Z1fjjj(i32 %olds, i32 %news, i32 %mask) {
%entry:
  %0 = and i32 %mask, 1
  ret i32 %0
}
Transformation doesn't verify!
ERROR: Value mismatch

Example:
i32 %olds = #x00080000 (524288)
i32 %news = #x00000000 (0)
i32 %mask = undef

Source:
i1 %cmp1 = #x1 (1)
i32 %and = #x00000000 (0)       [based on undef value]
i32 %cmp110 = #x00000001 (1)
i32 %conv12 = #x00000001 (1)

Target:
i32 %0 = #x00000000 (0)
Source value: #x00000001 (1)
Target value: #x00000000 (0)

Summary:
  0 correct transformations
  1 incorrect transformations
  0 errors
This revision now requires changes to proceed.Jan 8 2020, 9:16 AM

Thanks all for the reductions!

From what I can see so far, this patch is exposing an existing bug. We do this in instcombine independently of this patch:

$ cat sel.ll 
define i1 @poison_sel(i1 %cmp1, i1 %t) {
  %s = select i1 %cmp1, i1 %t, i1 true
  ret i1 %s
}
$ opt -instcombine sel.ll -S
define i1 @poison_sel(i1 %cmp1, i1 %t) {
  %not.cmp1 = xor i1 %cmp1, true
  %s = or i1 %not.cmp1, %t
  ret i1 %s
}

https://rise4fun.com/Alive/0uez
ERROR: Target is more poisonous than Source for i1 %s

I'm very surprised we didn't hit this sooner, but we have this set of 'select' folds in InstCombiner::visitSelectInst() that have existed for at least 10 years, and they are all poison-unsafe:

Name: fval is true
  %s = select i1 %cmp1, i1 %x, i1 true
  =>
  %not = xor i1 %cmp1, true
  %s = or i1 %not, %x

Name: fval is false
  %s = select i1 %cmp1, i1 %x, i1 false
  =>
  %s = and i1 %cmp1, %x

Name: tval is true
  %s = select i1 %cmp1, i1 true, i1 %x
  =>
  %s = or i1 %cmp1, %x

Name: tval is false
  %s = select i1 %cmp1, i1 false, i1 %x
  =>
  %not = xor i1 %cmp1, true
  %s = and i1 %not, %x
nikic added a comment.Jan 8 2020, 3:14 PM

Very nice! People have been talking about those select folds being unsound for a long time, but I think this is the first time they cause a real-life miscompile...

What is status of this patch?

What is status of this patch?

We need to remove the unsafe select transforms and see if anything regresses. Then, add transforms to avoid those regressions. When there are no more known regressions after removing the select transforms, this patch should be good to go.

It should be ready to go?

spatel added a subscriber: aqjune.Apr 18 2021, 6:22 AM

It should be ready to go?

I don't think we're there yet. cc @aqjune to see what is remaining.

The most basic test is still converted to bitwise logic:

define i1 @poison_sel(i1 %cmp1, i1 %t) {
  %s = select i1 %cmp1, i1 %t, i1 true
  ret i1 %s
}

The C++ reproducer from https://reviews.llvm.org/D72396#1810203 is also still miscompiled with this patch applied.

There were many plumbings done to existing transformations to recognize select instead of and/or, and then D99674 did conditional disabling of the select folding.
There hasn't been a performance regression reported, except instruction cost related issue which is immediately patched later.

This is a very hopeful news because the patch is virtually disabling the folding in most cases.
The patch blocks select a, b, false -> and a, b (and similarly for or) if b contains op that creates poison, e.g. icmp (add nsw), x.

In the patch I promised to fully remove the select folding if there is no performanace regression for 2 weeks, and now the duration has passed, so maybe it is time to remove it.
In a day or two I'll have a time to make a patch for its full removal.

In a day or two I'll have a time to make a patch for its full removal.

Great - thank you for all of the patches!

The select fix landed last week (and hasn't been reverted yet -- yay), so it's probably possible to pick this back up now.

The select fix landed last week (and hasn't been reverted yet -- yay), so it's probably possible to pick this back up now.

Thanks for the reminder...right on cue, it looks like we just got a regression report. :)
So I'll hold off a bit to see if we can fix that up first.

I think it is safe to try this again. It's been about 2 weeks since D101191 landed, and the C++ repro program produces the expected output now.
The patch still applies cleanly, so no updates needed.
Can someone re-review / mark this as approved? Thanks!

lebedev.ri accepted this revision.May 28 2021, 9:10 AM

I implicitly stopped blocking this the moment the fixes landed, but sure, i can say that out loud: let's try again :)
Thanks.

This revision is now accepted and ready to land.May 28 2021, 9:10 AM
This revision was automatically updated to reflect the committed changes.