Page MenuHomePhabricator

[Intrinsic] Give "is.constant" the "convergent" attribute
ClosedPublic

Authored by void on Mar 7 2020, 12:34 AM.

Details

Summary

Code frequently relies upon the results of "is.constant" intrinsics to
DCE invalid code paths. We don't want the intrinsic to be made control-
dependent on any additional values. For instance, we can't split a PHI
into a "constant" and "non-constant" part via jump threading in order
to "optimize" the constant part, because the "is.constant" intrinsic is
meant to return "false".

Diff Detail

Event Timeline

void created this revision.Mar 7 2020, 12:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2020, 12:34 AM
xbolva00 added inline comments.
llvm/lib/Transforms/Scalar/JumpThreading.cpp
2012 ↗(On Diff #248906)

You can use m_Intrinsic matcher from llvm pattern match

void updated this revision to Diff 248920.Mar 7 2020, 3:03 AM

Use pattern matching for intrinsic instruction matching.

void marked an inline comment as done.Mar 7 2020, 3:03 AM
void updated this revision to Diff 248921.Mar 7 2020, 3:17 AM

Should be *not* matching.

lebedev.ri retitled this revision from Don't create PHI nodes with "is.constant" values to [JumpThreading] Don't create PHI nodes with "is.constant" values.Mar 7 2020, 3:36 AM
nikic added a subscriber: nikic.Mar 7 2020, 4:15 AM

This looks somewhat dubious to me. This seems like an InstCombine-style fold that is performed in JumpThreading -- but why? Won't it just get folded either earlier or later? As the test case is over-reduced, it's hard to tell where this would make a difference in practice.

void added a comment.Mar 7 2020, 4:31 PM

This looks somewhat dubious to me. This seems like an InstCombine-style fold that is performed in JumpThreading -- but why? Won't it just get folded either earlier or later? As the test case is over-reduced, it's hard to tell where this would make a difference in practice.

No. In the original testcase (from net/ipv4/tcp.c in Linux with ASAN enabled), there are two calls to copy_to_user and copy_from_user near where the testcase is located. They each call check_copy_size:

static inline __attribute__((unused)) __attribute__((no_instrument_function))
__attribute__((always_inline)) bool
check_copy_size(const void *addr, size_t bytes, bool is_source)
{
        int sz = __builtin_object_size(addr, 0);
        if (__builtin_expect(!!(sz >= 0 && sz < bytes), 0)) {
                if (!__builtin_constant_p(bytes))
                        copy_overflow(sz, bytes);
                else if (is_source)
                        __bad_copy_from();
                else
                        __bad_copy_to();
                return false;
        }
        check_object_size(addr, bytes, is_source);
        return true;
}

static inline __attribute__((unused)) __attribute__((no_instrument_function))
__attribute__((always_inline)) unsigned long __attribute__((warn_unused_result))
copy_from_user(void *to, const void *from, unsigned long n)
{
        if (__builtin_expect(!!(check_copy_size(to, n, false)), 1))
                n = _copy_from_user(to, from, n);
        return n;
}

static inline __attribute__((unused)) __attribute__((no_instrument_function))
__attribute__((always_inline)) unsigned long __attribute__((warn_unused_result))
copy_to_user(void *to, const void *from, unsigned long n)
{
        if (__builtin_expect(!!(check_copy_size(from, n, true)), 1))
                n = _copy_to_user(to, from, n);
        return n;
}

The __bad_copy_* functions don't actually exist, so we get a linking error if those code paths aren't removed. The GVN patch combines the llvm.is.constant() check that's used by both of the copy_*_user functions. It *should* resolve to false, which would cause it to execute copy_overflow and DCE the paths with the __bad_* calls. All of that seems okay and while I'm not particularly sure that GVN should care about is.constant calls, it's probably innocuous.

So once we get to jump threading, there's a call to llvm.is.constant that's the conditional of two br instructions (one for each copy_*_user call). While it's threading the blocks in this area. See %5 below:

define i32 @do_tcp_getsockopt(%struct.sock* %sk, i32 %level, i32 %optname, i8* %optval, i32* %optlen) {
...
if.end18:                                         ; preds = %if.then11, %if.end7
  %len.0 = phi i32 [ 24, %if.then11 ], [ %conv, %if.end7 ]
  %conv19 = sext i32 %len.0 to i64
  %cmp3.i.i70 = icmp ugt i32 %len.0, 24
  %5 = call i1 @llvm.is.constant.i64(i64 %conv19) #4
  br i1 %cmp3.i.i70, label %if.then.i.i71, label %if.end12.i.i74

if.then.i.i71:                                    ; preds = %if.end18
  br i1 %5, label %if.else.i.i73, label %if.then7.i.i72

if.then7.i.i72:                                   ; preds = %if.then.i.i71
  call void (i8*, ...) @__warn_printk(i8* getelementptr inbounds ([38 x i8], [38 x i8]* @.str, i64 0, i64 0), i32 24, i64 %conv19) #9
  call void asm sideeffect "...", "i,i,i,i,~{dirflag},~{fpsr},~{flags}"(i8* getelementptr inbounds ([30 x i8], [30 x i8]* @.str.1, i64 0, i64 0), i32 127, i32 2305, i64 12) #4
  br label %copy_from_user.exit

if.else.i.i73:                                    ; preds = %if.then.i.i71
  call void @__bad_copy_to() #9
  br label %copy_from_user.exit
...
if.then.i.i:                                      ; preds = %zerocopy_rcv_out
  br i1 %5, label %if.else.i.i, label %if.then7.i.i

if.then7.i.i:                                     ; preds = %if.then.i.i
  call void (i8*, ...) @__warn_printk(i8* getelementptr inbounds ([38 x i8], [38 x i8]* @.str, i64 0, i64 0), i32 24, i64 %conv19) #9
  call void asm sideeffect "...", "i,i,i,i,~{dirflag},~{fpsr},~{flags}"(i8* getelementptr inbounds ([30 x i8], [30 x i8]* @.str.1, i64 0, i64 0), i32 127, i32 2305, i64 12) #4
  br label %copy_to_user.exit

if.else.i.i:                                      ; preds = %if.then.i.i
  call void @__bad_copy_from() #9
  br label %copy_to_user.exit

This is converted to this code after jump threading. See %5 and %8:

define i32 @do_tcp_getsockopt(%struct.sock* %sk, i32 %level, i32 %optname, i8* %optval, i32* %optlen) {
...
if.end18:                                         ; preds = %if.then11, %if.end7
  %len.0 = phi i32 [ 24, %if.then11 ], [ %conv, %if.end7 ]
  %conv19 = sext i32 %len.0 to i64
  %cmp3.i.i70 = icmp ugt i32 %len.0, 24
  %5 = call i1 @llvm.is.constant.i64(i64 %conv19) #4
  br i1 %cmp3.i.i70, label %if.then.i.i71, label %if.end12.i.i74

if.then.i.i71:                                    ; preds = %if.end18
  br i1 %5, label %if.else.i.i73, label %if.then7.i.i72

if.then7.i.i72:                                   ; preds = %if.then.i.i71
  call void (i8*, ...) @__warn_printk(i8* getelementptr inbounds ([38 x i8], [38 x i8]* @.str, i64 0, i64 0), i32 24, i64 %conv19) #9
  call void asm sideeffect "...", "i,i,i,i,~{dirflag},~{fpsr},~{flags}"(i8* getelementptr inbounds ([30 x i8], [30 x i8]* @.str.1, i64 0, i64 0), i32 127, i32 2305, i64 12) #4
  br label %copy_from_user.exit

if.end12.i.i74:                                   ; preds = %if.end18
  %6 = phi i1 [ %5, %if.end18 ]
  %cmp3.i.i7011 = phi i1 [ %cmp3.i.i70, %if.end18 ]
  %conv196 = phi i64 [ %conv19, %if.end18 ]
  %len.05 = phi i32 [ %len.0, %if.end18 ]
  br i1 %6, label %if.then.i77, label %if.then.i.i.i75

if.then.i.i.i75:                                  ; preds = %if.end12.i.i74
  call void @__check_object_size(i8* nonnull %0, i64 %conv196, i1 zeroext false)
  br label %if.then.i77

if.then.i77:                                      ; preds = %if.then11, %if.then.i.i.i75, %if.end12.i.i74
  %len.0516 = phi i32 [ %len.05, %if.then.i.i.i75 ], [ %len.05, %if.end12.i.i74 ], [ 24, %if.then11 ]
  %cmp3.i.i701115 = phi i1 [ %cmp3.i.i7011, %if.then.i.i.i75 ], [ %cmp3.i.i7011, %if.end12.i.i74 ], [ false, %if.then11 ]
  %7 = phi i1 [ %6, %if.then.i.i.i75 ], [ %6, %if.end12.i.i74 ], [ true, %if.then11 ]
  %conv197 = phi i64 [ %conv196, %if.then.i.i.i75 ], [ %conv196, %if.end12.i.i74 ], [ 24, %if.then11 ]
  %call2.i76 = call i64 @_copy_from_user(i8* nonnull %0, i8* %optval, i64 %conv197)
  br label %copy_from_user.exit

copy_from_user.exit:                              ; preds = %if.then.i77, %if.else.i.i73, %if.then7.i.i72
  %8 = phi i1 [ %7, %if.then.i77 ], [ %5, %if.then7.i.i72 ], [ %5, %if.else.i.i73 ]
...
if.then.i.i:                                      ; preds = %zerocopy_rcv_out
  br i1 %8, label %if.else.i.i, label %if.then7.i.i

if.then7.i.i:                                     ; preds = %if.then.i.i
  call void (i8*, ...) @__warn_printk(i8* getelementptr inbounds ([38 x i8], [38 x i8]* @.str, i64 0, i64 0), i32 24, i64 %conv198)
  call void asm sideeffect "...", "i,i,i,i,~{dirflag},~{fpsr},~{flags}"(i8* getelementptr inbounds ([30 x i8], [30 x i8]* @.str.1, i64 0, i64 0), i32 127, i32 2305, i64 12) #2
  br label %copy_to_user.exit

if.else.i.i:                                      ; preds = %if.then.i.i
  call void @__bad_copy_from()
  br label %copy_to_user.exit

So it's unnecessarily complicated and the __bad_copy_from isn't being DCE'd as it should be.

Doesn't fixing that instead sounds like more general fix rather than this?

void added a comment.Mar 8 2020, 12:21 AM

Doesn't fixing that instead sounds like more general fix rather than this?

What do you mean by "that"? This *does* fix the issue. Unless you can see another issue that I can't.

Doesn't fixing that instead sounds like more general fix rather than this?

What do you mean by "that"? This *does* fix the issue. Unless you can see another issue that I can't.

Previous comment implies there are DCE failures being exposed by this:

and the __bad_copy_from isn't being DCE'd as it should be.

I'm wondering why we should be papering over those DCE deficiencies
by trying to avoid performing folds that end up exposing them
instead of fixing DCE deficiencies.

nikic added a comment.Mar 8 2020, 1:37 AM

@void From your example, I still don't really get what the problem is / why this has to be done in JumpThreading in particular. The %6 phi that gets inserted here looks harmless to me -- the phi is trivial and will get eliminated by any simplification pass later in the pipeline (though I'm not entirely sure why a single-arg phi gets created in the first place).

What you seem to be really doing here is to move the is.constant lowering from the LowerConstantIntrinsics pass into the JumpThreading pass. It sounds to me like some kind of phase ordering issue where LowerConstantIntrinsics runs too late for the necessary DCE to be performed. Is that right? What does the IR look like at the point where the lowering is (currently) performed?

void added a comment.EditedMar 8 2020, 6:14 PM

@void From your example, I still don't really get what the problem is / why this has to be done in JumpThreading in particular. The %6 phi that gets inserted here looks harmless to me -- the phi is trivial and will get eliminated by any simplification pass later in the pipeline (though I'm not entirely sure why a single-arg phi gets created in the first place).

What you seem to be really doing here is to move the is.constant lowering from the LowerConstantIntrinsics pass into the JumpThreading pass. It sounds to me like some kind of phase ordering issue where LowerConstantIntrinsics runs too late for the necessary DCE to be performed. Is that right? What does the IR look like at the point where the lowering is (currently) performed?

[@lebedev.ri this is in response to your question as well.]

Here's the relevant IR right before lower constant intrinsics (in the code snippets, look for calls to __bad_copy_*, those are the ones we want to get rid of):

if.then607:                                       ; preds = %if.end603
  %176 = call i32 asm sideeffect "call __put_user_4", "={ax},0,{cx},~{ebx},~{dirflag},~{fpsr},~{flags}"(i32 24, i32* %optlen) #9, !dbg !24792, !srcloc !24793
  %tobool613 = icmp eq i32 %176, 0, !dbg !24794
  br i1 %tobool613, label %if.then.i188, label %cleanup644, !dbg !24795, !prof !15173, !misexpect !14134

if.end616:                                        ; preds = %if.end603
  %sext220 = shl i64 %asmresult590, 32, !dbg !24796                                                                                    
  %conv617 = ashr exact i64 %sext220, 32, !dbg !24796                                                                                  
  %cmp3.i.i181 = icmp ugt i32 %conv592, 24, !dbg !24802
  %177 = call i1 @llvm.is.constant.i64(i64 %conv617) #9, !dbg !24800                                                                   
  br i1 %cmp3.i.i181, label %if.then.i.i182, label %if.end12.i.i185, !dbg !24803, !prof !14133, !misexpect !14134                      

if.then.i.i182:                                   ; preds = %if.end616
  br i1 %177, label %if.else.i.i184, label %if.then7.i.i183, !dbg !24804, !prof !15547                                                 
  
if.then7.i.i183:                                  ; preds = %if.then.i.i182
  call void (i8*, ...) @__warn_printk(i8* getelementptr inbounds ([38 x i8], [38 x i8]* @.str.16, i64 0, i64 0), i32 24, i64 %conv617) #11, !dbg !24808
  call void asm sideeffect "...", "i,i,i,i,~{dirflag},~{fpsr},~{flags}"(i8* getelementptr inbounds ([30 x i8], [30 x i8]* @.str.17, i64 0, i64 0), i32 127, i32 2305, i64 12) #9, !dbg !24809, !srcloc !24342
  br label %copy_from_user.exit, !dbg !24810
  
if.else.i.i184:                                   ; preds = %if.then.i.i182
  call void @__bad_copy_to() #11, !dbg !24811
  br label %copy_from_user.exit
  
if.end12.i.i185:                                  ; preds = %if.end616                                                                 
  br i1 %177, label %if.then.i188, label %if.then.i.i.i186, !dbg !24814
  
if.then.i.i.i186:                                 ; preds = %if.end12.i.i185                                                           
  call void @__check_object_size(i8* nonnull %172, i64 %conv617, i1 zeroext false) #11, !dbg !24815                                    
  br label %if.then.i188, !dbg !24815
    
if.then.i188:                                     ; preds = %if.then607, %if.then.i.i.i186, %if.end12.i.i185                           
  %len.0208219 = phi i32 [ %conv592, %if.then.i.i.i186 ], [ %conv592, %if.end12.i.i185 ], [ 24, %if.then607 ]                          
  %conv617210218 = phi i64 [ %conv617, %if.then.i.i.i186 ], [ %conv617, %if.end12.i.i185 ], [ 24, %if.then607 ]                        
  %178 = phi i1 [ false, %if.then.i.i.i186 ], [ true, %if.end12.i.i185 ], [ true, %if.then607 ]                                        
  %call2.i187 = call i64 @_copy_from_user(i8* nonnull %172, i8* %optval, i64 %conv617210218) #11, !dbg !24816                          
  br label %copy_from_user.exit, !dbg !24817                                                                                           

copy_from_user.exit:                              ; preds = %if.then7.i.i183, %if.else.i.i184, %if.then.i188
  %179 = phi i1 [ %178, %if.then.i188 ], [ false, %if.then7.i.i183 ], [ true, %if.else.i.i184 ]
  %cmp3.i.i181212 = phi i1 [ false, %if.then.i188 ], [ true, %if.then7.i.i183 ], [ true, %if.else.i.i184 ]                             
  %conv617209 = phi i64 [ %conv617210218, %if.then.i188 ], [ %conv617, %if.then7.i.i183 ], [ %conv617, %if.else.i.i184 ]               
  %len.0207 = phi i32 [ %len.0208219, %if.then.i188 ], [ %conv592, %if.then7.i.i183 ], [ %conv592, %if.else.i.i184 ]                   
  %n.addr.0.i189 = phi i64 [ %call2.i187, %if.then.i188 ], [ %conv617, %if.then7.i.i183 ], [ %conv617, %if.else.i.i184 ]               
  %tobool619 = icmp eq i64 %n.addr.0.i189, 0, !dbg !24818
  br i1 %tobool619, label %if.end621, label %cleanup644, !dbg !24819

if.end621:                                        ; preds = %copy_from_user.exit
  call void @lock_sock_nested(%struct.sock* %sk, i32 0) #11, !dbg !24822                                                               
  %call622 = call fastcc i32 @tcp_zerocopy_receive(%struct.sock* %sk, %struct.tcp_zerocopy_receive* nonnull %zc) #10, !dbg !24823
  %sk_err.i = getelementptr inbounds %struct.sock, %struct.sock* %sk, i64 0, i32 51, !dbg !24833
  %180 = load i32, i32* %sk_err.i, align 8, !dbg !24833
  %tobool.i190 = icmp eq i32 %180, 0, !dbg !24833
  br i1 %tobool.i190, label %sock_error.exit, label %if.end.i, !dbg !24835, !prof !15173, !misexpect !15389

if.end.i:                                         ; preds = %if.then632
  %181 = call i32 asm sideeffect "xchgl $0, $1\0A", "=r,=*m,0,*m,~{memory},~{cc},~{dirflag},~{fpsr},~{flags}"(i32* %sk_err.i, i32 0, i32* %sk_err.i) #9, !dbg !24837, !srcloc !15261
  %sub.i = sub i32 0, %181, !dbg !24838
  br label %sock_error.exit, !dbg !24839

sock_error.exit:                                  ; preds = %if.then632, %if.end.i
  %retval.0.i = phi i32 [ %sub.i, %if.end.i ], [ 0, %if.then632 ], !dbg !24831
  %err634 = getelementptr inbounds %struct.tcp_zerocopy_receive, %struct.tcp_zerocopy_receive* %zc, i64 0, i32 4, !dbg !24840
  store i32 %retval.0.i, i32* %err634, align 4, !dbg !24841
  br label %zerocopy_rcv_inq, !dbg !24842

zerocopy_rcv_inq:                                 ; preds = %if.end621, %zerocopy_rcv_sk_err, %sock_error.exit
  %182 = getelementptr inbounds %struct.sock, %struct.sock* %sk, i64 2, i32 0, i32 12, i32 0, i32 0, i64 3, !dbg !24846
  %183 = load volatile i32, i32* %182, align 4, !dbg !24849
  %184 = getelementptr inbounds %struct.sock, %struct.sock* %sk, i64 2, i32 0, i32 12, i32 0, i32 0, i64 2, !dbg !24851
  %185 = load volatile i32, i32* %184, align 4, !dbg !24854
  %sub.i191 = sub i32 %185, %183, !dbg !24856
  %cmp.i192 = icmp slt i32 %sub.i191, 0, !dbg !24857
  br i1 %cmp.i192, label %if.then.i193, label %lor.rhs.i, !dbg !24857, !prof !14133
  
lor.rhs.i:                                        ; preds = %zerocopy_rcv_inq
  %186 = load volatile i32, i32* %182, align 4, !dbg !24860
  %cmp20.i = icmp eq i32 %183, %186, !dbg !24857
  br i1 %cmp20.i, label %if.end.i194, label %if.then.i193, !dbg !24862, !prof !15173, !misexpect !14134

if.then.i193:                                     ; preds = %lor.rhs.i, %zerocopy_rcv_inq
  call void @lock_sock_nested(%struct.sock* %sk, i32 0) #11, !dbg !24865
  %187 = load i32, i32* %184, align 8, !dbg !24866
  %188 = load i32, i32* %182, align 4, !dbg !24867
  %sub24.i = sub i32 %187, %188, !dbg !24868
  call void @release_sock(%struct.sock* %sk) #11, !dbg !24869
  br label %if.end.i194, !dbg !24870

if.end.i194:                                      ; preds = %if.then.i193, %lor.rhs.i
  %inq.0.i = phi i32 [ %sub24.i, %if.then.i193 ], [ %sub.i191, %lor.rhs.i ], !dbg !24844
  %cmp25.i = icmp eq i32 %inq.0.i, 0, !dbg !24871
  br i1 %cmp25.i, label %land.lhs.true.i, label %tcp_inq_hint.exit, !dbg !24872

land.lhs.true.i:                                  ; preds = %if.end.i194
  %skc_flags.i.i = getelementptr inbounds %struct.sock, %struct.sock* %sk, i64 0, i32 0, i32 15, i32 0, !dbg !24875
  %189 = load volatile i64, i64* %skc_flags.i.i, align 8, !dbg !24878
  %and1.i.i.i = lshr i64 %189, 1, !dbg !24879
  %190 = trunc i64 %and1.i.i.i to i32, !dbg !24879
  %191 = and i32 %190, 1, !dbg !24879
  br label %tcp_inq_hint.exit, !dbg !24879

tcp_inq_hint.exit:                                ; preds = %if.end.i194, %land.lhs.true.i
  %call636195 = phi i32 [ %191, %land.lhs.true.i ], [ %inq.0.i, %if.end.i194 ]
  %inq = getelementptr inbounds %struct.tcp_zerocopy_receive, %struct.tcp_zerocopy_receive* %zc, i64 0, i32 3, !dbg !24880
  store i32 %call636195, i32* %inq, align 8, !dbg !24881
  br label %zerocopy_rcv_out, !dbg !24882

zerocopy_rcv_out:                                 ; preds = %if.end621, %tcp_inq_hint.exit
  call void @llvm.dbg.label(metadata !24079), !dbg !24883
  %tobool637 = icmp eq i32 %call622, 0, !dbg !24884
  br i1 %tobool637, label %land.lhs.true638, label %cleanup644, !dbg !24886

land.lhs.true638:                                 ; preds = %zerocopy_rcv_out
  br i1 %cmp3.i.i181212, label %if.then.i.i39, label %if.end12.i.i42, !dbg !24891, !prof !14133, !misexpect !14134

if.then.i.i39:                                    ; preds = %land.lhs.true638
  br i1 %179, label %if.else.i.i41, label %if.then7.i.i40, !dbg !24892, !prof !15547

if.then7.i.i40:                                   ; preds = %if.then.i.i39
  call void (i8*, ...) @__warn_printk(i8* getelementptr inbounds ([38 x i8], [38 x i8]* @.str.16, i64 0, i64 0), i32 24, i64 %conv617209) #11, !dbg !24896
  call void asm sideeffect "...", "i,i,i,i,~{dirflag},~{fpsr},~{flags}"(i8* getelementptr inbounds ([30 x i8], [30 x i8]* @.str.17, i64 0, i64 0), i32 127, i32 2305, i64 12) #9, !dbg !24897, !srcloc !24342
  br label %copy_to_user.exit47, !dbg !24898

if.else.i.i41:                                    ; preds = %if.then.i.i39
  call void @__bad_copy_from() #11, !dbg !24899
  br label %copy_to_user.exit47, !dbg !24899

The code paths in question are:

- %177 (llvm.is.constant) determines if __bad_copy_to() is executed. It should resolve to "false", so __bad_copy_to() is DCE'd.

- %179 (PHI node) determines if __bad_copy_from() is executed.
  - %179 is a PHI of %178 (from %if.then.i188), false (from %if.then7.i.i183), and true (from %if.else.i.i184)
    - %178 is also a PHI of false (from %if.then.i.i.i186), true (from %if.end12.i.i185), and true (from %if.then607)

Because %179 doesn't resolve to false, it keeps it around. So the question is "why are some of these values true?"

This is after jump threading:

if.end616:                                        ; preds = %if.end603                                                                 
  %len.0 = phi i32 [ %conv592, %if.end603 ], !dbg !25087
  %conv617 = sext i32 %len.0 to i64, !dbg !25088 
  %cmp3.i.i181 = icmp ugt i32 %len.0, 24, !dbg !25094 
  %177 = call i1 @llvm.is.constant.i64(i64 %conv617) #14, !dbg !25092                                                                  
  br i1 %cmp3.i.i181, label %if.then.i.i182, label %if.end12.i.i185, !dbg !25095, !prof !14149, !misexpect !14150                      
  
if.then.i.i182:                                   ; preds = %if.end616                                                                 
  br i1 %177, label %if.else.i.i184, label %if.then7.i.i183, !dbg !25096, !prof !15638                                                 
  
if.then7.i.i183:                                  ; preds = %if.then.i.i182                                                            
  call void (i8*, ...) @__warn_printk(i8* getelementptr inbounds ([38 x i8], [38 x i8]* @.str.16, i64 0, i64 0), i32 24, i64 %conv617) #16, !dbg !25100
  call void asm sideeffect "...", "i,i,i,i,~{dirflag},~{fpsr},~{flags}"(i8* getelementptr inbounds ([30 x i8], [30 x i8]* @.str.17, i64 0, i64 0), i32 127, i32 2305, i64 12) #14, !dbg !25101, !srcloc !24631
  br label %copy_from_user.exit, !dbg !25102

if.else.i.i184:                                   ; preds = %if.then.i.i182
  call void @__bad_copy_to() #16, !dbg !25103
  br label %copy_from_user.exit
                                                                                                                                       
if.end12.i.i185:                                  ; preds = %if.end616
  %178 = phi i1 [ %177, %if.end616 ]
  %cmp3.i.i181213 = phi i1 [ %cmp3.i.i181, %if.end616 ]                                                                                
  %conv617210 = phi i64 [ %conv617, %if.end616 ]                                                                                       
  %len.0208 = phi i32 [ %len.0, %if.end616 ]      
  br i1 %178, label %if.then.i188, label %if.then.i.i.i186, !dbg !25106                                                                

if.then.i.i.i186:                                 ; preds = %if.end12.i.i185 
  call void @__check_object_size(i8* nonnull %172, i64 %conv617210, i1 zeroext false) #16, !dbg !25107
  br label %if.then.i188, !dbg !25107
                                                                                                                                       
if.then.i188:                                     ; preds = %if.then607, %if.then.i.i.i186, %if.end12.i.i185                           
  %len.0208219 = phi i32 [ %len.0208, %if.then.i.i.i186 ], [ %len.0208, %if.end12.i.i185 ], [ 24, %if.then607 ]                        
  %conv617210218 = phi i64 [ %conv617210, %if.then.i.i.i186 ], [ %conv617210, %if.end12.i.i185 ], [ 24, %if.then607 ]                  
  %cmp3.i.i181213217 = phi i1 [ %cmp3.i.i181213, %if.then.i.i.i186 ], [ %cmp3.i.i181213, %if.end12.i.i185 ], [ false, %if.then607 ]    
  %179 = phi i1 [ %178, %if.then.i.i.i186 ], [ %178, %if.end12.i.i185 ], [ true, %if.then607 ]
  %call2.i187 = call i64 @_copy_from_user(i8* nonnull %172, i8* %optval, i64 %conv617210218) #16, !dbg !25108
  call void @llvm.dbg.value(metadata i64 %call2.i187, metadata !22165, metadata !DIExpression()) #14, !dbg !25090
  br label %copy_from_user.exit, !dbg !25109

copy_from_user.exit:                              ; preds = %if.then7.i.i183, %if.else.i.i184, %if.then.i188
  %180 = phi i1 [ %179, %if.then.i188 ], [ %177, %if.then7.i.i183 ], [ %177, %if.else.i.i184 ]
  %cmp3.i.i181212 = phi i1 [ %cmp3.i.i181213217, %if.then.i188 ], [ %cmp3.i.i181, %if.then7.i.i183 ], [ %cmp3.i.i181, %if.else.i.i184 ]
  %conv617209 = phi i64 [ %conv617210218, %if.then.i188 ], [ %conv617, %if.then7.i.i183 ], [ %conv617, %if.else.i.i184 ]
  %len.0207 = phi i32 [ %len.0208219, %if.then.i188 ], [ %len.0, %if.then7.i.i183 ], [ %len.0, %if.else.i.i184 ]
  %n.addr.0.i189 = phi i64 [ %call2.i187, %if.then.i188 ], [ %conv617, %if.then7.i.i183 ], [ %conv617, %if.else.i.i184 ]
  call void @llvm.dbg.value(metadata i64 %n.addr.0.i189, metadata !22165, metadata !DIExpression()) #14, !dbg !25090
  %tobool619 = icmp eq i64 %n.addr.0.i189, 0, !dbg !25110
  br i1 %tobool619, label %if.end621, label %cleanup644, !dbg !25111

if.end621:                                        ; preds = %copy_from_user.exit
  call void @lock_sock_nested(%struct.sock* %sk, i32 0) #16, !dbg !25114
  %call622 = call fastcc i32 @tcp_zerocopy_receive(%struct.sock* %sk, %struct.tcp_zerocopy_receive* nonnull %zc) #15, !dbg !25115
  call void @release_sock(%struct.sock* %sk) #16, !dbg !25116
  switch i32 %len.0207, label %zerocopy_rcv_out [
    i32 24, label %zerocopy_rcv_sk_err
    i32 20, label %zerocopy_rcv_inq
  ], !dbg !25117

zerocopy_rcv_sk_err:                              ; preds = %if.end621
  call void @llvm.dbg.label(metadata !24366), !dbg !25118
  %tobool631 = icmp eq i32 %call622, 0, !dbg !25119
  br i1 %tobool631, label %if.then632, label %zerocopy_rcv_inq, !dbg !25121

if.then632:                                       ; preds = %zerocopy_rcv_sk_err
  %sk_err.i = getelementptr inbounds %struct.sock, %struct.sock* %sk, i64 0, i32 51, !dbg !25124
  %181 = load i32, i32* %sk_err.i, align 8, !dbg !25124
  %tobool.i190 = icmp eq i32 %181, 0, !dbg !25124
  br i1 %tobool.i190, label %sock_error.exit, label %if.end.i, !dbg !25126, !prof !15210, !misexpect !15480

if.end.i:                                         ; preds = %if.then632
  %182 = call i32 asm sideeffect "xchgl $0, $1\0A", "=r,=*m,0,*m,~{memory},~{cc},~{dirflag},~{fpsr},~{flags}"(i32* %sk_err.i, i32 0, i32* %sk_err.i) #14, !dbg !25128, !srcloc !15298
  %sub.i = sub i32 0, %182, !dbg !25129
  br label %sock_error.exit, !dbg !25130

sock_error.exit:                                  ; preds = %if.then632, %if.end.i
  %retval.0.i = phi i32 [ %sub.i, %if.end.i ], [ 0, %if.then632 ], !dbg !25122
  %err634 = getelementptr inbounds %struct.tcp_zerocopy_receive, %struct.tcp_zerocopy_receive* %zc, i64 0, i32 4, !dbg !25131
  store i32 %retval.0.i, i32* %err634, align 4, !dbg !25132
  br label %zerocopy_rcv_inq, !dbg !25133

zerocopy_rcv_inq:                                 ; preds = %if.end621, %zerocopy_rcv_sk_err, %sock_error.exit
  %183 = getelementptr inbounds %struct.sock, %struct.sock* %sk, i64 2, i32 0, i32 12, i32 0, i32 0, i64 3, !dbg !25137
  %184 = load volatile i32, i32* %183, align 4, !dbg !25140
  %185 = getelementptr inbounds %struct.sock, %struct.sock* %sk, i64 2, i32 0, i32 12, i32 0, i32 0, i64 2, !dbg !25142
  %186 = load volatile i32, i32* %185, align 4, !dbg !25145
  %sub.i191 = sub i32 %186, %184, !dbg !25147
  %cmp.i192 = icmp slt i32 %sub.i191, 0, !dbg !25148
  br i1 %cmp.i192, label %if.then.i193, label %lor.rhs.i, !dbg !25148, !prof !14149

lor.rhs.i:                                        ; preds = %zerocopy_rcv_inq
  %187 = load volatile i32, i32* %183, align 4, !dbg !25151
  %cmp20.i = icmp eq i32 %184, %187, !dbg !25148
  br i1 %cmp20.i, label %if.end.i194, label %if.then.i193, !dbg !25153, !prof !15210, !misexpect !14150

if.then.i193:                                     ; preds = %lor.rhs.i, %zerocopy_rcv_inq
  call void @lock_sock_nested(%struct.sock* %sk, i32 0) #16, !dbg !25156
  %188 = load i32, i32* %185, align 8, !dbg !25157
  %189 = load i32, i32* %183, align 4, !dbg !25158
  %sub24.i = sub i32 %188, %189, !dbg !25159
  call void @release_sock(%struct.sock* %sk) #16, !dbg !25160
  br label %if.end.i194, !dbg !25161

if.end.i194:                                      ; preds = %if.then.i193, %lor.rhs.i
  %inq.0.i = phi i32 [ %sub24.i, %if.then.i193 ], [ %sub.i191, %lor.rhs.i ], !dbg !25135
  %cmp25.i = icmp eq i32 %inq.0.i, 0, !dbg !25162
  br i1 %cmp25.i, label %land.lhs.true.i, label %tcp_inq_hint.exit, !dbg !25163

land.lhs.true.i:                                  ; preds = %if.end.i194
  %skc_flags.i.i = getelementptr inbounds %struct.sock, %struct.sock* %sk, i64 0, i32 0, i32 15, i32 0, !dbg !25166
  %190 = load volatile i64, i64* %skc_flags.i.i, align 8, !dbg !25169
  %and1.i.i.i = lshr i64 %190, 1, !dbg !25170
  %191 = trunc i64 %and1.i.i.i to i32, !dbg !25170
  %192 = and i32 %191, 1, !dbg !25170
  br label %tcp_inq_hint.exit, !dbg !25170

tcp_inq_hint.exit:                                ; preds = %if.end.i194, %land.lhs.true.i
  %call636195 = phi i32 [ %192, %land.lhs.true.i ], [ %inq.0.i, %if.end.i194 ]
  %inq = getelementptr inbounds %struct.tcp_zerocopy_receive, %struct.tcp_zerocopy_receive* %zc, i64 0, i32 3, !dbg !25171
  store i32 %call636195, i32* %inq, align 8, !dbg !25172
  br label %zerocopy_rcv_out, !dbg !25173

zerocopy_rcv_out:                                 ; preds = %if.end621, %tcp_inq_hint.exit
  %tobool637 = icmp eq i32 %call622, 0, !dbg !25175
  br i1 %tobool637, label %land.lhs.true638, label %cleanup644, !dbg !25177

land.lhs.true638:                                 ; preds = %zerocopy_rcv_out
  br i1 %cmp3.i.i181212, label %if.then.i.i39, label %if.end12.i.i42, !dbg !25182, !prof !14149, !misexpect !14150

if.then.i.i39:                                    ; preds = %land.lhs.true638
  br i1 %180, label %if.else.i.i41, label %if.then7.i.i40, !dbg !25183, !prof !15638

if.then7.i.i40:                                   ; preds = %if.then.i.i39
  call void (i8*, ...) @__warn_printk(i8* getelementptr inbounds ([38 x i8], [38 x i8]* @.str.16, i64 0, i64 0), i32 24, i64 %conv617209) #16, !dbg !25187
  call void asm sideeffect "...", "i,i,i,i,~{dirflag},~{fpsr},~{flags}"(i8* getelementptr inbounds ([30 x i8], [30 x i8]* @.str.17, i64 0, i64 0), i32 127, i32 2305, i64 12) #14, !dbg !25188, !srcloc !24631
  br label %copy_to_user.exit47, !dbg !25189

if.else.i.i41:                                    ; preds = %if.then.i.i39
  call void @__bad_copy_from() #16, !dbg !25190
  br label %copy_to_user.exit47, !dbg !25190

Here, %177 feeds into %178, which is a PHI node. That feeds into %179, which includes it twice and another value that's true from another block. So this is the start of the badness.

It turns out that during jump threading, the if.end616 block has a constant and non-constant value coming into it and are combined in a PHI node. This is why the block is duplicated (a constant block if.end616.thread and non-constant if.end616). But this messes with the llvm.is.constant intrinsic, which looks at that value:

if.end616.thread:
  %len.0204 = phi i32 [ 24, %if.end603 ]
  %conv617205 = sext i32 %len.0204 to i64 
  %cmp3.i.i181 = icmp ugt i32 %len.0204, 24 
  %177 = call i1 @llvm.is.constant.i64(i64 %conv617205)                                                                  
  br i1 %cmp3.i.i181, label %if.then.i.i182, label %if.end12.i.i185

if.end616:                                                                 
  %len.0 = phi i32 [ %conv592, %if.end603 ]
  %conv617 = sext i32 %len.0 to i64 
  %cmp3.i.i181 = icmp ugt i32 %len.0, 24 
  %178 = call i1 @llvm.is.constant.i64(i64 %conv617)                                                                  
  br i1 %cmp3.i.i181, label %if.then.i.i182, label %if.end12.i.i185

In reality, we want the original llvm.is.constant to return false. But %177 is true while %178 is false, leading to the PHI nodes that have %177 and %178 as values to not resolve to false and therefore we don't get the DCE effect.

The patch here solves this by resolving the llvm.is.constant before we get to that stage. Perhaps it should be done earlier or the logic that allows a block to be threaded should check for a situation where a PHI's value feeds into an llvm.is.constant intrinsic?

void updated this revision to Diff 249042.Mar 9 2020, 1:36 AM

Alternative patch. This prevents threading blocks where a PHI node has a constant and is used by an is.constant intrinsic.

@void From your example, I still don't really get what the problem is / why this has to be done in JumpThreading in particular. The %6 phi that gets inserted here looks harmless to me -- the phi is trivial and will get eliminated by any simplification pass later in the pipeline (though I'm not entirely sure why a single-arg phi gets created in the first place).

What you seem to be really doing here is to move the is.constant lowering from the LowerConstantIntrinsics pass into the JumpThreading pass. It sounds to me like some kind of phase ordering issue where LowerConstantIntrinsics runs too late for the necessary DCE to be performed. Is that right? What does the IR look like at the point where the lowering is (currently) performed?

[@lebedev.ri this is in response to your question as well.]

<...>

This is after jump threading:

Are you sure that is the correct snippet? Because if it is, i'm lost completely.

It turns out that during jump threading, the if.end616 block has a constant and non-constant value coming into it and are combined in a PHI node. This is why the block is duplicated (a constant block if.end616.thread and non-constant if.end616). But this messes with the llvm.is.constant intrinsic, which looks at that value:

if.end616.thread:
  %len.0204 = phi i32 [ 24, %if.end603 ]
  %conv617205 = sext i32 %len.0204 to i64 
  %cmp3.i.i181 = icmp ugt i32 %len.0204, 24 
  %177 = call i1 @llvm.is.constant.i64(i64 %conv617205)                                                                  
  br i1 %cmp3.i.i181, label %if.then.i.i182, label %if.end12.i.i185

if.end616:                                                                 
  %len.0 = phi i32 [ %conv592, %if.end603 ]
  %conv617 = sext i32 %len.0 to i64 
  %cmp3.i.i181 = icmp ugt i32 %len.0, 24 
  %178 = call i1 @llvm.is.constant.i64(i64 %conv617)                                                                  
  br i1 %cmp3.i.i181, label %if.then.i.i182, label %if.end12.i.i185

I'm not seeing @if.end603 and @if.end616.thread, in This is after jump threading: snippet.

void added a comment.Mar 9 2020, 12:04 PM

This is after jump threading:

Are you sure that is the correct snippet? Because if it is, i'm lost completely.

Yes. I just elided some parts that aren't super relevant. I can supply a preprocessed file though.

It turns out that during jump threading, the if.end616 block has a constant and non-constant value coming into it and are combined in a PHI node. This is why the block is duplicated (a constant block if.end616.thread and non-constant if.end616). But this messes with the llvm.is.constant intrinsic, which looks at that value:

if.end616.thread:
  %len.0204 = phi i32 [ 24, %if.end603 ]
  %conv617205 = sext i32 %len.0204 to i64 
  %cmp3.i.i181 = icmp ugt i32 %len.0204, 24 
  %177 = call i1 @llvm.is.constant.i64(i64 %conv617205)                                                                  
  br i1 %cmp3.i.i181, label %if.then.i.i182, label %if.end12.i.i185

if.end616:                                                                 
  %len.0 = phi i32 [ %conv592, %if.end603 ]
  %conv617 = sext i32 %len.0 to i64 
  %cmp3.i.i181 = icmp ugt i32 %len.0, 24 
  %178 = call i1 @llvm.is.constant.i64(i64 %conv617)                                                                  
  br i1 %cmp3.i.i181, label %if.then.i.i182, label %if.end12.i.i185

I'm not seeing @if.end603 and @if.end616.thread, in This is after jump threading: snippet.

The @if.end616.thread that's relevant is in the last code block above. I mistakenly omitted it from the previous example.

void updated this revision to Diff 249185.Mar 9 2020, 12:06 PM

Reformat and test fix.

joerg added a subscriber: joerg.Mar 9 2020, 12:30 PM

I'm confused by that example. If jumpthreading makes it possible to answer true, it is no differerent from inlining and other use cases. The reverse is the problematic case, turning constant cases into non-constant cases?

void added a comment.EditedMar 9 2020, 1:02 PM

I'm confused by that example. If jumpthreading makes it possible to answer true, it is no differerent from inlining and other use cases. The reverse is the problematic case, turning constant cases into non-constant cases?

No, inlining is fine, because it'll create a PHI node with the two values only if they can be combined into one. The value here shouldn't be considered constant. In a nutshell, jump threading is taking a PHI node and splitting it apart if one of the incoming values is a constant. This is supposedly fine in the general case, but doesn't work with the sematics of llvm.is.constant.

joerg added a comment.Mar 9 2020, 2:00 PM

It still seems to be a perfectly reasonable optimisation under the semantics of is.constant. The code here seems to be an abuse of it though and makes assumptions that are not sensible.

void updated this revision to Diff 249219.Mar 9 2020, 2:09 PM

Make test more representative of what this patch is doing.

void added a comment.Mar 9 2020, 2:36 PM

It still seems to be a perfectly reasonable optimisation under the semantics of is.constant. The code here seems to be an abuse of it though and makes assumptions that are not sensible.

The code is abusi^Wusing how __builtin_constant_p() works in gcc. It's honored even after inlining. But while that's icky, what's going on here is not quite that.

        int val;
        if (get_user(len, optlen))
                return -EFAULT;

        len = min_t(unsigned int, len, sizeof(int));

        if (len < 0)
                return -EINVAL;

        switch (optname) {
             ...
        case TCP_ZEROCOPY_RECEIVE: {
                struct tcp_zerocopy_receive zc;
                int err;

                if (get_user(len, optlen))
                        return -EFAULT;
                if (len < offsetofend(struct tcp_zerocopy_receive, length))
                        return -EINVAL;
                if (len > sizeof(zc)) {
                        len = sizeof(zc);
                        if (put_user(len, optlen))
                                return -EFAULT;
                }
                if (copy_from_user(&zc, optval, len))
                        return -EFAULT;
                lock_sock(sk);
                err = tcp_zerocopy_receive(sk, &zc);
                release_sock(sk);
                if (len == sizeof(zc))
                        goto zerocopy_rcv_sk_err;
                switch (len) {
                case offsetofend(struct tcp_zerocopy_receive, err):
                        goto zerocopy_rcv_sk_err;
                case offsetofend(struct tcp_zerocopy_receive, inq):
                        goto zerocopy_rcv_inq;
                case offsetofend(struct tcp_zerocopy_receive, length):
                default:
                        goto zerocopy_rcv_out;
                }
zerocopy_rcv_sk_err:
                if (!err)
                        zc.err = sock_error(sk);
zerocopy_rcv_inq:
                zc.inq = tcp_inq_hint(sk);
zerocopy_rcv_out:
                if (!err && copy_to_user(optval, &zc, len))
                        err = -EFAULT;
                return err;
        }

The len variable is assigned in several places here. In only one place does it get a constant value: the len = sizeof(zc) line. The copy_from_user and copy_to_user are inlined. They call a function (also inlined) to check the object size for overwrites:

_Bool check_copy_size(const void *addr, unsigned long bytes, _Bool is_source)
{
        int sz = __builtin_object_size(addr, 0);
        if (__builtin_expect(!!(sz >= 0 && sz < bytes), 0)) {
                if (!__builtin_constant_p(bytes))
                        copy_overwrite(sz, bytes);
                else if (is_source)
                        __bad_copy_from();
                else
                        __bad_copy_to();
                return false;
        }
        check_object_size(addr, bytes, is_source);
        return true;
}

The __builtin_constant_p(bytes) needs to return false, otherwise we'll get a link error because the __bad_copy_* functions don't exist. (Note: bytes is len here.)

Now, because len doesn't change between the copy_from_user and copy_to_user calls, the check if it's constant is combined. Now jump threading comes along and essentially converts that code into one where the if (len == sizeof(zc)) statement is split into two paths: one with len constant and the other with it non-constant. Those two combine into a PHI node (actually multiple PHI nodes) and that's what's used within the copy_to_user call. The copy_from_user call seems content to use the non-constant path (probably because of interaction between inlining and jump threading, because removing some of the calls between the two copy_*_user calls gets rid of the problem).

Threading through the if (len == sizeof(zc)) statement would be fine if it weren't for the fact that programmers rely upon is.constant to be false when not all paths to it have constant values for the operand.

void updated this revision to Diff 250048.Mar 12 2020, 2:00 PM

s/IsUsedBy/isUsedBy/g

void added a comment.Mar 12 2020, 2:00 PM

Friendly ping. :-)

Can you post standalone example on godbolt that shows this link failure without this patch please?

void added a comment.Mar 12 2020, 6:52 PM

Can you post standalone example on godbolt that shows this link failure without this patch please?

I'm not sure what that will achieve. It would include all of the Linux kernel as part of the linking process, and it's only one file (the one I reduced the testcase from) that has the __bad_copy_from() reference still in it. Perhaps I'm missing something?

Can you post standalone example on godbolt that shows this link failure without this patch please?

I'm not sure what that will achieve. It would include all of the Linux kernel as part of the linking process, and it's only one file (the one I reduced the testcase from) that has the __bad_copy_from() reference still in it. Perhaps I'm missing something?

I personally still don't understand *why* after this transform we fail to DCE. Is this pass ordering issue? Heuristics issue?

void added a comment.Mar 13 2020, 4:26 AM

Can you post standalone example on godbolt that shows this link failure without this patch please?

I'm not sure what that will achieve. It would include all of the Linux kernel as part of the linking process, and it's only one file (the one I reduced the testcase from) that has the __bad_copy_from() reference still in it. Perhaps I'm missing something?

I personally still don't understand *why* after this transform we fail to DCE. Is this pass ordering issue? Heuristics issue?

I gave a fairly lengthy explanation above. The test case is also instructive, and would be much easier to see what's happening, to tell the truth.

It's not about pass ordering. It's about how jump threading splits a PHI node into two parts: one with a constant value and one with a non-constant value. This is fine in normal situations, but the is.constant intrinsic is not normal, at least not in the way one thinks instructions are. It's meant to determine if the variable *at the point it's called* is constant or non-constant. If there are two values combined by a PHI node with one constant and the other non-constant, it's not appropriate to split the PHI apart into constant and non-constant paths, because then what would normally be DCE'd (like the call to __bad_copy_from) won't be, because there's one path where the is.constant is true, thus keeping the call to __bad_copy_from() around.

void added a comment.Mar 15 2020, 3:15 PM

Friendly ping. :-)

void added a comment.Mar 18 2020, 3:02 PM

Another friendly ping. :-)

I'm not really happy about imposing optimization "rule", which we have to follow for correctness, without some explanation in LangRef. Hacking a specific pass without a general rule is just going to make things more messy in the future; some other pass will eventually trip over this, and we won't have any reference for how it should be fixed.

Really, to preserve the semantics the kernel wants in general, we need some notion of a "region": when we have an if statement with a builtin_constant_p condition, the body of that if statement is a region with special semantics. If we resolve a builtin_constant_p, every use of a variable used as an input to the __builtin_constant_p must be constant-folded inside that region. We can optimize within the region, we can clone the whole region, but we can't clone the entry point of the region without cloning the whole thing, and we can't merge parts of the region without merging the whole thing.

I'm not sure how to actually represent that in LLVM IR in a way that can be uniformly queried by optimizations in a reasonable way. I guess marking llvm.is.constant "convergent" partially solves the issue, but we still might have problems with optimizations that merge code.

joerg added a comment.Mar 18 2020, 6:20 PM

I was discussing this with Bill off-list to get a better idea of the original test case. Basically, with the new constant intrinsic pass, we can provide a stronger semantic: the default LLVM pass chain always constant folds expressions involving is.constant and performs DCE trivially dead branches resulting from the former folding. This means that if the original expression is adjusted from:

if (__builtin_expect(!!(sz >= 0 && sz < bytes), 0)) {
    if (!__builtin_constant_p(bytes))
    ...
}

to the functionally equivalent:

if (!(__builtin_constant_p(sz) && __builtin_constant_p(bytes) && sz >= 0 && sz >= bytes) && __builtin_expect(!!(sz >= 0 && sz < bytes), 0)) {
    if (!__builtin_constant_p(bytes))
    ...
}

then we can actually ensure proper DCE even with -O0 in the default pass chain. That depends over all on two properties:

  • If __builtin_constant_p is constant folded by clang, it must DCE the appropiate places.
  • the constant intrinsic pass is run

With -O1 or higher, this should work in general.

void added a comment.Mar 18 2020, 6:45 PM

I'm not really happy about imposing optimization "rule", which we have to follow for correctness, without some explanation in LangRef. Hacking a specific pass without a general rule is just going to make things more messy in the future; some other pass will eventually trip over this, and we won't have any reference for how it should be fixed.

Really, to preserve the semantics the kernel wants in general, we need some notion of a "region": when we have an if statement with a builtin_constant_p condition, the body of that if statement is a region with special semantics. If we resolve a builtin_constant_p, every use of a variable used as an input to the __builtin_constant_p must be constant-folded inside that region. We can optimize within the region, we can clone the whole region, but we can't clone the entry point of the region without cloning the whole thing, and we can't merge parts of the region without merging the whole thing.

I'm not sure how to actually represent that in LLVM IR in a way that can be uniformly queried by optimizations in a reasonable way. I guess marking llvm.is.constant "convergent" partially solves the issue, but we still might have problems with optimizations that merge code.

This isn't about imposing a new "rule" on optimizations. It's preventing an optimization from changing the semantics of the program. At the point where jump threading decides to clone the region and make one edge constant and the other non-constant, it's violating llvm.is.constant's semantics. I'll make this more explicit in the language reference. But the upshot is that this isn't a hack, because of how llvm.is.constant is meant to work (i.e. to mimic builtin_constant_p). I don't think we should try to optimize around llvm.is.constant beyond DCE, which is really the main purpose of builtin_constant_p.

void added a comment.Mar 18 2020, 6:47 PM

I was discussing this with Bill off-list to get a better idea of the original test case. Basically, with the new constant intrinsic pass, we can provide a stronger semantic: the default LLVM pass chain always constant folds expressions involving is.constant and performs DCE trivially dead branches resulting from the former folding. This means that if the original expression is adjusted from:

if (__builtin_expect(!!(sz >= 0 && sz < bytes), 0)) {
    if (!__builtin_constant_p(bytes))
    ...
}

to the functionally equivalent:

if (!(__builtin_constant_p(sz) && __builtin_constant_p(bytes) && sz >= 0 && sz >= bytes) && __builtin_expect(!!(sz >= 0 && sz < bytes), 0)) {
    if (!__builtin_constant_p(bytes))
    ...
}

then we can actually ensure proper DCE even with -O0 in the default pass chain. That depends over all on two properties:

  • If __builtin_constant_p is constant folded by clang, it must DCE the appropiate places.
  • the constant intrinsic pass is run

With -O1 or higher, this should work in general.

Where should we perform this transformation? and can it be generalized?

joerg added a comment.Mar 19 2020, 5:22 AM

IMO the root of the problem here is that the branches are mixing predicated variables (bytes) with non-predicated variables (sz). If users want deterministic behavior here, that shouldn't be done.

void added a comment.Mar 19 2020, 12:30 PM

IMO the root of the problem here is that the branches are mixing predicated variables (bytes) with non-predicated variables (sz). If users want deterministic behavior here, that shouldn't be done.

I don't understand. The builtin_expect() call is only a suggestion to the compiler that a branch may or may not be taken. I don't see why this isn't allowed:

if (sz >= 0 && sz < bytes) {
    if (!__builtin_constant_p(bytes))
    ...
}
joerg added a comment.Mar 19 2020, 6:35 PM

Yes, builtin_expect is just noise here. The point I wanted to make is that both sz and bytes should be used in builtin_constant_p in the other condition.

joerg added a comment.Mar 19 2020, 6:36 PM

...in the outer condition, I meant.

void added a comment.Mar 19 2020, 9:28 PM

Yes, builtin_expect is just noise here. The point I wanted to make is that both sz and bytes should be used in builtin_constant_p in the other condition.

I'm sorry, but I still don't see why. And I don't see how we can enforce that rule.

joerg added a comment.Mar 20 2020, 8:12 AM

The original code has a functional dependency between sz and bytes and whether they can be constant evaluated. But the code doesn't express that. I don't think we can enforce that in any sensible way. There are valid use cases after all where partial inlining would result in entirely sensible decisions, just think about the more typical case of __builtin_constant_p selecting between inline asm taking immediate operands and one taking register/memory operands. That's why I am saying that I consider it a lot more useful to provide reliable building blocks to express the dependency and make sure they work.

void added a comment.EditedMar 20 2020, 12:47 PM

The original code has a functional dependency between sz and bytes and whether they can be constant evaluated. But the code doesn't express that. I don't think we can enforce that in any sensible way. There are valid use cases after all where partial inlining would result in entirely sensible decisions, just think about the more typical case of __builtin_constant_p selecting between inline asm taking immediate operands and one taking register/memory operands. That's why I am saying that I consider it a lot more useful to provide reliable building blocks to express the dependency and make sure they work.

[Adding the code snippets below for reference.]

Okay I think I get what you're saying now. If both sz and bytes are constant, then we can evaluate the if conditional and DCE properly. If sz or bytes isn't constant then the if conditional can't be DCE'd and that's when we require bytes to be non-constant. (Note: We don't care about compiling at -O0. Linux as a whole can't compile without optimizations.) In your code, you check that explicitly in the if conditional. While Linux is being clever and letting the compiler do it for us.

I don't argue that your suggested replacement is more readable and probably more amenable to the compiler, I still think that what Linux is doing is okay.

[Edit]

Re supplying reliable building blocks. That's not something we can do here, and I'm not sure how that would manifest. The situation as I see it is this: what's been written is valid code. It could be written in a different way, but in the end it's something the compiler should handle just as if they wrote it in the way you suggested. Therefore, the optimization passes must not violate these semantics. If you're suggesting that this is something the compiler should supply, I still don't see how that would manifest unless it's something akin to Eli's region suggestion. Even then, it's a fundamental change to the compiler that doesn't help me at all. This is a blocking bug for us, and it's pernicious as it leads to unexpected code generation. Adding a fundamental change to the compiler for one instruction seems a bit heavy handed to me. Would you prefer adding the convergent attribute to the intrinsic?

Original Code:

_Bool check_copy_size(const void *addr, unsigned long bytes, _Bool is_source)
{
        int sz = __builtin_object_size(addr, 0);
        if (__builtin_expect(!!(sz >= 0 && sz < bytes), 0)) {
                if (!__builtin_constant_p(bytes))
                        copy_overwrite(sz, bytes);
                else if (is_source)
                        __bad_copy_from();
                else
                        __bad_copy_to();
                return false;
        }
        check_object_size(addr, bytes, is_source);
        return true;
}

Your Suggested Replacement:

if (!(__builtin_constant_p(sz) && __builtin_constant_p(bytes) && sz >= 0 && sz >= bytes) && __builtin_expect(!!(sz >= 0 && sz < bytes), 0)) {
    if (!__builtin_constant_p(bytes))
    ...
}
MaskRay added inline comments.Mar 25 2020, 11:25 PM
llvm/test/Transforms/JumpThreading/is_constant.ll
270

Haven't read the code yet, but my feeling is that this test case really should be reduced. These metadata are not useful.

void marked an inline comment as done.Mar 26 2020, 1:17 AM
void added inline comments.
llvm/test/Transforms/JumpThreading/is_constant.ll
270

I can try, but it's pretty reduced as is. I think it triggers because of the amount of code around the issue.

kazu added a comment.Mar 29 2020, 1:14 AM

It looks like the testcase can be reduced to the following. You can check for no jump threading occurring:

define void @foo(i32 %a, i32 %var) {
entry:
  %cond1 = icmp ugt i32 %a, 24
  br i1 %cond1, label %bb_constant, label %bb_cond

bb_constant:
  br label %bb_cond

bb_cond:
  %phi = phi i32 [ 24, %bb_constant ], [ %var, %entry ]
  %sext = sext i32 %phi to i64
  %cond2 = icmp ugt i64 %sext, 24
  %is_constant = call i1 @llvm.is.constant.i64(i64 %sext)
  br i1 %cond2, label %bb_then, label %bb_else

bb_then:
  unreachable

bb_else:
  unreachable
}

; Function Attrs: nounwind readnone willreturn
declare i1 @llvm.is.constant.i64(i64) #0

attributes #0 = { nounwind readnone willreturn }

That said, if I remove the sign extension like so, the edge entry->bb_constant gets threaded through bb_cond to bb_else even with your patch. IIUC, your patch does not catch a case where a PHI node is directly used by llvm.is_constant:

define void @bar(i32 %a, i64 %var) {
entry:
  %cond1 = icmp ugt i32 %a, 24
  br i1 %cond1, label %bb_constant, label %bb_cond

bb_constant:
  br label %bb_cond

bb_cond:
  %sext = phi i64 [ 24, %bb_constant ], [ %var, %entry ]
  %cond2 = icmp ugt i64 %sext, 24
  %is_constant = call i1 @llvm.is.constant.i64(i64 %sext)
  br i1 %cond2, label %bb_then, label %bb_else

bb_then:
  unreachable

bb_else:
  unreachable
}

; Function Attrs: nounwind readnone willreturn
declare i1 @llvm.is.constant.i64(i64) #0

attributes #0 = { nounwind readnone willreturn }

By the way, MaybeThreadThroughTwoBasicBlocks in JumpThreading.cpp can thread an edge through two successive basic blocks theses days. You need to teach MaybeThreadThroughTwoBasicBlocks to look out for problematic llvm.is_constant also.

FWIW, we catch convergent calls in a somewhat sneaky interface. Specifically, getJumpThreadDuplicationCost returns ~0U when it encounters an instruction that we cannot safely duplicate. You could teach getJumpThreadDuplicationCost to return ~0U on a problematic llvm.is_constant. This way, you don't have to perform the same check in both MaybeThreadThroughTwoBasicBlocks and TryThreadEdge.

Now, the convergent attribute seems to me like a perfect fit for llvm.is_constant as far as I can tell from its description, but I am not sure what side effects it would have on the rest of the compiler.

void updated this revision to Diff 253484.Mar 29 2020, 9:07 PM
void edited the summary of this revision. (Show Details)

Using a convergent attribute accomplishes the same thing and doesn't run into the issue Eli pointed out, which is that we're creating special rules for passes.

void added a comment.Mar 29 2020, 9:08 PM

@kazu and @MaskRay I modified this to just add the "convergent" attribute to is.constant. It seems to be just fine for it. PTAL.

kazu accepted this revision.Mar 30 2020, 9:53 AM

LGTM. Thank you for revising the patch!

This revision is now accepted and ready to land.Mar 30 2020, 9:53 AM

Please revisit patch title/description before commiting

void retitled this revision from [JumpThreading] Don't create PHI nodes with "is.constant" values to [Intrinsic] Give "is.constant" the "convergent" attribute.Mar 30 2020, 11:42 AM
void edited the summary of this revision. (Show Details)

Please revisit patch title/description before commiting

Done.

Thanks for the reviews!

This revision was automatically updated to reflect the committed changes.