Code frequently relies upon the results of "is.constant" intrinsics to
DCE invalid code paths. We don't want the intrinsic to be made control-
dependent on any additional values. For instance, we can't split a PHI
into a "constant" and "non-constant" part via jump threading in order
to "optimize" the constant part, because the "is.constant" intrinsic is
meant to return "false".
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Transforms/Scalar/JumpThreading.cpp | ||
---|---|---|
2012 ↗ | (On Diff #248906) | You can use m_Intrinsic matcher from llvm pattern match |
This looks somewhat dubious to me. This seems like an InstCombine-style fold that is performed in JumpThreading -- but why? Won't it just get folded either earlier or later? As the test case is over-reduced, it's hard to tell where this would make a difference in practice.
No. In the original testcase (from net/ipv4/tcp.c in Linux with ASAN enabled), there are two calls to copy_to_user and copy_from_user near where the testcase is located. They each call check_copy_size:
static inline __attribute__((unused)) __attribute__((no_instrument_function)) __attribute__((always_inline)) bool check_copy_size(const void *addr, size_t bytes, bool is_source) { int sz = __builtin_object_size(addr, 0); if (__builtin_expect(!!(sz >= 0 && sz < bytes), 0)) { if (!__builtin_constant_p(bytes)) copy_overflow(sz, bytes); else if (is_source) __bad_copy_from(); else __bad_copy_to(); return false; } check_object_size(addr, bytes, is_source); return true; } static inline __attribute__((unused)) __attribute__((no_instrument_function)) __attribute__((always_inline)) unsigned long __attribute__((warn_unused_result)) copy_from_user(void *to, const void *from, unsigned long n) { if (__builtin_expect(!!(check_copy_size(to, n, false)), 1)) n = _copy_from_user(to, from, n); return n; } static inline __attribute__((unused)) __attribute__((no_instrument_function)) __attribute__((always_inline)) unsigned long __attribute__((warn_unused_result)) copy_to_user(void *to, const void *from, unsigned long n) { if (__builtin_expect(!!(check_copy_size(from, n, true)), 1)) n = _copy_to_user(to, from, n); return n; }
The __bad_copy_* functions don't actually exist, so we get a linking error if those code paths aren't removed. The GVN patch combines the llvm.is.constant() check that's used by both of the copy_*_user functions. It *should* resolve to false, which would cause it to execute copy_overflow and DCE the paths with the __bad_* calls. All of that seems okay and while I'm not particularly sure that GVN should care about is.constant calls, it's probably innocuous.
So once we get to jump threading, there's a call to llvm.is.constant that's the conditional of two br instructions (one for each copy_*_user call). While it's threading the blocks in this area. See %5 below:
define i32 @do_tcp_getsockopt(%struct.sock* %sk, i32 %level, i32 %optname, i8* %optval, i32* %optlen) { ... if.end18: ; preds = %if.then11, %if.end7 %len.0 = phi i32 [ 24, %if.then11 ], [ %conv, %if.end7 ] %conv19 = sext i32 %len.0 to i64 %cmp3.i.i70 = icmp ugt i32 %len.0, 24 %5 = call i1 @llvm.is.constant.i64(i64 %conv19) #4 br i1 %cmp3.i.i70, label %if.then.i.i71, label %if.end12.i.i74 if.then.i.i71: ; preds = %if.end18 br i1 %5, label %if.else.i.i73, label %if.then7.i.i72 if.then7.i.i72: ; preds = %if.then.i.i71 call void (i8*, ...) @__warn_printk(i8* getelementptr inbounds ([38 x i8], [38 x i8]* @.str, i64 0, i64 0), i32 24, i64 %conv19) #9 call void asm sideeffect "...", "i,i,i,i,~{dirflag},~{fpsr},~{flags}"(i8* getelementptr inbounds ([30 x i8], [30 x i8]* @.str.1, i64 0, i64 0), i32 127, i32 2305, i64 12) #4 br label %copy_from_user.exit if.else.i.i73: ; preds = %if.then.i.i71 call void @__bad_copy_to() #9 br label %copy_from_user.exit ... if.then.i.i: ; preds = %zerocopy_rcv_out br i1 %5, label %if.else.i.i, label %if.then7.i.i if.then7.i.i: ; preds = %if.then.i.i call void (i8*, ...) @__warn_printk(i8* getelementptr inbounds ([38 x i8], [38 x i8]* @.str, i64 0, i64 0), i32 24, i64 %conv19) #9 call void asm sideeffect "...", "i,i,i,i,~{dirflag},~{fpsr},~{flags}"(i8* getelementptr inbounds ([30 x i8], [30 x i8]* @.str.1, i64 0, i64 0), i32 127, i32 2305, i64 12) #4 br label %copy_to_user.exit if.else.i.i: ; preds = %if.then.i.i call void @__bad_copy_from() #9 br label %copy_to_user.exit
This is converted to this code after jump threading. See %5 and %8:
define i32 @do_tcp_getsockopt(%struct.sock* %sk, i32 %level, i32 %optname, i8* %optval, i32* %optlen) { ... if.end18: ; preds = %if.then11, %if.end7 %len.0 = phi i32 [ 24, %if.then11 ], [ %conv, %if.end7 ] %conv19 = sext i32 %len.0 to i64 %cmp3.i.i70 = icmp ugt i32 %len.0, 24 %5 = call i1 @llvm.is.constant.i64(i64 %conv19) #4 br i1 %cmp3.i.i70, label %if.then.i.i71, label %if.end12.i.i74 if.then.i.i71: ; preds = %if.end18 br i1 %5, label %if.else.i.i73, label %if.then7.i.i72 if.then7.i.i72: ; preds = %if.then.i.i71 call void (i8*, ...) @__warn_printk(i8* getelementptr inbounds ([38 x i8], [38 x i8]* @.str, i64 0, i64 0), i32 24, i64 %conv19) #9 call void asm sideeffect "...", "i,i,i,i,~{dirflag},~{fpsr},~{flags}"(i8* getelementptr inbounds ([30 x i8], [30 x i8]* @.str.1, i64 0, i64 0), i32 127, i32 2305, i64 12) #4 br label %copy_from_user.exit if.end12.i.i74: ; preds = %if.end18 %6 = phi i1 [ %5, %if.end18 ] %cmp3.i.i7011 = phi i1 [ %cmp3.i.i70, %if.end18 ] %conv196 = phi i64 [ %conv19, %if.end18 ] %len.05 = phi i32 [ %len.0, %if.end18 ] br i1 %6, label %if.then.i77, label %if.then.i.i.i75 if.then.i.i.i75: ; preds = %if.end12.i.i74 call void @__check_object_size(i8* nonnull %0, i64 %conv196, i1 zeroext false) br label %if.then.i77 if.then.i77: ; preds = %if.then11, %if.then.i.i.i75, %if.end12.i.i74 %len.0516 = phi i32 [ %len.05, %if.then.i.i.i75 ], [ %len.05, %if.end12.i.i74 ], [ 24, %if.then11 ] %cmp3.i.i701115 = phi i1 [ %cmp3.i.i7011, %if.then.i.i.i75 ], [ %cmp3.i.i7011, %if.end12.i.i74 ], [ false, %if.then11 ] %7 = phi i1 [ %6, %if.then.i.i.i75 ], [ %6, %if.end12.i.i74 ], [ true, %if.then11 ] %conv197 = phi i64 [ %conv196, %if.then.i.i.i75 ], [ %conv196, %if.end12.i.i74 ], [ 24, %if.then11 ] %call2.i76 = call i64 @_copy_from_user(i8* nonnull %0, i8* %optval, i64 %conv197) br label %copy_from_user.exit copy_from_user.exit: ; preds = %if.then.i77, %if.else.i.i73, %if.then7.i.i72 %8 = phi i1 [ %7, %if.then.i77 ], [ %5, %if.then7.i.i72 ], [ %5, %if.else.i.i73 ] ... if.then.i.i: ; preds = %zerocopy_rcv_out br i1 %8, label %if.else.i.i, label %if.then7.i.i if.then7.i.i: ; preds = %if.then.i.i call void (i8*, ...) @__warn_printk(i8* getelementptr inbounds ([38 x i8], [38 x i8]* @.str, i64 0, i64 0), i32 24, i64 %conv198) call void asm sideeffect "...", "i,i,i,i,~{dirflag},~{fpsr},~{flags}"(i8* getelementptr inbounds ([30 x i8], [30 x i8]* @.str.1, i64 0, i64 0), i32 127, i32 2305, i64 12) #2 br label %copy_to_user.exit if.else.i.i: ; preds = %if.then.i.i call void @__bad_copy_from() br label %copy_to_user.exit
So it's unnecessarily complicated and the __bad_copy_from isn't being DCE'd as it should be.
What do you mean by "that"? This *does* fix the issue. Unless you can see another issue that I can't.
Previous comment implies there are DCE failures being exposed by this:
I'm wondering why we should be papering over those DCE deficiencies
by trying to avoid performing folds that end up exposing them
instead of fixing DCE deficiencies.
@void From your example, I still don't really get what the problem is / why this has to be done in JumpThreading in particular. The %6 phi that gets inserted here looks harmless to me -- the phi is trivial and will get eliminated by any simplification pass later in the pipeline (though I'm not entirely sure why a single-arg phi gets created in the first place).
What you seem to be really doing here is to move the is.constant lowering from the LowerConstantIntrinsics pass into the JumpThreading pass. It sounds to me like some kind of phase ordering issue where LowerConstantIntrinsics runs too late for the necessary DCE to be performed. Is that right? What does the IR look like at the point where the lowering is (currently) performed?
[@lebedev.ri this is in response to your question as well.]
Here's the relevant IR right before lower constant intrinsics (in the code snippets, look for calls to __bad_copy_*, those are the ones we want to get rid of):
if.then607: ; preds = %if.end603 %176 = call i32 asm sideeffect "call __put_user_4", "={ax},0,{cx},~{ebx},~{dirflag},~{fpsr},~{flags}"(i32 24, i32* %optlen) #9, !dbg !24792, !srcloc !24793 %tobool613 = icmp eq i32 %176, 0, !dbg !24794 br i1 %tobool613, label %if.then.i188, label %cleanup644, !dbg !24795, !prof !15173, !misexpect !14134 if.end616: ; preds = %if.end603 %sext220 = shl i64 %asmresult590, 32, !dbg !24796 %conv617 = ashr exact i64 %sext220, 32, !dbg !24796 %cmp3.i.i181 = icmp ugt i32 %conv592, 24, !dbg !24802 %177 = call i1 @llvm.is.constant.i64(i64 %conv617) #9, !dbg !24800 br i1 %cmp3.i.i181, label %if.then.i.i182, label %if.end12.i.i185, !dbg !24803, !prof !14133, !misexpect !14134 if.then.i.i182: ; preds = %if.end616 br i1 %177, label %if.else.i.i184, label %if.then7.i.i183, !dbg !24804, !prof !15547 if.then7.i.i183: ; preds = %if.then.i.i182 call void (i8*, ...) @__warn_printk(i8* getelementptr inbounds ([38 x i8], [38 x i8]* @.str.16, i64 0, i64 0), i32 24, i64 %conv617) #11, !dbg !24808 call void asm sideeffect "...", "i,i,i,i,~{dirflag},~{fpsr},~{flags}"(i8* getelementptr inbounds ([30 x i8], [30 x i8]* @.str.17, i64 0, i64 0), i32 127, i32 2305, i64 12) #9, !dbg !24809, !srcloc !24342 br label %copy_from_user.exit, !dbg !24810 if.else.i.i184: ; preds = %if.then.i.i182 call void @__bad_copy_to() #11, !dbg !24811 br label %copy_from_user.exit if.end12.i.i185: ; preds = %if.end616 br i1 %177, label %if.then.i188, label %if.then.i.i.i186, !dbg !24814 if.then.i.i.i186: ; preds = %if.end12.i.i185 call void @__check_object_size(i8* nonnull %172, i64 %conv617, i1 zeroext false) #11, !dbg !24815 br label %if.then.i188, !dbg !24815 if.then.i188: ; preds = %if.then607, %if.then.i.i.i186, %if.end12.i.i185 %len.0208219 = phi i32 [ %conv592, %if.then.i.i.i186 ], [ %conv592, %if.end12.i.i185 ], [ 24, %if.then607 ] %conv617210218 = phi i64 [ %conv617, %if.then.i.i.i186 ], [ %conv617, %if.end12.i.i185 ], [ 24, %if.then607 ] %178 = phi i1 [ false, %if.then.i.i.i186 ], [ true, %if.end12.i.i185 ], [ true, %if.then607 ] %call2.i187 = call i64 @_copy_from_user(i8* nonnull %172, i8* %optval, i64 %conv617210218) #11, !dbg !24816 br label %copy_from_user.exit, !dbg !24817 copy_from_user.exit: ; preds = %if.then7.i.i183, %if.else.i.i184, %if.then.i188 %179 = phi i1 [ %178, %if.then.i188 ], [ false, %if.then7.i.i183 ], [ true, %if.else.i.i184 ] %cmp3.i.i181212 = phi i1 [ false, %if.then.i188 ], [ true, %if.then7.i.i183 ], [ true, %if.else.i.i184 ] %conv617209 = phi i64 [ %conv617210218, %if.then.i188 ], [ %conv617, %if.then7.i.i183 ], [ %conv617, %if.else.i.i184 ] %len.0207 = phi i32 [ %len.0208219, %if.then.i188 ], [ %conv592, %if.then7.i.i183 ], [ %conv592, %if.else.i.i184 ] %n.addr.0.i189 = phi i64 [ %call2.i187, %if.then.i188 ], [ %conv617, %if.then7.i.i183 ], [ %conv617, %if.else.i.i184 ] %tobool619 = icmp eq i64 %n.addr.0.i189, 0, !dbg !24818 br i1 %tobool619, label %if.end621, label %cleanup644, !dbg !24819 if.end621: ; preds = %copy_from_user.exit call void @lock_sock_nested(%struct.sock* %sk, i32 0) #11, !dbg !24822 %call622 = call fastcc i32 @tcp_zerocopy_receive(%struct.sock* %sk, %struct.tcp_zerocopy_receive* nonnull %zc) #10, !dbg !24823 %sk_err.i = getelementptr inbounds %struct.sock, %struct.sock* %sk, i64 0, i32 51, !dbg !24833 %180 = load i32, i32* %sk_err.i, align 8, !dbg !24833 %tobool.i190 = icmp eq i32 %180, 0, !dbg !24833 br i1 %tobool.i190, label %sock_error.exit, label %if.end.i, !dbg !24835, !prof !15173, !misexpect !15389 if.end.i: ; preds = %if.then632 %181 = call i32 asm sideeffect "xchgl $0, $1\0A", "=r,=*m,0,*m,~{memory},~{cc},~{dirflag},~{fpsr},~{flags}"(i32* %sk_err.i, i32 0, i32* %sk_err.i) #9, !dbg !24837, !srcloc !15261 %sub.i = sub i32 0, %181, !dbg !24838 br label %sock_error.exit, !dbg !24839 sock_error.exit: ; preds = %if.then632, %if.end.i %retval.0.i = phi i32 [ %sub.i, %if.end.i ], [ 0, %if.then632 ], !dbg !24831 %err634 = getelementptr inbounds %struct.tcp_zerocopy_receive, %struct.tcp_zerocopy_receive* %zc, i64 0, i32 4, !dbg !24840 store i32 %retval.0.i, i32* %err634, align 4, !dbg !24841 br label %zerocopy_rcv_inq, !dbg !24842 zerocopy_rcv_inq: ; preds = %if.end621, %zerocopy_rcv_sk_err, %sock_error.exit %182 = getelementptr inbounds %struct.sock, %struct.sock* %sk, i64 2, i32 0, i32 12, i32 0, i32 0, i64 3, !dbg !24846 %183 = load volatile i32, i32* %182, align 4, !dbg !24849 %184 = getelementptr inbounds %struct.sock, %struct.sock* %sk, i64 2, i32 0, i32 12, i32 0, i32 0, i64 2, !dbg !24851 %185 = load volatile i32, i32* %184, align 4, !dbg !24854 %sub.i191 = sub i32 %185, %183, !dbg !24856 %cmp.i192 = icmp slt i32 %sub.i191, 0, !dbg !24857 br i1 %cmp.i192, label %if.then.i193, label %lor.rhs.i, !dbg !24857, !prof !14133 lor.rhs.i: ; preds = %zerocopy_rcv_inq %186 = load volatile i32, i32* %182, align 4, !dbg !24860 %cmp20.i = icmp eq i32 %183, %186, !dbg !24857 br i1 %cmp20.i, label %if.end.i194, label %if.then.i193, !dbg !24862, !prof !15173, !misexpect !14134 if.then.i193: ; preds = %lor.rhs.i, %zerocopy_rcv_inq call void @lock_sock_nested(%struct.sock* %sk, i32 0) #11, !dbg !24865 %187 = load i32, i32* %184, align 8, !dbg !24866 %188 = load i32, i32* %182, align 4, !dbg !24867 %sub24.i = sub i32 %187, %188, !dbg !24868 call void @release_sock(%struct.sock* %sk) #11, !dbg !24869 br label %if.end.i194, !dbg !24870 if.end.i194: ; preds = %if.then.i193, %lor.rhs.i %inq.0.i = phi i32 [ %sub24.i, %if.then.i193 ], [ %sub.i191, %lor.rhs.i ], !dbg !24844 %cmp25.i = icmp eq i32 %inq.0.i, 0, !dbg !24871 br i1 %cmp25.i, label %land.lhs.true.i, label %tcp_inq_hint.exit, !dbg !24872 land.lhs.true.i: ; preds = %if.end.i194 %skc_flags.i.i = getelementptr inbounds %struct.sock, %struct.sock* %sk, i64 0, i32 0, i32 15, i32 0, !dbg !24875 %189 = load volatile i64, i64* %skc_flags.i.i, align 8, !dbg !24878 %and1.i.i.i = lshr i64 %189, 1, !dbg !24879 %190 = trunc i64 %and1.i.i.i to i32, !dbg !24879 %191 = and i32 %190, 1, !dbg !24879 br label %tcp_inq_hint.exit, !dbg !24879 tcp_inq_hint.exit: ; preds = %if.end.i194, %land.lhs.true.i %call636195 = phi i32 [ %191, %land.lhs.true.i ], [ %inq.0.i, %if.end.i194 ] %inq = getelementptr inbounds %struct.tcp_zerocopy_receive, %struct.tcp_zerocopy_receive* %zc, i64 0, i32 3, !dbg !24880 store i32 %call636195, i32* %inq, align 8, !dbg !24881 br label %zerocopy_rcv_out, !dbg !24882 zerocopy_rcv_out: ; preds = %if.end621, %tcp_inq_hint.exit call void @llvm.dbg.label(metadata !24079), !dbg !24883 %tobool637 = icmp eq i32 %call622, 0, !dbg !24884 br i1 %tobool637, label %land.lhs.true638, label %cleanup644, !dbg !24886 land.lhs.true638: ; preds = %zerocopy_rcv_out br i1 %cmp3.i.i181212, label %if.then.i.i39, label %if.end12.i.i42, !dbg !24891, !prof !14133, !misexpect !14134 if.then.i.i39: ; preds = %land.lhs.true638 br i1 %179, label %if.else.i.i41, label %if.then7.i.i40, !dbg !24892, !prof !15547 if.then7.i.i40: ; preds = %if.then.i.i39 call void (i8*, ...) @__warn_printk(i8* getelementptr inbounds ([38 x i8], [38 x i8]* @.str.16, i64 0, i64 0), i32 24, i64 %conv617209) #11, !dbg !24896 call void asm sideeffect "...", "i,i,i,i,~{dirflag},~{fpsr},~{flags}"(i8* getelementptr inbounds ([30 x i8], [30 x i8]* @.str.17, i64 0, i64 0), i32 127, i32 2305, i64 12) #9, !dbg !24897, !srcloc !24342 br label %copy_to_user.exit47, !dbg !24898 if.else.i.i41: ; preds = %if.then.i.i39 call void @__bad_copy_from() #11, !dbg !24899 br label %copy_to_user.exit47, !dbg !24899
The code paths in question are:
- %177 (llvm.is.constant) determines if __bad_copy_to() is executed. It should resolve to "false", so __bad_copy_to() is DCE'd. - %179 (PHI node) determines if __bad_copy_from() is executed. - %179 is a PHI of %178 (from %if.then.i188), false (from %if.then7.i.i183), and true (from %if.else.i.i184) - %178 is also a PHI of false (from %if.then.i.i.i186), true (from %if.end12.i.i185), and true (from %if.then607)
Because %179 doesn't resolve to false, it keeps it around. So the question is "why are some of these values true?"
This is after jump threading:
if.end616: ; preds = %if.end603 %len.0 = phi i32 [ %conv592, %if.end603 ], !dbg !25087 %conv617 = sext i32 %len.0 to i64, !dbg !25088 %cmp3.i.i181 = icmp ugt i32 %len.0, 24, !dbg !25094 %177 = call i1 @llvm.is.constant.i64(i64 %conv617) #14, !dbg !25092 br i1 %cmp3.i.i181, label %if.then.i.i182, label %if.end12.i.i185, !dbg !25095, !prof !14149, !misexpect !14150 if.then.i.i182: ; preds = %if.end616 br i1 %177, label %if.else.i.i184, label %if.then7.i.i183, !dbg !25096, !prof !15638 if.then7.i.i183: ; preds = %if.then.i.i182 call void (i8*, ...) @__warn_printk(i8* getelementptr inbounds ([38 x i8], [38 x i8]* @.str.16, i64 0, i64 0), i32 24, i64 %conv617) #16, !dbg !25100 call void asm sideeffect "...", "i,i,i,i,~{dirflag},~{fpsr},~{flags}"(i8* getelementptr inbounds ([30 x i8], [30 x i8]* @.str.17, i64 0, i64 0), i32 127, i32 2305, i64 12) #14, !dbg !25101, !srcloc !24631 br label %copy_from_user.exit, !dbg !25102 if.else.i.i184: ; preds = %if.then.i.i182 call void @__bad_copy_to() #16, !dbg !25103 br label %copy_from_user.exit if.end12.i.i185: ; preds = %if.end616 %178 = phi i1 [ %177, %if.end616 ] %cmp3.i.i181213 = phi i1 [ %cmp3.i.i181, %if.end616 ] %conv617210 = phi i64 [ %conv617, %if.end616 ] %len.0208 = phi i32 [ %len.0, %if.end616 ] br i1 %178, label %if.then.i188, label %if.then.i.i.i186, !dbg !25106 if.then.i.i.i186: ; preds = %if.end12.i.i185 call void @__check_object_size(i8* nonnull %172, i64 %conv617210, i1 zeroext false) #16, !dbg !25107 br label %if.then.i188, !dbg !25107 if.then.i188: ; preds = %if.then607, %if.then.i.i.i186, %if.end12.i.i185 %len.0208219 = phi i32 [ %len.0208, %if.then.i.i.i186 ], [ %len.0208, %if.end12.i.i185 ], [ 24, %if.then607 ] %conv617210218 = phi i64 [ %conv617210, %if.then.i.i.i186 ], [ %conv617210, %if.end12.i.i185 ], [ 24, %if.then607 ] %cmp3.i.i181213217 = phi i1 [ %cmp3.i.i181213, %if.then.i.i.i186 ], [ %cmp3.i.i181213, %if.end12.i.i185 ], [ false, %if.then607 ] %179 = phi i1 [ %178, %if.then.i.i.i186 ], [ %178, %if.end12.i.i185 ], [ true, %if.then607 ] %call2.i187 = call i64 @_copy_from_user(i8* nonnull %172, i8* %optval, i64 %conv617210218) #16, !dbg !25108 call void @llvm.dbg.value(metadata i64 %call2.i187, metadata !22165, metadata !DIExpression()) #14, !dbg !25090 br label %copy_from_user.exit, !dbg !25109 copy_from_user.exit: ; preds = %if.then7.i.i183, %if.else.i.i184, %if.then.i188 %180 = phi i1 [ %179, %if.then.i188 ], [ %177, %if.then7.i.i183 ], [ %177, %if.else.i.i184 ] %cmp3.i.i181212 = phi i1 [ %cmp3.i.i181213217, %if.then.i188 ], [ %cmp3.i.i181, %if.then7.i.i183 ], [ %cmp3.i.i181, %if.else.i.i184 ] %conv617209 = phi i64 [ %conv617210218, %if.then.i188 ], [ %conv617, %if.then7.i.i183 ], [ %conv617, %if.else.i.i184 ] %len.0207 = phi i32 [ %len.0208219, %if.then.i188 ], [ %len.0, %if.then7.i.i183 ], [ %len.0, %if.else.i.i184 ] %n.addr.0.i189 = phi i64 [ %call2.i187, %if.then.i188 ], [ %conv617, %if.then7.i.i183 ], [ %conv617, %if.else.i.i184 ] call void @llvm.dbg.value(metadata i64 %n.addr.0.i189, metadata !22165, metadata !DIExpression()) #14, !dbg !25090 %tobool619 = icmp eq i64 %n.addr.0.i189, 0, !dbg !25110 br i1 %tobool619, label %if.end621, label %cleanup644, !dbg !25111 if.end621: ; preds = %copy_from_user.exit call void @lock_sock_nested(%struct.sock* %sk, i32 0) #16, !dbg !25114 %call622 = call fastcc i32 @tcp_zerocopy_receive(%struct.sock* %sk, %struct.tcp_zerocopy_receive* nonnull %zc) #15, !dbg !25115 call void @release_sock(%struct.sock* %sk) #16, !dbg !25116 switch i32 %len.0207, label %zerocopy_rcv_out [ i32 24, label %zerocopy_rcv_sk_err i32 20, label %zerocopy_rcv_inq ], !dbg !25117 zerocopy_rcv_sk_err: ; preds = %if.end621 call void @llvm.dbg.label(metadata !24366), !dbg !25118 %tobool631 = icmp eq i32 %call622, 0, !dbg !25119 br i1 %tobool631, label %if.then632, label %zerocopy_rcv_inq, !dbg !25121 if.then632: ; preds = %zerocopy_rcv_sk_err %sk_err.i = getelementptr inbounds %struct.sock, %struct.sock* %sk, i64 0, i32 51, !dbg !25124 %181 = load i32, i32* %sk_err.i, align 8, !dbg !25124 %tobool.i190 = icmp eq i32 %181, 0, !dbg !25124 br i1 %tobool.i190, label %sock_error.exit, label %if.end.i, !dbg !25126, !prof !15210, !misexpect !15480 if.end.i: ; preds = %if.then632 %182 = call i32 asm sideeffect "xchgl $0, $1\0A", "=r,=*m,0,*m,~{memory},~{cc},~{dirflag},~{fpsr},~{flags}"(i32* %sk_err.i, i32 0, i32* %sk_err.i) #14, !dbg !25128, !srcloc !15298 %sub.i = sub i32 0, %182, !dbg !25129 br label %sock_error.exit, !dbg !25130 sock_error.exit: ; preds = %if.then632, %if.end.i %retval.0.i = phi i32 [ %sub.i, %if.end.i ], [ 0, %if.then632 ], !dbg !25122 %err634 = getelementptr inbounds %struct.tcp_zerocopy_receive, %struct.tcp_zerocopy_receive* %zc, i64 0, i32 4, !dbg !25131 store i32 %retval.0.i, i32* %err634, align 4, !dbg !25132 br label %zerocopy_rcv_inq, !dbg !25133 zerocopy_rcv_inq: ; preds = %if.end621, %zerocopy_rcv_sk_err, %sock_error.exit %183 = getelementptr inbounds %struct.sock, %struct.sock* %sk, i64 2, i32 0, i32 12, i32 0, i32 0, i64 3, !dbg !25137 %184 = load volatile i32, i32* %183, align 4, !dbg !25140 %185 = getelementptr inbounds %struct.sock, %struct.sock* %sk, i64 2, i32 0, i32 12, i32 0, i32 0, i64 2, !dbg !25142 %186 = load volatile i32, i32* %185, align 4, !dbg !25145 %sub.i191 = sub i32 %186, %184, !dbg !25147 %cmp.i192 = icmp slt i32 %sub.i191, 0, !dbg !25148 br i1 %cmp.i192, label %if.then.i193, label %lor.rhs.i, !dbg !25148, !prof !14149 lor.rhs.i: ; preds = %zerocopy_rcv_inq %187 = load volatile i32, i32* %183, align 4, !dbg !25151 %cmp20.i = icmp eq i32 %184, %187, !dbg !25148 br i1 %cmp20.i, label %if.end.i194, label %if.then.i193, !dbg !25153, !prof !15210, !misexpect !14150 if.then.i193: ; preds = %lor.rhs.i, %zerocopy_rcv_inq call void @lock_sock_nested(%struct.sock* %sk, i32 0) #16, !dbg !25156 %188 = load i32, i32* %185, align 8, !dbg !25157 %189 = load i32, i32* %183, align 4, !dbg !25158 %sub24.i = sub i32 %188, %189, !dbg !25159 call void @release_sock(%struct.sock* %sk) #16, !dbg !25160 br label %if.end.i194, !dbg !25161 if.end.i194: ; preds = %if.then.i193, %lor.rhs.i %inq.0.i = phi i32 [ %sub24.i, %if.then.i193 ], [ %sub.i191, %lor.rhs.i ], !dbg !25135 %cmp25.i = icmp eq i32 %inq.0.i, 0, !dbg !25162 br i1 %cmp25.i, label %land.lhs.true.i, label %tcp_inq_hint.exit, !dbg !25163 land.lhs.true.i: ; preds = %if.end.i194 %skc_flags.i.i = getelementptr inbounds %struct.sock, %struct.sock* %sk, i64 0, i32 0, i32 15, i32 0, !dbg !25166 %190 = load volatile i64, i64* %skc_flags.i.i, align 8, !dbg !25169 %and1.i.i.i = lshr i64 %190, 1, !dbg !25170 %191 = trunc i64 %and1.i.i.i to i32, !dbg !25170 %192 = and i32 %191, 1, !dbg !25170 br label %tcp_inq_hint.exit, !dbg !25170 tcp_inq_hint.exit: ; preds = %if.end.i194, %land.lhs.true.i %call636195 = phi i32 [ %192, %land.lhs.true.i ], [ %inq.0.i, %if.end.i194 ] %inq = getelementptr inbounds %struct.tcp_zerocopy_receive, %struct.tcp_zerocopy_receive* %zc, i64 0, i32 3, !dbg !25171 store i32 %call636195, i32* %inq, align 8, !dbg !25172 br label %zerocopy_rcv_out, !dbg !25173 zerocopy_rcv_out: ; preds = %if.end621, %tcp_inq_hint.exit %tobool637 = icmp eq i32 %call622, 0, !dbg !25175 br i1 %tobool637, label %land.lhs.true638, label %cleanup644, !dbg !25177 land.lhs.true638: ; preds = %zerocopy_rcv_out br i1 %cmp3.i.i181212, label %if.then.i.i39, label %if.end12.i.i42, !dbg !25182, !prof !14149, !misexpect !14150 if.then.i.i39: ; preds = %land.lhs.true638 br i1 %180, label %if.else.i.i41, label %if.then7.i.i40, !dbg !25183, !prof !15638 if.then7.i.i40: ; preds = %if.then.i.i39 call void (i8*, ...) @__warn_printk(i8* getelementptr inbounds ([38 x i8], [38 x i8]* @.str.16, i64 0, i64 0), i32 24, i64 %conv617209) #16, !dbg !25187 call void asm sideeffect "...", "i,i,i,i,~{dirflag},~{fpsr},~{flags}"(i8* getelementptr inbounds ([30 x i8], [30 x i8]* @.str.17, i64 0, i64 0), i32 127, i32 2305, i64 12) #14, !dbg !25188, !srcloc !24631 br label %copy_to_user.exit47, !dbg !25189 if.else.i.i41: ; preds = %if.then.i.i39 call void @__bad_copy_from() #16, !dbg !25190 br label %copy_to_user.exit47, !dbg !25190
Here, %177 feeds into %178, which is a PHI node. That feeds into %179, which includes it twice and another value that's true from another block. So this is the start of the badness.
It turns out that during jump threading, the if.end616 block has a constant and non-constant value coming into it and are combined in a PHI node. This is why the block is duplicated (a constant block if.end616.thread and non-constant if.end616). But this messes with the llvm.is.constant intrinsic, which looks at that value:
if.end616.thread: %len.0204 = phi i32 [ 24, %if.end603 ] %conv617205 = sext i32 %len.0204 to i64 %cmp3.i.i181 = icmp ugt i32 %len.0204, 24 %177 = call i1 @llvm.is.constant.i64(i64 %conv617205) br i1 %cmp3.i.i181, label %if.then.i.i182, label %if.end12.i.i185 if.end616: %len.0 = phi i32 [ %conv592, %if.end603 ] %conv617 = sext i32 %len.0 to i64 %cmp3.i.i181 = icmp ugt i32 %len.0, 24 %178 = call i1 @llvm.is.constant.i64(i64 %conv617) br i1 %cmp3.i.i181, label %if.then.i.i182, label %if.end12.i.i185
In reality, we want the original llvm.is.constant to return false. But %177 is true while %178 is false, leading to the PHI nodes that have %177 and %178 as values to not resolve to false and therefore we don't get the DCE effect.
The patch here solves this by resolving the llvm.is.constant before we get to that stage. Perhaps it should be done earlier or the logic that allows a block to be threaded should check for a situation where a PHI's value feeds into an llvm.is.constant intrinsic?
Alternative patch. This prevents threading blocks where a PHI node has a constant and is used by an is.constant intrinsic.
Are you sure that is the correct snippet? Because if it is, i'm lost completely.
It turns out that during jump threading, the if.end616 block has a constant and non-constant value coming into it and are combined in a PHI node. This is why the block is duplicated (a constant block if.end616.thread and non-constant if.end616). But this messes with the llvm.is.constant intrinsic, which looks at that value:
if.end616.thread: %len.0204 = phi i32 [ 24, %if.end603 ] %conv617205 = sext i32 %len.0204 to i64 %cmp3.i.i181 = icmp ugt i32 %len.0204, 24 %177 = call i1 @llvm.is.constant.i64(i64 %conv617205) br i1 %cmp3.i.i181, label %if.then.i.i182, label %if.end12.i.i185 if.end616: %len.0 = phi i32 [ %conv592, %if.end603 ] %conv617 = sext i32 %len.0 to i64 %cmp3.i.i181 = icmp ugt i32 %len.0, 24 %178 = call i1 @llvm.is.constant.i64(i64 %conv617) br i1 %cmp3.i.i181, label %if.then.i.i182, label %if.end12.i.i185
I'm not seeing @if.end603 and @if.end616.thread, in This is after jump threading: snippet.
Yes. I just elided some parts that aren't super relevant. I can supply a preprocessed file though.
It turns out that during jump threading, the if.end616 block has a constant and non-constant value coming into it and are combined in a PHI node. This is why the block is duplicated (a constant block if.end616.thread and non-constant if.end616). But this messes with the llvm.is.constant intrinsic, which looks at that value:
if.end616.thread: %len.0204 = phi i32 [ 24, %if.end603 ] %conv617205 = sext i32 %len.0204 to i64 %cmp3.i.i181 = icmp ugt i32 %len.0204, 24 %177 = call i1 @llvm.is.constant.i64(i64 %conv617205) br i1 %cmp3.i.i181, label %if.then.i.i182, label %if.end12.i.i185 if.end616: %len.0 = phi i32 [ %conv592, %if.end603 ] %conv617 = sext i32 %len.0 to i64 %cmp3.i.i181 = icmp ugt i32 %len.0, 24 %178 = call i1 @llvm.is.constant.i64(i64 %conv617) br i1 %cmp3.i.i181, label %if.then.i.i182, label %if.end12.i.i185I'm not seeing @if.end603 and @if.end616.thread, in This is after jump threading: snippet.
The @if.end616.thread that's relevant is in the last code block above. I mistakenly omitted it from the previous example.
I'm confused by that example. If jumpthreading makes it possible to answer true, it is no differerent from inlining and other use cases. The reverse is the problematic case, turning constant cases into non-constant cases?
No, inlining is fine, because it'll create a PHI node with the two values only if they can be combined into one. The value here shouldn't be considered constant. In a nutshell, jump threading is taking a PHI node and splitting it apart if one of the incoming values is a constant. This is supposedly fine in the general case, but doesn't work with the sematics of llvm.is.constant.
It still seems to be a perfectly reasonable optimisation under the semantics of is.constant. The code here seems to be an abuse of it though and makes assumptions that are not sensible.
The code is abusi^Wusing how __builtin_constant_p() works in gcc. It's honored even after inlining. But while that's icky, what's going on here is not quite that.
int val; if (get_user(len, optlen)) return -EFAULT; len = min_t(unsigned int, len, sizeof(int)); if (len < 0) return -EINVAL; switch (optname) { ... case TCP_ZEROCOPY_RECEIVE: { struct tcp_zerocopy_receive zc; int err; if (get_user(len, optlen)) return -EFAULT; if (len < offsetofend(struct tcp_zerocopy_receive, length)) return -EINVAL; if (len > sizeof(zc)) { len = sizeof(zc); if (put_user(len, optlen)) return -EFAULT; } if (copy_from_user(&zc, optval, len)) return -EFAULT; lock_sock(sk); err = tcp_zerocopy_receive(sk, &zc); release_sock(sk); if (len == sizeof(zc)) goto zerocopy_rcv_sk_err; switch (len) { case offsetofend(struct tcp_zerocopy_receive, err): goto zerocopy_rcv_sk_err; case offsetofend(struct tcp_zerocopy_receive, inq): goto zerocopy_rcv_inq; case offsetofend(struct tcp_zerocopy_receive, length): default: goto zerocopy_rcv_out; } zerocopy_rcv_sk_err: if (!err) zc.err = sock_error(sk); zerocopy_rcv_inq: zc.inq = tcp_inq_hint(sk); zerocopy_rcv_out: if (!err && copy_to_user(optval, &zc, len)) err = -EFAULT; return err; }
The len variable is assigned in several places here. In only one place does it get a constant value: the len = sizeof(zc) line. The copy_from_user and copy_to_user are inlined. They call a function (also inlined) to check the object size for overwrites:
_Bool check_copy_size(const void *addr, unsigned long bytes, _Bool is_source) { int sz = __builtin_object_size(addr, 0); if (__builtin_expect(!!(sz >= 0 && sz < bytes), 0)) { if (!__builtin_constant_p(bytes)) copy_overwrite(sz, bytes); else if (is_source) __bad_copy_from(); else __bad_copy_to(); return false; } check_object_size(addr, bytes, is_source); return true; }
The __builtin_constant_p(bytes) needs to return false, otherwise we'll get a link error because the __bad_copy_* functions don't exist. (Note: bytes is len here.)
Now, because len doesn't change between the copy_from_user and copy_to_user calls, the check if it's constant is combined. Now jump threading comes along and essentially converts that code into one where the if (len == sizeof(zc)) statement is split into two paths: one with len constant and the other with it non-constant. Those two combine into a PHI node (actually multiple PHI nodes) and that's what's used within the copy_to_user call. The copy_from_user call seems content to use the non-constant path (probably because of interaction between inlining and jump threading, because removing some of the calls between the two copy_*_user calls gets rid of the problem).
Threading through the if (len == sizeof(zc)) statement would be fine if it weren't for the fact that programmers rely upon is.constant to be false when not all paths to it have constant values for the operand.
Can you post standalone example on godbolt that shows this link failure without this patch please?
I'm not sure what that will achieve. It would include all of the Linux kernel as part of the linking process, and it's only one file (the one I reduced the testcase from) that has the __bad_copy_from() reference still in it. Perhaps I'm missing something?
I personally still don't understand *why* after this transform we fail to DCE. Is this pass ordering issue? Heuristics issue?
I gave a fairly lengthy explanation above. The test case is also instructive, and would be much easier to see what's happening, to tell the truth.
It's not about pass ordering. It's about how jump threading splits a PHI node into two parts: one with a constant value and one with a non-constant value. This is fine in normal situations, but the is.constant intrinsic is not normal, at least not in the way one thinks instructions are. It's meant to determine if the variable *at the point it's called* is constant or non-constant. If there are two values combined by a PHI node with one constant and the other non-constant, it's not appropriate to split the PHI apart into constant and non-constant paths, because then what would normally be DCE'd (like the call to __bad_copy_from) won't be, because there's one path where the is.constant is true, thus keeping the call to __bad_copy_from() around.
I'm not really happy about imposing optimization "rule", which we have to follow for correctness, without some explanation in LangRef. Hacking a specific pass without a general rule is just going to make things more messy in the future; some other pass will eventually trip over this, and we won't have any reference for how it should be fixed.
Really, to preserve the semantics the kernel wants in general, we need some notion of a "region": when we have an if statement with a builtin_constant_p condition, the body of that if statement is a region with special semantics. If we resolve a builtin_constant_p, every use of a variable used as an input to the __builtin_constant_p must be constant-folded inside that region. We can optimize within the region, we can clone the whole region, but we can't clone the entry point of the region without cloning the whole thing, and we can't merge parts of the region without merging the whole thing.
I'm not sure how to actually represent that in LLVM IR in a way that can be uniformly queried by optimizations in a reasonable way. I guess marking llvm.is.constant "convergent" partially solves the issue, but we still might have problems with optimizations that merge code.
I was discussing this with Bill off-list to get a better idea of the original test case. Basically, with the new constant intrinsic pass, we can provide a stronger semantic: the default LLVM pass chain always constant folds expressions involving is.constant and performs DCE trivially dead branches resulting from the former folding. This means that if the original expression is adjusted from:
if (__builtin_expect(!!(sz >= 0 && sz < bytes), 0)) { if (!__builtin_constant_p(bytes)) ... }
to the functionally equivalent:
if (!(__builtin_constant_p(sz) && __builtin_constant_p(bytes) && sz >= 0 && sz >= bytes) && __builtin_expect(!!(sz >= 0 && sz < bytes), 0)) { if (!__builtin_constant_p(bytes)) ... }
then we can actually ensure proper DCE even with -O0 in the default pass chain. That depends over all on two properties:
- If __builtin_constant_p is constant folded by clang, it must DCE the appropiate places.
- the constant intrinsic pass is run
With -O1 or higher, this should work in general.
This isn't about imposing a new "rule" on optimizations. It's preventing an optimization from changing the semantics of the program. At the point where jump threading decides to clone the region and make one edge constant and the other non-constant, it's violating llvm.is.constant's semantics. I'll make this more explicit in the language reference. But the upshot is that this isn't a hack, because of how llvm.is.constant is meant to work (i.e. to mimic builtin_constant_p). I don't think we should try to optimize around llvm.is.constant beyond DCE, which is really the main purpose of builtin_constant_p.
IMO the root of the problem here is that the branches are mixing predicated variables (bytes) with non-predicated variables (sz). If users want deterministic behavior here, that shouldn't be done.
I don't understand. The builtin_expect() call is only a suggestion to the compiler that a branch may or may not be taken. I don't see why this isn't allowed:
if (sz >= 0 && sz < bytes) { if (!__builtin_constant_p(bytes)) ... }
Yes, builtin_expect is just noise here. The point I wanted to make is that both sz and bytes should be used in builtin_constant_p in the other condition.
The original code has a functional dependency between sz and bytes and whether they can be constant evaluated. But the code doesn't express that. I don't think we can enforce that in any sensible way. There are valid use cases after all where partial inlining would result in entirely sensible decisions, just think about the more typical case of __builtin_constant_p selecting between inline asm taking immediate operands and one taking register/memory operands. That's why I am saying that I consider it a lot more useful to provide reliable building blocks to express the dependency and make sure they work.
[Adding the code snippets below for reference.]
Okay I think I get what you're saying now. If both sz and bytes are constant, then we can evaluate the if conditional and DCE properly. If sz or bytes isn't constant then the if conditional can't be DCE'd and that's when we require bytes to be non-constant. (Note: We don't care about compiling at -O0. Linux as a whole can't compile without optimizations.) In your code, you check that explicitly in the if conditional. While Linux is being clever and letting the compiler do it for us.
I don't argue that your suggested replacement is more readable and probably more amenable to the compiler, I still think that what Linux is doing is okay.
[Edit]
Re supplying reliable building blocks. That's not something we can do here, and I'm not sure how that would manifest. The situation as I see it is this: what's been written is valid code. It could be written in a different way, but in the end it's something the compiler should handle just as if they wrote it in the way you suggested. Therefore, the optimization passes must not violate these semantics. If you're suggesting that this is something the compiler should supply, I still don't see how that would manifest unless it's something akin to Eli's region suggestion. Even then, it's a fundamental change to the compiler that doesn't help me at all. This is a blocking bug for us, and it's pernicious as it leads to unexpected code generation. Adding a fundamental change to the compiler for one instruction seems a bit heavy handed to me. Would you prefer adding the convergent attribute to the intrinsic?
Original Code: _Bool check_copy_size(const void *addr, unsigned long bytes, _Bool is_source) { int sz = __builtin_object_size(addr, 0); if (__builtin_expect(!!(sz >= 0 && sz < bytes), 0)) { if (!__builtin_constant_p(bytes)) copy_overwrite(sz, bytes); else if (is_source) __bad_copy_from(); else __bad_copy_to(); return false; } check_object_size(addr, bytes, is_source); return true; } Your Suggested Replacement: if (!(__builtin_constant_p(sz) && __builtin_constant_p(bytes) && sz >= 0 && sz >= bytes) && __builtin_expect(!!(sz >= 0 && sz < bytes), 0)) { if (!__builtin_constant_p(bytes)) ... }
llvm/test/Transforms/JumpThreading/is_constant.ll | ||
---|---|---|
270 | Haven't read the code yet, but my feeling is that this test case really should be reduced. These metadata are not useful. |
llvm/test/Transforms/JumpThreading/is_constant.ll | ||
---|---|---|
270 | I can try, but it's pretty reduced as is. I think it triggers because of the amount of code around the issue. |
It looks like the testcase can be reduced to the following. You can check for no jump threading occurring:
define void @foo(i32 %a, i32 %var) { entry: %cond1 = icmp ugt i32 %a, 24 br i1 %cond1, label %bb_constant, label %bb_cond bb_constant: br label %bb_cond bb_cond: %phi = phi i32 [ 24, %bb_constant ], [ %var, %entry ] %sext = sext i32 %phi to i64 %cond2 = icmp ugt i64 %sext, 24 %is_constant = call i1 @llvm.is.constant.i64(i64 %sext) br i1 %cond2, label %bb_then, label %bb_else bb_then: unreachable bb_else: unreachable } ; Function Attrs: nounwind readnone willreturn declare i1 @llvm.is.constant.i64(i64) #0 attributes #0 = { nounwind readnone willreturn }
That said, if I remove the sign extension like so, the edge entry->bb_constant gets threaded through bb_cond to bb_else even with your patch. IIUC, your patch does not catch a case where a PHI node is directly used by llvm.is_constant:
define void @bar(i32 %a, i64 %var) { entry: %cond1 = icmp ugt i32 %a, 24 br i1 %cond1, label %bb_constant, label %bb_cond bb_constant: br label %bb_cond bb_cond: %sext = phi i64 [ 24, %bb_constant ], [ %var, %entry ] %cond2 = icmp ugt i64 %sext, 24 %is_constant = call i1 @llvm.is.constant.i64(i64 %sext) br i1 %cond2, label %bb_then, label %bb_else bb_then: unreachable bb_else: unreachable } ; Function Attrs: nounwind readnone willreturn declare i1 @llvm.is.constant.i64(i64) #0 attributes #0 = { nounwind readnone willreturn }
By the way, MaybeThreadThroughTwoBasicBlocks in JumpThreading.cpp can thread an edge through two successive basic blocks theses days. You need to teach MaybeThreadThroughTwoBasicBlocks to look out for problematic llvm.is_constant also.
FWIW, we catch convergent calls in a somewhat sneaky interface. Specifically, getJumpThreadDuplicationCost returns ~0U when it encounters an instruction that we cannot safely duplicate. You could teach getJumpThreadDuplicationCost to return ~0U on a problematic llvm.is_constant. This way, you don't have to perform the same check in both MaybeThreadThroughTwoBasicBlocks and TryThreadEdge.
Now, the convergent attribute seems to me like a perfect fit for llvm.is_constant as far as I can tell from its description, but I am not sure what side effects it would have on the rest of the compiler.
Using a convergent attribute accomplishes the same thing and doesn't run into the issue Eli pointed out, which is that we're creating special rules for passes.
Haven't read the code yet, but my feeling is that this test case really should be reduced. These metadata are not useful.