Try to fix issue https://github.com/llvm/llvm-project/issues/62163
as (1 srem X) is not equivalent with zext (X != 1), pattern sdiv a (1 srem b) miss the optimization.
Differential D149001
[InstSimplify] sdiv a (1 srem b) --> a floatshadow on Apr 22 2023, 11:14 AM. Authored by
Details Try to fix issue https://github.com/llvm/llvm-project/issues/62163 as (1 srem X) is not equivalent with zext (X != 1), pattern sdiv a (1 srem b) miss the optimization.
Diff Detail Event TimelineComment Actions This is missing a lot of conjugate patterns. You can do the same for udiv and there are related patterns for urem and srem as well. You want to extend this code: https://github.com/llvm/llvm-project/blob/1eb74f7e83ffb3f9d00e5987cead3b12e00bbe82/llvm/lib/Analysis/InstructionSimplify.cpp#L1053-L1061 Comment Actions now this patch will figure out other conjugate patterns like [u]sdiv X (1 [u]srem Y) --> X and [u]srem X (1 [u]srem Y) --> 0
Comment Actions Thanks! This will still a miss other patterns that can only be zero or one. For example a mask with 1: https://alive2.llvm.org/ce/z/A_ffYe You can handle these by calling computeKnownBits() and checking the number of leading zeros. Comment Actions use computeKnownBits() to check if the divisor can only be zero or one. another question is can this method replace the original pattern match like match(Op1, m_One())? I worry about the performance of computeKnownBits(). Comment Actions @nikic I checked PR51762 test in Transforms/InstCombine/zext-or-icmp.ll %lor.ext = zext i1 %spec.select57 to i32 %t2 = load i32, ptr %d, align 4 %conv15 = sext i16 %t1 to i32 %cmp16 = icmp sge i32 %t2, %conv15 %conv17 = zext i1 %cmp16 to i32 %t3 = load i32, ptr %f, align 4 %add = add nsw i32 %t3, %conv17 store i32 %add, ptr %f, align 4 %rem18 = srem i32 %lor.ext, %add %conv19 = zext i32 %rem18 to i64 %div = udiv i64 %insert.insert41, %conv19 %trunc33 = trunc i64 %div to i32 store i32 %trunc33, ptr %d, align 8 %r = icmp ult i64 %insert.insert41, %conv19 call void @llvm.assume(i1 %r) as %cond19 = zext (srem (zext i1) %add), %cond can only be zero or one as for current patch, it seems udiv will first enter simplifyDivRem which do the fold, replace %udiv with %insert.insert41 (done by computeKnownBits). ADD DEFERRED: %insert.insert41 = or i64 %insert.shift52, %insert.ext39 IC: Mod = %trunc33 = trunc i64 %insert.insert41 to i32 New = %trunc33 = trunc i64 %insert.ext39 to i32 thus store i32 %trunc33, ptr %d, align 8 becomes store i32 %sroa38, ptr %d, align 8 I am not sure what I should do now, alive2 tells me that the transforms seems correct https://alive2.llvm.org/ce/z/NfVsmT Comment Actions It's okay to replace in this case. Division instructions are very rare, so this is not particularly performance sensitive.
You can just regenerate the test. It's a fuzzer generated test case, so we don't really care as long as it doesn't infinite loop / miscompile. Comment Actions address comments
Comment Actions This looks good to me, but it looks like you need to regenerate some clang OpenMP tests as well. Not sure why/where those run InstSimplify, but the failures look legitimate.
Comment Actions LGTM. If you don't have commit access, can you please share the Name <email> to use for attribution?
|
I'd feel better about a == here.