Try to fix issue https://github.com/llvm/llvm-project/issues/62163
as (1 srem X) is not equivalent with zext (X != 1), pattern sdiv a (1 srem b) miss the optimization.
Paths
| Differential D149001
[InstSimplify] sdiv a (1 srem b) --> a ClosedPublic Authored by floatshadow on Apr 22 2023, 11:14 AM.
Details Summary Try to fix issue https://github.com/llvm/llvm-project/issues/62163 as (1 srem X) is not equivalent with zext (X != 1), pattern sdiv a (1 srem b) miss the optimization.
Diff Detail
Event TimelineComment Actions This is missing a lot of conjugate patterns. You can do the same for udiv and there are related patterns for urem and srem as well. You want to extend this code: https://github.com/llvm/llvm-project/blob/1eb74f7e83ffb3f9d00e5987cead3b12e00bbe82/llvm/lib/Analysis/InstructionSimplify.cpp#L1053-L1061 This revision now requires changes to proceed.Apr 22 2023, 11:40 AM Comment Actions now this patch will figure out other conjugate patterns like [u]sdiv X (1 [u]srem Y) --> X and [u]srem X (1 [u]srem Y) --> 0 floatshadow edited parent revisions, added: D149012: [InstSimplify] Test case for D149001; removed: D149008: [InstSimplify] Test for D149001.Apr 23 2023, 12:35 AM Comment Actions Thanks! This will still a miss other patterns that can only be zero or one. For example a mask with 1: https://alive2.llvm.org/ce/z/A_ffYe You can handle these by calling computeKnownBits() and checking the number of leading zeros. This revision now requires changes to proceed.Apr 23 2023, 12:52 AM Comment Actions use computeKnownBits() to check if the divisor can only be zero or one. another question is can this method replace the original pattern match like match(Op1, m_One())? I worry about the performance of computeKnownBits(). Comment Actions @nikic I checked PR51762 test in Transforms/InstCombine/zext-or-icmp.ll %lor.ext = zext i1 %spec.select57 to i32 %t2 = load i32, ptr %d, align 4 %conv15 = sext i16 %t1 to i32 %cmp16 = icmp sge i32 %t2, %conv15 %conv17 = zext i1 %cmp16 to i32 %t3 = load i32, ptr %f, align 4 %add = add nsw i32 %t3, %conv17 store i32 %add, ptr %f, align 4 %rem18 = srem i32 %lor.ext, %add %conv19 = zext i32 %rem18 to i64 %div = udiv i64 %insert.insert41, %conv19 %trunc33 = trunc i64 %div to i32 store i32 %trunc33, ptr %d, align 8 %r = icmp ult i64 %insert.insert41, %conv19 call void @llvm.assume(i1 %r) as %cond19 = zext (srem (zext i1) %add), %cond can only be zero or one as for current patch, it seems udiv will first enter simplifyDivRem which do the fold, replace %udiv with %insert.insert41 (done by computeKnownBits). ADD DEFERRED: %insert.insert41 = or i64 %insert.shift52, %insert.ext39 IC: Mod = %trunc33 = trunc i64 %insert.insert41 to i32 New = %trunc33 = trunc i64 %insert.ext39 to i32 thus store i32 %trunc33, ptr %d, align 8 becomes store i32 %sroa38, ptr %d, align 8 I am not sure what I should do now, alive2 tells me that the transforms seems correct https://alive2.llvm.org/ce/z/NfVsmT Comment Actions
It's okay to replace in this case. Division instructions are very rare, so this is not particularly performance sensitive.
You can just regenerate the test. It's a fuzzer generated test case, so we don't really care as long as it doesn't infinite loop / miscompile. Comment Actions address comments
Comment Actions This looks good to me, but it looks like you need to regenerate some clang OpenMP tests as well. Not sure why/where those run InstSimplify, but the failures look legitimate.
This revision now requires changes to proceed.Apr 24 2023, 2:20 AM Comment Actions LGTM. If you don't have commit access, can you please share the Name <email> to use for attribution? This revision is now accepted and ready to land.Apr 24 2023, 5:16 AM
Comment Actions
This my first revision LOL. This revision was landed with ongoing or failed builds.Apr 24 2023, 5:37 AM Closed by commit rG946b32680311: [InstSimplify] sdiv a (1 srem b) --> a (authored by floatshadow, committed by nikic). · Explain Why This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 516368 clang/test/OpenMP/distribute_parallel_for_reduction_task_codegen.cpp
clang/test/OpenMP/for_reduction_task_codegen.cpp
clang/test/OpenMP/parallel_for_reduction_task_codegen.cpp
clang/test/OpenMP/parallel_master_reduction_task_codegen.cpp
clang/test/OpenMP/parallel_reduction_task_codegen.cpp
clang/test/OpenMP/parallel_sections_reduction_task_codegen.cpp
clang/test/OpenMP/sections_reduction_task_codegen.cpp
clang/test/OpenMP/target_parallel_for_reduction_task_codegen.cpp
clang/test/OpenMP/target_parallel_reduction_task_codegen.cpp
clang/test/OpenMP/target_teams_distribute_parallel_for_reduction_task_codegen.cpp
clang/test/OpenMP/teams_distribute_parallel_for_reduction_task_codegen.cpp
llvm/lib/Analysis/InstructionSimplify.cpp
llvm/test/Transforms/InstCombine/zext-or-icmp.ll
llvm/test/Transforms/InstSimplify/div.ll
|
I'd feel better about a == here.