This is an archive of the discontinued LLVM Phabricator instance.

[llvm/CodeGen] Add ExpandLargeDivRem pass
ClosedPublic

Authored by mgehre-amd on May 30 2022, 5:45 AM.

Details

Summary

Adds a pass ExpandLargeDivRem to expand div/rem instructions
with more than 128 bits into auto-generated loops.

For example, urem i129 is expanded into a loop that is
automatically generated to implement
a simple shift-subtract algorithm similar to

loop:                                             ; preds = %if.end, %entry
  %i = phi i32 [ 128, %entry ], [ %new_i, %if.end ]
  %r = phi i129 [ 0, %entry ], [ %r3, %if.end ]
  %iext = zext i32 %i to i129
  %2 = lshr i129 %0, %iext
  %3 = trunc i129 %2 to i1
  %new_r = shl i129 %r, 1
  %4 = zext i1 %3 to i129
  %new_r1 = or i129 %new_r, %4
  %loop_exit_cond = icmp eq i32 %i, 0
  %new_i = add i32 %i, -1
  %5 = icmp uge i129 %new_r1, %1
  br i1 %5, label %then, label %if.end

then:                                             ; preds = %loop
  %new_r2 = sub i129 %new_r1, %1
  br label %if.end

if.end:                                           ; preds = %then, %loop
  %r3 = phi i129 [ %new_r2, %then ], [ %new_r1, %loop ]
  br i1 %loop_exit_cond, label %exit, label %loop

  ; Result is in %r3
}

As discussed on https://reviews.llvm.org/D120327, this approach has the advantage
that it is independent of the runtime library. This also helps the clang driver,
which otherwise would need to understand enough about the runtime library
to know whether to allow _BitInts with more than 128 bits.

Targets are still free to disable this pass and instead provide a faster
implementation in a runtime library.

Fixes https://github.com/llvm/llvm-project/issues/44994

Diff Detail

Event Timeline

mgehre-amd created this revision.May 30 2022, 5:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 30 2022, 5:45 AM
mgehre-amd requested review of this revision.May 30 2022, 5:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 30 2022, 5:45 AM
mgehre-amd edited the summary of this revision. (Show Details)Jun 8 2022, 8:15 AM

My primary concern here is that we're unconditionally iterating over every instruction in the module, even if the pass has nothing to do. In the grand scheme of things, it isn't *that* expensive, but it's still not cheap, and we seem to have grown a number of passes doing similar walks. Can we merge this with some other walk over the module?

My primary concern here is that we're unconditionally iterating over every instruction in the module, even if the pass has nothing to do. In the grand scheme of things, it isn't *that* expensive, but it's still not cheap, and we seem to have grown a number of passes doing similar walks. Can we merge this with some other walk over the module?

Hey, thanks for taking the time to look into this PR.
I looked around the pipeline to find passes that look relevant. I saw a few that didn't work because they are only enabled above -O0.
Then I found others that in principle would work, but all of those are so focused (createGCLoweringPass, createLowerConstantIntrinsicsPass, createExpandMemCmpPass, createPreISelIntrinsicLoweringPass, ...) that it felt wrong
to dilute their focus by adding the div/rem transformation.
Under -O2, we currently run 454 passes in the middle end and the backend combined.

Do you have a concrete proposal how I could address your concern?

arsenm added a subscriber: arsenm.Jun 14 2022, 5:52 AM

It's very unusual for a lowering strategy to introduce a new function. I would expect to expand the instruction inline and not emit a separate function, which avoids the need for this to be a module pass.

llvm/lib/CodeGen/ExpandLargeDivRem.cpp
54–60

Can you set these all at once with AttributeList

219

You shouldn't need to pre-accumulate the instructions, just use make_early_inc_range on the instruction iterator?

227–228

dyn_cast instead of isa + implicit cast to IntegerType?

228

Since this is ultimately a workaround for SelectionDAG, ideally this would be a TargetLowering control

Actually, how is this really different from lib/Transforms/Utils/IntegerDivision.cpp? This just goes to 128 but the existing code goes up to 64. Can you merge them, or at least move the implementation bodies to the same place?

Actually, how is this really different from lib/Transforms/Utils/IntegerDivision.cpp? This just goes to 128 but the existing code goes up to 64. Can you merge them, or at least move the implementation bodies to the same place?

Thanks, I really didn't know about IntegerDivision.cpp! I'll try to merge both implementations. If that works, I can use it here and avoid emitting a new function.

Another problem is this will miss divisions embedded in constant expressions

Another problem is this will miss divisions embedded in constant expressions

Good point. I wasn't able to create a reproducer from C to a sdiv constant expression, but there is a C++ reproducer: https://godbolt.org/z/v5Tsh8n9x.
My hypothesis is that there is no C/C++ code that creates an sdiv/udiv/srem/urem constant expression in a global variable initializers.
So it would be sufficient to walk the ConstantExpr operands of all instructions, looking for sdiv etc that needs to be converted, and
then uses something like replaceConstantUsesInFunction() to turn them into instructions. Does that sound sensible?

mgehre-amd marked 4 inline comments as done.

Use IntegerDivison.cpp
Do not emit separate functions
Turn into FunctionPass

llvm/lib/CodeGen/ExpandLargeDivRem.cpp
219

I tried, but it crashed. I think that's because we not only remove the current instruction but also insert new basic blocks, which confuses make_early_inc_range (instructions(F)).

228

I'm not sure how I would wire TargetLowering into the pass - I didn't find other passes using it.
Would TargetTransformInfo also work for you?

With the work around constant expressions (e.g. https://reviews.llvm.org/rG941c8e0ea50b), I might not need to handle div/mod constants anymore. For that reason, I would at least land this PR without support for transforming constant expressions.

nikic added a subscriber: nikic.Jul 7 2022, 12:43 AM

With the work around constant expressions (e.g. https://reviews.llvm.org/rG941c8e0ea50b), I might not need to handle div/mod constants anymore. For that reason, I would at least land this PR without support for transforming constant expressions.

div/rem constant expressions have already been removed. So yes, no need to handle them here anymore.

Ping, how can I get review/approval for this PR? Should I add somebody else as reviewer?

nikic added a comment.Jul 12 2022, 6:15 AM

I'm a bit confused, where in the code does this actually create functions? It doesn't look like expandDivision() does that, and I can't spot anything else creating __llvm_udivXXX either.

llvm/lib/CodeGen/ExpandLargeDivRem.cpp
41

You should either do the replacement directly in this loop, or not use make_early_inc_range. Doesn't make sense to use it if you're not modifying anything.

42

Unnecessary newline

69

Could use a vector of BinaryOperator to avoid the casts here.

85

At least for the new pass manager, I don't think it's legal to add new functions in a function pass. I'm not sure about the legacy pass manager.

llvm/lib/CodeGen/TargetPassConfig.cpp
859

Spurious change.

llvm/lib/Transforms/Utils/IntegerDivision.cpp
35

Precommit as NFC changes?

llvm/test/CodeGen/X86/urem-seteq.ll
380

What happened here?

arsenm added inline comments.Jul 12 2022, 7:26 AM
llvm/lib/CodeGen/ExpandLargeDivRem.cpp
48

This won't handle vectors

llvm/test/Transforms/ExpandLargeDivRem/urem129.ll
14

This looks out of date?

llvm/test/Transforms/ExpandLargeDivRem/values129.ll
6 ↗(On Diff #437840)

Shouldn't use -O2 in a test like this

21 ↗(On Diff #437840)

I'm not sure cases that constant fold are the most helpful tests

mgehre-amd edited the summary of this revision. (Show Details)Jul 15 2022, 12:54 AM

I'm a bit confused, where in the code does this actually create functions? It doesn't look like expandDivision() does that, and I can't spot anything else creating __llvm_udivXXX either.

Yes, I changed that approach to directly generate a loop in place of the original instruction as it was requested by one of the initial review comments here. I now also updated the PR description to reflect that.

mgehre-amd marked 11 inline comments as done.

Implement review comments

Thanks for the review! I updated the PR to reflect your comments.

llvm/lib/CodeGen/ExpandLargeDivRem.cpp
48

True, I'm not concerned about vectors right now. I can add a TODO comment here.

85

Yes, this was one of the early comments I got on this PR, so I changed it to not insert new functions but instead insert the loop in-place.

llvm/test/CodeGen/X86/urem-seteq.ll
380

This IR used to crash LLVM. I don't think that the actual assembly is relevant here, and after my changes it's much longer due to the loops that were inserted.

llvm/test/Transforms/ExpandLargeDivRem/urem129.ll
14

Yes, stupid me. I'll update.

llvm/test/Transforms/ExpandLargeDivRem/values129.ll
21 ↗(On Diff #437840)

I wrote this test when the implementation of the shift-subtract loop was still mine. It is verifying that
the value that is computed by that loop is the same as the value of the urem.
This is done by comparing whether constant-folding the urem directly gives the same value
as first replacing the urem by expandlargedivrem and then constant folding the result.

Then, I learned that there was existing implementation of the shift-subtract loop in llvm/lib/Transforms/Utils/IntegerDivision.cpp and switched to that.
Because Utils/IntegerDivision.cpp is already tested, I will get rid of this test here.

arsenm added inline comments.Jul 18 2022, 3:40 PM
llvm/lib/CodeGen/TargetPassConfig.cpp
1115

This doesn't belong in this patch. Currently this is lacking in legality checks for the expansion. I would prefer to see a separate patch which adds the pass and checks the legality per target, rather than relying on the arbitrary command line threshold

llvm/test/Transforms/ExpandLargeDivRem/sdiv129.ll
11

The positioning of the checks looks weird. Usually update_test_checks put these comments inside the function?

mgehre-amd marked 2 inline comments as done.Jul 19 2022, 2:57 AM
mgehre-amd added inline comments.
llvm/lib/CodeGen/TargetPassConfig.cpp
1115

Ok, I'll make a follow-up PR to add the pass and configure a per-target threshold.
I'm thinking about adding an int MaxLegalDivRemBitSize member to TargetTransformInfo.

llvm/test/Transforms/ExpandLargeDivRem/sdiv129.ll
11

I guess the position is still from the time where this was an extra function. I regenerated the check-lines, but update_test_checks.py doesn't move them around. I will re-generate them from scratch to get the typical position.

mgehre-amd marked 2 inline comments as done.

Remove pass from pass pipeline (will be added in next PR)

craig.topper added inline comments.
llvm/lib/CodeGen/ExpandLargeDivRem.cpp
13

I think but haven't checked that 32-bit x86, ARM, RISCV32, and other 32-bit targets can't lower division with more than 64 bits.

34

This says "<N> or more bits", but the code exits for <= ExpandDivRemBits

llvm/lib/Transforms/Utils/IntegerDivision.cpp
145–148

You already have the DivTy, why not go through ConstantInt::get for all of these?

llvm/test/Transforms/ExpandLargeDivRem/urem129.ll
15

Is this code handling the possible poison result from the ctlz's incorrectly? If [A] is zero then [TMP3] is poison, [TMP4] is poison, [TMP5] is poison, [TMP6] being an or won't block the poison. Assuming I understand poison correctly.

I don't think that was caused by this patch. It's an issue in the existing code.

craig.topper added inline comments.Jul 27 2022, 9:50 PM
llvm/test/Transforms/ExpandLargeDivRem/urem129.ll
15

Patch for the poison issue https://reviews.llvm.org/D130680

mgehre-amd marked 3 inline comments as done.

Fix description of expand-div-rem-bits argument
Use ConstantInt::get

llvm/lib/CodeGen/ExpandLargeDivRem.cpp
13

I checked x86_32 and it lowers 128 bit divisions to __divti3: https://godbolt.org/z/Y6hT9n3x5

34

done

llvm/lib/Transforms/Utils/IntegerDivision.cpp
145–148

done

craig.topper added inline comments.Jul 28 2022, 10:20 AM
llvm/lib/CodeGen/ExpandLargeDivRem.cpp
13

But it fails to link https://godbolt.org/z/5x38Gxqa9 so I think that's a bug. __divti3 doesn't exist in 32-bit libgcc.

mgehre-amd marked an inline comment as done.Jul 28 2022, 3:02 PM
mgehre-amd added inline comments.
llvm/lib/CodeGen/ExpandLargeDivRem.cpp
13

That's a good observation! I will update the comment.
I also need to modify the follow-up PR, https://reviews.llvm.org/D130076,
to set the correct limits for those targets.

mgehre-amd marked an inline comment as done.

Clarify that x86_32 cannot lower division with more than 64 bits.

craig.topper added inline comments.
llvm/lib/CodeGen/ExpandLargeDivRem.cpp
13

I thought about disabling that libcall in 32-bit mode, but the it is supported by compiler-rt. So I guess a linker error when using libgcc is better than a compiler error when using compiler-rt or libgcc until we get this pass in.

We usually provide a runtime library for specialized functionalities like this. Are such division operations common enough to benefit from compiler generated code?

We usually provide a runtime library for specialized functionalities like this. Are such division operations common enough to benefit from compiler generated code?

We explored and prototyped the approach of adding a functions to compiler-rt in a different PR. We discovered multiple issues.

One issues is that we don't have a fixed type here. There are existing library functions to do 64 bit or 128 bit division, but here we want to implement divisions with any bitsize above 129. This makes defining an ABI harder.
In addition, the main runtime library on Linux is libgcc, and it lacks such functions. It won't be easy to get them added to libgcc due to the fact that gcc doesn't have _BitInt support yet.
Providing an additional runtime on top of libgcc with just those new functions is also new territory.

So in the end, a lot of things become easier by just doing the transformation on the IR directly.

mgehre-amd marked 3 inline comments as done.Aug 15 2022, 12:25 AM

Friendly ping :-)
I think I addressed all comments. Is somebody able to approve this?

One issues is that we don't have a fixed type here. There are existing library functions to do 64 bit or 128 bit division, but here we want to implement divisions with any bitsize above 129. This makes defining an ABI harder.

Yeah, X86 psABI has new defination for fixed bitsize. But it doesn't help here.

In addition, the main runtime library on Linux is libgcc, and it lacks such functions. It won't be easy to get them added to libgcc due to the fact that gcc doesn't have _BitInt support yet.

Applaud. Implementing runtime library per to ABI is problematic. It is easy resulting in backward compatibility issues. So I incline to this approach. But I didn't look into the details and long discussions, I'd like to other reviewers to sign off.

llvm/include/llvm/CodeGen/ExpandLargeDivRem.h
2

Update the name.

llvm/include/llvm/CodeGen/MachinePassRegistry.def
46

Nit: Maybe better to use expand-large-div-rem.

llvm/lib/CodeGen/ExpandLargeDivRem.cpp
2

Ditto.

Rename expandlargedivrem to expand-large-div-rem
Fix filenames in comments

pengfei accepted this revision.Aug 26 2022, 12:42 AM

I don't find any problems. Maybe we can let it go :)

This revision is now accepted and ready to land.Aug 26 2022, 12:42 AM
This revision was landed with ongoing or failed builds.Aug 26 2022, 3:55 AM
This revision was automatically updated to reflect the committed changes.