Page MenuHomePhabricator

[LegalizeTypes] Improve splitting for urem/udiv by constant for some constants.
ClosedPublic

Authored by craig.topper on Jul 31 2022, 4:36 PM.

Details

Summary

For remainder:
If (1 << (Bitwidth / 2)) % Divisor == 1, we can add the high and low halves
together and use a (Bitwidth / 2) urem. If (BitWidth /2) is a legal integer
type, this urem will be expand by DAGCombiner using multiply by magic
constant. We do have to take into account that adding high and low
together can produce a carry, making it a (BitWidth / 2)+1 bit number.
So we need to also add back in the carry from the first addition.

For division:
We can use the above trick to compute the remainder, subtract that
remainder from the dividend, then multiply by the multiplicative
inverse of the Divisor modulo (1 << BitWidth).

This is based on the section "Remainder by Summing Digits" in
Hacker's delight.

The remainder trick is similar to a trick you may have learned for
determining if a decimal number is divisible by 3. You can add all the
digits together and see if the sum is divisible by 3. If you're not sure
if the sum is divisible by 3, you can add its digits together. This
can be repeated until you have a single decimal digit. If that digit
is 3, 6, or 9, then the original number is divisible by 3. This works
because 10 % 3 == 1.

gcc already does this same trick. There are additional tricks gcc
does urem as well as srem, udiv, and sdiv that I plan to add in
future patches.

Diff Detail

Event Timeline

craig.topper created this revision.Jul 31 2022, 4:36 PM
craig.topper requested review of this revision.Jul 31 2022, 4:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2022, 4:36 PM

precommit tests?

llvm/include/llvm/CodeGen/TargetLowering.h
4729

expandUREMByConstant ?

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4474

auto *CN

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7162

assert N->getOpcode == ISD::UREM?

7195

Are you we causing any oneuse issues if we split here and then can't expand below?

7213

Pull out repeated getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), HiLoVT) ?

craig.topper added inline comments.Aug 1 2022, 9:23 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7195

Good point. Today it's not an issue because that code only executes on the X86 win64 path and we have ADDCARRY available. But I'll fix it.

Do you have an overall plan written up somewhere?

There are a few different ways you could extend this:

  • This could be extended to handle factors of two: for example, a % 10 -> ((a / 2) % 5) * 2 + (a % 2).
  • This could be extended to handle more factors by slicing up numbers differently. e.g. for a % 7, slice the number into three 30-bit pieces, since 2^30 mod 7 = 1.
  • This could possibly be used to implement division: e.g. a / 5 -> (a - (a % 5)) * inverse(5). Not sure if this is more efficient than other approaches; might depend on the target.

On the general topic of division by constants, see also discussion on https://github.com/llvm/llvm-project/issues/56153

Do you have an overall plan written up somewhere?

There are a few different ways you could extend this:

  • This could be extended to handle factors of two: for example, a % 10 -> ((a / 2) % 5) * 2 + (a % 2).

Yeah. I was also going to look at that. gcc does something, but what they do can be improved.

  • This could be extended to handle more factors by slicing up numbers differently. e.g. for a % 7, slice the number into three 30-bit pieces, since 2^30 mod 7 = 1.

That is what gcc does and what I was going to look at doing soon.

  • This could possibly be used to implement division: e.g. a / 5 -> (a - (a % 5)) * inverse(5). Not sure if this is more efficient than other approaches; might depend on the target.

That's also what gcc does. This interacts poorly with DivRemPairs. If we have a both a division and remainder DivRemPairs rewrites the remainder in terms of division. If we're going to use remainder to do the division, then what DivRemPairs is doing is the wrong direction.

My immediate next step was looking at extending this patch to UDIV using this method for the same constant divisors.

On the general topic of division by constants, see also discussion on https://github.com/llvm/llvm-project/issues/56153

I hadn't seen that bug, but I had talked to @ndesaulniers which is what motivated this.

hiraditya added a comment.EditedAug 1 2022, 11:01 AM

a / 5, and a%10 etc.

Is it possible that the processor already does these tricks internally to improve the performance of these operations?

Can we add an additional test to llvm/test/CodeGen/ARM/div.ll that urem with a small constant does not produce a libcall to __aeabi_uldivmod? The change to llvm/lib/Target/X86/X86ISelLowering.cpp makes me wonder how many backend-specific changes are necessary here. Also, not sure if this is something we only want to do for -O2 (vs -Os)?

Can we add an additional test to llvm/test/CodeGen/ARM/div.ll that urem with a small constant does not produce a libcall to __aeabi_uldivmod?

Sure. I can do that.

The change to llvm/lib/Target/X86/X86ISelLowering.cpp makes me wonder how many backend-specific changes are necessary here.

Also, not sure if this is something we only want to do for -O2 (vs -Os)?

I put in the isIntDivCheap check to catch the targets that restricted div by constant expansion for -Oz. Though maybe I need to check that for HiLoVT too since we're kind of assuming HiLoVT UREM will be expanded by DAGCombiner.

a / 5, and a%10 etc.

Is it possible that the processor already does these tricks internally to improve the performance of these operations?

We're concerned here about cases where the numerator doesn't fit into one word. x86 has an instruction for division where the numerator is two words, but no other commonly used target does. And even on x86, it's slow.

  • This could possibly be used to implement division: e.g. a / 5 -> (a - (a % 5)) * inverse(5). Not sure if this is more efficient than other approaches; might depend on the target.

That's also what gcc does. This interacts poorly with DivRemPairs. If we have a both a division and remainder DivRemPairs rewrites the remainder in terms of division. If we're going to use remainder to do the division, then what DivRemPairs is doing is the wrong direction.

My immediate next step was looking at extending this patch to UDIV using this method for the same constant divisors.

It might be worth to compare this to algorithm 4 from https://gmplib.org/~tege/division-paper.pdf .

RKSimon added inline comments.Aug 2 2022, 5:27 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4474

Is it possible to support constant splat vectors as well?

craig.topper added inline comments.Aug 2 2022, 8:16 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4474

This function is only called for scalar integers.

Partial update. Still need to add ARM tests and really commit the tests. Patch has been rebased as if they had been commited.

craig.topper added inline comments.Aug 3 2022, 12:15 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7195

Rather than trying to create the nodes only when the below succeeds, I added an ultimate fall back for targets that don't use ZeroOrOneBooleanContent.

Add ARM test and add to ARM's LowerREM function.

Add back the RISCV test I lost.

craig.topper retitled this revision from [LegalizeTypes] Improve splitting for urem by constant for some constants. to [LegalizeTypes][WIP] Improve splitting for urem by constant for some constants..Aug 7 2022, 9:54 PM

Refactor to support UDIV and UDIVREM. The latter needed to support ARM.
Support constants that have trailing zeros such as 10 and 12 using the formula suggested by Eli.
Does not support breaking into 30-bit pieces.

Need to add more tests and comment cleanup. I did some local testing checking of a few divisors.

craig.topper retitled this revision from [LegalizeTypes][WIP] Improve splitting for urem by constant for some constants. to [LegalizeTypes][WIP] Improve splitting for urem/udiv by constant for some constants..Aug 7 2022, 11:49 PM
RKSimon added inline comments.Aug 8 2022, 6:23 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7169

assert((Opcode == ISD::UREM || Opcode == ISD::UDIV || Opcode == ISD::UDIVREM) && "unsigned division expected");

Do we want to maintain codegen'ing a libcall when optimizing for code size? If so, is there an existing test that demonstrates that, or do we need a new one?

Does 32b MIPS need custom target lowering? 32b ppc?

Support constants that have trailing zeros such as 10 and 12 using the formula suggested by Eli.

Mind adding a citation via comment in source or commit message?

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7278

Should this be LL or DL? (I dunno, just asking; curious why DL exists).

7292

Every caller is BUILD_PAIR'ing the results. Would it be a better interface to simply return an SDValue for the remainder and one for the quotient, rather than a pair of SDValues that need to be BUILD_PAIRed again in the caller (even though we've done so already in the callee)?

llvm/lib/Target/ARM/ARMISelLowering.cpp
20427

I find the Result param a little confusing. Sometimes it starts with the quotient, sometimes it starts with the remainder. Should these be two distinct parameters to expandDIVREMByConstant?

ie. one SmallVector<SDValue> for Div, one for Rem?

Especially since we're handling 3 different opcodes, possibly 6 in the future for signed types.

Do we want to maintain codegen'ing a libcall when optimizing for code size? If so, is there an existing test that demonstrates that, or do we need a new one?

Does 32b MIPS need custom target lowering? 32b ppc?

I have not checked yet.

Support constants that have trailing zeros such as 10 and 12 using the formula suggested by Eli.

Mind adding a citation via comment in source or commit message?

Yes. I need to update comments. Not sure if we want the trailing zeros case in the base patch or in its own commit. I squashed everything together after I changed the base patch interface so much to support DIV and DIVREM.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7278

LL is correct. DL is the shifted version. We need the bits that were shifted off here so we need the original.

I need to improve variable naming.

7292

The caller in LegalizeIntegerTypes does not BUILD_PAIR the results.

llvm/lib/Target/ARM/ARMISelLowering.cpp
20427

Returning two vectors complicates the code in X86TargetLowering::LowerWin64_i128OP which would need to check the opcode to know where to get the result since it handles both DIV and REM.

20489

grr why does clang-format keep over indenting this return?

Rebase on new tests

craig.topper edited the summary of this revision. (Show Details)Aug 14 2022, 7:21 PM
craig.topper edited the summary of this revision. (Show Details)Aug 14 2022, 7:24 PM

Remove the even divisor support to simplify the patch and the explanation of the concept.
Will move to a separate patch.

craig.topper retitled this revision from [LegalizeTypes][WIP] Improve splitting for urem/udiv by constant for some constants. to [LegalizeTypes] Improve splitting for urem/udiv by constant for some constants..Aug 14 2022, 9:32 PM
RKSimon added inline comments.Aug 15 2022, 2:17 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7183

const APInt &Divisor ?

llvm/test/CodeGen/X86/divmod128.ll
918

pre-commit this? what about optsize - should that expand do you think?

-Pre-commit tests.
-Make APInt const. It will need to be changed back to non-const when we support even divisors.
-Disable for optsize instead of minsize. I think that matches gcc.

LGTM with a couple of minors (although I'm not very familiar with the algorithm) - any other comments?

llvm/include/llvm/CodeGen/TargetLowering.h
4730

InL / InH -> LL / LH ?

llvm/test/CodeGen/X86/divide-by-constant.ll
939

why did you remove this?

craig.topper added inline comments.Fri, Sep 9, 9:15 AM
llvm/test/CodeGen/X86/divide-by-constant.ll
939

I didn't mean to. Not sure what happened there.

Address review comments

RKSimon accepted this revision.Mon, Sep 12, 5:14 AM

LGTM cheers

This revision is now accepted and ready to land.Mon, Sep 12, 5:14 AM

a / 5, and a%10 etc.

Is it possible that the processor already does these tricks internally to improve the performance of these operations?

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7237–7242

Can't you just unconditionally use getZExtOrTrunc? the booleancontents would just apply the optimization fold later

craig.topper added inline comments.Mon, Sep 12, 9:22 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7237–7242

I wrote it like this because that's what was done for ISD::SUB at the end of ExpandIntRes_ADDSUB. ISD::ADD in that function seemed more complicated that it needs to be.

This revision was landed with ongoing or failed builds.Mon, Sep 12, 10:42 AM
This revision was automatically updated to reflect the committed changes.