This is an archive of the discontinued LLVM Phabricator instance.

[LLVM][IR][LIT] support of 'no-overflow' flag for sdiv\udiv instructions
Needs ReviewPublic

Authored by magabari on Jan 11 2018, 3:29 AM.

Details

Summary

Following to the discussion made in RFC: https://groups.google.com/forum/#!msg/llvm-dev/eFtnCwpMMhs/eAHQj8rJCAAJ;context-place=searchin/llvm-dev/magabari%7Csort:date

This is an implementation of 'nof' flag for sdiv\udiv llvm instructions.
Please start reading the proposed RFC & changes in the LangRef.rst.

Adding this flag to integer div flag will allow us (later) to speculate div operation without a need to worry of divide by zero or overflow.

This patch contains llvm frontend changes only.

Diff Detail

Event Timeline

magabari created this revision.Jan 11 2018, 3:29 AM
  • Without any change in the backend (not even an abort) this will simply miscompile the no nof version on most targets.
  • With the way you are modeling the new flag, means that existing bitcode/.ll files will change semantics when read with newer compilers. I'm not sure that is a good idea for this, in any way at the very least you have to provide AutoUpgrade logic for that.

This looks like something that will be better served by an intrinsic (llvm.safe_(s|u)div or something like that), at least to begin with. Experimental intrinsics are a low-cost preferred way of trying out ideas like this without changing fundamental IR semantics.

  • Without any change in the backend (not even an abort) this will simply miscompile the no nof version on most targets.
  • With the way you are modeling the new flag, means that existing bitcode/.ll files will change semantics when read with newer compilers. I'm not sure that is a good idea for this, in any way at the very least you have to provide AutoUpgrade logic for that.
  • I agree with the first point, but currently there is no such optimization that will generate the no nof version, so I assume that won't happen in the short term, Currently i am working on the backend side but at any case we can put some sanity check in the backend that sdiv\udiv come with the nof attribute now.
  • As I know LLVM doesn't promise backward comparability so we tried following the correct solution (notice that another way is to suggest 'mayoverflow' attribute but it won't match other attributes in LLVM). Regarding the AutoUpgrade logic I will be glad if you can clarify more on what has to be done?

This looks like something that will be better served by an intrinsic (llvm.safe_(s|u)div or something like that), at least to begin with. Experimental intrinsics are a low-cost preferred way of trying out ideas like this without changing fundamental IR semantics.

@sanjoy we tried that in our first RFC (https://groups.google.com/forum/#!msg/llvm-dev/ooMhG28jZG0/UP5GadYvCQAJ;context-place=msg/llvm-dev/eFtnCwpMMhs/eAHQj8rJCAAJ )
and it has been rejected with this recommendation.

This looks like something that will be better served by an intrinsic (llvm.safe_(s|u)div or something like that), at least to begin with. Experimental intrinsics are a low-cost preferred way of trying out ideas like this without changing fundamental IR semantics.

I was the one who originally suggested the nof flag. Using the same IR representation for the same computation is nice for many reasons, which is why we have, for example, nsw, or the boolean flag to llvm.ctlz. And I think it's pretty clearly the design we would have used if the IR weren't originally designed around the limitations of x86.

If we're not confident the non-nof div is actually useful, we could call it "llvm.experimental.safe_(s|u)div" instead for now, I guess. The lowering/SelectionDAG work required for that is almost identical, so it wouldn't be hard to switch later.


The initial patch should probably be "complete", in the sense of being able to lower the safe divide. All you need for that is an IR lowering pass to convert a potentially-overflowing divide into a control flow around a conventional udiv/sdiv. (We probably want that anyway, for approximately the same reasons we have the ScalarizeMaskedMemIntrin pass.)

magabari updated this revision to Diff 129925.Jan 16 2018, 1:44 AM

added scalarization logic on codegen prepare (fixMayOverflowIntegerDiv) in case there is no support of may overflow div in the target.

zvi added a comment.Jan 16 2018, 1:50 AM
  • With the way you are modeling the new flag, means that existing bitcode/.ll files will change semantics when read with newer compilers. I'm not sure that is a good idea for this, in any way at the very least you have to provide AutoUpgrade logic for that.

This seems like a real issue. With no version info in the module, how can AutoUpgrade tell if a divide with no 'nof' attribute is of the old form or new form? This is really a performance issue, because AutoUpgrade can always pessimistically not add 'nof' if the version of the incoming module is unknown. Possible solutions:

  1. Introduce versioning to LLVM IR modules - will still require to add the version info to the legacy modules, which may be unacceptable to some users.
  2. Similar to above, but instead of version add a keyword or metadata key that will flag divides as upgradable to 'nof'. Same con as #1, but we can deprecate this feature sometime in the future.
  3. Pass a flag that is not embedded in the module to AutoUpgrade (e.g. opt/llc will add flags that will be propagated to AutoUpgrade)
  • As I know LLVM doesn't promise backward comparability so we tried following the correct solution (notice that another way is to suggest 'mayoverflow' attribute but it won't match other attributes in LLVM). Regarding the AutoUpgrade logic I will be glad if you can clarify more on what has to be done?

That is not true, we do provide backward compatibility: https://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility

In D41944#976834, @zvi wrote:
  • With the way you are modeling the new flag, means that existing bitcode/.ll files will change semantics when read with newer compilers. I'm not sure that is a good idea for this, in any way at the very least you have to provide AutoUpgrade logic for that.

This seems like a real issue. With no version info in the module, how can AutoUpgrade tell if a divide with no 'nof' attribute is of the old form or new form? This is really a performance issue, because AutoUpgrade can always pessimistically not add 'nof' if the version of the incoming module is unknown. Possible solutions:

  1. Introduce versioning to LLVM IR modules - will still require to add the version info to the legacy modules, which may be unacceptable to some users.
  2. Similar to above, but instead of version add a keyword or metadata key that will flag divides as upgradable to 'nof'. Same con as #1, but we can deprecate this feature sometime in the future.
  3. Pass a flag that is not embedded in the module to AutoUpgrade (e.g. opt/llc will add flags that will be propagated to AutoUpgrade)

Or change the logic so the new poison generating form requires a flag instead of the other way round.

Or change the logic so the new poison generating form requires a flag instead of the other way round.

From the perspective of transformation passes, we want the flag be consistent with other flags like nsw. So we want the representation in memory and textual IR to work like it does in this patch.

That said, we don't have to use the same representation in bitcode; the bitcode reader/writer can invert the bit so divides in old bitcode files get deserialized to "sdiv nof".

lib/CodeGen/CodeGenPrepare.cpp
574 ↗(On Diff #129925)

CodeGenPrepare doesn't run at -O0; you'll have to put this code somewhere else.

craig.topper added inline comments.Jan 16 2018, 2:00 PM
include/llvm/Analysis/TargetTransformInfo.h
490

Remove slash from end of line. Doxygen comment don't need the new line escaped.

include/llvm/IR/IRBuilder.h
1013

Can we just call BinaryOperator::Create(Instruction::UDiv, LHS, RHS) and then just call setIsNoOverflow and setIsExact on the result when needed? This would be similar to the CreateInsertNUWNSWBinOp private helper we have for NSW/NUW. You could make a new helper to be shared by sdiv/udiv.

lib/AsmParser/LLParser.cpp
3196

This only supports one order for the two keywords. I think you need to support both orders. See the nsw/nuw handling above.

5286

Need to support keywords being in the other order here too.

lib/CodeGen/CodeGenPrepare.cpp
573 ↗(On Diff #129925)

upsupported->unsupported

595 ↗(On Diff #129925)

Why is this initialized to Undef? Can't you just declare this where it's assigned below?

601 ↗(On Diff #129925)

Can you write comments for what's happening in the signed case.

612 ↗(On Diff #129925)

Should this be ConstantInt::getSigned? If the scalar type is larger than 64-bits that -1 will be padded with zeros by default.

617 ↗(On Diff #129925)

Can you load the APInt into a temporary variable to shorten this line? It's pretty awful to read right now.

645 ↗(On Diff #129925)

This could use some comments showing what the resulting IR should look like.

647 ↗(On Diff #129925)

Is the starting value of PrevPhi ever expected to be used? Does it need to be undef?

662 ↗(On Diff #129925)

If I'm reading this right we're create smallvectors contraining the same element repeatedly? Can we just use ConstantVector::getSplat which will take care of repeating an element?

682 ↗(On Diff #129925)

IRBuilder has a CreateExtractElement that takes a uint64_t. You don't need to call getInt32. It will call getInt64 internally.

lib/IR/AsmWriter.cpp
1141

You can use "const auto *PO = dyn_cast..." the type is spelled out in the dyn_cast so we don't need to repeat it

lib/Transforms/InstCombine/InstructionCombining.cpp
940–941

Can you rename this variable to not say "FP"? I think the FP part was always speculative. It wasn' know FP until the isa<FPMathOperator> was called. But now it looks really confusing to have a variable named FPInst and we're checking a property that could only be set on an integer division.

I agree with Eli last comment, In fact we can solve compatibility in both text and bitcode.
Eli already suggested a way how to solve that in bitcode by inverting the bit (set will mean "mayOverflow" and unset means "NoOverflow")

We can use another approach in text, by adding "mayoveflow" or "nooverflow" to every sdiv instruction (always).

sdiv without any attribute will be the same like "sdiv nof" (backward compatibility)
sdiv that may overflow will be represented as "sdiv mof".

Later, we can add restriction that sdiv node should come with nof or mof keyword.

Before accepting this patch, we really need to see benchmark results. I'm not going to change clang to start emitting non-UB divs if the perf is going to be horrible. We need data.
Otherwise I don't see the need for this poison version of division. Could you elaborate if your plan is to expose this somehow to the application developer?

I'm sorry if this questions have been properly answered in the past. If so, could you please link them here?

Before accepting this patch, we really need to see benchmark results. I'm not going to change clang to start emitting non-UB divs if the perf is going to be horrible. We need data.
Otherwise I don't see the need for this poison version of division. Could you elaborate if your plan is to expose this somehow to the application developer?

I'm sorry if this questions have been properly answered in the past. If so, could you please link them here?

In general the proposed feature allows compiler to start speculating div without worrying too much of div-by-zero etc. so for example you can do instruction hoisting or vectorizing predicated sdiv.
We are currently focused on vectorizing predicated div instruction and our implementation shows around 20-30% improvements on several tests of coremark-pro and denbench.

Before accepting this patch, we really need to see benchmark results. I'm not going to change clang to start emitting non-UB divs if the perf is going to be horrible. We need data.
Otherwise I don't see the need for this poison version of division. Could you elaborate if your plan is to expose this somehow to the application developer?

I'm sorry if this questions have been properly answered in the past. If so, could you please link them here?

In general the proposed feature allows compiler to start speculating div without worrying too much of div-by-zero etc. so for example you can do instruction hoisting or vectorizing predicated sdiv.
We are currently focused on vectorizing predicated div instruction and our implementation shows around 20-30% improvements on several tests of coremark-pro and denbench.

I believe that in micro benchmarks that can be vectorized you can get nice speedups. The question is what happens end-to-end to regular applications? Do I have a slowdown? Code size increase because now all my divisions are guarded?
Also, you could also guard those vectorizations around checks to ensure sdiv doesn't trap. This increases code size.

Before accepting this patch, we really need to see benchmark results. I'm not going to change clang to start emitting non-UB divs if the perf is going to be horrible. We need data.
Otherwise I don't see the need for this poison version of division. Could you elaborate if your plan is to expose this somehow to the application developer?

I'm sorry if this questions have been properly answered in the past. If so, could you please link them here?

In general the proposed feature allows compiler to start speculating div without worrying too much of div-by-zero etc. so for example you can do instruction hoisting or vectorizing predicated sdiv.
We are currently focused on vectorizing predicated div instruction and our implementation shows around 20-30% improvements on several tests of coremark-pro and denbench.

I believe that in micro benchmarks that can be vectorized you can get nice speedups. The question is what happens end-to-end to regular applications? Do I have a slowdown? Code size increase because now all my divisions are guarded?
Also, you could also guard those vectorizations around checks to ensure sdiv doesn't trap. This increases code size.

Not all divisions should be guarded, In fact all divisions which comes from C\C++ should be with "nof" which means it can be lowered *without* guards. From now on Clang will emit "nof" attribute for each div which comes from the user. And this matches the C\C++ specification on the case of divide-by-zero.
In case that we want to do some optimization that will do some speculation of div calculation we should remove this attribute (which means that overflow may be introduced by the compiler) and in this case we need to guard the the div calculation just in case that the specific target don't have support for lowering this kind of div (I assume that when you decide to do some optimization you should be sure that it's good for your target and not to end up with increase of code size and guards). In fact guarding div calculation is just the default implementation for targets that don't have a support for div that may overflow. In X86 we choose to simulate that div calculation using FP div which seems to be more efficient in some cases.

magabari updated this revision to Diff 130583.Jan 19 2018, 3:50 AM

@MatzeB @efriedma @sanjoy

Following to the raised comments about compatibility issues.
This update follows my prev. comment on how to solve it.

Please take a look on the tests i have added: (div_attrs.ll and div_not_allowed.ll)
nof, mof are exclusive attributes.

if you don't declare the kind of the div it will be assumed as "nof" (this will give us backward compatibility)
the assembler will emit nof or mof always.

I also inverted the meaning of the bit in the bitcode so 1 means may overflow and 0 means no overflow which will keep bitcode backward compatible.

lowering patch of 'nof' flag (for X86) is up:
https://reviews.llvm.org/D42353

spatel added a subscriber: spatel.Jan 24 2018, 9:26 AM
magabari marked 15 inline comments as done.Jan 25 2018, 7:37 AM

fixed

lib/CodeGen/CodeGenPrepare.cpp
647 ↗(On Diff #129925)

PrevPhi will be used at the first merge "undef" i think you meant VResult which can be uninitialized.

magabari updated this revision to Diff 131451.Jan 25 2018, 7:39 AM
magabari marked an inline comment as done.

fixed craig notes
I also agree with Eli friedman note i will upload fix soon

craig.topper added inline comments.Jan 25 2018, 10:35 AM
lib/CodeGen/CodeGenPrepare.cpp
647 ↗(On Diff #129925)

Isn't VResult consumed by the CreateInsertElement on the first iteration of the loop on line 731?

625 ↗(On Diff #131451)

Why are all the arguments to ConstantInt::get on separate lines? It looks short enough for one line.

690 ↗(On Diff #131451)

Remove commented out code.

703 ↗(On Diff #131451)

I should of caught this early, but if you pass a vector type to ConstantInt::get it will automatically create a splat. So you don't even need to call ConstantVector::getSplat you just need to pass I.getType() to ConstantInt::get

718 ↗(On Diff #131451)

What test file covers this code? I tried to look for one, but there are a lot of test updates and all I found was adding 'nof' to existing tests.

magabari marked 7 inline comments as done.Jan 29 2018, 4:08 AM

fixed craig and eli notes

lib/CodeGen/CodeGenPrepare.cpp
574 ↗(On Diff #129925)

added new pass "ScalarizeMayOverflowDiv"

647 ↗(On Diff #129925)

i think both can't be uninitialized.
VResult used in CreateInsertElement
PrevPhi used in Phi->addIncoming

625 ↗(On Diff #131451)

created a new pass "scalarizeMayOverflowDiv" and passed clang-format

690 ↗(On Diff #131451)

Sorry my fault

718 ↗(On Diff #131451)

added "scalarize-may-overflow-div.ll"

magabari updated this revision to Diff 131768.Jan 29 2018, 4:40 AM
magabari marked 5 inline comments as done.

added new pass 'ScalarizeMayOverflowDiv'
updated div attributes in LangRef
and fixed notes given by craig

craig.topper added inline comments.Jan 29 2018, 10:31 AM
lib/AsmParser/LLParser.cpp
3196

What happens if 'nof' and 'mof' are both present?

lib/CodeGen/ScalarizeMayOverflowDiv.cpp
221

Drop the "false".

222

Use "getSigned" and drop the true.

225

Weird formatting here.

266

If the vector only has 1 element, PrevPhi is undef. But that's not correct is it?

test/Transforms/ScalarizeMayOverflowDiv/scalarize-may-overflow-div.ll
2

Add a test case for a vector with only 1 element. i.e. <1 x i32>. That's the case where PrevPHI would be undef after the loop right?

So do we still need all the .ll file changes with the syntax nof/mof syntax changes?

So do we still need all the .ll file changes with the syntax nof/mof syntax changes?

Technically No, any test which doesn't have 'mof' attribute (like: sdiv i32 %a, %b) will be treated as it have 'nof' attribute (backward compatibility).
I kept the changes to the lit tests only because I think it's more "correct" to declare explicitly that this divide has no overflow rather than relying on backward compatibility feature doing that.

magabari marked 5 inline comments as done.Jan 29 2018, 11:46 PM
magabari added inline comments.
lib/AsmParser/LLParser.cpp
3196

it will fail in the parsing phase.
as you see 'mof' and 'nof' are exculsive, look at test div_not_allowed.ll.

lib/CodeGen/ScalarizeMayOverflowDiv.cpp
266

No, it may happen.
also in the if (Idx > 0) you may notice that i use the PrevPhi before assigning new one so it should be defined.

magabari updated this revision to Diff 131929.Jan 29 2018, 11:47 PM
magabari marked an inline comment as done.

fixed craig notes

magabari updated this revision to Diff 133859.Feb 12 2018, 7:19 AM

minimized the patch by removing changes to some lit tests (it will be committed later as NFC patch)

Currently i am adding only the tests which have CHECK line for sdiv\udiv because the "new" AsmWriter always emit nof\mof attribute with sdiv\udiv instruction

Guys, what do you think about this?

@magabari - what is the status of this patch?

We have another potential motivating example for ignoring overflow (specifically of sdiv by -1) in PR38239:
https://bugs.llvm.org/show_bug.cgi?id=38239

...because sdiv has that extra UB potential from a -1 divisor that doesn't exist for udiv, we can't do the sibling optimization that we did for udiv.

reames resigned from this revision.Mar 25 2020, 11:16 AM