This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Expansion of llvm.umul.overflow with i64 type operands.
ClosedPublic

Authored by jbhateja on Jun 15 2017, 9:16 AM.

Details

Summary

In order to test overflow condition multiplication result is first
stored into a temporary whose size is double than the size of operands.

Since types of operands and result are legal as per target hence
lowering of intrinsic to RTLIB call happens during node legalization and not
type legalization.

Expansion of intrinsic to RTLIB call happens only if temporary result
type is not supportable by target. Thus during call lowering expansion
of result is done and a MERGE_VALUES node comprising of
stack locations holding result are returned back.

Event Timeline

jbhateja created this revision.Jun 15 2017, 9:16 AM

I'm definitely not an expert on this part of LLVM, but to me this change seems very fragile, as it assumes things about the types of certain DAG nodes, which may be true now, but may change in the future.

I don't know what the right fix here is, but my gut feel is that it should be more along the lines of adding code in SelectionDAG::getNode() that would rewrite EXTRACT_ELEMENT from a result of LOAD into a half-sized LOAD, similarly to how EXTRACT_ELEMENT(BUILD_PAIR(Op1, Op2), N) gets simplified.

include/llvm/Target/TargetLowering.h
2802

Indentation.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
3564

So this assumes that the type of node node returned by ExpandLibCall is known and not going to change? Seems a bit dangerous to me...

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8125

Again, this seems to make assumptions about the type of OrigRetVT node. Maybe it's more warranted in this function, though.

I'm definitely not an expert on this part of LLVM, but to me this change seems very fragile, as it assumes things about the types of certain DAG nodes, which may be true now, but may change in the future.

I don't know what the right fix here is, but my gut feel is that it should be more along the lines of adding code in SelectionDAG::getNode() that would rewrite EXTRACT_ELEMENT from a result of LOAD into a half-sized LOAD, similarly to how EXTRACT_ELEMENT(BUILD_PAIR(Op1, Op2), N) gets simplified.

Please consider the fact that node being returned from LowerToCall is of type MERGE_VALUES which is a collection of nodes, EXTRACT_ELEMENT is used to extract part value constructed using BUILD_PAIR. BUILD_PAIR makes a larger value type. In this case since call lowering is happening during node legalization (called post type legalization) so returning a collection of stack locations which cumulatively stores the larger result looks fine. We just need to access the operands of MERGE_VALUES to get the Bottom and Top halves.

efriedma edited reviewers, added: efriedma; removed: eli.friedman.Jun 15 2017, 9:43 PM
efriedma added subscribers: llvm-commits, efriedma.

Please make sure to put llvm-commits in the subscriber list when you post a patch.

The changes to call lowering are kind of nasty; I'll sleep on it to see if I come up with a better idea.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
3564

Yes, definitely wrong: consider the case where the double-width type is legal.

Please make sure to put llvm-commits in the subscriber list when you post a patch.

The changes to call lowering are kind of nasty; I'll sleep on it to see if I come up with a better idea.

Hi Eli, This case is under condition when CanLowerReturn is false, i.e. type is illegal in TargetLowring::LowerCallTo. Also in SelectionDAGLegalize::ExpandNode RTLLIB call is considered for UMULO operation only when return type is illegal.
Change has been run through CodeGen regressions which is all clear.

include/llvm/Target/TargetLowering.h
2802

Will be addressed.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
3564

Please consider the fact that node being returned from LowerToCall is of type MERGE_VALUES which is a collection of nodes, EXTRACT_ELEMENT is used to extract part value constructed using BUILD_PAIR. BUILD_PAIR makes a larger value type. In this case since call lowering is happening during node legalization (called post type legalization) so returning a collection of stack locations which cumulatively stores the larger result looks fine. We just need to access the operands of MERGE_VALUES to get the Bottom and Top halves.

3564

Hi Eli, This case is under condition when CanLowerReturn is false, i.e. type is illegal in TargetLowring::LowerCallTo. Also in SelectionDAGLegalize::ExpandNode RTLLIB call is considered for UMULO operation only when return type is illegal.
Change has been run through CodeGen regressions which is all clear.

vadimcn added inline comments.Jun 16 2017, 11:03 AM
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
3564

Jatin, my issue with this code is that it seems to just assume that Ret.getOpcode() == ISD::MERGE_VALUES. Which may well be true now, but this code is pretty far away from where the Ret node gets constructed. What if tomorrow somebody modifies ExpandLibCall and it starts returning something else?

I think that you should assert the node type here before using its operands. (Unless, of course, MERGE_VALUES is the only node type that makes sense here. But even then, this fact merits at least being stated in a comment)

efriedma edited edge metadata.Jun 16 2017, 12:27 PM

I apologize, my review last night wasn't very insightful. Hopefully better comments today.


test/CodeGen/ARM/umulo-32.ll and test/CodeGen/SPARC/64cond.ll are tests for other targets which use this codepath. Please verify that you aren't changing the generated code for those testcases. (I guess wasm in particular breaks it can't return an i128 in registers?)

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
3564

Oh, right, sorry, missed the isTypeLegal check. I guess if UMULO result type is legal, and a type twice that width isn't, it should always get split into precisely two nodes.

Please add an assertion that Ret is an ISD::MERGE_VALUES with two operands (assuming that's what it's supposed to be?).

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8121

I'm a little suspicious about the way you're messing with RetTys here; the contents shouldn't depend on CanLowerReturn.

jbhateja updated this revision to Diff 102938.Jun 17 2017, 7:25 AM
jbhateja marked 6 inline comments as done.Jun 17 2017, 7:34 AM

I apologize, my review last night wasn't very insightful. Hopefully better comments today.


test/CodeGen/ARM/umulo-32.ll and test/CodeGen/SPARC/64cond.ll are tests for other targets which use this codepath. Please verify that you aren't changing the generated code for those testcases. (I guess wasm in particular breaks it can't return an i128 in registers?)

Review comments incorpocated. Verified the additional tests mentioned no changes in generated code with changes.

I just tried your updated patch, and I'm seeing crashes in make check. (Assertion Ret.getOpcode() == ISD::MERGE_VALUES && "Ret is merged value node comprising constituents stack" "locations holding result"' failed.`)

jbhateja updated this revision to Diff 103532.Jun 21 2017, 11:39 PM
  • Minor fixes.

I just tried your updated patch, and I'm seeing crashes in make check. (Assertion Ret.getOpcode() == ISD::MERGE_VALUES && "Ret is merged value node comprising constituents stack" "locations holding result"' failed.`)

Updated the assertion. We can have either BUILD_PAIR or MERGE_VALUE type nodes.
Instruction folding happening during SelectionDAG::getNode takes care of following :-

1/  For EXTRACT_ELEMENT directly returns the needed operand of BUILD_PAIR.
2/  IF only one operand for MERGE_VALUE then simply retruns the operand without forming a MERGE_VALUE node.

Thanks

vadimcn added inline comments.Jun 22 2017, 12:14 AM
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
3564

I am wondering if this handling of EXTRACT_ELEMENT(MERGE_VALUES) could be pushed to SelectionDAG::getNode(), where other const folding already takes place?

efriedma added inline comments.Jun 22 2017, 12:53 PM
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8121

Okay, so this is doing what I thought it's doing.

Making the type of the node returned by TargetLowering::LowerCallTo different based on the target's calling convention is not acceptable. Both sides of the "if (!CanLowerReturn)" need to do essentially the same thing. (Basically, you need to skip calling getCopyFromParts in the CanLowerReturn case.)

The end result still isn't really quite right; we aren't passing the right argument types to the calling convention code. But hopefully that problem won't bite anyone.

jbhateja marked an inline comment as done.Jun 25 2017, 7:42 PM

ping [eli or others]@reviewers

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
3564

Hi Vadimcn,

Please look at following exceprts from code ( SelectionDAG::getNode() : SelectionDAG.cpp)

1/ EXTRACT_ELEMENT BUILD_PAIR. instruction folding from SelectionDAG::getNode() ->

EXTRACT_ELEMENT of BUILD_PAIR is often formed while legalize is expanding
64-bit integers into 32-bit parts. Instead of building the extract of
// the BUILD_PAIR, only to have legalize rip it apart, just do it now.
if (N1.getOpcode() == ISD::BUILD_PAIR)

return N1.getOperand(N2C->getZExtValue());

2/ Handling in SelectionDAG::getNode() for single argument MERGE_VALUE

switch (Opcode) {
case ISD::TokenFactor:
case ISD::MERGE_VALUES:
case ISD::CONCAT_VECTORS:

return Operand;         // Factor, merge or concat of one node?  No need.

So an ExpandLibCall result if comprise of build pair it will be naked i.e. not contained in MERGE_VALUE node and its operands can be accessed directly which is what insturction folding in getNode doing.

Fix done for this bug is to break the stack locations into multiple results and sending it back after packing them in a MERGE_VALUE node.

This is a special case because larger unsupproted type (i128) is not the actual return value type of llvm.umulo otherwise DAG type legalization would have broken it down to legal
types earlier itself.

Thanks

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8121

Hi Eli, Types of node being returned could be different in IF and ELSE part of CanLoweReturn before this fix also. We can have BUILD_PAIR, or MERGE_VALUE both are possible based on number parts being returned.

Fix just splits larger integer result present at a stack locations to multiple smaller stack location such that the parts value types are legal as per target.

Also IsPostLegalization flag in Calling convention is specifying whether call lowering was done in Type Legalization of Selection DAG or during Node Legalization.

During Type Legalization if a larger illegal result is returned from LowerCallTo function then it is further broken down to legal type as type legalization is an iterative loop which messages the type till they are legal.

Thanks

efriedma added inline comments.Jun 26 2017, 1:16 PM
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8121

After type legalization, we are not supposed to produce nodes with illegal types. The particular code you're looking at managed to slip past at some point; we didn't fix it properly, and instead depend on an ugly hack to hide the illegal node from LegalizeDAG. As you discovered, the trick only works if the node is a BUILD_PAIR; we don't have code here to split loads.

If we're going to fix this properly, we should not be creating nodes with illegal types in call lowering after type legalization, whether or not CanLowerReturn is true.

The part I care about is that the VT of the returned node should be consistent. If we're generating a call after legalization, and the return type of that call is an 128-bit integer, we should always return a node whose VT is the pair "(i64, i64)". If we get it right, the details of call lowering are transparent; otherwise, we just have to hope we get lucky with the lowering of

jbhateja marked 2 inline comments as done.Jun 26 2017, 11:25 PM

Hi Eli,

Regarding your broadly two concerns raised above here are my responses :-

1/ We are not producing any illegal type post type legalization here.

Kindly have a look at following code snippet from function SelectionDAGLegalize::ExpandNode()

{

.......
.......
case ISD::UMULO:
case ISD::SMULO: {

EVT VT = Node->getValueType(0);
EVT WideVT = EVT::getIntegerVT(*DAG.getContext(), VT.getSizeInBits() * 2);

........
........
}

For UMULO and SMULO operantion since we need to test overflow hence during legalization (not type legalization) result type of the operation requested is WideVT which is twice the size of the result VT in this case it mean i128.

2/ On returning (i64, i64) in both the cases ie. when CallLoweringReturn is true or false. This is exactly what is being done NOW, earlier we were returning (i64,i64 BUILD_PAIR) when CallLoweringReturn was TRUE and i128 into a stack location when CallLoweringReturn was FALSE. NOW, instead of returing i128 into a stack location we return a pair of stack locations having VTs as (i64, i64).

Are you suggesting that breaking down of i128 held in the result stack location should be done outside the LowerCallTo function ?

That would mean defering the split decision from place where we exactly know whether the result is a stack location and type is illegal to a point where we have to put additional checks to test if type is legal or not and if its in a stack location. Also we are already past type legalization.

Inside LowerCallTo when CallLoweringReturn is FALSE (i.e. result is returned over
stack) we split ONLY IFF the result at the stack is not accomodable in the largest register, so this is safe. Let me know if you still have any concern.

Thanks.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8121

Hi Eli,

Regarding your broadly two concerns raised above here are my responses :-

1/ We are not producing any illegal type post type legalization here.

Kindly have a look at following code snippet from function SelectionDAGLegalize::ExpandNode()

{

 .......
 .......
case ISD::UMULO:
case ISD::SMULO: {
  EVT VT = Node->getValueType(0);
  EVT WideVT = EVT::getIntegerVT(*DAG.getContext(), VT.getSizeInBits() * 2);  
........
........

}

For UMULO and SMULO operantion since we need to test overflow hence during legalization (not type legalization) result type of the operation requested is WideVT which is twice the size of the result VT in this case it mean i128.

2/ On returning (i64, i64) in both the cases ie. when CallLoweringReturn is true or false. This is exactly what is being done NOW, earlier we were returning (i64,i64 BUILD_PAIR) when CallLoweringReturn was TRUE and i128 into a stack location when CallLoweringReturn was FALSE. NOW, instead of returing i128 into a stack location we return a pair of stack locations having VTs as (i64, i64).

Are you suggesting that breaking down of i128 held in the result stack location should be done outside the LowerCallTo function ?

That would mean defering the split decision from place where we exactly know whether the result is a stack location and type is illegal to a point where we have to put additional checks to test if type is legal or not and if its in a stack location. Also we are already past type legalization.

Inside LowerCallTo when CallLoweringReturn is FALSE (i.e. result is returned over
stack) we split ONLY IFF the result at the stack is not accomodable in the largest register, so this is safe. Let me know if you still have any concern.

Thanks.

earlier we were returning (i64,i64 BUILD_PAIR) when CallLoweringReturn

Despite the name, BUILD_PAIR does not return a pair. It returns an i128. It's essentially just a shortcut for ((zext Op1 to i128) << 64) | (zext Op0 to i128)) (see SelectionDAGLegalize::ExpandNode). This is a node with an illegal result type, and we should not be creating it.


The changes you're making to LowerCallTo in the case where CanLowerReturn is false look good.

The part I'm not happy with is that LowerCallTo behaves inconsistently with your patch. In the case where CanLowerReturn is true, we should split the returned value in the same way you're splitting it when CanLowerReturn is false.

jbhateja marked an inline comment as done.Jun 28 2017, 5:20 AM

earlier we were returning (i64,i64 BUILD_PAIR) when CallLoweringReturn

Despite the name, BUILD_PAIR does not return a pair. It returns an i128. It's essentially just a shortcut for ((zext Op1 to i128) << 64) | (zext Op0 to i128)) (see SelectionDAGLegalize::ExpandNode). This is a node with an illegal result type, and we should not be creating it.


The changes you're making to LowerCallTo in the case where CanLowerReturn is false look good.

The part I'm not happy with is that LowerCallTo behaves inconsistently with your patch. In the case where CanLowerReturn is true, we should split the returned value in the same way you're splitting it when CanLowerReturn is false.

As per your above comment -> "The particular code you're looking at managed to slip past at some point; we didn't fix it properly, and instead depend on an ugly hack to hide the illegal node from LegalizeDAG. As you discovered, the trick only works if the node is a BUILD_PAIR; we don't have code here to split loads."

LowerCallTo does not behave inconsitantly with my patch, thats the way it was working always. I just split the illegal type stack location into multiple legal type stack locations..

Just to bring it to your notice again this is a special case as WideVT (duble to type of original result) is being defined post type legalization which is why
we need to handle this illegal return type spilliting in LowerCallTo so that it returns a legal result values to it caller.

However, I definately agree that in long term this entire piece of code should be re-written so that LowerCallTo returns only one consistant type of node, but considering that argument to call being
lowered can also be non-integral types like a structure pointer , hence not sure/convinced that returing a result of node type BUILD_PAIR in all cases would be feasible, also LowerCallTo is being called from several places during type legalization also where SplitInteger is being explicitely called on return value to extract constituent elements. So making radical changes in LowerCallTo by returing only
one kind of node which eventually should be MERGE_VALUE might need good deal to work and testing.

As of now this is not happening but the patch is not breaking anything and it covering a case which was missed about spliting the illlegal type result.

Thanks

efriedma commandeered this revision.Jun 28 2017, 3:00 PM
efriedma edited reviewers, added: jbhateja; removed: efriedma.
efriedma updated this revision to Diff 104518.Jun 28 2017, 3:03 PM

Since I'm having trouble explaining the issue, I decided to just change it myself. Does this look reasonable?

jbhateja edited edge metadata.Jun 28 2017, 7:50 PM

Hi Eli,
As per following abstract from LLVM developer's policy since I do not have subversion access yet so I cannot be an approver of my own patch.

"Note that anyone is welcome to review and give feedback on a patch, but only people with Subversion write access can approve it."

Patch has been discussed and regressed and is fixing a missing a case of breaking illegal return types over stack during node legalization , which is not usual as type legalization takes care of illegal types and massages them iteratively till they are legal.

vadimcn accepted this revision.Jun 30 2017, 12:02 AM

Tested Eli's version and was able to build all of rustc's standard library for the wasm32-... target.

This revision is now accepted and ready to land.Jun 30 2017, 12:02 AM
jbhateja removed reviewers: jbhateja, vadimcn.EditedJun 30 2017, 12:43 AM

Thanks Eli and Vadim. For accepting revision.
Kindly update the author of fix before it gets landed.

:-)

This revision now requires review to proceed.Jun 30 2017, 12:43 AM
jbhateja accepted this revision.Jun 30 2017, 12:49 AM
jbhateja added a reviewer: vadimcn.
This revision is now accepted and ready to land.Jun 30 2017, 12:49 AM
jbhateja commandeered this revision.Jun 30 2017, 1:20 PM
jbhateja edited reviewers, added: efriedma; removed: jbhateja.
This revision now requires review to proceed.Jun 30 2017, 1:20 PM
jbhateja accepted this revision.Jun 30 2017, 1:24 PM
jbhateja removed a reviewer: efriedma.
This revision is now accepted and ready to land.Jun 30 2017, 1:24 PM
vadimcn accepted this revision.Jul 1 2017, 11:00 AM
vadimcn added a reviewer: efriedma.

Eli, I can commit this if you are busy. Just give a nod.

efriedma edited edge metadata.Jul 3 2017, 6:02 PM

Sure, you can merge it.

jbhateja updated this revision to Diff 105212.Jul 4 2017, 9:00 PM

[X86] Improvement in CodeGen instruction selection for LEAs

1/  Operand folding during complex pattern matching for LEAs has been
    extended to such that it promotes Scale to accommodate similar operand
    appearing in the DAG.
    e.g.
          T1 = A + B
          T2 = T1 + 10
          T3 = T2 + A
    For above DAG rooted at T3, X86AddressMode will no look like
          Base = B , Index = A , Scale = 2 , Disp = 10

2/  During OptimizeLEAPass down the pipeline CSE will now be performed over LEAs
    so that if there is an opportunity then complex LEAs (having 3 operands)
    could be factored out.
    e.g.
         leal 1(%rax,%rcx,1), %rdx
         leal 1(%rax,%rcx,2), %rcx
    will be factored as following
         leal 1(%rax,%rcx,1), %rdx
         leal (%rdx,%rcx)   , %edx

Reviewers.

My bad , I updated this patch with some more files intended for another fix (not yet reviewed) and a different revision.

Will it be possible to cherry pick the changes before last updation ?

Thanks,
Jatin

This revision was automatically updated to reflect the committed changes.