This is an archive of the discontinued LLVM Phabricator instance.

Optimize SETCC + VSEL of incompatible or illegal types
ClosedPublic

Authored by jonpa on Feb 3 2017, 5:13 AM.

Details

Summary

I noticed that SETCC+VSELECT resulted in unnecessary sign-extensions and truncations on SystemZ. On SystemZ, TypeWidenVec is returned for any legal vector type, and the boolean contents is zero or negative one (all one's or zero's).

The SETCC vector mask type is originally with element type i1. The type legalizer then promotes this vector to a legal vector type. For two elements, this always becomes v2i64, four elements always becomes v4i32, etc (same number of elements promoted to a legal vector type). The type legalizer then in turn fixes VSELECT to work with this promoted mask

This works well when the widths of the elements compared by SETCC matches those that are selected by VSELECT. However, in all the cases where those widths do not match, for example SETCC(v2i32) + VSELECT(v2i64), the promotion of the SETCC result remains unoptimized, as well as the truncation / extension of it to suit VSELECT.

For example:

define <16 x i8> @fun(<16 x i8> %val1, <16 x i8> %val2,
                                      <16 x i8> %val3, <16 x i8> %val4) {
  %cmp = icmp eq <16 x i8> %val1, %val2
  %ret = select <16 x i1> %cmp, <16 x i8> %val3, <16 x i8> %val4
  ret <16 x i8> %ret
}

->

# BB#0:
    vceqb    %v0, %v24, %v26
    vsel    %v24, %v28, %v30, %v0
    br    %r14

while having just two elements:

define <2 x i8> @fun(<2 x i8> %val1, <2 x i8> %val2,
                                    <2 x i8> %val3, <2 x i8> %val4) {
  %cmp = icmp eq <2 x i8> %val1, %val2
  %ret = select <2 x i1> %cmp, <2 x i8> %val3, <2 x i8> %val4
  ret <2 x i8> %ret
}

->

# BB#0:
    vceqb    %v0, %v24, %v26
    vuphb    %v0, %v0
    vuphh    %v0, %v0
    vuphf    %v0, %v0
    vrepih    %v1, 1807
    vperm    %v0, %v0, %v0, %v1
    veslb    %v0, %v0, 7
    vesrab    %v0, %v0, 7
    vsel    %v24, %v28, %v30, %v0
    br    %r14

This should really have been the same code as in the case of full vectors. In the longer version, there is first the vector compare element, then three unpacks to make the SETCC result legal, then a truncation (which gets the original vector back). Then, the second problem I have found is also highlighted here: after the truncation (vperm) there is an inreg sign extension of i1 (vector element shift left + artim. shift right). This is also completely unnecessary, due to the defined boolean contents of the target.

I have tried to tackle these two problems, and while not being sure of exactly how this should be done, the first attempt here at least achieves the right optimizations in simple tests for all these cases.

For the extensions / truncations back and forth produced by the type legalizer, I saw three alternatives.

  1. Try to improve the legalizer to not do this. I thought this seemed bad because we probably don't want any optimizations happening in a legalize phase, right? That would however be simpler than trying to detect the cases during DAGCombiner.
  2. Add code to handle this in DAGCombiner.
  3. Do a custom lowering for the SystemZ target. It seems other targets do this, and perhaps this is what is expected?

I tried first to experiment with the DAGCombiner::visitVSELECT() method. It was not as simple as one would have hoped to detect these cases. I am not sure at all if this code is acceptable. For instance, can it be assumed that a BUILD_VECTOR used by VSELECT is always using the "normal" elements order from the vector(s) produced by SETCC(s)? Or should this (and other things) be verified?

For the second problem, of the unneeded sign extension, I added a handling for this in DAGCombiner::visitSIGN_EXTEND_INREG(). I am with this code saying that it is always true that a vector of i1's is always correct if the boolean contents is ZeroOrNegativeOneBooleanContent. So there is never a need to fix this up with a sign extension from i1. I am however neither sure here if i1 really can be treated like this. It makes sense in the examples I have worked with.

This is a first attempt, and I am not sure what the best way to improve on this is. Would it for instance be worth trying SELECT_CC instead for this target? I even tried to return TypeWidenVector in getPreferredVectorAction() for vectors of i1, but that didn't work at all in this case.

Feedback on this much appreciated.

Diff Detail

Event Timeline

jonpa created this revision.Feb 3 2017, 5:13 AM
efriedma added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
6485

It seems like it would be simpler to produce the correct code in the first place, rather than vectorize the result which was scalarized by legalization.

7754

This doesn't make any sense; you can't assume an arbitrary vector is all zeros or all ones.

jonpa updated this revision to Diff 87208.Feb 6 2017, 3:47 AM

It seems like it would be simpler to produce the correct code in the first place, rather than vectorize the result which was scalarized by legalization.

I agree, so this time I have tried to do it during type legalization.

This doesn't make any sense; you can't assume an arbitrary vector is all zeros or all ones.

I was hoping maybe some check might make this legal. Anyway, this isn't needed anymore because the extend is not anymore generated in the first place.

I made a new method that handles VSELECT early and get the same results as with the previous version of the patch, in a simpler way. This is basically an extension of the handling of VSELECT during type legalization. I am not sure of what the best place to insert this function would be, but currently I need to call it from both PromoteIntOp_SELECT() and WidenVecRes_SELECT(), both called on the VSELECT.

Again, this patch is at a stage where I get the needed results, but where there might be missing legality checks and so on. This might even have to go into a custom lowering for the SystemZ backend if this disrupts other targets somehow.

Regression tests are now failing, but I would appreciate more feedback at this point.

I am not sure of what the best place to insert this function would be, but currently I need to call it from both PromoteIntOp_SELECT() and WidenVecRes_SELECT(), both called on the VSELECT.

You land in WidenVecRes_SELECT if the VSELECT result type is illegal, and PromoteIntOp_SELECT if the VSELECT result type is legal. I'm surprised PromoteIntOp_SELECT doesn't do the right thing without your special-case code, though; could you trace through and figure out what's happening? What goes wrong after PromoteTargetBoolean promotes the SETCC to an appropriate type?

jonpa added a comment.Feb 6 2017, 11:17 PM

You land in WidenVecRes_SELECT if the VSELECT result type is illegal, and PromoteIntOp_SELECT if the VSELECT result type is legal. I'm surprised PromoteIntOp_SELECT doesn't do the right thing without your special-case code, though; could you trace through and figure out what's happening? What goes wrong after PromoteTargetBoolean promotes the SETCC to an appropriate type?

In PromoteIntOp_SELECT() (trunk), PromoteTargetBoolean() promotes the boolean result vector by creating a new SIGN_EXTEND node for it.

This new SIGN_EXTEND is then legalized immediately after, and the SETCC result vector again produces a call to PromoteIntegerOperand(). The SETCC has however been handled before, and GetPromotedInteger() returns here a SIGN_EXTEND_VECTOR_INREG of the previously handled SETCC. The SETCC here has a legal (widened) integer vector type. PromoteIntOp_SIGN_EXTEND then adds two more extensions: ANY_EXTEND and SIGN_EXTEND_INREG of this. This results in:

Type-legalized selection DAG: BB#0 'fun:'
SelectionDAG has 18 nodes:
  t0: ch = EntryToken
            t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %vreg0
            t4: v8i16,ch = CopyFromReg t0, Register:v8i16 %vreg1
          t19: v8i16 = setcc t2, t4, seteq:ch
        t22: v4i32 = sign_extend_vector_inreg t19
      t25: v4i32 = sign_extend_inreg t22, ValueType:ch:v4i1
      t6: v4i32,ch = CopyFromReg t0, Register:v4i32 %vreg2
      t8: v4i32,ch = CopyFromReg t0, Register:v4i32 %vreg3
    t14: v4i32 = vselect t25, t6, t8
  t16: ch,glue = CopyToReg t0, Register:v4i32 %V24, t14
  t17: ch = SystemZISD::RET_FLAG t16, Register:v4i32 %V24, t16:1

Here then is the extra SIGN_EXTEND_INREG (t25) that I tried to explicitly eliminate in my first version of the patch, which results in two unnecessary shifts on SystemZ.

As far as I can see now this is the only reason for handling in PromoteIntOp_SELECT(). Could perhaps this extra SIGN_EXTEND_INREG be avoided in the first place instead somehow?

I think we'll eliminate the redundant sign-extend if you extend SelectionDAG::ComputeNumSignBits to handle SIGN_EXTEND_VECTOR_INREG the same way it handles SIGN_EXTEND.

jonpa updated this revision to Diff 87602.Feb 8 2017, 12:54 AM

Fixing ComputeNumSignBits() per your suggestion eliminates the unnecessary SIGN_EXTEND_VECTOR_INREG in several cases, but not in all.

In cases where the SETCC gets split, a VECTOR_SHUFFLE or BUILD_VECTOR is generated. I experimented with handling these cases by extending ComputeNumSignBits() for several more nodes, which seems to work. It resembles a bit the situation from before, where we also handled scalarized code after type legalization.

I also recall that the new method kind of depends on the fact that VSELECTs get split in DAGCombiner (line 6218). If that is not done, the mask adjustments in the end (extension / truncate) would not make sense. Should perhaps a check be added for this in the new method that this is indeed correct (or alternatively, update the comment in DAGCombiner)?

Feel free to keep this together for now if it's easier, but eventually the changes to ComputeNumSignBits should be split into separate commits, with their own testcases.

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2873

Have you tried simplifying the handling of Cond using WidenTargetBoolean?

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3026 ↗(On Diff #87602)

It doesn't really make sense to handle VECTOR_SHUFFLE and BUILD_VECTOR together. Maybe for VECTOR_SHUFFLE you can reuse the code from CONCAT_VECTORS?

3036 ↗(On Diff #87602)

You aren't handling the case where ExtrOpBits > VTBits correctly (you have to subtract "ExtrOpBits - VTBits" bits from the result).

Can we avoid making BUILD_VECTORS in the cases where this would be necessary?

3049 ↗(On Diff #87602)

Bitcasts aren't transparent in general: the integer width changes, or the operand might not even be an integer.

Eli, are you saying you think it is the right thing to handle these cases "after the fact" in ComputeNumSignBits(), as opposed to avoiding this per the previous version of the patch? I first wanted to check your opinion on this, since you earlier wanted things done early when you said we shouldn't handle scalarized code and then vectorize it again.

There is a bit more work in ComputeNumSignBits() to handle these nodes. Would the patch(es) of this function be useful regardless of how we handle VSELECT..?

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2873

Calling WidenTargetBoolean() instead of making a new SETCC results in a CONCAT_VECTORS and SIGN_EXTEND nodes of the original SETCC. This is not good, because the SIGN_EXTEND never disappears.

WidenTargetBoolean() and PromoteTargetBoolean() seem to have as the only behavior to extend the boolean vectors, which doesn't get folded properly by the SystemZ backend. It works very well for SystemZ to assume here that the VSELECT will be lowered to an instruction which produces a bitmask of the same type as its operands, but I guess this may be SystemZ specific, so would it be better to match the SIGN_EXTEND->SETCC in the SystemZ backend rather than trying to avoid it during type legalization?

Type legalization for SIGN_EXTEND(SETCC) should do the right thing, as opposed to special-casing VSELECT(SETCC), because it's more general. For example, consider the IR sext(icmp <4 x i32>...).

Legalization should avoid scalarizing the inputs where possible; scalarization is the ultimate fallback when we can't figure out some other reasonable way to lower an operation, but it's never a good thing (both for compile-time and runtime performance).

Where it makes sense, we should improve our analysis of DAG nodes, rather than adding special cases; if an existing transform does the right thing given a higher-quality ComputeNumSignBits, it makes sense to fix ComputeNumSignBits rather than add more transforms. We have a lot of existing transforms using ComputeNumSignBits, so it's better to make them all more powerful.

Does that make sense?

jonpa updated this revision to Diff 88511.Feb 15 2017, 4:57 AM

Patch updated according to suggestions.

Type legalization for SIGN_EXTEND(SETCC) should do the right thing, as opposed to special-casing VSELECT(SETCC), because it's more general. For example, consider the IR sext(icmp <4 x i32>...).

With the improvements in ComputeNumSignBits(), the call to the special-case VSELECT(SETCC) handling is not as needed as before (called in one place).

I tried again to get rid of it, and did another attempt in using the DAGCombiner to handle SIGN_EXTEND_VECTOR_INREG. It removes BUILD_VECTOR -> EXTRACT_VECTOR_ELTs -> SIGN_EXTEND_VECTOR_INREG when the types of BUILD_VECTOR and the operand of SIGN_EXTEND_VECTOR_INREG are the same. Unfortunately, there were a few cases of VSELECT result widening that this didn't handle, so I could not remove that call still. Therefore, this new attempt in the DAGCombiner was fruitless and not part of the patch.

Bitcasts aren't transparent in general: the integer width changes, or the operand might not even be an integer.

These have to be handled here. Do you think the update patch is making sense? If we can't handle the bistcasts like this, I think I would have to go back to calling the new method also for these cases. I suppose then these improvements in ComputeNumSignBits() are not needed here anymore, but could still perhaps be part of a separate patch. (btw I think I see an assert in that method that this is indeed an integer type).

jonpa marked 3 inline comments as done.Feb 15 2017, 5:01 AM
jonpa added inline comments.
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3036 ↗(On Diff #87602)

Well, we don't have to handle them if we handle the VSELECT early...

3049 ↗(On Diff #87602)

Does this look ok now?

RKSimon added a subscriber: RKSimon.
RKSimon added inline comments.
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3042 ↗(On Diff #88511)

Shuffle can't safely reuse the concat implementation - if any mask index is set to -1 (UNDEF) then we know nothing about the sign bits. Typically it will be a random element from one of the sources but it could just as easily be something else depending on shuffle lowering.

jonpa updated this revision to Diff 89088.Feb 20 2017, 1:29 AM
jonpa marked 2 inline comments as done.

In ComputeNumSignBits():
Handle VECTOR_SHUFFLE separately.
Check that input type is an integer vector for BITCAST.

I suppose that this handling of BITCAST could possibly work, although it looks like it probably shouldn't be there. I would therefore suggest that that I go back to Eli's original suggestion of handling this early, by calling WidenVSELECT_SETCC() also in PromoteIntOp_SELECT(). That way, this never gets scalarized, and we then don't need any of the new handlings in ComputeNumSignBits(), even though they may be useful in other contexts.

I first show now what the patch looks like with the latest updates per your review, before I change it back. What do you think?

RKSimon added inline comments.Feb 20 2017, 5:15 AM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3049 ↗(On Diff #89088)

This isn't going to work - you need to check through the shuffle mask elements looking for -ve indices (which represent undefs) - if it occurs then NumSignBits = 1.

You also need to determine which of the 2 input operands are actually referenced:
Op0/LHS => 0 <= index < NumElts
Op1/RHS => NumElts <= index < (2*NumElts)

And then determine the minimum NumSignBits of those operand(s).

Look at the SelectionDAG::computeKnownBits ISD::VECTOR_SHUFFLE implementation for reference.

jonpa updated this revision to Diff 89113.Feb 20 2017, 6:56 AM
jonpa marked an inline comment as done.

Ooops! Fixed now, hopefully.

However, it seems that the handling for VECTOR_SHUFFLE is actually not needed anymore. My only guess is that this is due to some other recent change. This would actually be removed at least from my first commit(s), then.

Again, are you fine with handling all these cases early per my previous comment?

RKSimon edited edge metadata.Feb 20 2017, 8:02 AM

Ooops! Fixed now, hopefully.

However, it seems that the handling for VECTOR_SHUFFLE is actually not needed anymore. My only guess is that this is due to some other recent change. This would actually be removed at least from my first commit(s), then.

Not needed possibly due to D29454 - please drop it then as its almost certainly not being tested properly.

Again, are you fine with handling all these cases early per my previous comment?

Yes if promotion can stop all of these cases then its probably a better place to handle it.

jonpa updated this revision to Diff 89171.Feb 21 2017, 1:55 AM
jonpa marked an inline comment as done.

Yes if promotion can stop all of these cases then its probably a better place to handle it.

OK, I changed back so this is handled early, and the ComputeNumSignBits() changes are removed.

I got regressions on several targets when applying this. I could eliminate a few by adding more checks before transformation. Left are target tests that I can't really judge if things have improved for that target and if so the test should be updated, or if the patch is flawed. These are failing with the patch as it is:

LLVM :: CodeGen/ARM/vuzp.ll
LLVM :: CodeGen/X86/2011-10-19-widen_vselect.ll
LLVM :: CodeGen/X86/2011-10-21-widen-cmp.ll
LLVM :: CodeGen/X86/psubus.ll

To be honest, at the moment this method is working well basically under the assumptions of the SystemZ target, which for one thing only has vector registers of 128 bits. It also has its way of using a vector register as a bitmask to perform vector selects, and I am not sure if other targets do this the same way. So I would actually suggest that I move this to a custom lowering for SystemZ of VSELECT instead for now. Unless of course this is also a known issue on some other target, and somebody wants to jump in and help me finish the patch right away.

It would have been nice to just improve common code by improving DAGCombines of SEXT_INREG etc, but after trying all of that, it turns out that in order to handle all the cases, it is really preferred to do it early before it gets scalarized.

What do you think?

RKSimon added a subscriber: spatel.Feb 21 2017, 5:52 AM

I got regressions on several targets when applying this. I could eliminate a few by adding more checks before transformation. Left are target tests that I can't really judge if things have improved for that target and if so the test should be updated, or if the patch is flawed. These are failing with the patch as it is:

LLVM :: CodeGen/ARM/vuzp.ll
LLVM :: CodeGen/X86/2011-10-19-widen_vselect.ll
LLVM :: CodeGen/X86/2011-10-21-widen-cmp.ll
LLVM :: CodeGen/X86/psubus.ll

From a local build I've just ran all those regressions (remove unnecessary sign manipulations) look like improvements to me! Please can you include them to show the deltas, all of them can be regenerated by running llvm\utils\update_llc_test_checks.py

To be honest, at the moment this method is working well basically under the assumptions of the SystemZ target, which for one thing only has vector registers of 128 bits. It also has its way of using a vector register as a bitmask to perform vector selects, and I am not sure if other targets do this the same way. So I would actually suggest that I move this to a custom lowering for SystemZ of VSELECT instead for now. Unless of course this is also a known issue on some other target, and somebody wants to jump in and help me finish the patch right away.

You at least need to add some SystemZ specific tests before making that decision but nothing I've seen so far makes me think it has to be target specific.

It would have been nice to just improve common code by improving DAGCombines of SEXT_INREG etc, but after trying all of that, it turns out that in order to handle all the cases, it is really preferred to do it early before it gets scalarized.

Can you give some examples of the patterns that we're missing? I'm trying to improve combining *_EXTEND_VECTOR_INREG instructions which probably has a lot of crossover.

jonpa updated this revision to Diff 89489.Feb 23 2017, 3:08 AM

Tests added for SystemZ.

Regenerated tests:
test/CodeGen/ARM/vuzp.ll
test/CodeGen/X86/2011-10-19-widen_vselect.ll
test/CodeGen/X86/2011-10-21-widen-cmp.ll
test/CodeGen/X86/psubus.ll

Can you give some examples of the patterns that we're missing? I'm trying to improve combining *_EXTEND_VECTOR_INREG instructions which probably has a lot of crossover.

As I remember it, it was not possible to get around the fact that if type-legalization widened a VSELECT with the previous behaviour, bad things happened which were not possible to handle in the DAGCombiner. So, the point was that the best thing was to avoid the scalarization by handling it early per this patch. This seems to be the right thing regardless of how the DAGCombiner is improved in the future. I could try to check back and get the details if you wish, or at least see if the ComputeNumSignBits() improvements would be useful generally.

Is this ok to commit now, or is there any assert or check that is needed in the new method in the case of any arbitrary target? Are the regenerated tests really ok, or should they be checked by respective target maintainers?

ARM and x86 test changes look fine.

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2875

I still don't like that you're special-casing SETCC here... it isn't the only source of i1 vectors.

jonpa updated this revision to Diff 89651.Feb 24 2017, 5:12 AM

I still don't like that you're special-casing SETCC here... it isn't the only source of i1 vectors.

I see your point, but it appears to me that the current code should handle the majority of cases. Would it be acceptable to add a TODO comment about this for now, as done below?

I also removed the check for SETCC in PromoteIntOp_SELECT(), to pave the way for future extensions.

jonpa added inline comments.Feb 24 2017, 5:14 AM
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2875

If there is anything that really should be handled and commited at the same as the SETCC, please give more details.

RKSimon added inline comments.Feb 24 2017, 9:56 AM
test/CodeGen/ARM/vuzp.ll
324–325

This comment and the test name need updating as vuzp isn't used anymore.

efriedma added inline comments.Feb 24 2017, 10:46 AM
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2875

It would be nice to handle VSELECT(AND(SETCC, SETCC), X, Y). I can't think of anything else that's likely to come up frequently.

jonpa updated this revision to Diff 90017.Feb 28 2017, 5:51 AM

OK, I reworked this so that it can also handle logical combinations (and/or/xor) of two SETCCs.

I again thought it would have been nice to do this during pre type-legalize DAGCombiner, but it really must wait due to the necessity of widening, which isn't done in DAGCombiner. So this is done now in three places: when a VSELECT is handled for result widening, arg promotion or result splitting.

There was previously a handling for VSELECT in DAGCombiner, which also aimed to avoid scalarization. It didn't handle any other operand than SETCC (e.g. the AND), so I started experimenting and found that I could actually remove the splitting in DAGCombiner entirely, with improved results even. The scalarization is avoided with the new method instead.

The new WidenVSELECTAndMask() method minimizes the number of conversions between the two SETCCs and the logical op, and between the logical op and the VSELECT.

convertMask() is called to convert a e.g. SETCC or AND to the right VT.

New SystemZ tests for the AND / OR / XORs of two SETCCs:
To test all combinations would have been 480 tests per opcode, so I instead tried to find a mix that tested

  1. For each vector element type, at least one compare each of Widen / Legal / Split.
  2. Selects of either smaller, same, in-between or greater vector type.

This is now 180 tests without (systematic) commutation, for all three opcodes.
Is commutation needed? Should I instead test everything? (~1500 tests)

The vec-cmpsel with just cmp/select, has 109 tests.

Other tests:
AMDGPU/fmax_legacy.ll/@test_fmax_legacy_ogt_v3f32
Crashed, because it didn't work with the v3f32 type.
Added check in patch to avoid vector types which are not sized with a power of 2.

AMDGPU/vselect64.ll
Crashed, because the VSELECT should be scalarized, and type legalizer couldn't handle the output of the new method.
Added a check in patch VTWillScalarize(), which checks if a VT will be split all the way to 1 element. If so, it aborts.

NVPTX/f16x2-instructions.ll
improved and solved a TODO vectorization problem :-)

X86/2011-10-19-widen_vselect.ll:
one instruction changed place (pshufd)

X86/avx512-mask-op.ll
X86/psubus.ll
Don't know - big diff - need help.

X86/vselect-pcmp.ll
two instructions removed - legal?

X86/avx512-mask-op.ll

You're making this substantially worse because i1 vectors are legal in avx-512, so you're forcing unnatural transforms.

The other changes look fine, at a glance.

For your new SystemZ tests, CHECK-NOT can be useful to make sure you aren't generating something unexpected, but please make sure each test at least CHECKs that the compares/blends you expect are being correctly generated.

I'll look at the new code later.

tra added a subscriber: tra.Feb 28 2017, 11:57 AM

Thank you for the patch.

test/CodeGen/NVPTX/f16x2-instructions.ll
425–426

I think these are still needed (see cvt.f32... below) and should become CHECK-NOF16-DAG now.

jonpa updated this revision to Diff 90149.Mar 1 2017, 3:26 AM

You're making this substantially worse because i1 vectors are legal in avx-512, so you're forcing unnatural transforms.

Aha.
I would then expect getSetCCResultType() to return a vector of i1's. But since v32i1 is not a simple type, v32i32 is returned instead. So I had to split the VT and then check, and if a i1-vector then is returned from getSetCCResultType(), abort. I added this check to the new method.

I found that there still was a slight diff, because earlier the DAGCombiner split a VSELECT and VSETCC by making two new narrow SETCCs. The DAGTypeLegalizer instead just splits the SETCC's result vector. When I changed this behaviour so that it instead does what DAGCombiner used to do, the regression disappears. This is done by SplitVSETCC(), which is copied from DAGCombiner. I suppose it should go in as a common utility function somewhere?

Tests updated per review.

jonpa marked an inline comment as done.Mar 1 2017, 3:28 AM
jonpa added inline comments.
test/CodeGen/NVPTX/f16x2-instructions.ll
425–426

Is this looking ok?

tra added inline comments.Mar 1 2017, 10:29 AM
test/CodeGen/NVPTX/f16x2-instructions.ll
425–426

LGTM.

jonpa marked an inline comment as done.Mar 5 2017, 11:38 PM

ping.

For the SystemZ tests, would it make sense to auto-generate the CHECK lines use update_llc_test_checks.py?

Please clang-format the new code.

lib/CodeGen/SelectionDAG/LegalizeTypesGeneric.cpp
546

Does SplitVSETCC do something different from just calling SplitVector? Your patch doesn't show any changes to avx512-mask-op.ll.

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2937

Could you just perform the SIGN_EXTEND/TRUNCATE to match the scalar size, then EXTRACT_SUBVECTOR/CONCAT_SUBVECTORS to match the number of elements? The invariants involving the relationship between the number of elements and the scalar sizes of MaskVT and ToMaskVT are confusing at best.

jonpa updated this revision to Diff 91167.Mar 9 2017, 5:56 AM
jonpa marked an inline comment as done.

Thanks for review - patch updated.

Please clang-format the new code.

done.

For the SystemZ tests, would it make sense to auto-generate the CHECK lines use update_llc_test_checks.py?

This results in

KeyError: "Triple 's390x-linux-gnu' is not supported"

Is the script expected to support SystemZ?

In any case, I have generated the tests so that they first CHECK the expected sequence, and then CHECK-NOT what is supposed to not be there. I could change this to a sequence of CHECK-NEXTs, instead if that's better...

lib/CodeGen/SelectionDAG/LegalizeTypesGeneric.cpp
546

Yes, it creates two SETCCs of half the size, instead of splitting the result vector of the wide SETCC. There is now change to that test anymore. See my comment from March 1st above.

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2937

Yes, this seems better, thanks :-)

Is the script expected to support SystemZ?

Not by default, but adding a target tends to be quite trivial - you should be able to copy + tweak the arm or powerpc targets very easily.

It'd be great if you could use the script and then commit the new systemz tests with the current codegen to trunk so that this patch can show the diff.

jonpa updated this revision to Diff 91521.Mar 13 2017, 1:25 AM

Not by default, but adding a target tends to be quite trivial - you should be able to copy + tweak the arm or powerpc targets very easily.
It'd be great if you could use the script and then commit the new systemz tests with the current codegen to trunk so that this patch can show the diff.

I am new to python, but I tried your suggestion, and it seems to work at least for these two test files. A minus is that the CHECKs end up between the two lines of arguments. I suppose I should rewrite the tests to have them on a single line and then re-generate? Do the changes to the script look ok?

I am still waiting for approval of the common code changes. Eli, are you happy with it now after I rewrote it per your suggestion?

/Jonas

efriedma accepted this revision.Mar 15 2017, 12:35 PM

Please commit the new SystemZ tests separately from the code changes, so the commit shows how the generated code changes.

Otherwise LGTM.

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2880

Needs to be #ifndef NDEBUG to avoid warnings in Release mode.

utils/update_llc_test_checks.py
74

You probably want to keep the BB#0 comment; that lets you verify that the first instruction in the function is actually the one you expect.

123

?

This revision is now accepted and ready to land.Mar 15 2017, 12:35 PM
jonpa updated this revision to Diff 91976.Mar 16 2017, 1:16 AM

Commited as r297930.

Thanks for review!

Just to make sure: I actually asked during review if the SplitVSETCC() function I copied into LegalizeTypesGeneric.cpp from DAGCombiner should become a class member instead somewhere. Did we miss that? I wasn't sure where the best place would be for it.

@uli: Do the new SystemZ tests and the script update look fine to you?

uweigand edited edge metadata.Mar 16 2017, 8:19 AM

@uli: Do the new SystemZ tests and the script update look fine to you?

LGTM.

@uli: Do the new SystemZ tests and the script update look fine to you?

LGTM.

If possible please can you add the script changes as a separate (pre) commit?

jonpa closed this revision.Mar 17 2017, 12:28 AM

@uli: Do the new SystemZ tests and the script update look fine to you?

LGTM.

If possible please can you add the script changes as a separate (pre) commit?

Committed r298048
M utils/update_llc_test_checks.py

LGTM.

tests commited as r298049

test/CodeGen/NVPTX/f16x2-instructions.ll