This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine][ARM] Combine pattern for REV16
ClosedPublic

Authored by SjoerdMeijer on Feb 5 2020, 2:41 AM.

Details

Summary

This adds another pattern matcher to the combiner to generate the REV16 instruction for a case that we were not handling.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Feb 5 2020, 2:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2020, 2:41 AM
RKSimon added inline comments.Feb 5 2020, 3:58 AM
llvm/test/CodeGen/Thumb2/thumb2-rev16.ll
2

Might be best if you regenerate+commit this file against trunk and then rebase the patch so it shows the full codegen diff.

Ah yes, thanks for the tip.

RKSimon added inline comments.Feb 5 2020, 5:54 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5657

This looks like it only works for MVT::i32 cases, but there doesn't seem to be any VT check?

5663

Use isConstOrConstSplat instead of dyn_cast<ConstantSDNode> ?

SjoerdMeijer marked an inline comment as done.Feb 5 2020, 7:33 AM
SjoerdMeijer added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5704

This looks like it only works for MVT::i32 cases, but there doesn't seem to be any VT check?

That is checked here, but I will add an assert to MatchBSwapHWordOrAndAnd

Thanks for looking. Comments addressed

spatel added inline comments.Feb 11 2020, 9:15 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5655

Use current formatting rules, so lowerCamel: "matchBSwap..."
(can fix the related function names as a preliminary step)

5658

Also assert that N->getOpcode() == ISD::OR ?

5667–5669

This is too specific - what if the operands of the "or" are commuted?

This patch should have a test for that possibility:

define i32 @bswap_ror_commuted(i32 %a) {
    %l8 = shl i32 %a, 8
    %r8 = lshr i32 %a, 8
    %mask_l8 = and i32 %l8, 4278255360
    %mask_r8 = and i32 %r8, 16711935
    %tmp = or i32 %mask_r8, %mask_l8
    ret i32 %tmp
}
5674–5675

These could use more descriptive names: ShiftAmt{1/2}.
These are using isConstOrConstSplat(), but the 'and' mask operands are not. Change the above code to match this?

Thanks for looking! Comments addressed. I have also added more test for the cases that shouldn't trigger.

spatel added inline comments.Feb 13 2020, 9:38 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5668–5671

This check works, but it makes me nervous because I think it requires demanded bits to alter the code for correctness.
Take this example where the masks are paired with the wrong shift op (and please add a test like this to the patch):

define i32 @not_rev16(i32 %a) {
    %l8 = shl i32 %a, 8
    %r8 = lshr i32 %a, 8
    %mask_r8 = and i32 %r8, 4278255360
    %mask_l8 = and i32 %l8, 16711935
    %tmp = or i32 %mask_r8, %mask_l8
    ret i32 %tmp
}

To be safe, I think we should enforce that the masks and shifts are paired correctly.
So we could do something like:

// Canonicalize mask ops to ensure that shift-left operand is on the left.
if (Mask2 == 0xff00ff00) {
  std::swap(N0, N1);
  std::swap(Mask1, Mask2)
}

or maybe better - go back to the earlier rev of this patch and call this function with the operands reversed:

if (SDValue BSwap = matchBSwapHWordOrAndAnd(TLI, DAG, N, N0, N1, VT,
                                            getShiftAmountTy(VT)))
  return BSwap;
// Try again with commuted operands.
if (SDValue BSwap = matchBSwapHWordOrAndAnd(TLI, DAG, N, N1, N0, VT,
                                            getShiftAmountTy(VT)))
  return BSwap;

Many thanks again for the feedback! Really liked that suggestion: so now calling the same function twice but with the operands swapped to check the commuted case. Have also added the suggested test case.

Many thanks again for the feedback! Really liked that suggestion: so now calling the same function twice but with the operands swapped to check the commuted case. Have also added the suggested test case.

There's 1 more concern for this transform - do we need to limit it when the intermediate values have other uses? This will be interesting because the answer may be different for different targets. Ie, for ARM/Thumb/AArch64, we're able to reduce the whole sequence to a single rev16, so it's always a good transform. But for x86, we're going to need a bswap and ror.

So we need to:

  1. Add a test like this:
define i32 @extra_maskop_uses2(i32 %a) {
    %l8 = shl i32 %a, 8
    %r8 = lshr i32 %a, 8
    %mask_l8 = and i32 %l8, 4278255360
    %mask_r8 = and i32 %r8, 16711935
    %or = or i32 %mask_r8, %mask_l8
    %mul = mul i32 %mask_r8, %mask_l8   ; use the mask ops for some other reason 
    %r = mul i32 %mul, %or              ; and use that result for some other reason
    ret i32 %r
}
  1. Copy that test (maybe the whole test file) over to llvm/tests/CodeGen/X86 and generate CHECK lines with utils/update_llc_test_checks.py.
  2. Possibly add some 'hasOneUse()' logic to the code to avoid regressions.

Please push the test changes to trunk before updating in this review (no need for pre-commit review unless you have questions/concerns).

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5664–5665

Nit: the input variables are "0" and "1", but this shifts the indexing to "1" and "2". It would be better to make this consistent (same for the "Shift" variables below this).

Most of the code in DAGCombiner uses "0"-based indexing.

Thanks, I will first add the tests (will indeed add the whole test file). Will do that on Monday, and after that, will return and revisit this patch

Fixed the off-by-1 error in the variable naming, and added hasOneUse checks for the AND instructions.

spatel accepted this revision.Feb 17 2020, 6:07 AM

LGTM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5664

This check seems over-protective. If only one of the 'and' ops has extra uses, the transform is probably still worth doing. It's ok to add a TODO comment here if you want and make that a small follow-up patch after adding more tests to exercise those cases.

llvm/test/CodeGen/X86/rev16.ll
227–228

Remove TODO comment - this is working as expected.
It also works with aarch64. The missed transform for arm/thumb seems to be because there's an early target-specific combine that creates "bfi", so we miss the generic pattern matching for bswap because it only runs with !LegalOperations. I'm not sure if/why we need that restriction.

This revision is now accepted and ready to land.Feb 17 2020, 6:07 AM

Thanks, this has been a good and useful introduction to DAGCombine for me :-)
Before committing, I will add a TODO saying that this hasOneUse is a bit over-protective for now (will follow up later), and will remove the TODO from the test case.

This revision was automatically updated to reflect the committed changes.