Page MenuHomePhabricator

[AArch64] Try to re-use extended operand for SETCC with vector ops.
ClosedPublic

Authored by fhahn on Feb 24 2022, 6:52 AM.

Details

Summary

Try to re-use an already extended operand for SetCC with vector operands
feeding an extended select. Doing so avoids requiring another full
extension of the SET_CC result when lowering the select.

This improves lowering for certain extend/cmp/select patterns operating.
For example with v16i8, this replaces 6 instructions for the extra extension
with 4 separate selects.

This improves the generated code for loops like the one below in
combination with D96522.

int foo(uint8_t *p, int N) {
  unsigned long long sum = 0;
  for (int i = 0; i < N ; i++, p++) {
    unsigned int v = *p;
    sum += (v < 127) ? v : 256 - v;
  }
  return sum;
}

https://clang.godbolt.org/z/Wco866MjY

On the AArch64 cores I have access to, the patch improves performance of
the vector loop by ~10%.

This could be generalized per follow-ups, but the initial version
targets one of the more important cases in combination with D96522.

Alive2 modeling:

Diff Detail

Event Timeline

fhahn created this revision.Feb 24 2022, 6:52 AM
fhahn requested review of this revision.Feb 24 2022, 6:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2022, 6:52 AM

Sorry for the delay. It took a while to find the time to figure out what was wrong, but it turned out that was in the other patch.

The change sounds OK to me, although it might be useful to generalize it to handle more types. This seems like it is useful in a few more cases: https://godbolt.org/z/fo8MMrqrE. I'm not sure if this generalises to not just select uses.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18075

Do you have a test for multiple uses?

19910

Whitespace

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 2:43 AM
fhahn updated this revision to Diff 413756.Mar 8 2022, 3:39 AM
fhahn marked 2 inline comments as done.

Rebased on top of new test case, removed whitespace change.

Sorry for the delay. It took a while to find the time to figure out what was wrong, but it turned out that was in the other patch.

The change sounds OK to me, although it might be useful to generalize it to handle more types. This seems like it is useful in a few more cases: https://godbolt.org/z/fo8MMrqrE. I'm not sure if this generalises to not just select uses.

Thanks for taking a look! Agreed that it will be helpful for a couple of other type combinations. But maybe the generalization should be done as follow-ups?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18075

I added one in 3836003e87ee, thanks!

19910

Removed, thanks!

Thanks for taking a look! Agreed that it will be helpful for a couple of other type combinations. But maybe the generalization should be done as follow-ups?

Any reason to do that in a followup, as opposed to doing it here?

fhahn updated this revision to Diff 442032.Jul 4 2022, 2:12 AM

Update to lift restrictions on specific vector types.

Thanks for taking a look! Agreed that it will be helpful for a couple of other type combinations. But maybe the generalization should be done as follow-ups?

Any reason to do that in a followup, as opposed to doing it here?

Mostly time constraints :)

I finally had some time to generalize this to support vector types in general

This sounds good as far as I can tell. If we are dealing with a dag combine, do we need to worry about odd types at all? And I think we can include eq and ne too, if alive is to be believed.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18082

I think this can skip the getScalarType call: UseMVT.getScalarSizeInBits() <= Op0MVT.getScalarSizeInBits()

18104

Could these include eq and ne conditions too?

llvm/test/CodeGen/AArch64/vselect-ext.ll
156–157

Can you remove all of these with the lower cases.

fhahn updated this revision to Diff 442777.Jul 6 2022, 10:25 PM
fhahn marked 3 inline comments as done.

This sounds good as far as I can tell. If we are dealing with a dag combine, do we need to worry about odd types at all?

Do you mean worry in terms of whether it is optimal? There are a few test cases with odd types, and it seems like they are handled OK on that front.

And I think we can include eq and ne too, if alive is to be believed.

Thanks, I added support for EQ/NE. The type of extend shouldn't matter, as long as we chose the existing one.

dmgreen accepted this revision.Jul 7 2022, 12:11 AM

This sounds good as far as I can tell. If we are dealing with a dag combine, do we need to worry about odd types at all?

Do you mean worry in terms of whether it is optimal? There are a few test cases with odd types, and it seems like they are handled OK on that front.

I meant for awkward types like i23 and i128. They don't have to be optimal, so long as they work OK. Just so long as we have a couple of tests, to make sure it doesnt run into problems.

The change sounds good with a small simplification suggestion. LGTM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18103

I think this might be able to combine the if blocks together for the ne/eq conditions:

if (Op0SExt && (isSignedIntSetCC(CC) || isIntEqualitySetCC(CC))) {
  Op0ExtV = SDValue(Op0SExt, 0);
  Op1ExtV = DAG.getNode(ISD::SIGN_EXTEND, DL, UseMVT, Op->getOperand(1));
} else if (Op0ZExt && (isUnsignedIntSetCC(CC) || isIntEqualitySetCC(CC))) {
  Op0ExtV = SDValue(Op0ZExt, 0);
  Op1ExtV = DAG.getNode(ISD::ZERO_EXTEND, DL, UseMVT, Op->getOperand(1));
} else
  return SDValue();
This revision is now accepted and ready to land.Jul 7 2022, 12:11 AM
fhahn updated this revision to Diff 443064.Jul 7 2022, 2:28 PM

Rebase on top of additional tests.

fhahn retitled this revision from [AArch64] Try to re-use extended operand for SETCC with v16i8 operands. to [AArch64] Try to re-use extended operand for SETCC with vector ops..Jul 7 2022, 4:42 PM
fhahn edited the summary of this revision. (Show Details)
fhahn updated this revision to Diff 443087.Jul 7 2022, 4:48 PM

Simplify check as suggested.

fhahn added a comment.Jul 7 2022, 4:49 PM

This sounds good as far as I can tell. If we are dealing with a dag combine, do we need to worry about odd types at all?

Do you mean worry in terms of whether it is optimal? There are a few test cases with odd types, and it seems like they are handled OK on that front.

I meant for awkward types like i23 and i128. They don't have to be optimal, so long as they work OK. Just so long as we have a couple of tests, to make sure it doesnt run into problems.

Ah got it! I added a few tests with vectors of i13 and i15. Looks like this handled well, because they are first converted to legal vector types.

The change sounds good with a small simplification suggestion. LGTM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18082

Thanks, simplified!

18103

Simplified as suggested, thanks!

18104

Yes, updated!

llvm/test/CodeGen/AArch64/vselect-ext.ll
156–157

Yeah I removed those. They were added by accident :(

This revision was landed with ongoing or failed builds.Jul 7 2022, 4:51 PM
This revision was automatically updated to reflect the committed changes.