This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Improve fshl cost modeling if 3rd arg is constant.
ClosedPublic

Authored by zjaffal on Mar 31 2023, 6:23 AM.

Details

Summary

In that case, the cost for i32 and i64 should be 1 (a single EXTR
instruction). For v4i32 and v2i64 it should be 3 (USHR + SHL + ORR).

Integer types smaller than i32 will get legalized to i32 and share its
cost.

This recovers a SLP regression revealed by D140392.

Diff Detail

Event Timeline

fhahn created this revision.Mar 31 2023, 6:23 AM
fhahn requested review of this revision.Mar 31 2023, 6:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 31 2023, 6:23 AM
dmgreen added a comment.EditedApr 2 2023, 4:51 AM

Hello. I looked briefly into funnel shifts recently as I had noticed regressions (from a patch that was not in the end committed). Some things I am aware of:

  • If both operands are the same then it simplifies to a ror. I don't think that modifies the costs here though.
  • An extr with two different operands can be a higher cost instruction. Again I don't think that needs to change the costs though
  • For integers less than i32, one of the operands needs to be shifted so that the bits are in the top of the register. So the cost is 1 higher that i32/i64.
  • For vectors we should really be generating USHR+SLI, but we don't do that yet.
  • I believe everything that applies to fshl in this patch should also equally apply to fshr? So long as the shift amount is const the costs should all be the same.

It is probably worth adding a TODO/FIXME that non-const shifts are not yet added.

fhahn updated this revision to Diff 511032.Apr 5 2023, 2:53 AM

Add TODOs, use cost 2 for i8 and i16.

Hello. I looked briefly into funnel shifts recently as I had noticed regressions (from a patch that was not in the end committed). Some things I am aware of:

Thanks for taking a look!

  • If both operands are the same then it simplifies to a ror. I don't think that modifies the costs here though.
  • An extr with two different operands can be a higher cost instruction. Again I don't think that needs to change the costs though
  • For integers less than i32, one of the operands needs to be shifted so that the bits are in the top of the register. So the cost is 1 higher that i32/i64.

Yes, the thing I was struggeling with was how this could be modeled with CostTableLookup. It would automatically promote i8/i16 to MVT::i32 and I didn't find where it would account for the extra conversion cost. I added checking using the IR types for now.

  • For vectors we should really be generating USHR+SLI, but we don't do that yet.

That sounds like a good improvement!

  • I believe everything that applies to fshl in this patch should also equally apply to fshr? So long as the shift amount is const the costs should all be the same.

I *think* so. I'll create a test for fshr based on fshl.ll and then submit a follow-up patch if that sounds good to you?

It is probably worth adding a TODO/FIXME that non-const shifts are not yet added.

I added 2 TODOs as suggsted,

dmgreen added inline comments.Apr 13 2023, 1:56 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
531

I would just combine fshr into a single patch I think. They seem to be essentially the same.

539

Can you add i16 and i8 vector types too? And 64bit vector sizes like v2i32.
Can you add a FIXME that the cost could be lower if the codegen was better.

548–549

Why does this not use RetTy, as above?

551

What is this guarding against? Other vector types?

561

I don't thinks its "equal to i8 or i16", it should be "not equal to i32 or i64", and any smaller sizes should get a larger cost. Or any larger sizes not a multiple of 64.

llvm/test/Analysis/CostModel/AArch64/fshl.ll
199

v2i64

fhahn added a subscriber: zjaffal.Apr 17 2023, 3:46 AM

Thanks for the latest comments @dmgreen. I'll ask @zjaffal to take over this patch and address the comments.

zjaffal updated this revision to Diff 514648.Apr 18 2023, 8:22 AM

Add more types to CostTblEntry

use HigherCost boolean to determine the cost for scalar fshl intrinsics.

Thanks for taking this over.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
552

Is this correct, that it's breaking if it is an integer type?

559

!= 64 is covered by < 64

568

There is a debug message left here, which is coming out in the tests too.

572

Do you have any opinions of whether this should be TyL.first * Cost (with a Cost of 1/2) or TyL.first + Cost (with a cost of 0/1)? It might capture the "extra shift" a little better, for integer types at least.
As far as I can see NumElements is always 1 here?

llvm/test/Analysis/CostModel/AArch64/fshl.ll
232

Should this be with equal operands and i66? Otherwise it would not get past the isUniform check. Or is it intended to check the other code?
We are probably getting into pretty uncommon cases here either way :)

zjaffal commandeered this revision.Apr 19 2023, 2:23 AM
zjaffal added a reviewer: fhahn.
zjaffal added inline comments.Apr 19 2023, 5:00 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
572

As far as I can see NumElements is always 1 here?

I was trying to see if the same code works for vector instructions but I think it might be easier to do that in a different patch given the following code is an overestimate for the true cost.

Do you have any opinions of whether this should be TyL.first * Cost (with a Cost of 1/2) or TyL.first + Cost (with a cost of 0/1)? It might capture the "extra shift" a little better, for integer types at least.

I will test it now and see if there are any differences.

zjaffal updated this revision to Diff 514963.Apr 19 2023, 8:19 AM
  1. Merge fshr cost modeling with this patch.
  2. Remove unnecessary debug output
  3. Fix cost model for scalar types. Using TyL.first + Cost instead of multiplication.
dmgreen accepted this revision.Apr 19 2023, 10:57 AM

Thanks for taking this over and doing the updates. There are a few extra suggestions inline but otherwise LGTM.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
549

You could just pass Intrinsic::fshl if the costs are expected to be the same, so long as there is a comment explaining it. It would help reduce the size of the table.

564–569

I think this can just be unsigned Cost = HigherCost ? 1 : 0. The comment would be good to keep though.
Maybe call it "ExtraCost" too, to show it's an addition not the base cost.

llvm/test/Analysis/CostModel/AArch64/fshl.ll
4

If there isn't already one, then adding a llvm.fshr version of this test file would be good too.

236

Maybe add a test for i128 too? It might be relatively common from crypto functions and would help test one of the codepaths.

This revision is now accepted and ready to land.Apr 19 2023, 10:57 AM
This revision was automatically updated to reflect the committed changes.

I committed the patch on @zjaffal 's behalf as he is on vacation and this helps recover a regression we are seeing in our tracking.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
549

done in the committed version.

564–569

I tried that but it was causing some changes so I kept the original version for now. I updated the name.

llvm/test/Analysis/CostModel/AArch64/fshl.ll
4

yep, there's fhsr.ll which is a copy of fshl.ll with fshl replaced by fshr.

236