Page MenuHomePhabricator

[Intrinsic] Add sshl.sat/ushl.sat, saturated shift intrinsics.
ClosedPublic

Authored by ebevhan on Jul 6 2020, 5:00 AM.

Details

Summary

This patch adds two intrinsics, llvm.sshl.sat and llvm.ushl.sat,
which perform signed and unsigned saturating left shift,
respectively.

These are useful for implementing the Embedded-C fixed point
support in Clang, originally discussed in
http://lists.llvm.org/pipermail/llvm-dev/2018-August/125433.html
and
http://lists.llvm.org/pipermail/cfe-dev/2018-May/058019.html

RFC: https://lists.llvm.org/pipermail/llvm-dev/2020-July/143195.html

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Was there an RFC for this?
While i agree it likely makes sense to have these for consistency,
i'm not sure why they are *needed* for implementing the Embedded-C fixed point support in Clang.

llvm/lib/IR/Verifier.cpp
5006–5009

I don't think it makes sense to limit these to scalars.

nikic added a subscriber: nikic.Jul 6 2020, 1:39 PM
ebevhan marked an inline comment as done.Jul 7 2020, 1:48 AM

Was there an RFC for this?

No explicit RFC for these particular intrinsics, but they have been mentioned in the larger scope of fixed-point support:

I notice now that Leonard's mail does not mention saturated shifts, but my older one does.

While i agree it likely makes sense to have these for consistency,
i'm not sure why they are *needed* for implementing the Embedded-C fixed point support in Clang.

Yes, "needed" might be a stronger wording than necessary. I originally wrote "useful" but was concerned it wasn't strong enough.
Of course, they aren't needed per se, but it becomes more of a hassle to select instructions for the operations if there are no intrinsics.

llvm/lib/IR/Verifier.cpp
5006–5009

The add.sat and sub.sat intrinsics were given vector operands because they were useful for some of x86's vector instructions. I couldn't see any such operations for shifts, but I can add the vector type support for consistency.

ebevhan updated this revision to Diff 275963.Jul 7 2020, 2:45 AM

Add vector support and TD isel nodes.

Thanks. This looks good to me in principle.

Alive2 support for these intrinsics: https://github.com/AliveToolkit/alive2/pull/448

llvm/include/llvm/CodeGen/ISDOpcodes.h
313–314

I'm not sure what left shift on 2 integers means.
Perhaps this needs some rewording.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
818

Op2Promoted = ZExtPromotedInteger(Op2);

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7402–7407

Have you checked if naive x != ((x << y) u/s>> y) results in worse lowering?

7419–7420

Why not just change predicate to ISD::SETUGE?

llvm/test/CodeGen/X86/sshl_sat.ll
10

Add i32 test while at it?

ebevhan updated this revision to Diff 276377.Jul 8 2020, 4:49 AM

Addressed review comments.

ebevhan marked 5 inline comments as done.Jul 8 2020, 4:51 AM
ebevhan added inline comments.
llvm/include/llvm/CodeGen/ISDOpcodes.h
313–314

I lifted it from the other node descriptions, but it doesn't really make sense for shift. I changed it a bit.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7402–7407

The CTLZ approach was the one that popped into my head first, so I went with that. But it does turn out that yours works a bit better, at least for sshl.sat, so I swapped it out.

Some more thoughts.

llvm/docs/LangRef.rst
14551

This should be

'``llvm.sshl.sat.*``' Intrinsics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14570

Here and elsewhere: i strongly suspect it should be s/saturation/saturating/

14600

same

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
808

Assert that we only ever get ISD::USHLSAT/ISD::SSHLSAT ?

817

Actually, why do we need to signext, or even zeroext it?
As the comment before function notes, we want anyext, we don't care about those new high bits,
because we are immediately going to shift them out.

----------------------------------------
Name: promote ushl
  %r = ushl_sat i8 %x, %y
  ret i8 %r
=>
  %x_wide = zext i8 %x to i16
  %y_wide = zext i8 %y to i16
  %t0 = shl i16 %x_wide, 8
  %t1 = ushl_sat i16 %t0, %y_wide
  %t2 = lshr i16 %t1, 8
  %r = trunc i16 %t2 to i8
  ret i8 %r

Done: 1
Transformation seems to be correct!

----------------------------------------
Name: promote sshl
  %r = sshl_sat i8 %x, %y
  ret i8 %r
=>
  %x_wide = zext i8 %x to i16
  %y_wide = zext i8 %y to i16
  %t0 = shl i16 %x_wide, 8
  %t1 = sshl_sat i16 %t0, %y_wide
  %t2 = ashr i16 %t1, 8
  %r = trunc i16 %t2 to i8
  ret i8 %r

Done: 1
Transformation seems to be correct!

So i think you want

SDValue Op1Promoted = GetPromotedInteger(Op1);
SDValue Op1Promoted = GetPromotedInteger(Op2);
unsigned ShiftOp = Opcode == ISD::USHLSAT ? ISD::SRL : ISD::SRA;

and maybe get rid of ShiftOp variable, or sink it closer to use.

ebevhan updated this revision to Diff 276410.Jul 8 2020, 6:44 AM

Addressed review comments.

ebevhan marked 5 inline comments as done.Jul 8 2020, 6:46 AM
ebevhan added inline comments.
llvm/docs/LangRef.rst
14551

I made the change to the rest of the saturating/fixedpoint intrinsics as well.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
817

Ah, that's true. I grabbed it from the ADDSUBSAT promotion without thinking. That needs the proper extension due to the min/max expansion, I think.

Patch as-is looks good but i'm not sure what's the RFC status here.
If these new intrinsics were already previously proposed as part of some RFC that got accepted,
can you state that in the patch's description? (with link to the thread)

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
817

Err, i was half-right i think. I guess we actually need to zext the shift amount.

ebevhan updated this revision to Diff 276432.Jul 8 2020, 7:37 AM

Fixed review comment and updated summary.

Fixed review comment and updated summary.

(note that updating commit msg does not automatically update description in phab differential)

ebevhan edited the summary of this revision. (Show Details)Jul 8 2020, 7:41 AM

Patch as-is looks good but i'm not sure what's the RFC status here.
If these new intrinsics were already previously proposed as part of some RFC that got accepted,
can you state that in the patch's description? (with link to the thread)

I added the links to the threads I mentioned earlier.

Looking back at the full discussion, it doesn't really seem like any real consensus regarding how to implement the types was reached, but the prevailing view was that an altogether new IR type was the best approach. I don't think either I or Leonard thought that was the right (or fastest, at least) way to go, though.

The final listing of intrinsics was in http://lists.llvm.org/pipermail/llvm-dev/2018-September/126311.html but the design has diverged a bit from that since then.

Fixed review comment and updated summary.

(note that updating commit msg does not automatically update description in phab differential)

I noticed! I was looking for an arc option to do that but couldn't seem to find one.

Patch as-is looks good but i'm not sure what's the RFC status here.
If these new intrinsics were already previously proposed as part of some RFC that got accepted,
can you state that in the patch's description? (with link to the thread)

I added the links to the threads I mentioned earlier.

Looking back at the full discussion, it doesn't really seem like any real consensus regarding how to implement the types was reached, but the prevailing view was that an altogether new IR type was the best approach. I don't think either I or Leonard thought that was the right (or fastest, at least) way to go, though.

The final listing of intrinsics was in http://lists.llvm.org/pipermail/llvm-dev/2018-September/126311.html but the design has diverged a bit from that since then.

i see.

I think this is fine, but just to be safe, may i suggest to do an RFC for these two intrinsics specifically,
just so we're 100% sure everyone is on the same page about them?

Fixed review comment and updated summary.

(note that updating commit msg does not automatically update description in phab differential)

I noticed! I was looking for an arc option to do that but couldn't seem to find one.

Sorry, it's just a repeating issue in many reviews :/

i see.

I think this is fine, but just to be safe, may i suggest to do an RFC for these two intrinsics specifically,
just so we're 100% sure everyone is on the same page about them?

Sure, I'll send one out.

efriedma added inline comments.
llvm/docs/LangRef.rst
14577

Not sure what "must be" means in this context; the shift amount is a variable, so we can't enforce anything about it statically.

Is it poison? Or undefined behavior? Or does the shift clamp to the min/max value?

lebedev.ri added inline comments.Jul 8 2020, 11:45 AM
llvm/docs/LangRef.rst
14577

Right, we should spell that out.
IMO this should be consistent with normal shifts, i.e. poison),

If ``b`` is (statically or
dynamically) equal to or larger than the number of bits in
``a``, this instruction returns a :ref:`poison value <poisonvalues>`.
If the arguments are vectors, each vector element of ``a`` is shifted
by the corresponding shift amount in ``b``.
ebevhan updated this revision to Diff 276683.Jul 9 2020, 3:08 AM

Addressed review comments.

ebevhan marked an inline comment as done.Jul 9 2020, 3:10 AM

Any more on this?

arsenm added a subscriber: arsenm.Wed, Jul 15, 5:38 PM

Can you also add the GlobalISel instructions etc.

llvm/docs/LangRef.rst
14767

Unrelated changes

15102

Separate change

arsenm added inline comments.Wed, Jul 15, 5:48 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
795

This code is exactly the same as for ADD/SUB sat and should be shared

ebevhan added inline comments.Thu, Jul 16, 4:36 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
795

The code is not exactly the same, but some of it is. I could try factoring it in. I find that code that does a lot is a bit harder to read, though.

Any more on this?

Still looks good to me, but i'd like to see some (positive?) feedback in the actual RFC llvm-dev thread from the usual suspects (@lattner @spatel @nikic)

llvm/docs/LangRef.rst
14767

Please just commit these changes to already-existing lines directly.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
795

While i'm usually against duplication, i'm not not very sure that in this case it is better to deduplicate.

ebevhan updated this revision to Diff 278494.Thu, Jul 16, 8:23 AM

Added GlobalISel opcodes. I'm not that familiar with GISEL, though, so I've probably not done it completely right.

ebevhan marked 3 inline comments as done.Thu, Jul 16, 8:55 AM
ebevhan added inline comments.
llvm/docs/LangRef.rst
14767

I factored out the other fixes into a separate commit.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
795

I deduplicated anyway. I don't think it ended up too bad.

nikic accepted this revision.Sun, Jul 19, 7:02 AM
nikic added a reviewer: arsenm.

LG apart from the GlobalISel part, which I know nothing about. @arsenm Could you please review the GlobalISel portion?

This revision is now accepted and ready to land.Sun, Jul 19, 7:02 AM
arsenm added inline comments.Mon, Jul 20, 9:43 AM
llvm/include/llvm/Target/GenericOpcodes.td
551

The shift amount type doesn't have to match the shift value type, so src2 should use type1

559

Ditto

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sshlsat.mir
18

These all failed to legalize.

You need something like this in AMDGPULegalizerInfo

getActionDefinitionsBuilder({G_SSHLSAT, G_USHLSAT})
  .scalarize(0)
  .clampScalar(0, S32)
  .lower();

to actually trigger any of the legalization code

ebevhan added inline comments.Mon, Aug 3, 7:12 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sshlsat.mir
18

Okay, I thought something was off.

Does this mean that any target that wants these legalized with the default legalization needs to specify this explicitly?

arsenm added inline comments.Mon, Aug 3, 7:40 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sshlsat.mir
18

Yes, nearly everything is opt-in in GlobalISel

ebevhan updated this revision to Diff 282833.Tue, Aug 4, 2:03 AM

Rebased; added more GlobalIsel legalization.

arsenm added a comment.Tue, Aug 4, 1:40 PM

GlobalISel parts LGTM

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
5930

Extra blank line here

lebedev.ri edited the summary of this revision. (Show Details)Tue, Aug 4, 1:48 PM

Still no replies to RFC thread :/

llvm/test/CodeGen/X86/sshl_sat.ll
12

It is best to put vector tests into a separate file from the getgo
See e.g. uadd_sat_vec.ll

llvm/test/CodeGen/X86/ushl_sat.ll
8

It seems i8 is the only "basic" bit width missing.
It would probably good to have it.

ebevhan updated this revision to Diff 283200.Wed, Aug 5, 4:48 AM

Added i8 test and moved vector test to separate file.

lebedev.ri accepted this revision.Wed, Aug 5, 5:01 AM

LGTM, but i would have preferred to see more feedback on the RFC thread.
That being said, unless you are planning on forming calls to these intrinsics in middle-end transform passes
(i.e. only planning on using them in clang codegen), i think this is okay to proceed.

LGTM, but i would have preferred to see more feedback on the RFC thread.
That being said, unless you are planning on forming calls to these intrinsics in middle-end transform passes
(i.e. only planning on using them in clang codegen), i think this is okay to proceed.

At least for now, I'm not planning to emit them in middleend. I wanted to get most of the basic support in Clang and LLVM before looking at optimization.

However, based on the discussions in D82663, I am planning on sending an RFC for moving the fixed-point semantics and value class to LLVM, and adding a Builder class similar to the MatrixBuilder.

bjope added a comment.Wed, Aug 5, 3:04 PM

LGTM, but i would have preferred to see more feedback on the RFC thread.
That being said, unless you are planning on forming calls to these intrinsics in middle-end transform passes
(i.e. only planning on using them in clang codegen), i think this is okay to proceed.

At least for now, I'm not planning to emit them in middleend. I wanted to get most of the basic support in Clang and LLVM before looking at optimization.

However, based on the discussions in D82663, I am planning on sending an RFC for moving the fixed-point semantics and value class to LLVM, and adding a Builder class similar to the MatrixBuilder.

I think middle-end rewrites such as constant folding, or rewriting (sshl.sat (sshl.sat X, Y), Z) -> (sshl.sat X, Y+Z) is OK.
As Roman said, we should probably think twice before we start rewrite arbitrary things into using these intrinsics (such as rewriting a saturated multiply into a saturated shift).

Still, we probably want to have a canonical form for things like "saturated multiplication by 2" in the future. I think that a lot of these saturated left shifts can be done using a saturated multiplication. Similarly saturated right shifts can often be described using saturated division. At least when the shift count is known. For unknown shift counts these intrinsics will make it much easier for a backend target that has such shifts in the instruction set to do the right thing.

This revision was landed with ongoing or failed builds.Fri, Aug 7, 6:10 AM
This revision was automatically updated to reflect the committed changes.