Page MenuHomePhabricator

[SelectionDAG] Provide default expansion for rotates

Authored by kparzysz on Jun 4 2018, 9:16 AM.



Implement default legalization of rotates: either in terms of the rotation in the opposite direction (if legal), or in terms of shifts and ors.

Implement generating of rotate instructions for Hexagon. Hexagon only supports rotates by an immediate value, so implement custom lowering of ROTL/ROTR on Hexagon. If a rotate is not legal, use the default expansion.

Diff Detail


Event Timeline

kparzysz created this revision.Jun 4 2018, 9:16 AM

Please split ReassociateOrForRotate into a separate patch; it's conceptually separate and likely to affect other targets.

3929 ↗(On Diff #149783)

This lowering is wrong (consider c==0, c==64). Maybe (or (shl x, (and c, w-1), (srl x, (and (sub 0, c), w-1)) instead?

kparzysz marked an inline comment as done.Jun 4 2018, 11:56 AM
kparzysz updated this revision to Diff 149820.Jun 4 2018, 11:59 AM
kparzysz retitled this revision from [SelectionDAG] Create rotates more aggressively, provide default expansion to [SelectionDAG] Provide default expansion for rotates.
kparzysz edited the summary of this revision. (Show Details)
kparzysz added a reviewer: eli.friedman.

Removed the rotation generation from this patch, applied Eli's suggestions.

RKSimon added a subscriber: RKSimon.
RKSimon added inline comments.
4985 ↗(On Diff #149820)

This might be useful as general helper function - add it to the DAGCombiner class for reuse?

3912 ↗(On Diff #149820)

Use OpVT.getScalarSizeInBits() - hopefully we might be able to use this for vector cases as well at some point soon.

3918 ↗(On Diff #149820)

Move the Sub = DAG.getNode into the if() case? I don't think its used anywhere else.

1359 ↗(On Diff #149820)

Is this relevant to rotation support?

1366 ↗(On Diff #149820)

Is this relevant to rotation support?

2100 ↗(On Diff #149820)

I don't think you need the getNode() here

kparzysz marked 4 inline comments as done.Jun 5 2018, 6:53 AM
kparzysz added inline comments.
2100 ↗(On Diff #149820)

I try to keep the SDValue->SDNode conversions explicit in general.

kparzysz updated this revision to Diff 149968.Jun 5 2018, 6:57 AM

Applied Simon's comments.

RKSimon added inline comments.Jun 5 2018, 7:29 AM
3923 ↗(On Diff #149968)

Is there any way that we can get here with a bit width that isn't power-of-two? In which case we'd have to adjust this and use UREM not AND........

kparzysz marked an inline comment as done.Jun 5 2018, 7:40 AM
kparzysz updated this revision to Diff 149982.Jun 5 2018, 7:43 AM

Replaced (and x, w-1) with (urem x, w).

kparzysz updated this revision to Diff 149987.Jun 5 2018, 7:57 AM

Remove incorrect assertion that was added in the previous patch. The version with urem works for non-power-of-2 as well.

Currently it's impossible to reach LegalizeDAG with an integer that isn't a power of two (only power-of-two integers can be specified as legal types), but I guess it's better to future-proof.

I like to see a test with explicit CHECK lines for the assembly generated using the legalization in SelectionDAGLegalize::ExpandNode, so we'll know if something changes.

Turns out the UREM approach has a downside: the UREM is legalized before it gets to the combiner, so on architectures that don't have div/mod, it will be expanded into a libcall. I'm going to change it back to an AND.

The testcase rotate.ll already has a couple of cases where the expansion takes place (f1, f3, f5, and f7), but now it only checks that there was no rotate used. I can add check for the actual assembly.

kparzysz updated this revision to Diff 150046.Jun 5 2018, 2:47 PM

Using UREM ended up calling runtime functions (since UREM is a libcall on Hexagon), so I switched it back to using AND. Eli's comment confirms that only types that are power-of-2 bits wide can exist at this point.

Per Eli's request, expanded the testcase to include checks for the actual expansion code.

efriedma added inline comments.Jun 6 2018, 3:09 PM
107 ↗(On Diff #150046)

This looks correct, but low quality. Trunk generates 3 instructions, the ideal rotate expansion would generate 5 instructions (I think), this is 6 instructions.

kparzysz updated this revision to Diff 150315.Jun 7 2018, 6:27 AM

Removed a redundant AND from rotate expansion.

kparzysz marked an inline comment as done.Jun 7 2018, 6:27 AM

LGTM @efriedma any more comments?

The AND you removed wasn't redundant: 32-0 is 32, not 0.

(When I said there was a redundant operation, I was referring to the second SUB.)

(When I said there was a redundant operation, I was referring to the second SUB.)

Yeah, the first sub came from ROTR->ROTL conversion, so it was actually 5 instructions for the ROTL alone. I think we should be able to optimize it further on targets where large shift amounts generate 0 (in DAG combiner?).

kparzysz updated this revision to Diff 150550.Jun 8 2018, 11:56 AM

It's now 5 instructions everywhere, and the and is back.

Hmm... yes, I guess the AND isn't necessary if the shift produces either zero or the input, which is true for every target I can think of. But (SHL x, 32) is defined to produce undef for a 32-bit input, not some target-specific result, so you can't depend on that. Might make sense to add a dedicated target-independent opcode to represent it.

This revision is now accepted and ready to land.Jun 11 2018, 5:03 PM
This revision was automatically updated to reflect the committed changes.