This is an archive of the discontinued LLVM Phabricator instance.

[AggressiveInstCombine] Add `shufflevector` instr support to `TruncInstCombine`
AbandonedPublic

Authored by anton-afanasyev on Mar 22 2022, 8:05 AM.

Details

Summary

Add shufflevector instruction to the expression graph post-dominated by trunc,
allowing TruncInstCombine to reduce bitwidth of expressions containing these
instructions.

Fixes #54149

Diff Detail

Unit TestsFailed

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2022, 8:05 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
anton-afanasyev requested review of this revision.Mar 22 2022, 8:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2022, 8:05 AM
anton-afanasyev edited the summary of this revision. (Show Details)Mar 22 2022, 8:06 AM

Do we expect that unused inputs of the shuffle has been already replaced with undef?

Do we expect that unused inputs of the shuffle has been already replaced with undef?

I don't see this could be an issue. For instance, @unary_shuffle() test-case contains shuffle with undef (line 25). Do you mean this case?

Do we expect that unused inputs of the shuffle has been already replaced with undef?

I don't see this could be an issue. For instance, @unary_shuffle() test-case contains shuffle with undef (line 25). Do you mean this case?

I mean, what if we have a two-input shuffle, and one of the operands is unused as per the shuffle mask.
Then, said operand can be replaced with undef, which doesn't affect the narrowing, while the original operand might?
I guess it's a theoretical question, mainly.

This might introduce regressions as the shuffle costs for the same mask but different element types can vary considerably (SSE v4i32/v4i16 unary shuffles are really cheap but v4i8 or v4i64 can be a lot more expensive).

anton-afanasyev abandoned this revision.May 14 2022, 7:47 AM

FWIW we might be able to perform something similar inside VectorCombine

FWIW we might be able to perform something similar inside VectorCombine

Do you mean using TTI.getShuffleCost()? There's an issue here: we don't know the exact shuffle type at the moment we need to get its cost. We infer this type (given by MinBitWidth) after the expression graph has been built already. Need to refactor whole pass for this case, which looks redundant.