This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Shrink integer operations into the smallest type possible
ClosedPublic

Authored by jmolloy on Oct 8 2015, 5:58 AM.

Details

Summary

C semantics force sub-int-sized values (e.g. i8, i16) to be promoted to int
type (e.g. i32) whenever arithmetic is performed on them.

For targets with native i8 or i16 operations, usually InstCombine can shrink
the arithmetic type down again. However InstCombine refuses to create illegal
types, so for targets without i8 or i16 registers, the lengthening and
shrinking remains.

Most SIMD ISAs (e.g. NEON) however support vectors of i8 or i16 even when
their scalar equivalents do not, so during vectorization it is important to
remove these lengthens and truncates when deciding the profitability of
vectorization.

The algorithm this uses starts at truncs and icmps, trawling their use-def
chains until they terminate or instructions outside the loop are found (or
unsafe instructions like inttoptr casts are found). If the use-def chains
starting from different root instructions (truncs/icmps) meet, they are
unioned. The demanded bits of each node in the graph are ORed together to form
an overall mask of the demanded bits in the entire graph. The minimum bitwidth
that graph can be truncated to is the bitwidth minus the number of leading
zeroes in the overall mask.

The intention is that this algorithm should "first do no harm", so it will
never insert extra cast instructions. This is why the use-def graphs are
unioned, so that subgraphs with different minimum bitwidths do not need casts
inserted between them.

This algorithm works hard to reduce compile time impact. DemandedBits are only
queried if there are extends of illegal types and if a truncate to an illegal
type is seen. In the general case, this results in a simple linear scan of the
instructions in the loop.

No non-noise compile time impact was seen on a clang bootstrap build.

Diff Detail

Repository
rL LLVM

Event Timeline

jmolloy updated this revision to Diff 36848.Oct 8 2015, 5:58 AM
jmolloy retitled this revision from to [LoopVectorize] Shrink integer operations into the smallest type possible.
jmolloy updated this object.
jmolloy added reviewers: sbaranga, hfinkel, gberry, anemet.
jmolloy set the repository for this revision to rL LLVM.
jmolloy added a subscriber: llvm-commits.
sbaranga edited edge metadata.Oct 8 2015, 7:47 AM

Hi James,

I generally really like the idea of using this DemandedBits analysis here. Otherwise, I have a few comments (inline).

Do you see any changes to lnt/spec performance with this patch?

Thanks,
Silviu

lib/Analysis/VectorUtils.cpp
441

Returning possibly large structs is not ideal. Would it be better for this function to take a reference to a map instead?

549

Should be sizeof(LeaderDemandedBits) instead of 64.

lib/Transforms/Vectorize/LoopVectorize.cpp
3146

Is there any way of doing this without leaving the ext/trunc pairs?

Maybe instead of generating a trunc look if we're doing it for an ext and if so use the ext operand instead.

3797

Stray whitespace change?

5316–5318

I think you wouldn't get the correct cost here (for ICmp)?

Hi Silviu,

Thanks for the comments, I'll address them all tomorrow. I don't have LNT or spec numbers right now, but I can get them for you tomorrow.

Cheers,

James

lib/Analysis/VectorUtils.cpp
441

Ah, but we're in C++11 now! I did have to check on google that returning a class will invoke the move constructor... apparently so!

549

Agreed. I will make this change.

lib/Transforms/Vectorize/LoopVectorize.cpp
3146

Yeah, I suppose that would be neater than relying on InstCombine. I'll take a look at this.

3797

Woops, will fix.

5316–5318

Ouch, you're right. I'll fix that.

jmolloy updated this revision to Diff 37083.Oct 12 2015, 1:40 AM
jmolloy edited edge metadata.

Hi Silviu,

Thanks for the review. Updated diff attached.

Cheers,

James

Thanks, James! I only have one more comment.

-Silviu

lib/Transforms/Vectorize/LoopVectorize.cpp
3146

It looks like this solution will leave us some zext instructions with no users. I think we can clean these up as well.

jmolloy updated this revision to Diff 37093.Oct 12 2015, 3:59 AM

Thanks Silviu - that's a simple enough change. Done.

sbaranga accepted this revision.Oct 12 2015, 4:17 AM
sbaranga edited edge metadata.

LGTM!

Regarding performance numbers: as long as we don't have regressions I think it's ok (having regressions with this change would have been suspicious).

Thanks,
Silviu

This revision is now accepted and ready to land.Oct 12 2015, 4:17 AM
jmolloy closed this revision.Oct 12 2015, 5:59 AM

Thanks Silviu - r250032.