This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Don't use zero-stride vector load if there's no optimized u-arch
ClosedPublic

Authored by pcwang-thead on Nov 9 2022, 1:42 AM.

Details

Summary

For vector strided instructions, as the RVV spec says:

When rs2=x0, then an implementation is allowed, but not required, to
perform fewer memory operations than the number of active elements, and
may perform different numbers of memory operations across different
dynamic executions of the same static instruction.

So compiler shouldn't assume that fewer memory operations will be
performed when rs2=x0.

We add a target feature to specify whether u-arch supports optimized
zero-stride vector load. And we do vector splat optimization iff this
feature is supported.

This feature is enabled by default since most designs implement this
optimization.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptNov 9 2022, 1:42 AM
pcwang-thead requested review of this revision.Nov 9 2022, 1:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 9 2022, 1:42 AM
reames requested changes to this revision.Nov 10 2022, 9:38 AM
reames added a subscriber: reames.

I have no problem with adding a feature to disable this optimization, but there's two problems with this patch.

First, a bunch of existing CPUs do support this optimization. Your change doesn't update any of them, so this is strictly a regression.

Second, most designs I'm aware of do implement this optimization. As such, I really think this patch gets the default wrong. We should allow disabling optimizations for code targeting designs which don't optimize this case, but if we're targeting a generic riscv64 vector core, assuming this optimization should be the default.

Also, are you are of a specific design which doesn't optimize this case? If not, the complexity doesn't seem worthwhile.

This revision now requires changes to proceed.Nov 10 2022, 9:38 AM

Enbale this feature by default.

I have no problem with adding a feature to disable this optimization, but there's two problems with this patch.

First, a bunch of existing CPUs do support this optimization. Your change doesn't update any of them, so this is strictly a regression.

Second, most designs I'm aware of do implement this optimization. As such, I really think this patch gets the default wrong. We should allow disabling optimizations for code targeting designs which don't optimize this case, but if we're targeting a generic riscv64 vector core, assuming this optimization should be the default.

Thanks! I do agree with you and I have made it default now.

Also, are you are of a specific design which doesn't optimize this case? If not, the complexity doesn't seem worthwhile.

Yes, some of our taped-out low-end products didn't do this optimization.

craig.topper added inline comments.Nov 10 2022, 10:28 PM
llvm/lib/Target/RISCV/RISCV.td
455

FeatureNoOptimizedZeroStrideLoad -> TuneNoOptimizedZeroStrideLoad.

Rename FeatureNoOptimizedZeroStrideLoad to TuneNoOptimizedZeroStrideLoad

pcwang-thead marked an inline comment as done.Nov 10 2022, 11:20 PM
pcwang-thead edited the summary of this revision. (Show Details)Nov 10 2022, 11:41 PM
reames accepted this revision.Nov 11 2022, 9:16 AM

LGTM

If you're interested in optimizing for such a target, I'd suggest a follow up. We should probably be canonicalizing in the other direction (i.e. replace a zero stride load with a load and splat). A zero stride load can probably be matched during gather lowering.

This revision is now accepted and ready to land.Nov 11 2022, 9:16 AM

LGTM

If you're interested in optimizing for such a target, I'd suggest a follow up. We should probably be canonicalizing in the other direction (i.e. replace a zero stride load with a load and splat). A zero stride load can probably be matched during gather lowering.

Thanks! I will have a try.

This revision was landed with ongoing or failed builds.Nov 13 2022, 9:52 PM
This revision was automatically updated to reflect the committed changes.