Page MenuHomePhabricator

[RISCV] Don't use tail agnostic policy on instructions where destination is tied to source
ClosedPublic

Authored by craig.topper on Dec 28 2020, 4:45 PM.

Details

Summary

If the destination is tied, then user has some control of the
register used for input. They would have the ability to control
the value of any tail elements. By using tail agnostic we take
this option away from them.

Its not clear that the intrinsics are defined such that this isn't
supposed to work. And undisturbed is a valid implementation for agnostic
so code wouldn't even fail to work on all systems if we always used
agnostic.

The vcompress intrinsic is defined to require tail undisturbed so
at minimum we need this for that instruction or need to redefine
the intrinsic.

I've made an exception here for vmv.s.x/fmv.s.f and reduction
instructions which only write to element 0 regardless of the tail
policy. This allows us to keep the agnostic policy on those which
should allow better redundant vsetvli removal.

An enhancement would be to check for undef input and keep the
agnostic policy, but we don't have good test coverage for that yet.

Diff Detail

Event Timeline

craig.topper created this revision.Dec 28 2020, 4:45 PM
craig.topper requested review of this revision.Dec 28 2020, 4:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 28 2020, 4:45 PM
khchen added inline comments.Dec 28 2020, 8:30 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2151

If the tail policy is ignored on WritesElement0 instructions, why not set tail policy as tu?
Is it because spec said If anything, the default should be tail-agnostic/mask-agnostic, so software has to specify when it cares about the non-participating elements?

If we set anything as tu except undefined maskedoff, could it also make vsetvli removal easier.

craig.topper added inline comments.Dec 28 2020, 8:44 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2151

tu is bad for performance on a core implementing out of order or super scalar execution of vector instructions. It prevents an instruction from executing until the last instruction that wrote its destination register finishes executing even if they aren't related instructions. So I think it will be important for performance.

I left ta on the WriteElement0 instructions since that's what we use on vsetvli intrinsics so I was hoping to eliminate unnecessary vtype changes assuming ta is more common. But its admittedly a simple policy and we might need something smarter in the vsetvli removal pass to detect unneeded changes of the tail policy for instructions that don't care.

khchen accepted this revision.Dec 28 2020, 10:54 PM

thanks for fixup, LGTM.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2151

Got it, assuming ta is more common make sense to me.

This revision is now accepted and ready to land.Dec 28 2020, 10:54 PM