This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Properly handle partial writes in isConvertibleToVMV_V_V.
ClosedPublic

Authored by craig.topper on Jun 21 2023, 6:33 PM.

Details

Summary

We were only checking for the previous insructions to write exactly
the register or a super register. We ignored writes to a subregister
and continued searching for the producing instruction. We need to
abort instead.

There's another check inside the if body to abort if the registers
don't match exactly. So we just need to check for overlap so we
enter the if body.

The test is new and has not been commited yet.

Diff Detail

Event Timeline

craig.topper created this revision.Jun 21 2023, 6:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2023, 6:33 PM
craig.topper requested review of this revision.Jun 21 2023, 6:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2023, 6:33 PM

@reames FYI, I think this entire transformation may be conflating tail agnostic and tail undefined.

@reames FYI, I think this entire transformation may be conflating tail agnostic and tail undefined.

LGTM. But I am curious what does the "tail undefined" mean?

@reames FYI, I think this entire transformation may be conflating tail agnostic and tail undefined.

LGTM. But I am curious what does the "tail undefined" mean?

Philip tried to document that here https://reviews.llvm.org/D152937

The vector spec defines tail agnostic as being all ones or undisturbed (and an extra case for instructions that write masks).

If the user did something like

//vsetvli vlmax
vmv.v.i v8, v8, -1
//vsetvli less than vlmax with tail agnostic.
vadd.vi v8, v16, 1

They could expect the tail elements of vadd.vi to be all ones regardless of how hardware treats tail agnostic. The tail was preloaded with all ones. Tail agnostic for the vadd must either keep them undisturbed or replace them will all ones. I don't know if anyone would really do this. The spec discourages using the tail, but it defined how it behaves so a user could still depend on it.

Tail undefined is the concept that the upper bits can really be any value.

This transform uses the VL of a producing instruction to limit the VL of a copy. This means we don't copy the tail elements of the producing operation. So they're undefined. All we checked was the policy was "tail agnostic". We don't know if the user expected all ones.

For autovectorized code we never create anything where we expect the all ones value.

The C RVV intrinsic interface is way less clear on this. Any intrinsic that doesn't have a passthru operand, is "tail undefined" because the user has no control of register allocation. For intrinsics that "tail agnostic" but have an operand that is tied to the dest like maskedoff for masked instruction or the one of the sources for fma instructions, the user does have control of what register the tail comes from so could depend on the behavior.

I think we should document in the intrinsics API that agnostic really means undefined.

fakepaper56 accepted this revision.Jun 25 2023, 4:48 PM

Thank you for the elaboration.

This revision is now accepted and ready to land.Jun 25 2023, 4:48 PM
This revision was landed with ongoing or failed builds.Jun 25 2023, 11:09 PM
This revision was automatically updated to reflect the committed changes.