Page MenuHomePhabricator

[RISCV] Add support for vector saturating add/sub operations
ClosedPublic

Authored by frasercrmck on Jul 23 2021, 4:29 AM.

Details

Summary

This patch adds support for lowering the saturating vector add/sub
intrinsics to RVV instructions, for both fixed-length and
scalable-vector forms alike.

Note that some of the DAG combines are still not triggering for the
scalable-vector tests. These require a bit more work in the DAGCombiner
itself.

Diff Detail

Event Timeline

frasercrmck created this revision.Jul 23 2021, 4:29 AM
frasercrmck requested review of this revision.Jul 23 2021, 4:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 23 2021, 4:29 AM

These instructions can set the vxsat bit. How does that interact with other code around that might want to read the vxsat bit?

These instructions can set the vxsat bit. How does that interact with other code around that might want to read the vxsat bit?

It's a good question. The vxsat flag can only be read via a CSR, as far as I can tell.

I don't believe we expose access to CSRs via intrinsics. What guarantees do we give to software using inline asm that reads CSRs?

It appears we're modeling the VXSAT instruction defs and uses, though we're perhaps not modeling the VCSR super register correctly.

On a related note, given these instructions write VXSAT we'll probably see some write-after-write scheduling dependencies which I suspect would hamper performance. It's basically a write-only sticky flag so it might be legal to loosen those order dependencies somewhat.

These instructions can set the vxsat bit. How does that interact with other code around that might want to read the vxsat bit?

It's a good question. The vxsat flag can only be read via a CSR, as far as I can tell.

I don't believe we expose access to CSRs via intrinsics. What guarantees do we give to software using inline asm that reads CSRs?

It appears we're modeling the VXSAT instruction defs and uses, though we're perhaps not modeling the VCSR super register correctly.

On a related note, given these instructions write VXSAT we'll probably see some write-after-write scheduling dependencies which I suspect would hamper performance. It's basically a write-only sticky flag so it might be legal to loosen those order dependencies somewhat.

There was a proposal to add intrinsics wrapping inline assembly in https://github.com/riscv/rvv-intrinsic-doc/issues/36 Not sure why a decision hasn't been made on that issue. It looks like the P extension is also proposing to use the vxsat csr.

I don't believe we expose access to CSRs via intrinsics.

__builtin_readcyclecounter does already

Looks like ARM also has a saturating flag but haven't implemented the defined intrinsics for it due to complexities with needing to know when the bit is accessed. See https://reviews.llvm.org/D32282

craig.topper accepted this revision.Jul 26 2021, 12:05 PM

LGTM. We can cross the vxsat bridge when we get there.

This revision is now accepted and ready to land.Jul 26 2021, 12:05 PM

Thanks for the input. I suspect that any solution that involves builtins/intrinsics to read flags without explicit use/def chains is inherently "lossy" and the programmer can't guarantee that the compiler won't re-order or remove instructions. I'd imagine that we'd recommend that the programmer performs the saturation checks themselves in C.

If there were builtins like __builtin_add_overflow but for saturation and we wanted to use the saturation instructions then I think we'd have to clear the CSR before the operation.

This revision was landed with ongoing or failed builds.Jul 27 2021, 2:13 AM
This revision was automatically updated to reflect the committed changes.