This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Update lowerFROUND to use masked instructions.
ClosedPublic

Authored by craig.topper on Jul 27 2022, 12:51 PM.

Details

Summary

This avoids a vmerge at the end and avoids spurious fflags updates.
This isn't used for constrained intrinsic so we technically don't have
to worry about fflags, but it doesn't cost much to support it.

To support I've extend our FCOPYSIGN_VL node to support a passthru
operand. Similar to what was done for VRGATHER*_VL nodes.

I plan to do a similar update for trunc, floor, and ceil.

Diff Detail

Event Timeline

craig.topper created this revision.Jul 27 2022, 12:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2022, 12:51 PM
craig.topper requested review of this revision.Jul 27 2022, 12:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2022, 12:51 PM

The copysign change LGTM, please separate that and land it. Once done, rebase so the remainder is easier to read.

I'd encourage you land the mild style changes too - e.g. Src, etc.. - but that's optional.

Rebase after changing FCOPYSIGN_VL

Rebase after doing a NFC reordering of the existing code.

This revision is now accepted and ready to land.Jul 28 2022, 12:27 AM
reames accepted this revision.Jul 28 2022, 8:21 AM

LGTM to me too.

Side note - this seems reasonably likely to be profitable for this specific case since all of the instructions are very likely to be scheduled together, but have you thought about the general applicability of a transform like this? Having the mask register class be so constrained could make this unprofitable if the instructions were intermixed with other computation using a different mask.

The "general transform" I'm noting here is that we have select mask, f(x), x -> f(x, mask). This is principle applies to any code sequence, and is not at all specific to floating point.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
1914

Why is the explicit splat here needed? getConstantFP seems to do generate a splat internally, and the old code relied on that behavior. Why change it to create the scalar constant (explicitly) and then splat (explicitly)?

1928

Same question as above.

LGTM to me too.

Side note - this seems reasonably likely to be profitable for this specific case since all of the instructions are very likely to be scheduled together, but have you thought about the general applicability of a transform like this? Having the mask register class be so constrained could make this unprofitable if the instructions were intermixed with other computation using a different mask.

The "general transform" I'm noting here is that we have select mask, f(x), x -> f(x, mask). This is principle applies to any code sequence, and is not at all specific to floating point.

The beginning of a general patch for that was posted https://reviews.llvm.org/D130442 the first version folds tail undisturbed vmerge.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
1914

The old code used getConstantFP with either a fixed vector type or scalable vector type. For fixed vector it would create a build_vector that would later be converted by lowerBUILD_VECTOR to VFMV_V_F_VL with a VL based on the fixed vector type. For scalable vector getConstantFP would create a SPLAT_VECTOR that would be treated as a VLMAX vfmv.v.f during isel.

Since we are converting fixed length vectors to scalable in order to use _VL nodes, we need to match how BUILD_VECTOR would be converted. Using getConstantFP with the ContainerVT would create a VLMax splat instead of using the VL from the fixed vector we started with. It would still work since the .vx pattern match in isel ignores the VL on the splat, but if it wasn't folded to a .vx instruction it would create a vsetvli toggle.

This revision was landed with ongoing or failed builds.Jul 28 2022, 10:05 AM
This revision was automatically updated to reflect the committed changes.
reames added inline comments.Jul 28 2022, 10:59 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
1914

Ah yes, I've stumbled into this issue before.

I'm finding myself wondering if we need a version of SPLAT_VECTOR which takes an explicit VL, and represents a splat over those lanes only. On the other hand, the overlap between BUILD_VECTOR and SPLAT_VECTOR is already confusing enough. This is probably worthy of some offline discussion; we're not going to magically resolve this here.