Page MenuHomePhabricator

[AArch64][SVE] Add initial backend support for FP splat_vector
ClosedPublic

Authored by cameron.mcinally on Feb 14 2020, 10:57 AM.

Details

Summary

This issue was found in D73711.

We'll also need separate changes to CodeGen wrt splat_vectors. There are some incorrect uses of scalable build_vectors (int and fp) scattered there. This is obvious when an assert to check for scalable build_vectors is added to DAG.getNode(...).

Notice the TODO in the f16 immediate splat tests. There seems to be a missing lowering for f16 constants. Instead of becoming ConstantFPs, the f16 constants are being explicitly loaded from the constant pool. Does anyone know where the special f16 constant lowerings live? I don't have a lot of experience with those.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2020, 10:57 AM

Fix bad copy-and-paste.

There seems to be a missing lowering for f16 constants. Instead of becoming ConstantFPs, the f16 constants are being explicitly loaded from the constant pool

See AArch64TargetLowering::isFPImmLegal ? We should generate an appropriate fmov with the right target features. I don't think "-mattr=+sve" implies those features, though.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
300

I'm not adding a pseudo-instruction is worthwhile if the only benefit is avoiding an INSERT_SUBREG.

313

What do we end up generating for a non-zero float immediate? We might need a pattern to avoid an extra mov in the general case.

In theory, we can generate other float immediates using the integer dup/dupm, but I guess most of them won't be useful for 32-bit or 64-bit floats. Some probably are, though; for example, you can generate 1.0 with dupm.

There seems to be a missing lowering for f16 constants. Instead of becoming ConstantFPs, the f16 constants are being explicitly loaded from the constant pool

See AArch64TargetLowering::isFPImmLegal ? We should generate an appropriate fmov with the right target features. I don't think "-mattr=+sve" implies those features, though.

Thanks, I'll take a look.

I also just noticed that the test file I added was renamed upstream to sve-vector-splat.ll. I'll move these new tests to that file.

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
300

Good point. Will update it.

I'm not the original author (cherry-picked from D71712), so maybe I'm missing something though...

313

I'm fairly certain that support exists in D71712. I didn't include any of the shufflevector tests or patterns in this Diff though.

My intention was to cherry-pick something small for easy reviewing. If you'd like to see that support included in this patch, I'll add it.

efriedma added inline comments.Feb 18 2020, 10:53 AM
llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td
313

Oh, that's fine, it doesn't need to be in the same patch. Just wanted to make sure you were considering it.

Update with @efriedma suggestions.

cameron.mcinally marked 5 inline comments as done.Feb 18 2020, 12:07 PM
efriedma added inline comments.Feb 18 2020, 1:08 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
878

I think setOperationAction(ISD::ConstantFP, MVT::f16, Legal) is going to cause a fatal error for some FP constants. Not that they would be impossible to lower appropriately, but I don't think we have the necessary patterns.

cameron.mcinally marked an inline comment as done.Feb 18 2020, 1:53 PM
cameron.mcinally added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
878

Ah, good call. That was also from D71712, but yeah, there are probably missing f16 patterns.

Are we're ok with not folding the f16 constants for now? Or I could wait on this patch too. Any preference?

efriedma added inline comments.Feb 18 2020, 2:30 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
878

I'm okay if we miss the optimization for now...

That said, I think making +sve imply +fullfp16 solves the immediate problem you're running into, where constantfp 0.0 is getting lowered to a constant pool.

Don't mark f16 as a legal ConstantFP for SVE.

That said, I think making +sve imply +fullfp16 solves the immediate problem you're running into, where constantfp 0.0 is getting lowered to a constant pool.

I don't have a lot of intuition built up around fp16, so I'm hesitant to change it. I will open another Diff linking the two feature flags though, so we can start a discussion.

efriedma accepted this revision.Feb 18 2020, 3:04 PM

LGTM, assuming the regression test still passes.

This revision is now accepted and ready to land.Feb 18 2020, 3:04 PM
This revision was automatically updated to reflect the committed changes.