Details
Diff Detail
Event Timeline
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1131 | I think if this is added we need to add extra lowering for FP16 constants of any kind, and it probably shouldn't be guarded by hasSVE but done in general. Any i16 constant can be materialized with a single mov, so the worst case is that plus a gpr->fpr, which is likely better than the load we currently produce. |
- Updated title and summary
- Made f16 immediates legal for hasFullFP16 rather than just SVE
- Added f16 fpimm patterns
- Updated tests
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1131 | Hi Dave, thanks for having a look. I've had a go at doing this - let me know if it looks alright. |
Yeah seems OK as far as I can tell.
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
1629 | COPY's are quite loose in their definition, but it is probably best to keep them copying between values of the same size. I think this can use FMOVWHr directly, so long as the instruction has already been defined. |
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
1629 | Is HasFullFP16 relevant? The lowering requires no special instructions so I'd even argue the setOperationAction call requires no protection. What's important is whether the f16 type is legal. Once we get to operation lowering and isel we only need to protect instructions not types. We made this mistake for bf16 for SVE and it just lead to unnecessary complexity which was eventually removed. |
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
1629 | I've checked and FMOVWHr does require HasFullFP16 so the question becomes, should we allow good code generation even for the case when HasFullFP16 is not available? Or is HasFullFP16 so universally available these days for us not to care? |
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
1629 | FMOVWHr requires FullFP16. It could use FMOVWSr instead, but then the types are a little awkward. The call to setOperationAction is currently inside a HasFullFP16 block, so It could be moved out and we could always be using generic instructions. If we do go that route then make sure that we have tests for it. |
- Use FMOVWHr instead of COPY_TO_REGCLASS
- Add let Predicates = [HasFullFP16]
- Removed code to copy from GPR32 to FPR16
- Updated a couple of recently added tests
LGTM. Thanks.
I've checked and FMOVWHr does require HasFullFP16 so the question becomes, should we allow good code generation even for the case when HasFullFP16 is not available? Or is HasFullFP16 so universally available these days for us not to care?
I think it's fine for this patch. It's a nice general improvement for when fp16 is available, and fixes the original issue of recognizing scalable splat constants.
I think if this is added we need to add extra lowering for FP16 constants of any kind, and it probably shouldn't be guarded by hasSVE but done in general.
Any i16 constant can be materialized with a single mov, so the worst case is that plus a gpr->fpr, which is likely better than the load we currently produce.
https://godbolt.org/z/r59arn5bq