This is an archive of the discontinued LLVM Phabricator instance.

[RISCV][CodeGen] Add codegen pattern for FLI instruction in experimental zfa extension
ClosedPublic

Authored by joshua-arch1 on Jan 11 2023, 5:49 PM.

Details

Summary

This patch implements experimental support for the RISCV Zfa extension as specified here: https://github.com/riscv/riscv-isa-manual/releases/download/draft-20221119-5234c63/riscv-spec.pdf, Ch. 25. This extension has not been ratified. Once ratified, it'll move out of experimental status.

This change adds codegen support for load-immediate instructions (fli.s/fli.d/fli.h).

Diff Detail

Event Timeline

joshua-arch1 created this revision.Jan 11 2023, 5:49 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 11 2023, 5:49 PM
joshua-arch1 requested review of this revision.Jan 11 2023, 5:49 PM
joshua-arch1 edited the summary of this revision. (Show Details)Jan 31 2023, 12:43 AM
joshua-arch1 edited the summary of this revision. (Show Details)Jan 31 2023, 9:54 PM
joshua-arch1 edited the summary of this revision. (Show Details)

Since the assmebly support patch for Zfa instructions except FLI has been accepted, could anyone please review this codegen patch?

craig.topper added inline comments.Feb 12 2023, 3:24 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
1535–1548

I think we can't reach this line when Zfa is enabled now? getLoadFP32Imm/getLoadFP64Imm/getLoadFP16Imm return -1 for 0.0 and -0.0 right?

2067

FCVTMOD does not saturate. It returns the lower bits of the overflowed value. These instruction cannot be used to lower saturating conversion.

3935

Does this need to be qualified with Zfa?

8310

Does this need to be qualified with Zfa?

llvm/lib/Target/RISCV/RISCVISelLowering.h
114 ↗(On Diff #493508)

This description is incorrect. The FCVTMOD instructions do not saturate.

llvm/lib/Target/RISCV/RISCVInstrInfoZfa.td
136

This name is misleading. There is no fmvh.x.w instruction. FMV_X_W_FPR64 would be better.

291

Why do you need a COPY_TO_REGCLASS?

308

What is giving these patterns priority over the ones in RISCVInstrInfoF.td?

llvm/test/CodeGen/RISCV/double-zfa.ll
108

This doesn't look like the minimum value for normal value for double. I would expect the mantissa bits to be 0.

So it should be 0x0010000000000000 I think?

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bitcast.ll
422 ↗(On Diff #493508)

Why does this test change? It doesn't use Zfa.

craig.topper added inline comments.Feb 12 2023, 7:30 PM
llvm/lib/Target/RISCV/RISCVInstrInfoZfa.td
308

I think this works because patterns with UsesCustomInserter=1 have lower priority than patterns that don't use CustomInserter.

craig.topper added inline comments.Feb 12 2023, 7:32 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
1536

This needs to be rebased.

llvm/lib/Target/RISCV/RISCVInstrInfoZfa.td
277

This isn't a "bitcast". It's an encoding conversion.

fpimm is now handled with custom code in RISCVISelDAGToDAG.cpp so this needs to be moved there.

Since the assmebly support patch for Zfa instructions except FLI has been accepted, could anyone please review this codegen patch?

I'd encourage you to land the underlying approved review. I can't speak for others, but I certainty prioritize review for changes which have fewer outstanding blocking issues. You could also split this into an FLI and non-FLI part with the same reasoning.

Since the assmebly support patch for Zfa instructions except FLI has been accepted, could anyone please review this codegen patch?

I'd encourage you to land the underlying approved review. I can't speak for others, but I certainty prioritize review for changes which have fewer outstanding blocking issues. You could also split this into an FLI and non-FLI part with the same reasoning.

Different from assembly support, I think there is nothing special with the codegen pattern for FLI instruction. It's unnecessory to split the codegen patch into an FLI and non-FLI part.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2067

So what's the difference between FCVTMOD and normal FP_TO_INT? Is NaN converted to zero in normal FP_TO_INT?

2067

What instruction sequence does FCVTMOD need to replace? I'm a bit confused.

llvm/test/CodeGen/RISCV/double-zfa.ll
108

It is IR syntax. I have written a simple case to confirm that this value is used to express 0x0010000000000000 in IR level.

Since the assmebly support patch for Zfa instructions except FLI has been accepted, could anyone please review this codegen patch?

I'd encourage you to land the underlying approved review. I can't speak for others, but I certainty prioritize review for changes which have fewer outstanding blocking issues. You could also split this into an FLI and non-FLI part with the same reasoning.

Different from assembly support, I think there is nothing special with the codegen pattern for FLI instruction. It's unnecessory to split the codegen patch into an FLI and non-FLI part.

The suggestion to split is to unblock parts of the patch. We would rather have small pieces make progress than stalling a whole patch when one part of it that can be isolated needs additional work. Smaller patches are much easier to review and re-review.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2067

FCVTMOD follows the semantics of JavaScript. It’s for being able writing JavaScript JITs for RISC-V. It’s not useful for C or Rust.

llvm/test/CodeGen/RISCV/double-zfa.ll
108

It’s the IR syntax for double just the hex representation of double.

Since the assmebly support patch for Zfa instructions except FLI has been accepted, could anyone please review this codegen patch?

I'd encourage you to land the underlying approved review. I can't speak for others, but I certainty prioritize review for changes which have fewer outstanding blocking issues. You could also split this into an FLI and non-FLI part with the same reasoning.

Different from assembly support, I think there is nothing special with the codegen pattern for FLI instruction. It's unnecessory to split the codegen patch into an FLI and non-FLI part.

The suggestion to split is to unblock parts of the patch. We would rather have small pieces make progress than stalling a whole patch when one part of it that can be isolated needs additional work. Smaller patches are much easier to review and re-review.

I see. I will split these patch into three small ones. One for FLI, one for FCVMOD and another for other instructions.

Codegen patch for instructions except FLI and FCVTMOD has been commited. https://reviews.llvm.org/D143982

Codegen patch for instructions except FLI and FCVTMOD has been commited. https://reviews.llvm.org/D143982

Important English point here. The patch has been *posted* not *committed*. Saying it's committed means that it was landed to the git repository. That appears to not be what you meant, and was actively confusing in this case.

joshua-arch1 retitled this revision from [RISCV][CodeGen] Add codegen pattern for experimental zfa extension to [RISCV][CodeGen] Add codegen pattern for FLI instruction in experimental zfa extension.Feb 16 2023, 4:39 AM
joshua-arch1 edited the summary of this revision. (Show Details)
craig.topper added inline comments.Feb 16 2023, 10:22 AM
llvm/test/CodeGen/RISCV/float-zfa.ll
123

Why 255.0? Is this a negative test?

joshua-arch1 marked 9 inline comments as done.Feb 19 2023, 5:33 PM

Ping.

llvm/test/CodeGen/RISCV/float-zfa.ll
123

Yep. I just want to ensure load floating-point immediates of other values will not generate FLI.

This revision is now accepted and ready to land.Feb 19 2023, 6:37 PM

LGTM

This CodeGen patch depends on some functions in https://reviews.llvm.org/D140460.

This comment was removed by craig.topper.
This revision was landed with ongoing or failed builds.Mar 6 2023, 10:27 PM
This revision was automatically updated to reflect the committed changes.
craig.topper added inline comments.Mar 8 2023, 9:50 PM
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
849

Why was this isPosZero check added? It wasn't there when the patch was approved.

joshua-arch1 added a comment.EditedMar 15 2023, 6:48 PM

Anyone knows how to generate FLI from C-code? If I compile the following program, I cannot get FlI. ConstantFP will be converted to Constant in DAG.

void foo_double64 ()
{
  volatile double a;
  a = 0.0625;
}
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
849

Just in order to make sure loading +0.0 will not use fli.s.

craig.topper added a comment.EditedMar 15 2023, 7:09 PM

Anyone knows how to generate FLI from C-code? If I compile the following program, I cannot get FlI. ConstantFP will be converted to Constant in DAG.

void foo_double64 ()
{
  volatile double a;
  a = 0.0625;
}

You just need to use it to do some floating point arithmetic

void foo_double64 (double x)
{
  volatile double a;
  a = x + 0.0625;
}

or return a floating point value

double foo_double64 ()
{
  return 0.0625;
}
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
849

But why treat fli.s different than fli.h or fli.d?

Anyone knows how to generate FLI from C-code? If I compile the following program, I cannot get FlI. ConstantFP will be converted to Constant in DAG.

void foo_double64 ()
{
  volatile double a;
  a = 0.0625;
}

You just need to use it to do some floating point arithmetic

void foo_double64 (double x)
{
  volatile double a;
  a = x + 0.0625;
}

or return a floating point value

double foo_double64 ()
{
  return 0.0625;
}

Is that because we cannot directly store a ConstantFP in DAG?

Anyone knows how to generate FLI from C-code? If I compile the following program, I cannot get FlI. ConstantFP will be converted to Constant in DAG.

void foo_double64 ()
{
  volatile double a;
  a = 0.0625;
}

You just need to use it to do some floating point arithmetic

void foo_double64 (double x)
{
  volatile double a;
  a = x + 0.0625;
}

or return a floating point value

double foo_double64 ()
{
  return 0.0625;
}

Is that because we cannot directly store a ConstantFP in DAG?

The change is done in DAGCombiner::replaceStoreOfFPConstant. I think the idea is that FP constants are usually harder to create in registers than integer constants. Also CPUs usually have more integer resources than FP resources. It doesn't look like it can be disabled currently. I think all of the constants FLI handles can be done in 1 or 2 integer instructions so I'm not very concerned about this. It was more than that I might be concerned.

This comment was removed by joshua-arch1.

Anyone knows how to generate FLI from C-code? If I compile the following program, I cannot get FlI. ConstantFP will be converted to Constant in DAG.

void foo_double64 ()
{
  volatile double a;
  a = 0.0625;
}

You just need to use it to do some floating point arithmetic

void foo_double64 (double x)
{
  volatile double a;
  a = x + 0.0625;
}

or return a floating point value

double foo_double64 ()
{
  return 0.0625;
}

Is that because we cannot directly store a ConstantFP in DAG?

The change is done in DAGCombiner::replaceStoreOfFPConstant. I think the idea is that FP constants are usually harder to create in registers than integer constants. Also CPUs usually have more integer resources than FP resources. It doesn't look like it can be disabled currently. I think all of the constants FLI handles can be done in 1 or 2 integer instructions so I'm not very concerned about this. It was more than that I might be concerned.

Will one FLI instruction have less cycles and perform better than two integer instructions?

Anyone knows how to generate FLI from C-code? If I compile the following program, I cannot get FlI. ConstantFP will be converted to Constant in DAG.

void foo_double64 ()
{
  volatile double a;
  a = 0.0625;
}

You just need to use it to do some floating point arithmetic

void foo_double64 (double x)
{
  volatile double a;
  a = x + 0.0625;
}

or return a floating point value

double foo_double64 ()
{
  return 0.0625;
}

Is that because we cannot directly store a ConstantFP in DAG?

The change is done in DAGCombiner::replaceStoreOfFPConstant. I think the idea is that FP constants are usually harder to create in registers than integer constants. Also CPUs usually have more integer resources than FP resources. It doesn't look like it can be disabled currently. I think all of the constants FLI handles can be done in 1 or 2 integer instructions so I'm not very concerned about this. It was more than that I might be concerned.

Will one FLI instruction have less cycles and perform better than two integer instructions?

Maybe, but it will probably be microarchitecture dependent.

Delaying a store by an extra cycle doesn't seem like a big deal. Other instructions don't directly depend on a store except later loads. The compiler would often be able to see the load is to the same location as the store and forward the data without the load.

If we're storing the same value in a loop we should be hoisting the 2 integer instructions out of the loop. So they wouldn't be executed many times.

Maybe, but it will probably be microarchitecture dependent.

Delaying a store by an extra cycle doesn't seem like a big deal. Other instructions don't directly depend on a store except later loads. The compiler would often be able to see the load is to the same location as the store and forward the data without the load.

If we're storing the same value in a loop we should be hoisting the 2 integer instructions out of the loop. So they wouldn't be executed many times.

Even if we disable replaceStoreOfFPConstant in DAGCombiner, FP constant will still be converted to integer in OptimizeFloatStore() when legalizing.

This comment was removed by joshua-arch1.
craig.topper added a comment.EditedMar 21 2023, 8:38 PM

Anyone knows how to generate FLI from C-code? If I compile the following program, I cannot get FlI. ConstantFP will be converted to Constant in DAG.

void foo_double64 ()
{
  volatile double a;
  a = 0.0625;
}

You just need to use it to do some floating point arithmetic

void foo_double64 (double x)
{
  volatile double a;
  a = x + 0.0625;
}

or return a floating point value

double foo_double64 ()
{
  return 0.0625;
}

Is that because we cannot directly store a ConstantFP in DAG?

The change is done in DAGCombiner::replaceStoreOfFPConstant. I think the idea is that FP constants are usually harder to create in registers than integer constants. Also CPUs usually have more integer resources than FP resources. It doesn't look like it can be disabled currently. I think all of the constants FLI handles can be done in 1 or 2 integer instructions so I'm not very concerned about this. It was more than that I might be concerned.

For rv64 in O0, replaceStoreOfFPConstant will not convert FP constant into integers. However, for rv32, FP constant will be changed to integers both in O0 and O2. Is there any special consideration?

I can see that it won't ever do it for f32 on rv64 because it doesn't consider the possibility of using an i32 store when i32 isn't a legal type.

Beyond that I would need to see your test cases.