This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Add RISCV-specific TargetTransformInfo
ClosedPublic

Authored by lenary on Jun 17 2019, 8:27 AM.

Details

Summary

LLVM Allows Targets to provide information that guides optimisations
made to LLVM IR. This is done with callbacks on a TargetTransformInfo object.

This patch adds a TargetTransformInfo class for RISC-V. This will allow us to
implement RISC-V specific callbacks as they become necessary.

This commit also adds the getIntImmCost callbacks, and tests them with a simple
constant hoisting test.

Previous review was on D63007

Diff Detail

Repository
rL LLVM

Event Timeline

lenary created this revision.Jun 17 2019, 8:27 AM

This is blocked on D62857

lenary updated this revision to Diff 205085.Jun 17 2019, 8:32 AM
lenary added a subscriber: MaskRay.
asb added a subscriber: ributzka.Jun 18 2019, 2:00 AM

Thanks for the patch! I note that test/CodeGen/RISCV/{imm-cse.ll,split-offsets.ll} have codegen changes with this. imm-cse.ll is a neutral change but split-offsets.ll seems to be a regression in instruction count that merits investigation.

With that addressed I _think_ this is fine. One aspect I'm struggling to determine one way or the other is whether the behaviour you implement here of returning TCC_FREE for getIntImmCost(const APInt &Imm) for a 12-bit value (which can be merged into an ADDI) is correct, or whether that case should only be TCC_FREE if going through getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, Type *Ty);. The PPC backend seems to handle it that way, while AArch64 handles it similar to how you do. From reading the getIntImmCost descriptions in TargetTransformInfo.h I probably would have assumed the PPC way was "most correct".

@ributzka - which behaviour did you intend when creating these APIs?

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
9 ↗(On Diff #205085)

"file a" is missing a word. Probably should be file contains a / file describes a / some other similar phrase

44 ↗(On Diff #205085)

Maybe I'm missing some C++ magic or this is a compatibility thing, but I don't think this line is needed?

luismarques added inline comments.Jun 18 2019, 2:26 AM
llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
29 ↗(On Diff #205085)

You can add an assert message. E.g. assert(foo && "some message").

lenary updated this revision to Diff 205835.Jun 20 2019, 9:11 AM

Address review feedback:

  • Return TCC_Free in many more unknown cases, to prevent constant hoisting
  • Only return TCC_Free for 12-bit immediates in arithmetic operations
  • Use RISCVMatInt::getIntMatCost to accurately estimate materialisation cost
  • Remove some useless lines and update comments and asserts.
lenary marked 3 inline comments as done.Jun 20 2019, 9:12 AM

If I recall correctly, then getIntImmCost(const APInt &Imm) predates the ConstantHoisting pass and is also used by other optimizations. The purpose of that hook is to calculate the materialization cost of a constant and is context free.

I added getIntImmCost(unsigned Opcode/Intrinsic::ID IID, unsigned Idx, const APInt &Imm, Type *Ty), because many instructions on AArch64 were able to fold constants and getIntImmCost(const APInt &Imm) would not provide accurate information to identify expensive constants.

asb added a comment.Jun 21 2019, 12:17 AM

Thanks for the update Sam. In the previous version of the patch, int getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, Type *Ty); just called int getIntImmCost(const APInt &Imm, Type *Ty), but this one adds some logic to the former function that makes an estimate based on RISC-V instruction set knowledge. Right now you're ignoring the Idx argument and assuming that all binops can take a 12-bit immediate. Did you consider adding similar logic to the PPC and AArch64 versions of this function, having the switch setting ImmIdx appropriately and then using that to determine the cost? It looks like it wouldn't be much more complex or lengthy than what you have no to have a small switch(...) for the opcodes, and would have the advantage that getIntImmCost would be "done".

One difficulty I have with this code is I don't fully understand why other backend implementations are written as they are:

  • Defaulting to TCC_FREE seems at first look needlessly pessimistic when it's not that hard to enumerate the operations which have native instructions with immediates. Though perhaps the thinking is that some of the more complex opcodes would be decomposed to simpler native instructions that have immediates, and so defaulting to constant hoisting would be a regression
  • It's unclear why AArch64's implementation assumes mul/div/rem can take an immediate argument, given afaik the native instructions are all reg-reg.
In D63433#1553128, @asb wrote:

Thanks for the update Sam. In the previous version of the patch, int getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, Type *Ty); just called int getIntImmCost(const APInt &Imm, Type *Ty), but this one adds some logic to the former function that makes an estimate based on RISC-V instruction set knowledge. Right now you're ignoring the Idx argument and assuming that all binops can take a 12-bit immediate. Did you consider adding similar logic to the PPC and AArch64 versions of this function, having the switch setting ImmIdx appropriately and then using that to determine the cost? It looks like it wouldn't be much more complex or lengthy than what you have no to have a small switch(...) for the opcodes, and would have the advantage that getIntImmCost would be "done".

The first version of this patch took an optimistic view. It assumed that all IR operations could take a zero immediate (via the zero register) or a 12-bit immediate, and both of these were essentially "free". All other immediates were as expensive as materialising them would be (in terms of selectiondag nodes).

I am now taking a more conservative view. While I still assume that all operations can take a zero immediate (via the zero register) for free, I now assume all other immediate values are as expensive as materialising the integer, unless I have more information (like isAddLike).

I hoped this approach would work, but it turns out this hoists a lot of small constants (like i64 1). This doesn't sound like an issue, but it causes regressions in lots of the calling convention tests on rv32. I'm going to disable constant hoisting in the cases we don't know about.

I also had to switch off hoisting entirely for intrinsics because constant hoisting will gladly hoist an "expensive" argument, even if that intrinsic requires that argument is an immediate (via ImmArg). I am working on a separate patch for ConstantHoisting to prevent this issue.

One difficulty I have with this code is I don't fully understand why other backend implementations are written as they are:

  • Defaulting to TCC_FREE seems at first look needlessly pessimistic when it's not that hard to enumerate the operations which have native instructions with immediates. Though perhaps the thinking is that some of the more complex opcodes would be decomposed to simpler native instructions that have immediates, and so defaulting to constant hoisting would be a regression

I'm happy to introduce this switch block, including examining argument indexes to ensure they match the native instructions with immediates. This would indeed prevent hoisting where a complex operation might be lowered into simpler native instructions, but given the status before this patch is applied is that no hoisting is done at all, the patch cannot be a regression.

  • It's unclear why AArch64's implementation assumes mul/div/rem can take an immediate argument, given afaik the native instructions are all reg-reg.

I'm not clear on this either. I can look at the method history.

lenary updated this revision to Diff 205981.Jun 21 2019, 6:10 AM

Address Review Comments:

  • Make getIntMatCost significantly more conservative, to avoid regressions.
  • Make getIntMatCost prevent hoisting for all intrinsics to avoid a bug.
asb accepted this revision.Jun 21 2019, 6:25 AM

LGTM, thanks!

This revision is now accepted and ready to land.Jun 21 2019, 6:25 AM
This revision was automatically updated to reflect the committed changes.