Page MenuHomePhabricator

[RISCV] Prevent re-ordering some adds after shifts

Authored by lenary on Jun 4 2019, 7:05 AM.



DAGCombine will normally turn a (shl (add x, c1), c2) into (add (shl x, c2), c1 << c2), where c1 and c2 are constants. This can be prevented by a callback in TargetLowering.

On RISC-V, materialising the constant c1 << c2 can be more expensive than materialising c1, because materialising the former may take more instructions, and may use a register, where materialising the latter would not.

This patch implements the hook in RISCVTargetLowering to prevent this transform, in the cases where:

  • c1 fits into the immediate field in an addi instruction.
  • c1 takes fewer instructions to materialise than c1 << c2.

In future, DAGCombine could do the check to see whether c1 fits into an add immediate, which might simplify more targets hooks than just RISC-V.

Diff Detail


Event Timeline

lenary created this revision.Jun 4 2019, 7:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 4 2019, 7:05 AM
lenary updated this revision to Diff 202939.Jun 4 2019, 7:40 AM
  • Add commets about larger constants. These can be improved at a later date
Jim added a subscriber: Jim.Jun 4 2019, 7:32 PM
lenary updated this revision to Diff 203171.Jun 5 2019, 8:55 AM
  • Generalise optimisation to check materialisation cost
asb added inline comments.Jun 5 2019, 9:06 AM
860 ↗(On Diff #203171)

I was actually thinking it might be better to define getIntImmCost for RISC-V (which at least initially uses generateInstSeq, even if that might result in a little wasted work), then call that from here (that change might affect codegen in other areas, but should be an improvement). Arguably the introduction of getIntImmCost could make sense as a separate patch (that this one depends on), if a sensible standalone test case is straight forward.

lenary updated this revision to Diff 203312.Jun 6 2019, 2:34 AM
  • Abstract away calculation of Materialisation Cost
lewis-revill added inline comments.
869 ↗(On Diff #203171)

Isn't there an inaccuracy in this method of checking the materialization cost since RISCVMatInt::generateInstSeq always calculates the cost of materializing into a register? In this case we have instructions which might use immediates, but this will calculate the cost as being the same as a single-instruction materialization into a register followed by an instruction using a register.


addi rd, rs1, C

would appear to be the same cost as:

lui rs2, (C >> 12)
add rd, rs1, rs2
lenary updated this revision to Diff 203314.Jun 6 2019, 2:41 AM
  • Remove out-of-date comments
asb added inline comments.Jun 6 2019, 3:00 AM
78 ↗(On Diff #203314)

Should add a comment to document what this does, and to document that it really does calculate the cost of materialising an integer (i.e. doesn't take into account whether there might be an opportunity for merging it into an addi). You should also document that it is invalid to call this for a Val which can't be represented with 32Bits when Is64Bit is false (that triggers an assert in generateInstSeq - I think it's probably still ok to treat that as an API misuse rather than adding more explicit error handling).

lenary edited the summary of this revision. (Show Details)Jun 6 2019, 3:35 AM
lenary retitled this revision from [RISCV] Prevent hoisting some adds after shifts to [RISCV] Prevent re-ordering some adds after shifts.Jun 6 2019, 3:48 AM
lenary marked 3 inline comments as done.Jun 6 2019, 4:06 AM
lenary added inline comments.
869 ↗(On Diff #203171)

On line 860, we check that C1 (which you're calling C) will fit into an add immediate. If it will, that counts as "free", and we definitely want to prevent the re-ordering, and we don't even check the materialisation cost.

In fact, I'm going to update the patch to always allow the re-ordering if C1 << C2 will fit into an add immediate, because then we also know that materialisation of that constant is "free", and so we should allow the re-ordering because it might help later dagcombines.

lenary updated this revision to Diff 203525.Jun 7 2019, 3:21 AM
lenary marked an inline comment as done.
  • Allow Combine if C1 << C2 will fit into an immediate
  • Explain restrictions on RISCVMatInt::getIntMatCost
lenary marked an inline comment as done.Jun 7 2019, 3:21 AM
asb added a comment.Jun 7 2019, 11:40 PM

I added some minor comments, along with a bigger suggestion. What do you think about adding a getIntMatCost taking APInt and IsRV64, which will split the immediate into XLEN-sized chunks (see comment for reference to similar code in AArch64)?

848 ↗(On Diff #203525)

Actually, with the logic there now I guess we're checking that (add val, c1) is fewer instructions than (add, val', c1 << c2) which is a similar condition but not quite the same. i.e. both c1 and c1 << c2 might be materialisable with a single instruction, but if c1 << c2 is materialised using a single lui it's still unprofitable as we can't merge it into the add.

860 ↗(On Diff #203525)

shift immediate -> add immediate

861 ↗(On Diff #203525)

End comment with a full stop

870 ↗(On Diff #203525)

I'm not totally liking calling getIntMatCost with something that isn't logically equivalent to IsRV64 (you could of course have MVT::i64 on RV32 pre-legalisation). The cost of materialising a 64-bit constant on RV64 also isn't necessarily the same as materialising an i64 split into two i32 on RV2.

In an ideal world, we'd have a getIntMatCost taking an APInt+IsRV64, with similar logic to int AArch64TTIImpl::getIntImmCost(const APInt &Imm, Type *Ty) - i.e. splitting into 32-bit/64-bit chunks. Hard to imagine this being a big deal for this particular case, but it's good infrastructure to have.

lenary updated this revision to Diff 205064.Jun 17 2019, 6:49 AM

Address review feedback

  • Introduce new getIntMatCost(const APInt &Val, bool IsRV64) API, which can materialise much wider constants than the previous method.
  • Clarify and check grammar in comments.
lenary marked 4 inline comments as done.Jun 17 2019, 6:51 AM

In future, DAGCombine could do the check to see whether c1 fits into an add immediate, which might simplify more targets hooks than just RISC-V.

Why not do this the right way already?

@lebedev.ri @craig.topper

asb added a comment.Jun 17 2019, 11:13 AM

Thanks for the update Sam, I think there might be a minor correctness issue with getIntMatCost - let me know what you think.

It would be great to add a simple sanity check for the chunking logic. Perhaps there's a test case involving -1 that produces a different answer with the current version of the patch versus a version updated to use the type size for the bit size.

78 ↗(On Diff #205064)

Given you have a description in the header, I think this repeated description for the implementation is redundant (see "Don’t duplicate the documentation comment in the header file and in the implementation file"). Though maybe I missed a de facto standard elsewhere in the codebase for duplicating a reduced version. I don't feel strongly about this, so do feel free to keep if you prefer.

81 ↗(On Diff #205064)

I think this isn't going to give the desired result if Val is e.g. -1. In that case, getMinSignedBits is going to return 1, meaning the loop is only executed once regardless of the original type width. If the original type was e.g. an i64 on RV32, then two instructions are required to materialise the constant, yet the logic in this function will only return 1.

I think the solutation is to more closely mirror the AArch64TTIImpl::getIntImmCost function I mentioned before, and add a Type parameter (and maybe also adopt similar logic for sign-extending constants to be a multiple of the PlatRegSize).

41 ↗(On Diff #205064)

Nit: Shouldn't this be Is64Bit to to match the naming in the implementation?

asb added a comment.Jun 18 2019, 12:37 AM

In future, DAGCombine could do the check to see whether c1 fits into an add immediate, which might simplify more targets hooks than just RISC-V.

Why not do this the right way already?

@lebedev.ri @craig.topper

Now this patch has been extended to cover more cases (i.e. comparing materialisation cost rather than just identifying the isLegalAddImmediate case), it wouldn't have much effect on this backend. A patch that uses isLegalAddImmediate in the relevant DAGCombine might be a small benefit for targets that don't implement the isDesirableToCommuteWithShift hook, but I don't think it would affect the implementation here (it's not obvious to me the hook implementation should assume isLegalAddImmediate had already been checked). So if it does make sense to add, I think it's definitely a separate patch to this one.

asb added inline comments.Jun 18 2019, 1:10 AM
13 ↗(On Diff #205064)

The patch has since been updated to do a direct cost comparison, rather than just looking at the case where the constant fits into an immediate. These two paragraphs should be updated to reflect that

lenary updated this revision to Diff 205350.Jun 18 2019, 7:55 AM

Address review feedback

  • Update getIntMatCost to take an integer size (in bits). This ensures we chunk the constant correctly for the legal types on the target, and account for the costs of all required chunks. I was unable to devise a simple test case for when this behaviour would not match the behaivour using getMinSignedBits, due to legalisation always splitting the wider type before the isDesirableToCommuteWithShift callback is called.

    The chunking will automatically expand each chunk to be the platform register width, so we don't need to sign extend the constant to be a multiple of that width before we start chunking.
  • Update and de-duplicate comments on tests and implementations
  • Update naming of IsRV64 parameter in RISCVMatInt.{h,cpp}
lenary marked 4 inline comments as done.Jun 18 2019, 7:58 AM
lenary added inline comments.
81 ↗(On Diff #205064)

I think I now cover chunking correctly. As in my message above, I don't need to sign extend Val before chunking, because each chunk is extended to be PlatRegSize within the loop.

asb accepted this revision.Jun 18 2019, 8:19 AM

LGTM, thanks!

This revision is now accepted and ready to land.Jun 18 2019, 8:19 AM
This revision was automatically updated to reflect the committed changes.