Page MenuHomePhabricator

[RISCV] Support lowering FrameIndex

Authored by asb on Nov 9 2017, 9:53 AM.



Introduces the AddrFI "addressing mode", which is necessary simple because it's not possible to write a pattern that directly matches a frameindex.

Ensures callee-saved registers are accessed relative to the stackpointer. This is necessary as callee saved register spills are performed before the frame pointer is set.

Diff Detail


Event Timeline

mgrang added a subscriber: mgrang.Nov 9 2017, 10:10 AM
mgrang added inline comments.
333 ↗(On Diff #122260)

nit: Period after comment.

mgrang added inline comments.Nov 9 2017, 10:14 AM
74 ↗(On Diff #122260)

This will have an "unused variable 'Reg'" error in a non-asserts build. No?

asb updated this revision to Diff 122295.Nov 9 2017, 11:48 AM
asb marked an inline comment as done.
asb edited the summary of this revision. (Show Details)

Thanks Mandeep, updated the patch. I've moved a bit of logic from the successor patch (D39849) to this one, as I think it makes more sense to use the correct register in eliminateFrameIndex from the start, rather than adding the extra logic in the next patch.

74 ↗(On Diff #122260)

No, Reg gets used in the code below.

asb updated this revision to Diff 122635.Nov 13 2017, 4:22 AM

Rebase on latest LLVM and remove redundant return in eliminateFrameIndex.

sameer.abuasal edited reviewers, added: sabuasal; removed: sameer.abuasal.Nov 13 2017, 11:46 AM
apazos added inline comments.Nov 13 2017, 5:23 PM
94 ↗(On Diff #122635)

too many ( )

90 ↗(On Diff #122635)

you can make this the default value and eliminate the else.

122 ↗(On Diff #122635)

maybe move default to the first check.

asb updated this revision to Diff 122789.Nov 14 2017, 12:05 AM
asb marked 3 inline comments as done.

Updated patch to address review comments.

94 ↗(On Diff #122635)

The extra parens are actually necessary to avoid a compiler warning "warning: using the result of an assignment as a condition without parentheses [-Wparentheses]". This is a fairly ugly idiom and we can avoid the need for it altogether, so I've refactored the code to do so.

sabuasal edited edge metadata.Nov 16 2017, 12:30 PM

Hi @asb

Can you point me to a documentation for the addressing modes names you are using (ADDRii, MEMii ..) ? I can't find these on the GitHub page.

asb added a comment.Nov 16 2017, 12:40 PM

I should add a clarifying comment. These are codegen-only constructs that as far as I can see are required for frameindex lowering, in order to represent a frameindex+offset pair.

Why do you want the special load/store instructions?

71 ↗(On Diff #122789)

You should implement getFrameIndexReference in TargetFrameLowering (in any case). You could use it here instead of all this logic.

asb updated this revision to Diff 123759.Nov 21 2017, 3:52 AM
asb marked an inline comment as done.

Rebased and updated to implement getFrameIndexReference.

@kparzysz honestly I don't really want the special load/store instructions, but was unable to write a pattern that would allow frameindex+offset to be matched by the normal load/store. Perhaps I'm missing something obvious?

hi Alex,

this doesn't apply for me on ToT. what version is it re based on?

asb added a comment.Nov 21 2017, 12:30 PM

Just tested again with r318797. Applies cleanly, compiles, all tests pass.

FrameIndex cannot be matched directly in patterns, but it can be matched by a ComplexPattern, which then can be used in selection patterns.

We have this in Hexagon (the relevant part is the getTargetFrameIndex):

bool HexagonDAGToDAGISel::SelectAddrFI(SDValue &N, SDValue &R) {
  if (N.getOpcode() != ISD::FrameIndex)
    return false;
  auto &HFI = *HST->getFrameLowering();
  MachineFrameInfo &MFI = MF->getFrameInfo();
  int FX = cast<FrameIndexSDNode>(N)->getIndex();
  if (!MFI.isFixedObjectIndex(FX) && HFI.needsAligna(*MF))
    return false;
  R = CurDAG->getTargetFrameIndex(FX, MVT::i32);
  return true;

Then in

def AddrFI: ComplexPattern<i32, 1, "SelectAddrFI", [frameindex], []>;

And then some Pat:

def: Pat<(VT (Load (add (i32 AddrFI:$fi), ImmPred:$Off))),
         (VT (MI AddrFI:$fi, imm:$Off))>;
asb added a comment.EditedNov 21 2017, 1:32 PM

FrameIndex cannot be matched directly in patterns, but it can be matched by a ComplexPattern, which then can be used in selection patterns.

We have this in Hexagon (the relevant part is the getTargetFrameIndex):

Thanks Krzysztof. This patch does something similar, in that I define a ComplexPattern in order to match a FrameIndex. The bit I remember having problems with was using the operand produced by the complex pattern with my targets standard GPR+imm load instructions. Of course my complexpattern was trying to get both the base and offset from the frameindex. Sticking to _just_ matching the frameindex in the ComplexPattern like your Hexagon snippets does seems more promising. I'll have another look tomorrow.

asb added a comment.Nov 22 2017, 12:56 PM

I've had a good look at the alternative approach, whereby a ComplexPattern is used that _just_ matches the FrameIndex and extra load and store tablegen patterns are written using that. This also requires lowering FrameIndex in RISCVDAGToDAGISel::Select. If going that way, it probably makes sense to move HexagonDAGToDAGISel::isOrEquivalentToAdd to target-independent code rather than copying it in to RISCVDagToDAGISel.

I've managed to achieve identical codegen apart from one regression, where a simple varargs test case starts with:

t0: ch = EntryToken
      t39: i32 = add nuw FrameIndex:i32<-1>, Constant:i32<4>
    t55: ch = store<ST4[%va]> t0, t39, FrameIndex:i32<0>, undef:i32
      t9: i32,ch = CopyFromReg t0, Register:i32 %vreg2
    t11: ch = store<ST4[FixedStack-4](align=8)> t0, t9, FrameIndex:i32<-4>, undef:i32
      t13: i32,ch = CopyFromReg t0, Register:i32 %vreg3
    t15: ch = store<ST4[<unknown>]> t0, t13, FrameIndex:i32<-5>, undef:i32
      t17: i32,ch = CopyFromReg t0, Register:i32 %vreg4
    t19: ch = store<ST4[FixedStack-6](align=16)> t0, t17, FrameIndex:i32<-6>, undef:i32
      t21: i32,ch = CopyFromReg t0, Register:i32 %vreg5
    t23: ch = store<ST4[<unknown>]> t0, t21, FrameIndex:i32<-7>, undef:i32
      t25: i32,ch = CopyFromReg t0, Register:i32 %vreg6
    t27: ch = store<ST4[FixedStack-8](align=8)> t0, t25, FrameIndex:i32<-8>, undef:i32
      t29: i32,ch = CopyFromReg t0, Register:i32 %vreg7
    t31: ch = store<ST4[<unknown>]> t0, t29, FrameIndex:i32<-9>, undef:i32
  t60: ch = TokenFactor t58:1, t55, t11, t15, t19, t23, t27, t31
t44: ch,glue = CopyToReg t60, Register:i32 %X10, t58
    t4: i32,ch = CopyFromReg t0, Register:i32 %vreg1
  t7: ch = store<ST4[<unknown>]> t0, t4, FrameIndex:i32<-3>, undef:i32
t58: i32,ch = load<LD4[%2]> t7, FrameIndex:i32<-1>, undef:i32
t45: ch = RISCVISD::RET_FLAG t44, Register:i32 %X10, t44:1

ends up as the following (two addi when one would suffice):

	%vreg7<def> = COPY %X17; GPR:%vreg7
	%vreg6<def> = COPY %X16; GPR:%vreg6
	%vreg5<def> = COPY %X15; GPR:%vreg5
	%vreg4<def> = COPY %X14; GPR:%vreg4
	%vreg3<def> = COPY %X13; GPR:%vreg3
	%vreg2<def> = COPY %X12; GPR:%vreg2
	%vreg1<def> = COPY %X11; GPR:%vreg1
	SW %vreg1, <fi#-3>, 0; mem:ST4[<unknown>] GPR:%vreg1
	SW %vreg7, <fi#-9>, 0; mem:ST4[<unknown>] GPR:%vreg7
	SW %vreg6, <fi#-8>, 0; mem:ST4[FixedStack-8](align=8) GPR:%vreg6
	SW %vreg5, <fi#-7>, 0; mem:ST4[<unknown>] GPR:%vreg5
	SW %vreg4, <fi#-6>, 0; mem:ST4[FixedStack-6](align=16) GPR:%vreg4
	SW %vreg3, <fi#-5>, 0; mem:ST4[<unknown>] GPR:%vreg3
	SW %vreg2, <fi#-4>, 0; mem:ST4[FixedStack-4](align=8) GPR:%vreg2
	%vreg8<def> = ADDI <fi#-1>, 0; GPR:%vreg8
	%vreg9<def> = ADDI %vreg8<kill>, 4; GPR:%vreg9,%vreg8
	SW %vreg9<kill>, <fi#0>, 0; mem:ST4[%va] GPR:%vreg9
	%vreg10<def> = LW <fi#-1>, 0; mem:LD4[%2] GPR:%vreg10
	%X10<def> = COPY %vreg10; GPR:%vreg10
	PseudoRET %X10<imp-use>

One the one hand, the RISC-V backend is starting by focusing on correct and well tested output, with performance work coming later. This means sub-optimal codegen can be tolerated initially. On the other hand, it seems a shame to regress code generation in this way.

I'll have a look if there's a solution that doesn't result in too much additional complexity.

asb added a comment.Nov 23 2017, 3:26 AM

Just to update on my previous comment and to make sure nobody wastes time on this: looking at the issue with a fresh pair of eyes this morning the selection issues were rather straight-forward to resolve. Krzysztof's proposed solution seems cleaner overall and is now working as a drop-in replacement for the RV32I tests. I'll clean up my implementation, test against the full RV32IMAFD+RV64IMAFD patchset and refresh this patch.

asb updated this revision to Diff 124451.Nov 27 2017, 1:10 PM
asb edited the summary of this revision. (Show Details)

Rewrite to avoid the need for the fake *_FI instructions. I also move isOrEquivalentToAdd from HexagonISelDAGToDAG to the SelectionDAGISel base class, so we can make use of it.

asb updated this revision to Diff 125373.Dec 4 2017, 10:51 AM

No patch changes, just refreshing the diff. If we can get this reviewed, it unblocks the following two patches which already have reviews.

sabuasal accepted this revision.EditedDec 7 2017, 1:27 PM


@kparzysz should probably take a look too

This revision is now accepted and ready to land.Dec 7 2017, 1:27 PM
kparzysz added inline comments.Dec 7 2017, 1:33 PM
119 ↗(On Diff #125373)

This looks unused. Can it be removed?

asb updated this revision to Diff 126137.Dec 8 2017, 6:33 AM
asb marked an inline comment as done.

Rebase the patch and remove MEMii which is no longer needed.

asb added a comment.Dec 10 2017, 9:15 AM

@kparzysz, does this look good to you now?

kparzysz accepted this revision.Dec 10 2017, 2:22 PM

Yes, LGTM.

This revision was automatically updated to reflect the committed changes.