This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/RISCV/
-
Target/
-
RISCV/
-
RISCVInstrInfo.cpp
-
test/CodeGen/RISCV/
-
CodeGen/
-
RISCV/
-
unroll-loop-cse.ll

Differential D118216

[RISCV] LUI used for address computation should not isAsCheapAsAMove
ClosedPublic

Authored by Luhaocong on Jan 25 2022, 11:46 PM.

Download Raw Diff

Details

Reviewers

craig.topper
asb
jrtc27
frasercrmck
benshi001

Commits

rG23a50736004e: [RISCV] LUI used for address computation should not isAsCheapAsAMove

Summary

A LUI instruction with flag RISCVII::MO_HI is usually used in conjunction with ADDI, and jointly
complete address computation. To bind the cost evaluation of address computation, the LUI
should not be regarded as a cheap move separately, which is consistent with ADDI.

In this test case, it improves the unroll-loop code that the rematerialization of array's base address
miss MachineCSE with Heuristics #1 at isProfitableToCSE.

Diff Detail

Unit TestsFailed

	Time	Test
	60,040 ms	x64 debian > ThreadSanitizer-x86_64.ThreadSanitizer-x86_64::restore_stack.cpp
	60,080 ms	x64 debian > libFuzzer.libFuzzer::large.test

Event Timeline

Luhaocong created this revision.Jan 25 2022, 11:46 PM

Herald added subscribers: VincentWu, luke957, achieveartificialintelligence and 27 others. · View Herald TranscriptJan 25 2022, 11:46 PM

Luhaocong requested review of this revision.Jan 25 2022, 11:46 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 25 2022, 11:46 PM

Herald added subscribers: llvm-commits, • pcwang-thead, eopXD and 2 others. · View Herald Transcript

Harbormaster completed remote builds in B145679: Diff 403147.Jan 25 2022, 11:47 PM

Luhaocong retitled this revision from [RISCV][NFC] eliminate rematerialization ofarray's base address to [RISCV][NFC] eliminate rematerialization of array's base address.Jan 26 2022, 12:00 AM

Luhaocong edited the summary of this revision. (Show Details)

Luhaocong added reviewers: craig.topper, asb, jrtc27, frasercrmck, benshi001.

Luhaocong added a parent revision: D118218: [RISCV] Pre-commit test for D118216.Jan 26 2022, 12:05 AM

Please remove NFC from the title. You changed the behavior of a test. That's clearly not "No Functional Change"

Luhaocong retitled this revision from [RISCV][NFC] eliminate rematerialization of array's base address to [RISCV] eliminate rematerialization of array's base address.Jan 26 2022, 5:21 PM

I'd appreciate more people kicking the tires on this, but it looks like an improvement to me. Doing the usual GCC torture suite recompile (obviously it's not representative of any real-world workload), this seems to be in a win in almost every single case. Thanks for the patch.

This revision is now accepted and ready to land.Jan 28 2022, 6:32 AM

update test case.

Harbormaster completed remote builds in B147864: Diff 406308.Feb 6 2022, 7:00 PM

The only issue I have is that the description is specifically talking about arrays' base addresses but the change has wider-reaching consequences than just LICM: CSE, register allocation, sinking, etc. It's not even a loop-unroll specific change, right? This would apply to non-unrolled loops too, presumably.

So for me it'd be best for:
a) the commit title to reflect the code change itself
b) the description to describe why "MO_HI" is not as cheap as a move in principle
c) use array base address rematerialization as one motivating example.

This seems conceptually wrong to me. isAsCheapAsAMove is meant to model how expensive an instruction is to execute, but how you compute the immediate it uses has zero bearing on that. lui rd, 0x123 and lui rd, %hi(sym) if %hi(sym) happens to be 0x123 are completely indistinguishable from the perspective of the processor. To me this screams of something elsewhere getting cost modelling wrong.

In D118216#3305266, @jrtc27 wrote:

This seems conceptually wrong to me. isAsCheapAsAMove is meant to model how expensive an instruction is to execute, but how you compute the immediate it uses has zero bearing on that. lui rd, 0x123 and lui rd, %hi(sym) if %hi(sym) happens to be 0x123 are completely indistinguishable from the perspective of the processor. To me this screams of something elsewhere getting cost modelling wrong.

Maybe the heuristic #1 should ignore the isCheapAsMove if the instruction isTriviallyRematerializable?

In D118216#3305287, @craig.topper wrote:

In D118216#3305266, @jrtc27 wrote:

This seems conceptually wrong to me. isAsCheapAsAMove is meant to model how expensive an instruction is to execute, but how you compute the immediate it uses has zero bearing on that. lui rd, 0x123 and lui rd, %hi(sym) if %hi(sym) happens to be 0x123 are completely indistinguishable from the perspective of the processor. To me this screams of something elsewhere getting cost modelling wrong.

Maybe the heuristic #1 should ignore the isCheapAsMove if the instruction isTriviallyRematerializable?

So the problem is that LUI _is_ current marked as cheap as a move, since we set isAsCheapAsAMove to 1 on the TableGen record, and the default case falls back on that value. What this patch does is override that and make LUIs that are part of address computations _not_ as cheap as a move.

modify description

In D118216#3305096, @frasercrmck wrote:

The only issue I have is that the description is specifically talking about arrays' base addresses but the change has wider-reaching consequences than just LICM: CSE, register allocation, sinking, etc. It's not even a loop-unroll specific change, right? This would apply to non-unrolled loops too, presumably.

So for me it'd be best for:
a) the commit title to reflect the code change itself
b) the description to describe why "MO_HI" is not as cheap as a move in principle
c) use array base address rematerialization as one motivating example.

Thank you very much for your useful advice and help.

Harbormaster completed remote builds in B148441: Diff 407102.Feb 9 2022, 3:40 AM

In D118216#3305349, @jrtc27 wrote:

In D118216#3305287, @craig.topper wrote:

In D118216#3305266, @jrtc27 wrote:

This seems conceptually wrong to me. isAsCheapAsAMove is meant to model how expensive an instruction is to execute, but how you compute the immediate it uses has zero bearing on that. lui rd, 0x123 and lui rd, %hi(sym) if %hi(sym) happens to be 0x123 are completely indistinguishable from the perspective of the processor. To me this screams of something elsewhere getting cost modelling wrong.

Maybe the heuristic #1 should ignore the isCheapAsMove if the instruction isTriviallyRematerializable?

So the problem is that LUI _is_ current marked as cheap as a move, since we set isAsCheapAsAMove to 1 on the TableGen record, and the default case falls back on that value. What this patch does is override that and make LUIs that are part of address computations _not_ as cheap as a move.

I agree with you. This patch is only a feasible but not perfect scheme for binding the cost evaluation of address computation.
The more difficult is how to establish a cost model, which can analyze multiple related instructions at the same time.

This revision was landed with ongoing or failed builds.Feb 11 2022, 11:21 PM

Closed by commit rG23a50736004e: [RISCV] LUI used for address computation should not isAsCheapAsAMove (authored by Luhaocong, committed by benshi001). · Explain Why

This revision was automatically updated to reflect the committed changes.

benshi001 added a commit: rG23a50736004e: [RISCV] LUI used for address computation should not isAsCheapAsAMove.

I objected and still believe this patch is fundamentally wrong. The problem needs solving elsewhere, not like this. Please revert.

In D118216#3316752, @jrtc27 wrote:

I objected and still believe this patch is fundamentally wrong. The problem needs solving elsewhere, not like this. Please revert.

Although this solution is far from perfect, it does improve code quality. Could you please show a test case that gets wrong or worse assembly by this patch?
I suggest we can keep it and go on searching better solutions.

In D118216#3316773, @benshi001 wrote:

In D118216#3316752, @jrtc27 wrote:

I objected and still believe this patch is fundamentally wrong. The problem needs solving elsewhere, not like this. Please revert.

Although this solution is far from perfect, it does improve code quality. Could you please show a test case that gets wrong or worse assembly by this patch?
I suggest we can keep it and go on searching better solutions.

I don't know, but I don't really care, it is blatantly wrong to say LUI rd, <imm> is not as cheap as a move, especially so to say LUI rd, %hi(x) isn't but LUI rd, x is, it's complete nonsense. There are lots of things you can commit that would improve codegen quality but are totally wrong and would get backed out immediately.

In D118216#3316828, @jrtc27 wrote:

In D118216#3316773, @benshi001 wrote:

In D118216#3316752, @jrtc27 wrote:

I objected and still believe this patch is fundamentally wrong. The problem needs solving elsewhere, not like this. Please revert.

Although this solution is far from perfect, it does improve code quality. Could you please show a test case that gets wrong or worse assembly by this patch?
I suggest we can keep it and go on searching better solutions.

I don't know, but I don't really care, it is blatantly wrong to say LUI rd, <imm> is not as cheap as a move, especially so to say LUI rd, %hi(x) isn't but LUI rd, x is, it's complete nonsense. There are lots of things you can commit that would improve codegen quality but are totally wrong and would get backed out immediately.

Here is a case to show the "wrong" you said, it does exist. 's1' spill unexpected due to the inaccurate cost evaluation of lui s1, %hi(g)
Although this patch achieved greater codegen in most cases, it is really important to accurately describe the cost of instructions.
Thanks for your review, I will revert it and look for more reasonable way.

void func1(void);
void func2(void);

int g;

int foo(int a) {
  int ret = 0;
  ret += g;
  if (a > 0)
    func1();
  else
    func2();
  ret += g;
  return ret;
}

foo:
        addi    sp, sp, -32
        sd      ra, 24(sp)
        sd      s0, 16(sp)
        sd      s1, 8(sp)
        lui     s1, %hi(g)
        lw      s0, %lo(g)(s1)
        blez    a0, .LBB0_2
        call    func1
        j       .LBB0_3
.LBB0_2:
        call    func2
.LBB0_3:
        lw      a0, %lo(g)(s1)
        addw    a0, a0, s0
        ld      ra, 24(sp)
        ld      s0, 16(sp)
        ld      s1, 8(sp)
        addi    sp, sp, 32
        ret

benshi001 added a reverting change: rG0b93e90971c0: Revert "[RISCV] LUI used for address computation should not isAsCheapAsAMove".Feb 17 2022, 1:27 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVInstrInfo.cpp

2 lines

test/

CodeGen/

RISCV/

unroll-loop-cse.ll

20 lines

Diff 406308

llvm/lib/Target/RISCV/RISCVInstrInfo.cpp

Show First 20 Lines • Show All 992 Lines • ▼ Show 20 Lines	case RISCV::FSGNJ_H:
return MI.getOperand(1).isReg() && MI.getOperand(2).isReg() &&		return MI.getOperand(1).isReg() && MI.getOperand(2).isReg() &&
MI.getOperand(1).getReg() == MI.getOperand(2).getReg();		MI.getOperand(1).getReg() == MI.getOperand(2).getReg();
case RISCV::ADDI:		case RISCV::ADDI:
case RISCV::ORI:		case RISCV::ORI:
case RISCV::XORI:		case RISCV::XORI:
return (MI.getOperand(1).isReg() &&		return (MI.getOperand(1).isReg() &&
MI.getOperand(1).getReg() == RISCV::X0) \|\|		MI.getOperand(1).getReg() == RISCV::X0) \|\|
(MI.getOperand(2).isImm() && MI.getOperand(2).getImm() == 0);		(MI.getOperand(2).isImm() && MI.getOperand(2).getImm() == 0);
		case RISCV::LUI:
		return MI.getOperand(1).getTargetFlags() != RISCVII::MO_HI;
}		}
return MI.isAsCheapAsAMove();		return MI.isAsCheapAsAMove();
}		}

Optional<DestSourcePair>		Optional<DestSourcePair>
RISCVInstrInfo::isCopyInstrImpl(const MachineInstr &MI) const {		RISCVInstrInfo::isCopyInstrImpl(const MachineInstr &MI) const {
if (MI.isMoveReg())		if (MI.isMoveReg())
return DestSourcePair{MI.getOperand(0), MI.getOperand(1)};		return DestSourcePair{MI.getOperand(0), MI.getOperand(1)};
▲ Show 20 Lines • Show All 848 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/unroll-loop-cse.ll

	Show All 14 Lines
	; CHECK-NEXT: lui a1, %hi(x)			; CHECK-NEXT: lui a1, %hi(x)
	; CHECK-NEXT: lw a3, %lo(x)(a1)			; CHECK-NEXT: lw a3, %lo(x)(a1)
	; CHECK-NEXT: lui a2, %hi(check)			; CHECK-NEXT: lui a2, %hi(check)
	; CHECK-NEXT: lw a4, %lo(check)(a2)			; CHECK-NEXT: lw a4, %lo(check)(a2)
	; CHECK-NEXT: li a0, 1			; CHECK-NEXT: li a0, 1
	; CHECK-NEXT: bne a3, a4, .LBB0_6			; CHECK-NEXT: bne a3, a4, .LBB0_6
	; CHECK-NEXT: # %bb.1:			; CHECK-NEXT: # %bb.1:
	; CHECK-NEXT: addi a1, a1, %lo(x)			; CHECK-NEXT: addi a1, a1, %lo(x)
	; CHECK-NEXT: lw a1, 4(a1)			; CHECK-NEXT: lw a3, 4(a1)
	; CHECK-NEXT: addi a2, a2, %lo(check)			; CHECK-NEXT: addi a2, a2, %lo(check)
	; CHECK-NEXT: lw a2, 4(a2)			; CHECK-NEXT: lw a4, 4(a2)
	; CHECK-NEXT: bne a1, a2, .LBB0_6			; CHECK-NEXT: bne a3, a4, .LBB0_6
	; CHECK-NEXT: # %bb.2:			; CHECK-NEXT: # %bb.2:
	; CHECK-NEXT: lui a1, %hi(x)
	; CHECK-NEXT: addi a1, a1, %lo(x)
	; CHECK-NEXT: lw a3, 8(a1)			; CHECK-NEXT: lw a3, 8(a1)
	; CHECK-NEXT: lui a2, %hi(check)
	; CHECK-NEXT: addi a2, a2, %lo(check)
	; CHECK-NEXT: lw a4, 8(a2)			; CHECK-NEXT: lw a4, 8(a2)
	; CHECK-NEXT: bne a3, a4, .LBB0_6			; CHECK-NEXT: bne a3, a4, .LBB0_6
	; CHECK-NEXT: # %bb.3:			; CHECK-NEXT: # %bb.3:
	; CHECK-NEXT: lw a1, 12(a1)			; CHECK-NEXT: lw a3, 12(a1)
	; CHECK-NEXT: lw a2, 12(a2)			; CHECK-NEXT: lw a4, 12(a2)
	; CHECK-NEXT: bne a1, a2, .LBB0_6			; CHECK-NEXT: bne a3, a4, .LBB0_6
	; CHECK-NEXT: # %bb.4:			; CHECK-NEXT: # %bb.4:
	; CHECK-NEXT: lui a1, %hi(x)
	; CHECK-NEXT: addi a1, a1, %lo(x)
	; CHECK-NEXT: lw a3, 16(a1)			; CHECK-NEXT: lw a3, 16(a1)
	; CHECK-NEXT: lui a2, %hi(check)
	; CHECK-NEXT: addi a2, a2, %lo(check)
	; CHECK-NEXT: lw a4, 16(a2)			; CHECK-NEXT: lw a4, 16(a2)
	; CHECK-NEXT: bne a3, a4, .LBB0_6			; CHECK-NEXT: bne a3, a4, .LBB0_6
	; CHECK-NEXT: # %bb.5:			; CHECK-NEXT: # %bb.5:
	; CHECK-NEXT: lw a0, 20(a1)			; CHECK-NEXT: lw a0, 20(a1)
	; CHECK-NEXT: lw a1, 20(a2)			; CHECK-NEXT: lw a1, 20(a2)
	; CHECK-NEXT: xor a0, a0, a1			; CHECK-NEXT: xor a0, a0, a1
	; CHECK-NEXT: snez a0, a0			; CHECK-NEXT: snez a0, a0
	; CHECK-NEXT: .LBB0_6:			; CHECK-NEXT: .LBB0_6:
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines