This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Separate base from offset in lowerGlobalAddress
ClosedPublic

Authored by sabuasal on May 16 2018, 4:16 PM.

Download Raw Diff

Details

Reviewers

asb
apazos

Commits

rG1dc0a8fb1828: [RISCV] Separate base from offset in lowerGlobalAddress
rL332641: [RISCV] Separate base from offset in lowerGlobalAddress

Summary

When lowering global address, lower the base as a TargetGlobal first then
create an SDNode for the offset separately and chain it to the address calculation

This optimization will create a DAG where the base address of a global access will
be reused between different access. The offset can later be folded into the immediate
part of the memory access instruction.

  
With this optimization we generate:
 
  lui a0, %hi(s)
  addi a0, a0, %lo(s) ; shared base address.

  addi a1, zero, 20 ; 2 instructions per access.
  sw a1, 44(a0)

  addi a1, zero, 10
  sw a1, 8(a0)

  addi a1, zero, 30
  sw a1, 80(a0)

  Instead of:

  lui a0, %hi(s+44) ; 3 instructions per access.
  addi a1, zero, 20
  sw a1, %lo(s+44)(a0)
 
  lui a0, %hi(s+8)
  addi a1, zero, 10
  sw a1, %lo(s+8)(a0)

  lui a0, %hi(s+80)
  addi a1, zero, 30
  sw a1, %lo(s+80)(a0)
 
  Which will save one instruction per access.

Diff Detail

Event Timeline

sabuasal created this revision.May 16 2018, 4:16 PM

Herald added subscribers: mgrang, edward-jones, zzheng and 7 others. · View Herald TranscriptMay 16 2018, 4:16 PM

sabuasal added reviewers: asb, apazos.May 16 2018, 4:17 PM

sabuasal edited the summary of this revision. (Show Details)

sabuasal retitled this revision from [RISCV] Separate base from offset in lowerGlobalAddress to [RISCV] Separate base from offset in lowerGlobalAddress (no peephole).May 16 2018, 5:05 PM

Thanks Sameer, this looks good to me. It would be worth adding some sort of comment to lowerGlobalAddress about the decision to emit a separate ADD node rather than folding the offset into the global address.

lib/Target/RISCV/RISCVISelLowering.cpp
300	It would be good to add a comment here documenting the decision not to fold-the offset into the global address. e.g. "In order to maximise the opportunity for common subexpression elimination, emit a separate ADD node for the global address offset instead of folding it in the global address node. Later peephole optimisations may choose to fold it back in when profitable."

This revision is now accepted and ready to land.May 17 2018, 2:25 AM

sabuasal updated this revision to Diff 147358.May 17 2018, 11:12 AM

sabuasal marked an inline comment as done.

sabuasal retitled this revision from [RISCV] Separate base from offset in lowerGlobalAddress (no peephole) to [RISCV] Separate base from offset in lowerGlobalAddress.

Closed by commit rL332641: [RISCV] Separate base from offset in lowerGlobalAddress (authored by sabuasal). · Explain WhyMay 17 2018, 11:18 AM

This revision was automatically updated to reflect the committed changes.

sabuasal added a child revision: D45748: [RISCV] Add peepholes for Global Address lowering patterns.May 17 2018, 12:20 PM

Revision Contents

Path

Size

lib/

Target/

RISCV/

RISCVISelLowering.cpp

10 lines

test/

CodeGen/

RISCV/

20 lines

6 lines

6 lines

80 lines

hoist-global-addr-base.ll

111 lines

mem.ll

6 lines

wide-mem.ll

8 lines

zext-with-load-is-free.ll

24 lines

Diff 147211

lib/Target/RISCV/RISCVISelLowering.cpp

	Show First 20 Lines • Show All 287 Lines • ▼ Show 20 Lines

	SDValue RISCVTargetLowering::lowerGlobalAddress(SDValue Op,			SDValue RISCVTargetLowering::lowerGlobalAddress(SDValue Op,
	SelectionDAG &DAG) const {			SelectionDAG &DAG) const {
	SDLoc DL(Op);			SDLoc DL(Op);
	EVT Ty = Op.getValueType();			EVT Ty = Op.getValueType();
	GlobalAddressSDNode *N = cast<GlobalAddressSDNode>(Op);			GlobalAddressSDNode *N = cast<GlobalAddressSDNode>(Op);
	const GlobalValue *GV = N->getGlobal();			const GlobalValue *GV = N->getGlobal();
	int64_t Offset = N->getOffset();			int64_t Offset = N->getOffset();
				MVT XLenVT = Subtarget.getXLenVT();

	if (isPositionIndependent() \|\| Subtarget.is64Bit())			if (isPositionIndependent() \|\| Subtarget.is64Bit())
	report_fatal_error("Unable to lowerGlobalAddress");			report_fatal_error("Unable to lowerGlobalAddress");

				asbUnsubmitted Done Reply Inline Actions It would be good to add a comment here documenting the decision not to fold-the offset into the global address. e.g. "In order to maximise the opportunity for common subexpression elimination, emit a separate ADD node for the global address offset instead of folding it in the global address node. Later peephole optimisations may choose to fold it back in when profitable." asb: It would be good to add a comment here documenting the decision not to fold-the offset into the…
	SDValue GAHi =			SDValue GAHi = DAG.getTargetGlobalAddress(GV, DL, Ty, 0, RISCVII::MO_HI);
	DAG.getTargetGlobalAddress(GV, DL, Ty, Offset, RISCVII::MO_HI);			SDValue GALo = DAG.getTargetGlobalAddress(GV, DL, Ty, 0, RISCVII::MO_LO);
	SDValue GALo =
	DAG.getTargetGlobalAddress(GV, DL, Ty, Offset, RISCVII::MO_LO);
	SDValue MNHi = SDValue(DAG.getMachineNode(RISCV::LUI, DL, Ty, GAHi), 0);			SDValue MNHi = SDValue(DAG.getMachineNode(RISCV::LUI, DL, Ty, GAHi), 0);
	SDValue MNLo =			SDValue MNLo =
	SDValue(DAG.getMachineNode(RISCV::ADDI, DL, Ty, MNHi, GALo), 0);			SDValue(DAG.getMachineNode(RISCV::ADDI, DL, Ty, MNHi, GALo), 0);
				if (Offset != 0)
				return DAG.getNode(ISD::ADD, DL, Ty, MNLo,
				DAG.getConstant(Offset, DL, XLenVT));
	return MNLo;			return MNLo;
	}			}

	SDValue RISCVTargetLowering::lowerBlockAddress(SDValue Op,			SDValue RISCVTargetLowering::lowerBlockAddress(SDValue Op,
	SelectionDAG &DAG) const {			SelectionDAG &DAG) const {
	SDLoc DL(Op);			SDLoc DL(Op);
	EVT Ty = Op.getValueType();			EVT Ty = Op.getValueType();
	BlockAddressSDNode *N = cast<BlockAddressSDNode>(Op);			BlockAddressSDNode *N = cast<BlockAddressSDNode>(Op);
	▲ Show 20 Lines • Show All 1,129 Lines • Show Last 20 Lines

test/CodeGen/RISCV/byval.ll

	Show All 16 Lines
	}			}


	define void @caller() nounwind {			define void @caller() nounwind {
	; RV32I-LABEL: caller:			; RV32I-LABEL: caller:
	; RV32I: # %bb.0: # %entry			; RV32I: # %bb.0: # %entry
	; RV32I-NEXT: addi sp, sp, -32			; RV32I-NEXT: addi sp, sp, -32
	; RV32I-NEXT: sw ra, 28(sp)			; RV32I-NEXT: sw ra, 28(sp)
	; RV32I-NEXT: lui a0, %hi(foo+12)
	; RV32I-NEXT: lw a0, %lo(foo+12)(a0)
	; RV32I-NEXT: sw a0, 24(sp)
	; RV32I-NEXT: lui a0, %hi(foo+8)
	; RV32I-NEXT: lw a0, %lo(foo+8)(a0)
	; RV32I-NEXT: sw a0, 20(sp)
	; RV32I-NEXT: lui a0, %hi(foo+4)
	; RV32I-NEXT: lw a0, %lo(foo+4)(a0)
	; RV32I-NEXT: sw a0, 16(sp)
	; RV32I-NEXT: lui a0, %hi(foo)			; RV32I-NEXT: lui a0, %hi(foo)
	; RV32I-NEXT: lw a0, %lo(foo)(a0)			; RV32I-NEXT: lw a1, %lo(foo)(a0)
	; RV32I-NEXT: sw a0, 12(sp)			; RV32I-NEXT: sw a1, 12(sp)
				; RV32I-NEXT: addi a0, a0, %lo(foo)
				; RV32I-NEXT: lw a1, 12(a0)
				; RV32I-NEXT: sw a1, 24(sp)
				; RV32I-NEXT: lw a1, 8(a0)
				; RV32I-NEXT: sw a1, 20(sp)
				; RV32I-NEXT: lw a0, 4(a0)
				; RV32I-NEXT: sw a0, 16(sp)
	; RV32I-NEXT: addi a0, sp, 12			; RV32I-NEXT: addi a0, sp, 12
	; RV32I-NEXT: call callee			; RV32I-NEXT: call callee
	; RV32I-NEXT: lw ra, 28(sp)			; RV32I-NEXT: lw ra, 28(sp)
	; RV32I-NEXT: addi sp, sp, 32			; RV32I-NEXT: addi sp, sp, 32
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	entry:			entry:
	%call = call i32 @callee(%struct.Foo* byval @foo)			%call = call i32 @callee(%struct.Foo* byval @foo)
	ret void			ret void
	}			}

test/CodeGen/RISCV/double-mem.ll

	Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; RV32IFD-NEXT: fld ft0, 8(sp)			; RV32IFD-NEXT: fld ft0, 8(sp)
	; RV32IFD-NEXT: sw a0, 8(sp)			; RV32IFD-NEXT: sw a0, 8(sp)
	; RV32IFD-NEXT: sw a1, 12(sp)			; RV32IFD-NEXT: sw a1, 12(sp)
	; RV32IFD-NEXT: fld ft1, 8(sp)			; RV32IFD-NEXT: fld ft1, 8(sp)
	; RV32IFD-NEXT: fadd.d ft0, ft1, ft0			; RV32IFD-NEXT: fadd.d ft0, ft1, ft0
	; RV32IFD-NEXT: lui a0, %hi(G)			; RV32IFD-NEXT: lui a0, %hi(G)
	; RV32IFD-NEXT: fld ft1, %lo(G)(a0)			; RV32IFD-NEXT: fld ft1, %lo(G)(a0)
	; RV32IFD-NEXT: fsd ft0, %lo(G)(a0)			; RV32IFD-NEXT: fsd ft0, %lo(G)(a0)
	; RV32IFD-NEXT: lui a0, %hi(G+72)			; RV32IFD-NEXT: addi a0, a0, %lo(G)
	; RV32IFD-NEXT: fld ft1, %lo(G+72)(a0)			; RV32IFD-NEXT: fld ft1, 72(a0)
	; RV32IFD-NEXT: fsd ft0, %lo(G+72)(a0)			; RV32IFD-NEXT: fsd ft0, 72(a0)
	; RV32IFD-NEXT: fsd ft0, 8(sp)			; RV32IFD-NEXT: fsd ft0, 8(sp)
	; RV32IFD-NEXT: lw a0, 8(sp)			; RV32IFD-NEXT: lw a0, 8(sp)
	; RV32IFD-NEXT: lw a1, 12(sp)			; RV32IFD-NEXT: lw a1, 12(sp)
	; RV32IFD-NEXT: addi sp, sp, 16			; RV32IFD-NEXT: addi sp, sp, 16
	; RV32IFD-NEXT: ret			; RV32IFD-NEXT: ret
	; Use %a and %b in an FP op to ensure floating point registers are used, even			; Use %a and %b in an FP op to ensure floating point registers are used, even
	; for the soft float ABI			; for the soft float ABI
	%1 = fadd double %a, %b			%1 = fadd double %a, %b
	▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

test/CodeGen/RISCV/float-mem.ll

	Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; RV32IF-LABEL: flw_fsw_global:			; RV32IF-LABEL: flw_fsw_global:
	; RV32IF: # %bb.0:			; RV32IF: # %bb.0:
	; RV32IF-NEXT: fmv.w.x ft0, a1			; RV32IF-NEXT: fmv.w.x ft0, a1
	; RV32IF-NEXT: fmv.w.x ft1, a0			; RV32IF-NEXT: fmv.w.x ft1, a0
	; RV32IF-NEXT: fadd.s ft0, ft1, ft0			; RV32IF-NEXT: fadd.s ft0, ft1, ft0
	; RV32IF-NEXT: lui a0, %hi(G)			; RV32IF-NEXT: lui a0, %hi(G)
	; RV32IF-NEXT: flw ft1, %lo(G)(a0)			; RV32IF-NEXT: flw ft1, %lo(G)(a0)
	; RV32IF-NEXT: fsw ft0, %lo(G)(a0)			; RV32IF-NEXT: fsw ft0, %lo(G)(a0)
	; RV32IF-NEXT: lui a0, %hi(G+36)			; RV32IF-NEXT: addi a0, a0, %lo(G)
	; RV32IF-NEXT: flw ft1, %lo(G+36)(a0)			; RV32IF-NEXT: flw ft1, 36(a0)
	; RV32IF-NEXT: fsw ft0, %lo(G+36)(a0)			; RV32IF-NEXT: fsw ft0, 36(a0)
	; RV32IF-NEXT: fmv.x.w a0, ft0			; RV32IF-NEXT: fmv.x.w a0, ft0
	; RV32IF-NEXT: ret			; RV32IF-NEXT: ret
	%1 = fadd float %a, %b			%1 = fadd float %a, %b
	%2 = load volatile float, float* @G			%2 = load volatile float, float* @G
	store float %1, float* @G			store float %1, float* @G
	%3 = getelementptr float, float* @G, i32 9			%3 = getelementptr float, float* @G, i32 9
	%4 = load volatile float, float* %3			%4 = load volatile float, float* %3
	store float %1, float* %3			store float %1, float* %3
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

test/CodeGen/RISCV/fp128.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \			; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
	; RUN: \| FileCheck -check-prefix=RV32I %s			; RUN: \| FileCheck -check-prefix=RV32I %s

	@x = local_unnamed_addr global fp128 0xL00000000000000007FFF000000000000, align 16			@x = local_unnamed_addr global fp128 0xL00000000000000007FFF000000000000, align 16
	@y = local_unnamed_addr global fp128 0xL00000000000000007FFF000000000000, align 16			@y = local_unnamed_addr global fp128 0xL00000000000000007FFF000000000000, align 16

	; Besides anything else, these tests help verify that libcall ABI lowering			; Besides anything else, these tests help verify that libcall ABI lowering
	; works correctly			; works correctly

	define i32 @test_load_and_cmp() nounwind {			define i32 @test_load_and_cmp() nounwind {
	; RV32I-LABEL: test_load_and_cmp:			; RV32I-LABEL: test_load_and_cmp:
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	; RV32I-NEXT: addi sp, sp, -48			; RV32I-NEXT: addi sp, sp, -48
	; RV32I-NEXT: sw ra, 44(sp)			; RV32I-NEXT: sw ra, 44(sp)
	; RV32I-NEXT: lui a0, %hi(y+12)
	; RV32I-NEXT: lw a0, %lo(y+12)(a0)
	; RV32I-NEXT: sw a0, 20(sp)
	; RV32I-NEXT: lui a0, %hi(y+8)
	; RV32I-NEXT: lw a0, %lo(y+8)(a0)
	; RV32I-NEXT: sw a0, 16(sp)
	; RV32I-NEXT: lui a0, %hi(y+4)
	; RV32I-NEXT: lw a0, %lo(y+4)(a0)
	; RV32I-NEXT: sw a0, 12(sp)
	; RV32I-NEXT: lui a0, %hi(y)			; RV32I-NEXT: lui a0, %hi(y)
	; RV32I-NEXT: lw a0, %lo(y)(a0)			; RV32I-NEXT: lw a1, %lo(y)(a0)
	; RV32I-NEXT: sw a0, 8(sp)			; RV32I-NEXT: sw a1, 8(sp)
	; RV32I-NEXT: lui a0, %hi(x+12)			; RV32I-NEXT: lui a1, %hi(x)
	; RV32I-NEXT: lw a0, %lo(x+12)(a0)			; RV32I-NEXT: lw a2, %lo(x)(a1)
	; RV32I-NEXT: sw a0, 36(sp)			; RV32I-NEXT: sw a2, 24(sp)
	; RV32I-NEXT: lui a0, %hi(x+8)			; RV32I-NEXT: addi a0, a0, %lo(y)
	; RV32I-NEXT: lw a0, %lo(x+8)(a0)			; RV32I-NEXT: lw a2, 12(a0)
	; RV32I-NEXT: sw a0, 32(sp)			; RV32I-NEXT: sw a2, 20(sp)
	; RV32I-NEXT: lui a0, %hi(x+4)			; RV32I-NEXT: lw a2, 8(a0)
	; RV32I-NEXT: lw a0, %lo(x+4)(a0)			; RV32I-NEXT: sw a2, 16(sp)
				; RV32I-NEXT: lw a0, 4(a0)
				; RV32I-NEXT: sw a0, 12(sp)
				; RV32I-NEXT: addi a0, a1, %lo(x)
				; RV32I-NEXT: lw a1, 12(a0)
				; RV32I-NEXT: sw a1, 36(sp)
				; RV32I-NEXT: lw a1, 8(a0)
				; RV32I-NEXT: sw a1, 32(sp)
				; RV32I-NEXT: lw a0, 4(a0)
	; RV32I-NEXT: sw a0, 28(sp)			; RV32I-NEXT: sw a0, 28(sp)
	; RV32I-NEXT: lui a0, %hi(x)
	; RV32I-NEXT: lw a0, %lo(x)(a0)
	; RV32I-NEXT: sw a0, 24(sp)
	; RV32I-NEXT: addi a0, sp, 24			; RV32I-NEXT: addi a0, sp, 24
	; RV32I-NEXT: addi a1, sp, 8			; RV32I-NEXT: addi a1, sp, 8
	; RV32I-NEXT: call __netf2			; RV32I-NEXT: call __netf2
	; RV32I-NEXT: xor a0, a0, zero			; RV32I-NEXT: xor a0, a0, zero
	; RV32I-NEXT: snez a0, a0			; RV32I-NEXT: snez a0, a0
	; RV32I-NEXT: lw ra, 44(sp)			; RV32I-NEXT: lw ra, 44(sp)
	; RV32I-NEXT: addi sp, sp, 48			; RV32I-NEXT: addi sp, sp, 48
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	%1 = load fp128, fp128* @x, align 16			%1 = load fp128, fp128* @x, align 16
	%2 = load fp128, fp128* @y, align 16			%2 = load fp128, fp128* @y, align 16
	%cmp = fcmp une fp128 %1, %2			%cmp = fcmp une fp128 %1, %2
	%3 = zext i1 %cmp to i32			%3 = zext i1 %cmp to i32
	ret i32 %3			ret i32 %3
	}			}

	define i32 @test_add_and_fptosi() nounwind {			define i32 @test_add_and_fptosi() nounwind {
	; RV32I-LABEL: test_add_and_fptosi:			; RV32I-LABEL: test_add_and_fptosi:
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	; RV32I-NEXT: addi sp, sp, -80			; RV32I-NEXT: addi sp, sp, -80
	; RV32I-NEXT: sw ra, 76(sp)			; RV32I-NEXT: sw ra, 76(sp)
	; RV32I-NEXT: lui a0, %hi(y+12)
	; RV32I-NEXT: lw a0, %lo(y+12)(a0)
	; RV32I-NEXT: sw a0, 36(sp)
	; RV32I-NEXT: lui a0, %hi(y+8)
	; RV32I-NEXT: lw a0, %lo(y+8)(a0)
	; RV32I-NEXT: sw a0, 32(sp)
	; RV32I-NEXT: lui a0, %hi(y+4)
	; RV32I-NEXT: lw a0, %lo(y+4)(a0)
	; RV32I-NEXT: sw a0, 28(sp)
	; RV32I-NEXT: lui a0, %hi(y)			; RV32I-NEXT: lui a0, %hi(y)
	; RV32I-NEXT: lw a0, %lo(y)(a0)			; RV32I-NEXT: lw a1, %lo(y)(a0)
	; RV32I-NEXT: sw a0, 24(sp)			; RV32I-NEXT: sw a1, 24(sp)
	; RV32I-NEXT: lui a0, %hi(x+12)			; RV32I-NEXT: lui a1, %hi(x)
	; RV32I-NEXT: lw a0, %lo(x+12)(a0)			; RV32I-NEXT: lw a2, %lo(x)(a1)
	; RV32I-NEXT: sw a0, 52(sp)			; RV32I-NEXT: sw a2, 40(sp)
	; RV32I-NEXT: lui a0, %hi(x+8)			; RV32I-NEXT: addi a0, a0, %lo(y)
	; RV32I-NEXT: lw a0, %lo(x+8)(a0)			; RV32I-NEXT: lw a2, 12(a0)
	; RV32I-NEXT: sw a0, 48(sp)			; RV32I-NEXT: sw a2, 36(sp)
	; RV32I-NEXT: lui a0, %hi(x+4)			; RV32I-NEXT: lw a2, 8(a0)
	; RV32I-NEXT: lw a0, %lo(x+4)(a0)			; RV32I-NEXT: sw a2, 32(sp)
				; RV32I-NEXT: lw a0, 4(a0)
				; RV32I-NEXT: sw a0, 28(sp)
				; RV32I-NEXT: addi a0, a1, %lo(x)
				; RV32I-NEXT: lw a1, 12(a0)
				; RV32I-NEXT: sw a1, 52(sp)
				; RV32I-NEXT: lw a1, 8(a0)
				; RV32I-NEXT: sw a1, 48(sp)
				; RV32I-NEXT: lw a0, 4(a0)
	; RV32I-NEXT: sw a0, 44(sp)			; RV32I-NEXT: sw a0, 44(sp)
	; RV32I-NEXT: lui a0, %hi(x)
	; RV32I-NEXT: lw a0, %lo(x)(a0)
	; RV32I-NEXT: sw a0, 40(sp)
	; RV32I-NEXT: addi a0, sp, 56			; RV32I-NEXT: addi a0, sp, 56
	; RV32I-NEXT: addi a1, sp, 40			; RV32I-NEXT: addi a1, sp, 40
	; RV32I-NEXT: addi a2, sp, 24			; RV32I-NEXT: addi a2, sp, 24
	; RV32I-NEXT: call __addtf3			; RV32I-NEXT: call __addtf3
	; RV32I-NEXT: lw a0, 68(sp)			; RV32I-NEXT: lw a0, 68(sp)
	; RV32I-NEXT: sw a0, 20(sp)			; RV32I-NEXT: sw a0, 20(sp)
	; RV32I-NEXT: lw a0, 64(sp)			; RV32I-NEXT: lw a0, 64(sp)
	; RV32I-NEXT: sw a0, 16(sp)			; RV32I-NEXT: sw a0, 16(sp)
	Show All 15 Lines

test/CodeGen/RISCV/hoist-global-addr-base.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=riscv32 < %s \| FileCheck %s

				%struct.S = type { [40 x i32], i32, i32, i32, [4100 x i32], i32, i32, i32 }
				@s = common dso_local global %struct.S zeroinitializer, align 4
				@foo = global [6 x i16] [i16 1, i16 2, i16 3, i16 4, i16 5, i16 0], align 2

				define dso_local void @multiple_stores() local_unnamed_addr {
				; CHECK-LABEL: multiple_stores:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: lui a0, %hi(s)
				; CHECK-NEXT: addi a0, a0, %lo(s)
				; CHECK-NEXT: addi a1, zero, 20
				; CHECK-NEXT: sw a1, 164(a0)
				; CHECK-NEXT: addi a1, zero, 10
				; CHECK-NEXT: sw a1, 160(a0)
				; CHECK-NEXT: ret
				entry:
				store i32 10, i32* getelementptr inbounds (%struct.S, %struct.S* @s, i32 0, i32 1), align 4
				store i32 20, i32* getelementptr inbounds (%struct.S, %struct.S* @s, i32 0, i32 2), align 4
				ret void
				}

				define dso_local void @control_flow() local_unnamed_addr #0 {
				; CHECK-LABEL: control_flow:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: lui a0, %hi(s)
				; CHECK-NEXT: addi a0, a0, %lo(s)
				; CHECK-NEXT: lw a1, 164(a0)
				; CHECK-NEXT: addi a2, zero, 1
				; CHECK-NEXT: blt a1, a2, .LBB1_2
				; CHECK-NEXT: # %bb.1: # %if.then
				; CHECK-NEXT: addi a1, zero, 10
				; CHECK-NEXT: sw a1, 160(a0)
				; CHECK-NEXT: .LBB1_2: # %if.end
				; CHECK-NEXT: ret
				entry:
				%0 = load i32, i32* getelementptr inbounds (%struct.S, %struct.S* @s, i32 0, i32 2), align 4
				%cmp = icmp sgt i32 %0, 0
				br i1 %cmp, label %if.then, label %if.end

				if.then: ; preds = %entry
				store i32 10, i32* getelementptr inbounds (%struct.S, %struct.S* @s, i32 0, i32 1), align 4
				br label %if.end

				if.end: ; preds = %if.then, %entry
				ret void
				}

				;TODO: Offset shouln't be separated in this case. We get shorter sequence if it
				; is merged in the LUI %hi and the ADDI %lo.
				define dso_local i32* @big_offset_one_use() local_unnamed_addr {
				; CHECK-LABEL: big_offset_one_use:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: lui a0, 4
				; CHECK-NEXT: addi a0, a0, 188
				; CHECK-NEXT: lui a1, %hi(s)
				; CHECK-NEXT: addi a1, a1, %lo(s)
				; CHECK-NEXT: add a0, a1, a0
				; CHECK-NEXT: ret
				entry:
				ret i32* getelementptr inbounds (%struct.S, %struct.S* @s, i32 0, i32 5)
				}

				;TODO: Offset shouln't be separated in this case. We get shorter sequence if it
				; is merged in the LUI %hi and the ADDI %lo.
				define dso_local i32* @small_offset_one_use() local_unnamed_addr {
				; CHECK-LABEL: small_offset_one_use:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: lui a0, %hi(s)
				; CHECK-NEXT: addi a0, a0, %lo(s)
				; CHECK-NEXT: addi a0, a0, 160
				; CHECK-NEXT: ret
				entry:
				ret i32* getelementptr inbounds (%struct.S, %struct.S* @s, i32 0, i32 1)
				}


				;TODO: Offset shouln't be separated in this case. We get shorter sequence if it
				; is merged in the LUI %hi and the ADDI %lo.
				define dso_local i32 @load_half() nounwind {
				; CHECK-LABEL: load_half:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: addi sp, sp, -16
				; CHECK-NEXT: sw ra, 12(sp)
				; CHECK-NEXT: lui a0, %hi(foo)
				; CHECK-NEXT: addi a0, a0, %lo(foo)
				; CHECK-NEXT: lhu a0, 8(a0)
				; CHECK-NEXT: addi a1, zero, 140
				; CHECK-NEXT: bne a0, a1, .LBB4_2
				; CHECK-NEXT: # %bb.1: # %if.end
				; CHECK-NEXT: mv a0, zero
				; CHECK-NEXT: lw ra, 12(sp)
				; CHECK-NEXT: addi sp, sp, 16
				; CHECK-NEXT: ret
				; CHECK-NEXT: .LBB4_2: # %if.then
				; CHECK-NEXT: call abort
				entry:
				%0 = load i16, i16* getelementptr inbounds ([6 x i16], [6 x i16]* @foo, i32 0, i32 4), align 2
				%cmp = icmp eq i16 %0, 140
				br i1 %cmp, label %if.end, label %if.then

				if.then:
				tail call void @abort()
				unreachable

				if.end:
				ret i32 0
				}

				declare void @abort()

test/CodeGen/RISCV/mem.ll

	Show First 20 Lines • Show All 162 Lines • ▼ Show 20 Lines
	@G = global i32 0			@G = global i32 0

	define i32 @lw_sw_global(i32 %a) nounwind {			define i32 @lw_sw_global(i32 %a) nounwind {
	; RV32I-LABEL: lw_sw_global:			; RV32I-LABEL: lw_sw_global:
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	; RV32I-NEXT: lui a2, %hi(G)			; RV32I-NEXT: lui a2, %hi(G)
	; RV32I-NEXT: lw a1, %lo(G)(a2)			; RV32I-NEXT: lw a1, %lo(G)(a2)
	; RV32I-NEXT: sw a0, %lo(G)(a2)			; RV32I-NEXT: sw a0, %lo(G)(a2)
	; RV32I-NEXT: lui a2, %hi(G+36)			; RV32I-NEXT: addi a2, a2, %lo(G)
	; RV32I-NEXT: lw a3, %lo(G+36)(a2)			; RV32I-NEXT: lw a3, 36(a2)
	; RV32I-NEXT: sw a0, %lo(G+36)(a2)			; RV32I-NEXT: sw a0, 36(a2)
	; RV32I-NEXT: mv a0, a1			; RV32I-NEXT: mv a0, a1
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	%1 = load volatile i32, i32* @G			%1 = load volatile i32, i32* @G
	store i32 %a, i32* @G			store i32 %a, i32* @G
	%2 = getelementptr i32, i32* @G, i32 9			%2 = getelementptr i32, i32* @G, i32 9
	%3 = load volatile i32, i32* %2			%3 = load volatile i32, i32* %2
	store i32 %a, i32* %2			store i32 %a, i32* %2
	ret i32 %1			ret i32 %1
	Show All 16 Lines

test/CodeGen/RISCV/wide-mem.ll

Show All 14 Lines	; RV32I-NEXT: ret
ret i64 %1		ret i64 %1
}		}

@val64 = local_unnamed_addr global i64 2863311530, align 8		@val64 = local_unnamed_addr global i64 2863311530, align 8

define i64 @load_i64_global() nounwind {		define i64 @load_i64_global() nounwind {
; RV32I-LABEL: load_i64_global:		; RV32I-LABEL: load_i64_global:
; RV32I: # %bb.0:		; RV32I: # %bb.0:
; RV32I-NEXT: lui a0, %hi(val64)		; RV32I-NEXT: lui a1, %hi(val64)
; RV32I-NEXT: lw a0, %lo(val64)(a0)		; RV32I-NEXT: lw a0, %lo(val64)(a1)
; RV32I-NEXT: lui a1, %hi(val64+4)		; RV32I-NEXT: addi a1, a1, %lo(val64)
; RV32I-NEXT: lw a1, %lo(val64+4)(a1)		; RV32I-NEXT: lw a1, 4(a1)
; RV32I-NEXT: ret		; RV32I-NEXT: ret
%1 = load i64, i64* @val64		%1 = load i64, i64* @val64
ret i64 %1		ret i64 %1
}		}

test/CodeGen/RISCV/zext-with-load-is-free.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \		; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
; RUN: \| FileCheck %s -check-prefix=RV32I		; RUN: \| FileCheck %s -check-prefix=RV32I

; TODO: lbu and lhu should be selected to avoid the unnecessary masking.		; TODO: lbu and lhu should be selected to avoid the unnecessary masking.

@bytes = global [5 x i8] zeroinitializer, align 1		@bytes = global [5 x i8] zeroinitializer, align 1

define i32 @test_zext_i8() {		define i32 @test_zext_i8() {
; RV32I-LABEL: test_zext_i8:		; RV32I-LABEL: test_zext_i8:
; RV32I: # %bb.0: # %entry		; RV32I: # %bb.0: # %entry
; RV32I-NEXT: lui a0, %hi(bytes)		; RV32I-NEXT: lui a0, %hi(bytes)
; RV32I-NEXT: lbu a0, %lo(bytes)(a0)		; RV32I-NEXT: lbu a1, %lo(bytes)(a0)
; RV32I-NEXT: addi a1, zero, 136		; RV32I-NEXT: addi a2, zero, 136
; RV32I-NEXT: bne a0, a1, .LBB0_3		; RV32I-NEXT: bne a1, a2, .LBB0_3
; RV32I-NEXT: # %bb.1: # %entry		; RV32I-NEXT: # %bb.1: # %entry
; RV32I-NEXT: lui a0, %hi(bytes+1)		; RV32I-NEXT: addi a0, a0, %lo(bytes)
; RV32I-NEXT: lbu a0, %lo(bytes+1)(a0)		; RV32I-NEXT: lbu a0, 1(a0)
; RV32I-NEXT: addi a1, zero, 7		; RV32I-NEXT: addi a1, zero, 7
; RV32I-NEXT: bne a0, a1, .LBB0_3		; RV32I-NEXT: bne a0, a1, .LBB0_3
; RV32I-NEXT: # %bb.2: # %if.end		; RV32I-NEXT: # %bb.2: # %if.end
; RV32I-NEXT: mv a0, zero		; RV32I-NEXT: mv a0, zero
; RV32I-NEXT: ret		; RV32I-NEXT: ret
; RV32I-NEXT: .LBB0_3: # %if.then		; RV32I-NEXT: .LBB0_3: # %if.then
; RV32I-NEXT: addi a0, zero, 1		; RV32I-NEXT: addi a0, zero, 1
; RV32I-NEXT: ret		; RV32I-NEXT: ret
Show All 12 Lines	if.end:
ret i32 0		ret i32 0
}		}

@shorts = global [5 x i16] zeroinitializer, align 2		@shorts = global [5 x i16] zeroinitializer, align 2

define i32 @test_zext_i16() {		define i32 @test_zext_i16() {
; RV32I-LABEL: test_zext_i16:		; RV32I-LABEL: test_zext_i16:
; RV32I: # %bb.0: # %entry		; RV32I: # %bb.0: # %entry
; RV32I-NEXT: lui a0, 16		; RV32I-NEXT: lui a0, %hi(shorts)
; RV32I-NEXT: addi a0, a0, -120		; RV32I-NEXT: lui a1, 16
; RV32I-NEXT: lui a1, %hi(shorts)		; RV32I-NEXT: addi a1, a1, -120
; RV32I-NEXT: lhu a1, %lo(shorts)(a1)		; RV32I-NEXT: lhu a2, %lo(shorts)(a0)
; RV32I-NEXT: bne a1, a0, .LBB1_3		; RV32I-NEXT: bne a2, a1, .LBB1_3
; RV32I-NEXT: # %bb.1: # %entry		; RV32I-NEXT: # %bb.1: # %entry
; RV32I-NEXT: lui a0, %hi(shorts+2)		; RV32I-NEXT: addi a0, a0, %lo(shorts)
; RV32I-NEXT: lhu a0, %lo(shorts+2)(a0)		; RV32I-NEXT: lhu a0, 2(a0)
; RV32I-NEXT: addi a1, zero, 7		; RV32I-NEXT: addi a1, zero, 7
; RV32I-NEXT: bne a0, a1, .LBB1_3		; RV32I-NEXT: bne a0, a1, .LBB1_3
; RV32I-NEXT: # %bb.2: # %if.end		; RV32I-NEXT: # %bb.2: # %if.end
; RV32I-NEXT: mv a0, zero		; RV32I-NEXT: mv a0, zero
; RV32I-NEXT: ret		; RV32I-NEXT: ret
; RV32I-NEXT: .LBB1_3: # %if.then		; RV32I-NEXT: .LBB1_3: # %if.then
; RV32I-NEXT: addi a0, zero, 1		; RV32I-NEXT: addi a0, zero, 1
; RV32I-NEXT: ret		; RV32I-NEXT: ret
Show All 14 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Separate base from offset in lowerGlobalAddressClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 147211

lib/Target/RISCV/RISCVISelLowering.cpp

test/CodeGen/RISCV/byval.ll

test/CodeGen/RISCV/double-mem.ll

test/CodeGen/RISCV/float-mem.ll

test/CodeGen/RISCV/fp128.ll

test/CodeGen/RISCV/hoist-global-addr-base.ll

test/CodeGen/RISCV/mem.ll

test/CodeGen/RISCV/wide-mem.ll

test/CodeGen/RISCV/zext-with-load-is-free.ll

[RISCV] Separate base from offset in lowerGlobalAddress
ClosedPublic