This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Generate out-of-line jump tables for XO without 32-bit branch
ClosedPublic

Authored by john.brawn on Jun 26 2023, 7:59 AM.

Download Raw Diff

Details

Reviewers

stuij
keith.walker.arm
simonwallis2
t.p.northover
efriedma

Commits

rG4fb0e0114f62: [ARM] Generate out-of-line jump tables for XO without 32-bit branch

Summary

When we only have a 16-bit pc-relative branch instruction we generate a table of address for a jump table. Currently this is placed inline, but this won't work with execute-only memory. In this case generate the jump table out-of-line.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

john.brawn created this revision.Jun 26 2023, 7:59 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 26 2023, 7:59 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

john.brawn requested review of this revision.Jun 26 2023, 7:59 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 26 2023, 7:59 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B241180: Diff 534552.Jun 26 2023, 8:42 AM

I'd prefer if the Thumb1 and Thumb2 codepaths work as close to the same way as possible; this looks significantly different. (Thumb2 execute-only has basically the same issues as Thumb1 execute-only, as far as I can tell; we only get away with "loading" from the text segment if we can form tbb/tbh.)

(Do we care about execute-only PIC?)

llvm/lib/Target/ARM/ARMISelLowering.cpp
3498	What relevant instructions does v8m.base have?

In D153774#4449785, @efriedma wrote:

I'd prefer if the Thumb1 and Thumb2 codepaths work as close to the same way as possible; this looks significantly different. (Thumb2 execute-only has basically the same issues as Thumb1 execute-only, as far as I can tell; we only get away with "loading" from the text segment if we can form tbb/tbh.)

The thumb1 and thumb2 handling of jump tables is already different, and the only difference of execute-only thumb1 from non-execute-only is the location of the jump table (inline vs out-of-line).

(Do we care about execute-only PIC?)

No.

llvm/lib/Target/ARM/ARMISelLowering.cpp
3498	t2B, the 32-bit direct branch instruction.

Oh, I didn't remember JUMPTABLE_INSTS was a thing. This is making more sense now; Thumb2/armv8m.base defaults to jumping to a table of jumps, which works fine with execute-only, but Thumb1 (and apparently ARM-mode) loads offsets from a table instead, which doesn't work with execute-only. So to make Thumb1 work, you force the table to be emitted out-of-line.

That makes sense, but depending how much we care, there are probably better sequences. We can use "bl" as a 32-bit branch on Thumb1, like we do in other places. And we can then compress those 32-bit branches to 16-bit branches if the offsets aren't too big.

If you just care about getting it working, this is okay as-is. LGTM

This revision is now accepted and ready to land.Jun 27 2023, 12:21 PM

This revision was landed with ongoing or failed builds.Jun 28 2023, 5:31 AM

Closed by commit rG4fb0e0114f62: [ARM] Generate out-of-line jump tables for XO without 32-bit branch (authored by john.brawn). · Explain Why

This revision was automatically updated to reflect the committed changes.

john.brawn added a commit: rG4fb0e0114f62: [ARM] Generate out-of-line jump tables for XO without 32-bit branch.

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

ARMConstantIslandPass.cpp

15 lines

ARMExpandPseudoInsts.cpp

21 lines

ARMISelLowering.cpp

5 lines

test/

CodeGen/

ARM/

execute-only.ll

21 lines

Diff 535342

llvm/lib/Target/ARM/ARMConstantIslandPass.cpp

Show First 20 Lines • Show All 599 Lines • ▼ Show 20 Lines
/// jumps to begin with. In almost all cases they'll never be moved from that		/// jumps to begin with. In almost all cases they'll never be moved from that
/// position.		/// position.
void ARMConstantIslands::doInitialJumpTablePlacement(		void ARMConstantIslands::doInitialJumpTablePlacement(
std::vector<MachineInstr *> &CPEMIs) {		std::vector<MachineInstr *> &CPEMIs) {
unsigned i = CPEntries.size();		unsigned i = CPEntries.size();
auto MJTI = MF->getJumpTableInfo();		auto MJTI = MF->getJumpTableInfo();
const std::vector<MachineJumpTableEntry> &JT = MJTI->getJumpTables();		const std::vector<MachineJumpTableEntry> &JT = MJTI->getJumpTables();

		// Only inline jump tables are placed in the function.
		if (MJTI->getEntryKind() != MachineJumpTableInfo::EK_Inline)
		return;

MachineBasicBlock *LastCorrectlyNumberedBB = nullptr;		MachineBasicBlock *LastCorrectlyNumberedBB = nullptr;
for (MachineBasicBlock &MBB : *MF) {		for (MachineBasicBlock &MBB : *MF) {
auto MI = MBB.getLastNonDebugInstr();		auto MI = MBB.getLastNonDebugInstr();
// Look past potential SpeculationBarriers at end of BB.		// Look past potential SpeculationBarriers at end of BB.
while (MI != MBB.end() &&		while (MI != MBB.end() &&
(isSpeculationBarrierEndBBOpcode(MI->getOpcode()) \|\|		(isSpeculationBarrierEndBBOpcode(MI->getOpcode()) \|\|
MI->isDebugInstr()))		MI->isDebugInstr()))
--MI;		--MI;
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	initializeFunctionInfo(const std::vector<MachineInstr*> &CPEMIs) {
BBInfoVector &BBInfo = BBUtils->getBBInfo();		BBInfoVector &BBInfo = BBUtils->getBBInfo();
// The known bits of the entry block offset are determined by the function		// The known bits of the entry block offset are determined by the function
// alignment.		// alignment.
BBInfo.front().KnownBits = Log2(MF->getAlignment());		BBInfo.front().KnownBits = Log2(MF->getAlignment());

// Compute block offsets and known bits.		// Compute block offsets and known bits.
BBUtils->adjustBBOffsetsAfter(&MF->front());		BBUtils->adjustBBOffsetsAfter(&MF->front());

		// We only care about jump table instructions when jump tables are inline.
		MachineJumpTableInfo *MJTI = MF->getJumpTableInfo();
		bool InlineJumpTables =
		MJTI && MJTI->getEntryKind() == MachineJumpTableInfo::EK_Inline;

// Now go back through the instructions and build up our data structures.		// Now go back through the instructions and build up our data structures.
for (MachineBasicBlock &MBB : *MF) {		for (MachineBasicBlock &MBB : *MF) {
// If this block doesn't fall through into the next MBB, then this is		// If this block doesn't fall through into the next MBB, then this is
// 'water' that a constant pool island could be placed.		// 'water' that a constant pool island could be placed.
if (!BBHasFallthrough(&MBB))		if (!BBHasFallthrough(&MBB))
WaterList.push_back(&MBB);		WaterList.push_back(&MBB);

for (MachineInstr &I : MBB) {		for (MachineInstr &I : MBB) {
if (I.isDebugInstr())		if (I.isDebugInstr())
continue;		continue;

unsigned Opc = I.getOpcode();		unsigned Opc = I.getOpcode();
if (I.isBranch()) {		if (I.isBranch()) {
bool isCond = false;		bool isCond = false;
unsigned Bits = 0;		unsigned Bits = 0;
unsigned Scale = 1;		unsigned Scale = 1;
int UOpc = Opc;		int UOpc = Opc;
switch (Opc) {		switch (Opc) {
default:		default:
continue; // Ignore other JT branches		continue; // Ignore other JT branches
case ARM::t2BR_JT:		case ARM::t2BR_JT:
case ARM::tBR_JTr:		case ARM::tBR_JTr:
		if (InlineJumpTables)
T2JumpTables.push_back(&I);		T2JumpTables.push_back(&I);
continue; // Does not get an entry in ImmBranches		continue; // Does not get an entry in ImmBranches
case ARM::Bcc:		case ARM::Bcc:
isCond = true;		isCond = true;
UOpc = ARM::B;		UOpc = ARM::B;
[[fallthrough]];		[[fallthrough]];
case ARM::B:		case ARM::B:
Bits = 24;		Bits = 24;
Scale = 4;		Scale = 4;
Show All 30 Lines	for (MachineInstr &I : MBB) {

if (Opc == ARM::CONSTPOOL_ENTRY \|\| Opc == ARM::JUMPTABLE_ADDRS \|\|		if (Opc == ARM::CONSTPOOL_ENTRY \|\| Opc == ARM::JUMPTABLE_ADDRS \|\|
Opc == ARM::JUMPTABLE_INSTS \|\| Opc == ARM::JUMPTABLE_TBB \|\|		Opc == ARM::JUMPTABLE_INSTS \|\| Opc == ARM::JUMPTABLE_TBB \|\|
Opc == ARM::JUMPTABLE_TBH)		Opc == ARM::JUMPTABLE_TBH)
continue;		continue;

// Scan the instructions for constant pool operands.		// Scan the instructions for constant pool operands.
for (unsigned op = 0, e = I.getNumOperands(); op != e; ++op)		for (unsigned op = 0, e = I.getNumOperands(); op != e; ++op)
if (I.getOperand(op).isCPI() \|\| I.getOperand(op).isJTI()) {		if (I.getOperand(op).isCPI() \|\|
		(I.getOperand(op).isJTI() && InlineJumpTables)) {
// We found one. The addressing mode tells us the max displacement		// We found one. The addressing mode tells us the max displacement
// from the PC that this instruction permits.		// from the PC that this instruction permits.

// Basic size info comes from the TSFlags field.		// Basic size info comes from the TSFlags field.
unsigned Bits = 0;		unsigned Bits = 0;
unsigned Scale = 1;		unsigned Scale = 1;
bool NegOk = false;		bool NegOk = false;
bool IsSoImm = false;		bool IsSoImm = false;
▲ Show 20 Lines • Show All 1,666 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMExpandPseudoInsts.cpp

Show All 17 Lines
#include "ARMBaseRegisterInfo.h"		#include "ARMBaseRegisterInfo.h"
#include "ARMConstantPoolValue.h"		#include "ARMConstantPoolValue.h"
#include "ARMMachineFunctionInfo.h"		#include "ARMMachineFunctionInfo.h"
#include "ARMSubtarget.h"		#include "ARMSubtarget.h"
#include "MCTargetDesc/ARMAddressingModes.h"		#include "MCTargetDesc/ARMAddressingModes.h"
#include "llvm/CodeGen/LivePhysRegs.h"		#include "llvm/CodeGen/LivePhysRegs.h"
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
		#include "llvm/CodeGen/MachineJumpTableInfo.h"
#include "llvm/MC/MCAsmInfo.h"		#include "llvm/MC/MCAsmInfo.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "arm-pseudo"		#define DEBUG_TYPE "arm-pseudo"

static cl::opt<bool>		static cl::opt<bool>
▲ Show 20 Lines • Show All 1,002 Lines • ▼ Show 20 Lines	case MachineOperand::MO_ExternalSymbol: {
const char *ES = MO.getSymbolName();		const char *ES = MO.getSymbolName();
unsigned TF = MO.getTargetFlags();		unsigned TF = MO.getTargetFlags();
Upper8_15 = Upper8_15.addExternalSymbol(ES, TF \| ARMII::MO_HI_8_15);		Upper8_15 = Upper8_15.addExternalSymbol(ES, TF \| ARMII::MO_HI_8_15);
Upper0_7 = Upper0_7.addExternalSymbol(ES, TF \| ARMII::MO_HI_0_7);		Upper0_7 = Upper0_7.addExternalSymbol(ES, TF \| ARMII::MO_HI_0_7);
Lower8_15 = Lower8_15.addExternalSymbol(ES, TF \| ARMII::MO_LO_8_15);		Lower8_15 = Lower8_15.addExternalSymbol(ES, TF \| ARMII::MO_LO_8_15);
Lower0_7 = Lower0_7.addExternalSymbol(ES, TF \| ARMII::MO_LO_0_7);		Lower0_7 = Lower0_7.addExternalSymbol(ES, TF \| ARMII::MO_LO_0_7);
break;		break;
}		}
		case MachineOperand::MO_JumpTableIndex: {
		unsigned Idx = MO.getIndex();
		unsigned TF = MO.getTargetFlags();
		Upper8_15 = Upper8_15.addJumpTableIndex(Idx, TF \| ARMII::MO_HI_8_15);
		Upper0_7 = Upper0_7.addJumpTableIndex(Idx, TF \| ARMII::MO_HI_0_7);
		Lower8_15 = Lower8_15.addJumpTableIndex(Idx, TF \| ARMII::MO_LO_8_15);
		Lower0_7 = Lower0_7.addJumpTableIndex(Idx, TF \| ARMII::MO_LO_0_7);
		break;
		}
default: {		default: {
const GlobalValue *GV = MO.getGlobal();		const GlobalValue *GV = MO.getGlobal();
unsigned TF = MO.getTargetFlags();		unsigned TF = MO.getTargetFlags();
Upper8_15 =		Upper8_15 =
Upper8_15.addGlobalAddress(GV, MO.getOffset(), TF \| ARMII::MO_HI_8_15);		Upper8_15.addGlobalAddress(GV, MO.getOffset(), TF \| ARMII::MO_HI_8_15);
Upper0_7 =		Upper0_7 =
Upper0_7.addGlobalAddress(GV, MO.getOffset(), TF \| ARMII::MO_HI_0_7);		Upper0_7.addGlobalAddress(GV, MO.getOffset(), TF \| ARMII::MO_HI_0_7);
Lower8_15 =		Lower8_15 =
▲ Show 20 Lines • Show All 1,707 Lines • ▼ Show 20 Lines	switch (Opcode) {
case ARM::t2MOVCCi32imm:		case ARM::t2MOVCCi32imm:
ExpandMOV32BitImm(MBB, MBBI);		ExpandMOV32BitImm(MBB, MBBI);
return true;		return true;

case ARM::tMOVi32imm:		case ARM::tMOVi32imm:
ExpandTMOV32BitImm(MBB, MBBI);		ExpandTMOV32BitImm(MBB, MBBI);
return true;		return true;

		case ARM::tLEApcrelJT:
		// Inline jump tables are handled in ARMAsmPrinter.
		if (MI.getMF()->getJumpTableInfo()->getEntryKind() ==
		MachineJumpTableInfo::EK_Inline)
		return false;

		// Use a 32-bit immediate move to generate the address of the jump table.
		assert(STI->isThumb() && "Non-inline jump tables expected only in thumb");
		ExpandTMOV32BitImm(MBB, MBBI);
		return true;

case ARM::SUBS_PC_LR: {		case ARM::SUBS_PC_LR: {
MachineInstrBuilder MIB =		MachineInstrBuilder MIB =
BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(ARM::SUBri), ARM::PC)		BuildMI(MBB, MBBI, MI.getDebugLoc(), TII->get(ARM::SUBri), ARM::PC)
.addReg(ARM::LR)		.addReg(ARM::LR)
.add(MI.getOperand(0))		.add(MI.getOperand(0))
.add(MI.getOperand(1))		.add(MI.getOperand(1))
.add(MI.getOperand(2))		.add(MI.getOperand(2))
.addReg(ARM::CPSR, RegState::Undef);		.addReg(ARM::CPSR, RegState::Undef);
▲ Show 20 Lines • Show All 509 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,478 Lines • ▼ Show 20 Lines	if (CP->isMachineConstantPoolEntry())
Res =		Res =
DAG.getTargetConstantPool(CP->getMachineCPVal(), PtrVT, CPAlign);		DAG.getTargetConstantPool(CP->getMachineCPVal(), PtrVT, CPAlign);
else		else
Res = DAG.getTargetConstantPool(CP->getConstVal(), PtrVT, CPAlign);		Res = DAG.getTargetConstantPool(CP->getConstVal(), PtrVT, CPAlign);
return DAG.getNode(ARMISD::Wrapper, dl, MVT::i32, Res);		return DAG.getNode(ARMISD::Wrapper, dl, MVT::i32, Res);
}		}

unsigned ARMTargetLowering::getJumpTableEncoding() const {		unsigned ARMTargetLowering::getJumpTableEncoding() const {
		// If we don't have a 32-bit pc-relative branch instruction then the jump
		// table consists of block addresses. Usually this is inline, but for
		// execute-only it must be placed out-of-line.
		if (Subtarget->genExecuteOnly() && !Subtarget->hasV8MBaselineOps())
		return MachineJumpTableInfo::EK_BlockAddress;
return MachineJumpTableInfo::EK_Inline;		return MachineJumpTableInfo::EK_Inline;
}		}

SDValue ARMTargetLowering::LowerBlockAddress(SDValue Op,		SDValue ARMTargetLowering::LowerBlockAddress(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();		ARMFunctionInfo *AFI = MF.getInfo<ARMFunctionInfo>();
		efriedmaUnsubmitted Not Done Reply Inline Actions What relevant instructions does v8m.base have? efriedma: What relevant instructions does v8m.base have?
		john.brawnAuthorUnsubmitted Done Reply Inline Actions t2B, the 32-bit direct branch instruction. john.brawn: t2B, the 32-bit direct branch instruction.
unsigned ARMPCLabelIndex = 0;		unsigned ARMPCLabelIndex = 0;
SDLoc DL(Op);		SDLoc DL(Op);
EVT PtrVT = getPointerTy(DAG.getDataLayout());		EVT PtrVT = getPointerTy(DAG.getDataLayout());
const BlockAddress *BA = cast<BlockAddressSDNode>(Op)->getBlockAddress();		const BlockAddress *BA = cast<BlockAddressSDNode>(Op)->getBlockAddress();
SDValue CPAddr;		SDValue CPAddr;
bool IsPositionIndependent = isPositionIndependent() \|\| Subtarget->isROPI();		bool IsPositionIndependent = isPositionIndependent() \|\| Subtarget->isROPI();
if (!IsPositionIndependent) {		if (!IsPositionIndependent) {
CPAddr = DAG.getTargetConstantPool(BA, PtrVT, Align(4));		CPAddr = DAG.getTargetConstantPool(BA, PtrVT, Align(4));
▲ Show 20 Lines • Show All 18,627 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/execute-only.ll

	Show All 39 Lines
	; CHECK-LABEL: .LJTI1_0:			; CHECK-LABEL: .LJTI1_0:
	; CHECK-NEXT: b.w			; CHECK-NEXT: b.w
	; CHECK-NEXT: b.w			; CHECK-NEXT: b.w
	; CHECK-NEXT: b.w			; CHECK-NEXT: b.w
	; CHECK-NEXT: b.w			; CHECK-NEXT: b.w
	; CHECK-NEXT: b.w			; CHECK-NEXT: b.w
	; CHECK-NEXT: b.w			; CHECK-NEXT: b.w

				; CHECK-T1-LABEL: jump_table:
				; CHECK-T1: lsls [[REG_OFFSET:r[0-9]+]], {{r[0-9]+}}, #2
				; CHECK-T1-NEXT: movs [[REG_JT:r[0-9]+]], :upper8_15:.LJTI1_0
				; CHECK-T1-NEXT: lsls [[REG_JT]], [[REG_JT]], #8
				; CHECK-T1-NEXT: adds [[REG_JT]], :upper0_7:.LJTI1_0
				; CHECK-T1-NEXT: lsls [[REG_JT]], [[REG_JT]], #8
				; CHECK-T1-NEXT: adds [[REG_JT]], :lower8_15:.LJTI1_0
				; CHECK-T1-NEXT: lsls [[REG_JT]], [[REG_JT]], #8
				; CHECK-T1-NEXT: adds [[REG_JT]], :lower0_7:.LJTI1_0
				; CHECK-T1-NEXT: ldr [[REG_ENTRY:r[0-9]+]], [[[REG_JT]], [[REG_OFFSET]]]
				; CHECK-T1-NEXT: mov pc, [[REG_ENTRY]]
				; CHECK-T1: .section .rodata,"a",%progbits
				; CHECK-T1-NEXT: .p2align 2, 0x0
				; CHECK-T1-NEXT: .LJTI1_0:
				; CHECK-T1-NEXT: .long
				; CHECK-T1-NEXT: .long
				; CHECK-T1-NEXT: .long
				; CHECK-T1-NEXT: .long
				; CHECK-T1-NEXT: .long
				; CHECK-T1-NEXT: .long

	entry:			entry:
	switch i32 %c, label %return [			switch i32 %c, label %return [
	i32 1, label %sw.bb			i32 1, label %sw.bb
	i32 2, label %sw.bb1			i32 2, label %sw.bb1
	i32 3, label %sw.bb3			i32 3, label %sw.bb3
	i32 4, label %sw.bb4			i32 4, label %sw.bb4
	i32 5, label %sw.bb6			i32 5, label %sw.bb6
	i32 6, label %sw.bb8			i32 6, label %sw.bb8
	▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines