This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/MC/
-
llvm/
-
MC/
-
MCInstPrinter.h
-
lib/Target/RISCV/MCTargetDesc/
-
Target/
-
RISCV/
-
MCTargetDesc/
-
RISCVInstPrinter.h
-
RISCVInstPrinter.cpp
-
test/MC/RISCV/
-
MC/
-
RISCV/
-
rvi-aliases-valid.s
-
tools/llvm-objdump/
-
llvm-objdump/
1
llvm-objdump.cpp

Differential D143345

[RFC][RISCV] Don't disassemble `addi`s with relocations as `mv`s
Needs ReviewPublic

Authored by luismarques on Feb 5 2023, 7:50 AM.

Download Raw Diff

Details

Reviewers

asb
reames
craig.topper
jrtc27
kito-cheng
jhenderson
MaskRay

Summary

A RISC-V mv rd, rs instruction is an alias/pseudoinstruction for addi rd, rs, 0.

When disassembling object files we'll commonly encounter many (arguably) spurious mvs: addis with unresolved relocations. The immediate operand isn't actually zero, though that is obscured by the relocation, so IMO it shouldn't be disassembled as a mv. I'd argue that even if such a relocation happens to resolve to zero it's still semantically not a mv, but that is a minor point.

This patch tries to address that situation by detecting when those addis have relocations and not printing them as mvs.

Please let me know if you agree with (1) my reasoning and (2) the overall approach of this patch.
If there is a rough agreement I'll further improve the patch, including optimizing the search for the relocations and not introducing a separate printInst method.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,080 ms	x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test
	60,060 ms	x64 debian > libFuzzer.libFuzzer::minimize_crash.test
	60,080 ms	x64 debian > libFuzzer.libFuzzer::out-of-process-fuzz.test
	60,060 ms	x64 debian > libFuzzer.libFuzzer::value-profile-load.test

Event Timeline

luismarques created this revision.Feb 5 2023, 7:50 AM

Herald added a reviewer: jhenderson. · View Herald TranscriptFeb 5 2023, 7:50 AM

Herald added a reviewer: MaskRay. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: luke, VincentWu, vkmr and 24 others. · View Herald Transcript

luismarques requested review of this revision.Feb 5 2023, 7:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2023, 7:50 AM

Herald added subscribers: llvm-commits, • pcwang-thead, eopXD. · View Herald Transcript

Harbormaster completed remote builds in B211936: Diff 494909.Feb 5 2023, 8:46 AM

craig.topper added inline comments.Feb 15 2023, 11:09 AM

llvm/tools/llvm-objdump/llvm-objdump.cpp
535	Do we have an llvm::find_if that uses ranges?

I think the reasoning makes sense to me. Do you also use -r when you disassemble object files? Or do you anticipate just knowing that an addi with 0 is some unrelocated address?

Basic motivation/direction here makes sense to me. +1 on the general approach.

In D143345#4129968, @craig.topper wrote:

I think the reasoning makes sense to me. Do you also use -r when you disassemble object files? Or do you anticipate just knowing that an addi with 0 is some unrelocated address?

I often use the -r option to determine the symbol addresses being computed (in non-trivial disassemblies) or as a sanity check to verify that some immediate isn't actually zero. Sometimes I don't bother because it makes the output noisy or I just forget.

I'm not sure I understand your second question, though (it's late here...). I assume it's more than just a curiosity and it has some relation to this patch but it's not clear to me what the implication is. Yes, without -r the disassemblies can be a bit puzzling at times, and you need to anticipate that some 0 immediates are misleading, and they won't actually be zero in the final binary. You have the same problem with mvs, just exacerbated because then you have to think that they aren't even true mv instructions at all. Is your point that when you do use -r it should be clear that the mvs aren't actually moves and the addis aren't actually zero? Apologies if I'm being dense.

In D143345#4130539, @luismarques wrote:

I'm not sure I understand your second question, though (it's late here...). I assume it's more than just a curiosity and it has some relation to this patch but it's not clear to me what the implication is. Yes, without -r the disassemblies can be a bit puzzling at times, and you need to anticipate that some 0 immediates are misleading, and they won't actually be zero in the final binary. You have the same problem with mvs, just exacerbated because then you have to think that they aren't even true mv instructions at all. Is your point that when you do use -r it should be clear that the mvs aren't actually moves and the addis aren't actually zero? Apologies if I'm being dense.

I assume it's the latter. It's true that with -r the spurious mvs are less of a problem, though I think it's still something to be avoided if we can.

In D143345#4130565, @luismarques wrote:

In D143345#4130539, @luismarques wrote:

I'm not sure I understand your second question, though (it's late here...). I assume it's more than just a curiosity and it has some relation to this patch but it's not clear to me what the implication is. Yes, without -r the disassemblies can be a bit puzzling at times, and you need to anticipate that some 0 immediates are misleading, and they won't actually be zero in the final binary. You have the same problem with mvs, just exacerbated because then you have to think that they aren't even true mv instructions at all. Is your point that when you do use -r it should be clear that the mvs aren't actually moves and the addis aren't actually zero? Apologies if I'm being dense.

I assume it's the latter. It's true that with -r the spurious mvs are less of a problem, though I think it's still something to be avoided if we can.

Yes it was the latter. I was just curious.

It'd be really cool if we could use the relocation to print %pcrel_lo(sym) instead of 0. But I don't know how much work that would be.

In D143345#4130585, @craig.topper wrote:

It'd be really cool if we could use the relocation to print %pcrel_lo(sym) instead of 0. But I don't know how much work that would be.

I can work on that if there is agreement that we should go in that direction. Makes sense to me, but I wonder if some people will push back against that, regardless of implementation concerns.

In D143345#4130588, @luismarques wrote:

In D143345#4130585, @craig.topper wrote:

It'd be really cool if we could use the relocation to print %pcrel_lo(sym) instead of 0. But I don't know how much work that would be.

I can work on that if there is agreement that we should go in that direction. Makes sense to me, but I wonder if some people will push back against that, regardless of implementation concerns.

From my perspective that would be cool and useful for all targets. I can’t think of a reason why one wouldn’t prefer that, beyond the complexity of having to synthesise labels when the assembler has converted them to section-relative.

Revision Contents

Path

Size

llvm/

include/

llvm/

MC/

MCInstPrinter.h

6 lines

lib/

Target/

RISCV/

MCTargetDesc/

RISCVInstPrinter.h

3 lines

RISCVInstPrinter.cpp

16 lines

test/

MC/

RISCV/

rvi-aliases-valid.s

6 lines

tools/

llvm-objdump/

llvm-objdump.cpp

7 lines

Diff 494909

llvm/include/llvm/MC/MCInstPrinter.h

Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	public:
/// \p Address the address of current instruction on most targets, used to		/// \p Address the address of current instruction on most targets, used to
/// print a PC relative immediate as the target address. On targets where a PC		/// print a PC relative immediate as the target address. On targets where a PC
/// relative immediate is relative to the next instruction and the length of a		/// relative immediate is relative to the next instruction and the length of a
/// MCInst is difficult to measure (e.g. x86), this is the address of the next		/// MCInst is difficult to measure (e.g. x86), this is the address of the next
/// instruction. If Address is 0, the immediate will be printed.		/// instruction. If Address is 0, the immediate will be printed.
virtual void printInst(const MCInst *MI, uint64_t Address, StringRef Annot,		virtual void printInst(const MCInst *MI, uint64_t Address, StringRef Annot,
const MCSubtargetInfo &STI, raw_ostream &OS) = 0;		const MCSubtargetInfo &STI, raw_ostream &OS) = 0;

		virtual void printInst(const MCInst *MI, uint64_t Address, StringRef Annot,
		const MCSubtargetInfo &STI, raw_ostream &OS,
		bool HasRel) {
		printInst(MI, Address, Annot, STI, OS);
		}

/// Return the name of the specified opcode enum (e.g. "MOV32ri") or		/// Return the name of the specified opcode enum (e.g. "MOV32ri") or
/// empty if we can't resolve it.		/// empty if we can't resolve it.
StringRef getOpcodeName(unsigned Opcode) const;		StringRef getOpcodeName(unsigned Opcode) const;

/// Print the assembler register name.		/// Print the assembler register name.
virtual void printRegName(raw_ostream &OS, MCRegister Reg) const;		virtual void printRegName(raw_ostream &OS, MCRegister Reg) const;

bool getUseMarkup() const { return UseMarkup; }		bool getUseMarkup() const { return UseMarkup; }
▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/MCTargetDesc/RISCVInstPrinter.h

Show All 22 Lines	public:
RISCVInstPrinter(const MCAsmInfo &MAI, const MCInstrInfo &MII,		RISCVInstPrinter(const MCAsmInfo &MAI, const MCInstrInfo &MII,
const MCRegisterInfo &MRI)		const MCRegisterInfo &MRI)
: MCInstPrinter(MAI, MII, MRI) {}		: MCInstPrinter(MAI, MII, MRI) {}

bool applyTargetSpecificCLOption(StringRef Opt) override;		bool applyTargetSpecificCLOption(StringRef Opt) override;

void printInst(const MCInst *MI, uint64_t Address, StringRef Annot,		void printInst(const MCInst *MI, uint64_t Address, StringRef Annot,
const MCSubtargetInfo &STI, raw_ostream &O) override;		const MCSubtargetInfo &STI, raw_ostream &O) override;
		void printInst(const MCInst *MI, uint64_t Address, StringRef Annot,
		const MCSubtargetInfo &STI, raw_ostream &O,
		bool HasRel) override;
void printRegName(raw_ostream &O, MCRegister Reg) const override;		void printRegName(raw_ostream &O, MCRegister Reg) const override;

void printOperand(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,		void printOperand(const MCInst *MI, unsigned OpNo, const MCSubtargetInfo &STI,
raw_ostream &O, const char *Modifier = nullptr);		raw_ostream &O, const char *Modifier = nullptr);
void printBranchOperand(const MCInst *MI, uint64_t Address, unsigned OpNo,		void printBranchOperand(const MCInst *MI, uint64_t Address, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O);		const MCSubtargetInfo &STI, raw_ostream &O);
void printCSRSystemRegister(const MCInst *MI, unsigned OpNo,		void printCSRSystemRegister(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O);		const MCSubtargetInfo &STI, raw_ostream &O);
Show All 26 Lines

llvm/lib/Target/RISCV/MCTargetDesc/RISCVInstPrinter.cpp

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	if (Opt == "numeric") {
return true;		return true;
}		}

return false;		return false;
}		}

void RISCVInstPrinter::printInst(const MCInst *MI, uint64_t Address,		void RISCVInstPrinter::printInst(const MCInst *MI, uint64_t Address,
StringRef Annot, const MCSubtargetInfo &STI,		StringRef Annot, const MCSubtargetInfo &STI,
raw_ostream &O) {		raw_ostream &O, bool HasRel) {
bool Res = false;		bool Res = false;
const MCInst *NewMI = MI;		const MCInst *NewMI = MI;
MCInst UncompressedMI;		MCInst UncompressedMI;
if (PrintAliases && !NoAliases)		if (PrintAliases && !NoAliases)
Res = RISCVRVC::uncompress(UncompressedMI, *MI, STI);		Res = RISCVRVC::uncompress(UncompressedMI, *MI, STI);
if (Res)		if (Res)
NewMI = const_cast<MCInst *>(&UncompressedMI);		NewMI = const_cast<MCInst *>(&UncompressedMI);
		if (MI->getOpcode() == RISCV::ADDI) {
		auto Rs = MI->getOperand(1);
		if (Rs.getReg() != RISCV::X0) { // neither a "nop" nor a "li".
		auto Imm = MI->getOperand(2);
		if (Imm.isImm() && Imm.getImm() == 0 && HasRel)
		NoAliases = true; // don't print the "addi" of a "la" as "mv".
		}
		}
if (!PrintAliases \|\| NoAliases \|\| !printAliasInstr(NewMI, Address, STI, O))		if (!PrintAliases \|\| NoAliases \|\| !printAliasInstr(NewMI, Address, STI, O))
printInstruction(NewMI, Address, STI, O);		printInstruction(NewMI, Address, STI, O);
printAnnotation(O, Annot);		printAnnotation(O, Annot);
}		}

		void RISCVInstPrinter::printInst(const MCInst *MI, uint64_t Address,
		StringRef Annot, const MCSubtargetInfo &STI,
		raw_ostream &O) {
		printInst(MI, Address, Annot, STI, O, false);
		}

void RISCVInstPrinter::printRegName(raw_ostream &O, MCRegister Reg) const {		void RISCVInstPrinter::printRegName(raw_ostream &O, MCRegister Reg) const {
O << getRegisterName(Reg);		O << getRegisterName(Reg);
}		}

void RISCVInstPrinter::printOperand(const MCInst *MI, unsigned OpNo,		void RISCVInstPrinter::printOperand(const MCInst *MI, unsigned OpNo,
const MCSubtargetInfo &STI, raw_ostream &O,		const MCSubtargetInfo &STI, raw_ostream &O,
const char *Modifier) {		const char *Modifier) {
assert((Modifier == nullptr \|\| Modifier[0] == 0) && "No modifiers supported");		assert((Modifier == nullptr \|\| Modifier[0] == 0) && "No modifiers supported");
▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/test/MC/RISCV/rvi-aliases-valid.s

	Show First 20 Lines • Show All 288 Lines • ▼ Show 20 Lines

	# CHECK-S-OBJ-NOALIAS: ebreak			# CHECK-S-OBJ-NOALIAS: ebreak
	# CHECK-S-OBJ: ebreak			# CHECK-S-OBJ: ebreak
	sbreak			sbreak

	# CHECK-S-OBJ-NOALIAS: ecall			# CHECK-S-OBJ-NOALIAS: ecall
	# CHECK-S-OBJ: ecall			# CHECK-S-OBJ: ecall
	scall			scall

				.option relax
				1: auipc t1, %pcrel_hi(1b)
				# CHECK-OBJ-NOALIAS: addi t1, t1, 0
				# CHECK-OBJ: addi t1, t1, 0
				addi t1, t1, %pcrel_lo(1b)

llvm/tools/llvm-objdump/llvm-objdump.cpp

Show First 20 Lines • Show All 522 Lines • ▼ Show 20 Lines	printInst(MCInstPrinter &IP, const MCInst *MI, ArrayRef<uint8_t> Bytes,

if (MI) {		if (MI) {
// See MCInstPrinter::printInst. On targets where a PC relative immediate		// See MCInstPrinter::printInst. On targets where a PC relative immediate
// is relative to the next instruction and the length of a MCInst is		// is relative to the next instruction and the length of a MCInst is
// difficult to measure (x86), this is the address of the next		// difficult to measure (x86), this is the address of the next
// instruction.		// instruction.
uint64_t Addr =		uint64_t Addr =
Address.Address + (STI.getTargetTriple().isX86() ? Bytes.size() : 0);		Address.Address + (STI.getTargetTriple().isX86() ? Bytes.size() : 0);
IP.printInst(MI, Addr, "", STI, OS);		auto is_rel = [=](RelocationRef Rel) { return Rel.getOffset() == Addr; };
		// TODO: if people agree with the overall approach of this patch then the
		// search for a relocation will be optimized to avoid duplicate work.
		bool HasRel =
		std::find_if(Rels->begin(), Rels->end(), is_rel) != Rels->end();
		craig.topperUnsubmitted Not Done Reply Inline Actions Do we have an llvm::find_if that uses ranges? craig.topper: Do we have an llvm::find_if that uses ranges?
		IP.printInst(MI, Addr, "", STI, OS, HasRel);
} else		} else
OS << "\t<unknown>";		OS << "\t<unknown>";
}		}
};		};
PrettyPrinter PrettyPrinterInst;		PrettyPrinter PrettyPrinterInst;

class HexagonPrettyPrinter : public PrettyPrinter {		class HexagonPrettyPrinter : public PrettyPrinter {
public:		public:
▲ Show 20 Lines • Show All 2,716 Lines • Show Last 20 Lines