This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/MC/
-
llvm/
-
MC/
2
MCInstrAnalysis.h
-
lib/Target/RISCV/MCTargetDesc/
-
Target/
-
RISCV/
-
MCTargetDesc/
-
RISCVMCTargetDesc.cpp
-
test/MC/
-
MC/
-
Disassembler/RISCV/
-
RISCV/
-
branch-targets.txt
-
RISCV/
-
rv64-relax-all.s
-
tools/llvm-objdump/
-
llvm-objdump/
1
llvm-objdump.cpp

Differential D78702

[RFC][RISCV][MC/Objdump] Extend llvm-objdump output to support more instruction patterns
Needs RevisionPublic

Authored by simoncook on Apr 23 2020, 4:49 AM.

Download Raw Diff

Details

Reviewers

asb
edward-jones
jhenderson
MaskRay
lenary

Summary

This extends llvm-objdump and MCInstrAnalysis to be able to do more in
depth analysis of instructions which cannot be done by looking at
just a single instruction. I would like some feedback on the approach
taken so far, whether this is suitable or if some parts should be
moved or changed into more generic interfaces.

Currently MCInstrAnalysis has the evaluateBranch method which is given
a single MCInst that is known to be a branch will return if possible
the target address of that instruction, with no context other than the
current PC.

In GNU objdump, the RISC-V disassembler can determine the address of
more instructions, for example calculating branch/call targets built
over a number of instructions, and the addresses of targets of load
and store instruction, both immediates and those which are GP-relative.

To add such functionality, I have extended MCInstrAnalysis with a new
evaluateInst function, which provides the same functionality as
evaluateBranch, but has the ability to store information across multiple
instructions allowing these patterns to be picked up. The default
implementation does the same check llvm-objdump does before calling
evaluateInst, so is a NFC change for other targets, but for RISC-V
does a more detailed analysis. This allows evaluateInst to be safe to
call on all instructions and thus is called for each instruction.

In the RISC-V implementation there is a cache of the current state of
all known registers, and upon evaluating an instruction, updates the
known GPR state in case a future instruction needs that value. To avoid
mis-identifying addresses, upon a change of control flow the cache is
invalidated, except for if GP is defined (by the __global_pointer$
symbol, in which case it is reset to the known value).

For this to work there have had to be a couple of other changes to
llvm-objdump that are target specific, but would be good to make
generic. These are:

The MCInstrAnalysis object is no longer const, should this analysis

belong in this class (which given the evaluateBranch call, I think
makes sense)

There is a resetAnalysis hook which provides an additional signal

that control flow has changed, I think as a "new symbol/file"
indicator this should be fine.

I've added a address printout after the instruction just for RISC-V;

this matches GNU objdump's behaviour, but does not necessarily have
to land with the rest of the patch.

I do an explicit check for __global_pointer$ and pass this through

to MCInstrAnalysis since it has no concept of anything more than
MCInsts, there may be scope for making this more generic.

Since I haven't written tests yet, I can give an example of before/
after with this patch:

00010074 <main>:
   10074: 63 15 05 00                   bne     a0, zero, 10 <main+0xa>
   10078: 03 a5 c1 8f                   lw      a0, -1796(gp)
   1007c: 82 80                         c.jr    ra
   1007e: 23 ae a1 8e                   sw      a0, -1796(gp)
   10082: 7d 15                         c.addi  a0, -1
   10084: c5 bf                         c.j     -16 <main>

00010090 <_start>:
   10090: 97 21 00 00                   auipc   gp, 2
   10094: 93 81 41 ba                   addi    gp, gp, -1116
   10098: 17 15 00 00                   auipc   a0, 1
   1009c: 13 05 85 49                   addi    a0, a0, 1176
   100a0: 17 16 00 00                   auipc   a2, 1
   100a4: 13 06 06 4b                   addi    a2, a2, 1200
   100a8: 09 8e                         c.sub   a2, a0
   100aa: 81 45                         c.li    a1, 0
   100ac: 97 00 00 00                   auipc   ra, 0
   100b0: e7 80 20 1e                   jalr    ra, 482(ra)
   100b4: 17 05 00 00                   auipc   a0, 0
   100b8: 13 05 85 13                   addi    a0, a0, 312
   100bc: 97 00 00 00                   auipc   ra, 0
   100c0: e7 80 40 0f                   jalr    ra, 244(ra)
   100c4: 97 00 00 00                   auipc   ra, 0
   100c8: e7 80 40 16                   jalr    ra, 356(ra)
   100cc: 02 45                         c.lwsp  a0, 0(sp)
   100ce: 4c 00                         c.addi4spn      a1, sp, 4
   100d0: 01 46                         c.li    a2, 0
   100d2: 97 00 00 00                   auipc   ra, 0
   100d6: e7 80 20 fa                   jalr    ra, -94(ra)
   100da: 17 03 00 00                   auipc   t1, 0
   100de: 67 00 63 0e                   jalr    zero, 230(t1)

00010074 <main>:
   10074: 63 15 05 00                   bne     a0, zero, 10 #1007e <main+0xa>
   10078: 03 a5 c1 8f                   lw      a0, -1796(gp) #11490 <__bss_start>
   1007c: 82 80                         c.jr    ra
   1007e: 23 ae a1 8e                   sw      a0, -1796(gp) #11490 <__bss_start>
   10082: 7d 15                         c.addi  a0, -1
   10084: c5 bf                         c.j     -16 #10074 <main>

00010090 <_start>:
   10090: 97 21 00 00                   auipc   gp, 2
   10094: 93 81 41 ba                   addi    gp, gp, -1116 #11c34 <_end+0x6e4>
   10098: 17 15 00 00                   auipc   a0, 1
   1009c: 13 05 85 49                   addi    a0, a0, 1176 #11530 <__bss_start>
   100a0: 17 16 00 00                   auipc   a2, 1
   100a4: 13 06 06 4b                   addi    a2, a2, 1200 #11550 <_end>
   100a8: 09 8e                         c.sub   a2, a0
   100aa: 81 45                         c.li    a1, 0
   100ac: 97 00 00 00                   auipc   ra, 0
   100b0: e7 80 20 1e                   jalr    ra, 482(ra) #1028e <memset>
   100b4: 17 05 00 00                   auipc   a0, 0
   100b8: 13 05 85 13                   addi    a0, a0, 312 #101ec <__libc_fini_array>
   100bc: 97 00 00 00                   auipc   ra, 0
   100c0: e7 80 40 0f                   jalr    ra, 244(ra) #101b0 <atexit>
   100c4: 97 00 00 00                   auipc   ra, 0
   100c8: e7 80 40 16                   jalr    ra, 356(ra) #10228 <__libc_init_array>
   100cc: 02 45                         c.lwsp  a0, 0(sp)
   100ce: 4c 00                         c.addi4spn      a1, sp, 4
   100d0: 01 46                         c.li    a2, 0
   100d2: 97 00 00 00                   auipc   ra, 0
   100d6: e7 80 20 fa                   jalr    ra, -94(ra) #10074 <main>
   100da: 17 03 00 00                   auipc   t1, 0
   100de: 67 00 63 0e                   jalr    zero, 230(t1) #101c0 <exit>

Any other comments/suggestions on how to implement this are welcome.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	12,330 ms	lldb-api.commands/expression/import-std-module/basic::Unknown Unit Message ("")
	3,930 ms	lldb-api.commands/expression/import-std-module/conflicts::Unknown Unit Message ("")
	6,240 ms	lldb-api.commands/expression/import-std-module/deque-basic::Unknown Unit Message ("")
	6,200 ms	lldb-api.commands/expression/import-std-module/deque-dbg-info-content::Unknown Unit Message ("")
	5,660 ms	lldb-api.commands/expression/import-std-module/forward_list::Unknown Unit Message ("")
		View Full Test Results (41 Failed)

Event Timeline

simoncook created this revision.Apr 23 2020, 4:49 AM

Herald added a reviewer: jhenderson. · View Herald TranscriptApr 23 2020, 4:49 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, evandro, luismarques and 25 others. · View Herald Transcript

Harbormaster failed remote builds in B54381: Diff 259524!Apr 23 2020, 5:54 AM

MaskRay added a reviewer: MaskRay.Apr 23 2020, 9:17 AM

Thanks for the patch. Several RISC targets need more than one instruction to materialize an address. Making MCInstrAnalysis is moving toward the correct direction.

To add such functionality, I have extended MCInstrAnalysis with a new evaluateInst function, which provides the same functionality as evaluateBranch, but has the ability to store information across multiple instructions allowing these patterns to be picked up. The default implementation does the same check llvm-objdump does before calling evaluateInst, so is a NFC change for other targets, but for RISC-V does a more detailed analysis. This allows evaluateInst to be safe to call on all instructions and thus is called for each instruction.

Looks good. This can benefit PC-relative instructions on other targets as well. For example, we can symbolize the target addresses movq 4101(%rip), %rax for x86. I do have a plan to add a similar interface but you beat me to it.

The MCInstrAnalysis object is no longer const, should this analysis belong in this class (which given the evaluateBranch call, I think makes sense)

Making the MCInstrAnalysis instance mutable is required to make it stateful.

There is a resetAnalysis hook which provides an additional signal that control flow has changed, I think as a "new symbol/file" indicator this should be fine.

Another idea is to just construct a fresh MCInstrAnalysis instance for a new object file. This is required by ARM which has two MCInstrAnalysis subclasses for ARM-state and Thumb-state. Arguably its MCInstrAnalysis should be stateful too to take into account of data mapping symbols. In the future we may need MCSubtargetInfo. Reconstructing a new MCInstrAnalysis instance does not seem like a large cost we want to avoid.

I've added a address printout after the instruction just for RISC-V; this matches GNU objdump's behaviour, but does not necessarily have to land with the rest of the patch.

Maybe something similar to my previous patches on improving the disassembly output? (D76580/D76591/D77853)
Ideally that patch should land before this one.

I do an explicit check for __global_pointer$ and pass this through to MCInstrAnalysis since it has no concept of anything more than MCInsts, there may be scope for making this more generic.

Please change the example to PC relative addresses first. Honestly I think GP is an ugly part of the ABI. This was confirmed here https://groups.google.com/a/groups.riscv.org/forum/#!searchin/sw-dev/__global_pointer$24%7Csort:date/sw-dev/ZjYUJswknQ4/bhFnlWc8BQAJ expand "quoted text"

If we can place

if (MIA &&
    (Obj->getArch() == Triple::riscv32 ||
     Obj->getArch() == Triple::riscv64) &&
    Name == "__global_pointer$")

in a more suitable target specific place, I will not necessarily block to that change. Note, we can just do the non-controversial PC relative addresses first. It is sufficient to demonstrate the benefits.

Since I haven't written tests yet, I can give an example of before/after with this patch:

Consider using PC-relative instructions and exhibiting diff -u output.

jalr ra, -94(ra) #10074 <main>

I'd prefer we just print the target address. GNU objdump always prints the target address. I migrated some targets to print target addresses (see the diffs I linked above). If you feel we need a command line option to print an immediate instead, we can add it, but IMHO that customized output should not be the default. (I asked some people and many feel that the immediate is not useful)

Recently on the binutils mailing list, Alan Modra is proposing some options to control objdump -d output https://sourceware.org/pipermail/binutils/2020-April/110669.html If you have ideas, please speak up.

llvm/include/llvm/MC/MCInstrAnalysis.h
165	We can just construct a new MCInstrAnalysis instance. See my main comment.
171	Sigh, GP is an ugly part of the ABI. See my main comment.
llvm/tools/llvm-objdump/llvm-objdump.cpp
1580	I know that other `getArch` calls exist in this file, but for new code we should avoid them.

Created D78776

I am thinking whether we should unify MCInstrAnalysis::{evaluateBranch,evaluateMemoryOperand} and use a better name like evaluateTargetAddress (non-const because it needs to be stateful)

Simon can you please rebase, it seems D78776 got merged and now conflicts. Thank you.

In D78702#2011633, @apazos wrote:

Simon can you please rebase, it seems D78776 got merged and now conflicts. Thank you.

I'm currently in the middle of rebasing to make this work well using evaluateMemoryOperandAddress instead and move some of the printing back into the Instruction printer as per feedback, I should have an updated patch (more likely two) in the next couple of days. I'll update this for the stateful half when that bits ready.

lenary resigned from this revision.Jan 14 2021, 9:59 AM

Herald added subscribers: frasercrmck, NickHung. · View Herald TranscriptJan 14 2021, 9:59 AM

rkruppe removed a subscriber: rkruppe.Jan 14 2021, 10:19 AM

MaskRay requested changes to this revision.Feb 6 2021, 8:02 PM

This revision now requires changes to proceed.Feb 6 2021, 8:02 PM

Herald added a subscriber: vkmr. · View Herald TranscriptFeb 6 2021, 8:02 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

MC/

MCInstrAnalysis.h

26 lines

lib/

Target/

RISCV/

MCTargetDesc/

RISCVMCTargetDesc.cpp

151 lines

test/

MC/

Disassembler/

RISCV/

branch-targets.txt

16 lines

RISCV/

rv64-relax-all.s

8 lines

tools/

llvm-objdump/

llvm-objdump.cpp

28 lines

Diff 259524

llvm/include/llvm/MC/MCInstrAnalysis.h

//===- llvm/MC/MCInstrAnalysis.h - InstrDesc target hooks -------- C++ --===//		//===- llvm/MC/MCInstrAnalysis.h - InstrDesc target hooks -------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file defines the MCInstrAnalysis class which the MCTargetDescs can		// This file defines the MCInstrAnalysis class which the MCTargetDescs can
// derive from to give additional information to MC.		// derive from to give additional information to MC.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_MC_MCINSTRANALYSIS_H		#ifndef LLVM_MC_MCINSTRANALYSIS_H
#define LLVM_MC_MCINSTRANALYSIS_H		#define LLVM_MC_MCINSTRANALYSIS_H

		#include "llvm/ADT/Triple.h"
#include "llvm/MC/MCInst.h"		#include "llvm/MC/MCInst.h"
#include "llvm/MC/MCInstrDesc.h"		#include "llvm/MC/MCInstrDesc.h"
#include "llvm/MC/MCInstrInfo.h"		#include "llvm/MC/MCInstrInfo.h"
#include <cstdint>		#include <cstdint>

namespace llvm {		namespace llvm {

class MCRegisterInfo;		class MCRegisterInfo;
class Triple;

class MCInstrAnalysis {		class MCInstrAnalysis {
protected:		protected:
friend class Target;		friend class Target;

const MCInstrInfo *Info;		const MCInstrInfo *Info;

public:		public:
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	public:
/// register moves. For example, on most X86 subtargets, a candidate for move		/// register moves. For example, on most X86 subtargets, a candidate for move
/// elimination cannot specify the same register for both source and		/// elimination cannot specify the same register for both source and
/// destination.		/// destination.
virtual bool isOptimizableRegisterMove(const MCInst &MI,		virtual bool isOptimizableRegisterMove(const MCInst &MI,
unsigned CPUID) const {		unsigned CPUID) const {
return false;		return false;
}		}

		/// Given an instruction evaluate its operands to gleam information which can
		/// be useful for multi-instruction patterns. An example is RISC-V's function
		/// calls, whose target is calculated from two (not necessarily adjacent)
		/// instructions. Assumes a linear scan of disassembly.
		virtual bool evaluateInst(const MCInst &Inst, uint64_t Addr, uint64_t Size,
		uint64_t &Target) {
		if (isCall(Inst) \|\| isUnconditionalBranch(Inst) \|\|
		isConditionalBranch(Inst))
		return evaluateBranch(Inst, Addr, Size, Target);

		return false;
		}

		/// For the evaluateInst call, reset any known values.
		/// NewObject is set to true when a new object is being analyzed, in which
		/// case Arch is used to indicate the Architecture of the incoming object.
		virtual void resetAnalysis(bool NewObject = false,
		MaskRayUnsubmitted Not Done Reply Inline Actions We can just construct a new MCInstrAnalysis instance. See my main comment. MaskRay: We can just construct a new MCInstrAnalysis instance. See my main comment.
		Triple::ArchType Arch = Triple::UnknownArch) {}

		/// For the evaluateInst call, set a target's known GP register for GP-based
		/// analysis. This value is cached until resetAnalysis is called with
		/// NewObject=true
		virtual void setGPForAnalysis(uint64_t Addr) {}
		MaskRayUnsubmitted Not Done Reply Inline Actions Sigh, GP is an ugly part of the ABI. See my main comment. MaskRay: Sigh, GP is an ugly part of the ABI. See my main comment.

/// Given a branch instruction try to get the address the branch		/// Given a branch instruction try to get the address the branch
/// targets. Return true on success, and the address in Target.		/// targets. Return true on success, and the address in Target.
virtual bool		virtual bool
evaluateBranch(const MCInst &Inst, uint64_t Addr, uint64_t Size,		evaluateBranch(const MCInst &Inst, uint64_t Addr, uint64_t Size,
uint64_t &Target) const;		uint64_t &Target) const;

/// Given an instruction tries to get the address of a memory operand. Returns		/// Given an instruction tries to get the address of a memory operand. Returns
/// the address on success.		/// the address on success.
Show All 15 Lines

llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp

Show All 11 Lines

#include "RISCVMCTargetDesc.h"		#include "RISCVMCTargetDesc.h"
#include "RISCVELFStreamer.h"		#include "RISCVELFStreamer.h"
#include "RISCVInstPrinter.h"		#include "RISCVInstPrinter.h"
#include "RISCVMCAsmInfo.h"		#include "RISCVMCAsmInfo.h"
#include "RISCVTargetStreamer.h"		#include "RISCVTargetStreamer.h"
#include "TargetInfo/RISCVTargetInfo.h"		#include "TargetInfo/RISCVTargetInfo.h"
#include "Utils/RISCVBaseInfo.h"		#include "Utils/RISCVBaseInfo.h"
		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/CodeGen/Register.h"		#include "llvm/CodeGen/Register.h"
#include "llvm/MC/MCAsmInfo.h"		#include "llvm/MC/MCAsmInfo.h"
#include "llvm/MC/MCInstrAnalysis.h"		#include "llvm/MC/MCInstrAnalysis.h"
#include "llvm/MC/MCInstrInfo.h"		#include "llvm/MC/MCInstrInfo.h"
#include "llvm/MC/MCRegisterInfo.h"		#include "llvm/MC/MCRegisterInfo.h"
#include "llvm/MC/MCStreamer.h"		#include "llvm/MC/MCStreamer.h"
#include "llvm/MC/MCSubtargetInfo.h"		#include "llvm/MC/MCSubtargetInfo.h"
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	static MCTargetStreamer *createRISCVAsmTargetStreamer(MCStreamer &S,
formatted_raw_ostream &OS,		formatted_raw_ostream &OS,
MCInstPrinter *InstPrint,		MCInstPrinter *InstPrint,
bool isVerboseAsm) {		bool isVerboseAsm) {
return new RISCVTargetAsmStreamer(S, OS);		return new RISCVTargetAsmStreamer(S, OS);
}		}

namespace {		namespace {

		// Cache of RISC-V Register Values used by RISCVMCInstrAnalysis
		class RISCVRegCache {
		uint64_t GPRKnownValues[32] = {0};
		bool GPRGoodValues[32] = {true, false};
		uint64_t KnownGPValue = 0;
		bool GPIsSet = false;

		public:
		void setReg(unsigned Reg, uint64_t Value) {
		// Ignore writes to X0
		if (Reg == RISCV::X0)
		return;
		Reg -= RISCV::X0;
		GPRKnownValues[Reg] = Value;
		GPRGoodValues[Reg] = true;
		}
		Optional<uint64_t> getReg(unsigned Reg) {
		Reg -= RISCV::X0;
		if (GPRGoodValues[Reg])
		return GPRKnownValues[Reg];
		return None;
		}
		// Invalidate all known register values.
		// If Full is false, then do not invalidate the cached value of the global
		// pointer
		void invalidate(bool Full = false) {
		for (unsigned i = 1; i < 32; i++) {
		GPRKnownValues[i] = 0;
		GPRGoodValues[i] = false;
		}
		if (Full) {
		GPIsSet = false;
		KnownGPValue = 0;
		} else if (GPIsSet) {
		GPRKnownValues[3] = KnownGPValue;
		GPRGoodValues[3] = true;
		}
		}
		void invalidateReg(unsigned Reg) {
		if (Reg == RISCV::X0)
		return;
		Reg -= RISCV::X0;
		GPRGoodValues[Reg] = false;
		}
		void setGP(uint64_t Addr) {
		GPRKnownValues[3] = Addr;
		GPRGoodValues[3] = true;
		GPIsSet = true;
		KnownGPValue = Addr;
		}
		};

class RISCVMCInstrAnalysis : public MCInstrAnalysis {		class RISCVMCInstrAnalysis : public MCInstrAnalysis {
		RISCVRegCache RegCache;
		bool IsRV64;

public:		public:
explicit RISCVMCInstrAnalysis(const MCInstrInfo *Info)		explicit RISCVMCInstrAnalysis(const MCInstrInfo *Info)
: MCInstrAnalysis(Info) {}		: MCInstrAnalysis(Info) {}

		void resetAnalysis(bool NewObject, Triple::ArchType Arch) override {
		// Reset the register cache, using NewObject to determine whether any
		// cached GP is valid
		RegCache.invalidate(NewObject);
		if (NewObject)
		IsRV64 = Arch == Triple::riscv64;
		}

		void setGPForAnalysis(uint64_t Addr) override { RegCache.setGP(Addr); }

		bool evaluateInst(const MCInst &Inst, uint64_t Addr, uint64_t Size,
		uint64_t &Target) override {
		// First evaluate branches that evaluateBranch supports
		if ((isCall(Inst) \|\| isUnconditionalBranch(Inst) \|\|
		isConditionalBranch(Inst)) &&
		evaluateBranch(Inst, Addr, Size, Target)) {
		RegCache.invalidate();
		return true;
		}

		switch (Inst.getOpcode()) {
		default:
		break;
		case RISCV::AUIPC:
		case RISCV::LUI:
		case RISCV::C_LUI: {
		unsigned Reg = Inst.getOperand(0).getReg();
		uint64_t Value = Inst.getOperand(1).getImm() << 12;
		if (Inst.getOpcode() == RISCV::AUIPC)
		Value += Addr;
		RegCache.setReg(Reg, Value);
		return false;
		}
		case RISCV::ADDI: {
		unsigned DstReg = Inst.getOperand(0).getReg();
		unsigned SrcReg = Inst.getOperand(1).getReg();
		if (auto SrcVal = RegCache.getReg(SrcReg)) {
		Target = *SrcVal + Inst.getOperand(2).getImm();
		if (!IsRV64)
		Target &= 0xffffffff;
		RegCache.setReg(DstReg, Target);
		return true;
		}
		break;
		}
		case RISCV::JALR: {
		unsigned SrcReg = Inst.getOperand(1).getReg();
		if (auto SrcVal = RegCache.getReg(SrcReg)) {
		Target = *SrcVal + Inst.getOperand(2).getImm();
		if (!IsRV64)
		Target &= 0xffffffff;
		// Since this is a jump to a new BB, invalidate the whole cache
		RegCache.invalidate();
		return true;
		}
		break;
		}
		case RISCV::LB:
		case RISCV::LH:
		case RISCV::LW:
		case RISCV::LBU:
		case RISCV::LHU:
		case RISCV::LWU:
		case RISCV::LD:
		case RISCV::FLW:
		case RISCV::FLD:
		case RISCV::SB:
		case RISCV::SH:
		case RISCV::SW:
		case RISCV::FSW:
		case RISCV::SD:
		case RISCV::FSD: {
		unsigned SrcReg = Inst.getOperand(1).getReg();
		if (auto SrcVal = RegCache.getReg(SrcReg)) {
		Target = *SrcVal + Inst.getOperand(2).getImm();
		if (!IsRV64)
		Target &= 0xffffffff;
		return true;
		}
		break;
		}
		}

		// For all other instructions, it is no longer safe to assume the value of
		// any destination register, so invalidate these.
		auto &Desc = Info->get(Inst.getOpcode());
		for (unsigned i = 0, e = Desc.getNumDefs(); i < e; i++) {
		auto &Op = Inst.getOperand(i);
		if (Op.isReg() && Op.getReg() >= RISCV::X0 && Op.getReg() <= RISCV::X31)
		RegCache.invalidateReg(Op.getReg());
		}

		return false;
		}

bool evaluateBranch(const MCInst &Inst, uint64_t Addr, uint64_t Size,		bool evaluateBranch(const MCInst &Inst, uint64_t Addr, uint64_t Size,
uint64_t &Target) const override {		uint64_t &Target) const override {
if (isConditionalBranch(Inst)) {		if (isConditionalBranch(Inst)) {
int64_t Imm;		int64_t Imm;
if (Size == 2)		if (Size == 2)
Imm = Inst.getOperand(1).getImm();		Imm = Inst.getOperand(1).getImm();
else		else
Imm = Inst.getOperand(2).getImm();		Imm = Inst.getOperand(2).getImm();
▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/MC/Disassembler/RISCV/branch-targets.txt

Show All 9 Lines	.option norvc
bnez a0, label1		bnez a0, label1
bnez a0, label2		bnez a0, label2
.option rvc		.option rvc
j label1		j label1
j label2		j label2
bnez a0, label1		bnez a0, label1
bnez a0, label2		bnez a0, label2
# CHECK-LABEL: <label1>:		# CHECK-LABEL: <label1>:
# CHECK-NEXT: jal zero, 0 <label1>		# CHECK-NEXT: jal zero, 0 #0 <label1>
# CHECK-NEXT: jal zero, 20 <label2>		# CHECK-NEXT: jal zero, 20 #18 <label2>
# CHECK-NEXT: bne a0, zero, -8 <label1>		# CHECK-NEXT: bne a0, zero, -8 #0 <label1>
# CHECK-NEXT: bne a0, zero, 12 <label2>		# CHECK-NEXT: bne a0, zero, 12 #18 <label2>
# CHECK-NEXT: c.j -16 <label1>		# CHECK-NEXT: c.j -16 #0 <label1>
# CHECK-NEXT: c.j 6 <label2>		# CHECK-NEXT: c.j 6 #18 <label2>
# CHECK-NEXT: c.bnez a0, -20 <label1>		# CHECK-NEXT: c.bnez a0, -20 #0 <label1>
# CHECK-NEXT: c.bnez a0, 2 <label2>		# CHECK-NEXT: c.bnez a0, 2 #18 <label2>

label2:		label2:

llvm/test/MC/RISCV/rv64-relax-all.s

	# RUN: llvm-mc -filetype=obj -triple riscv64 -mattr=+c %s \| llvm-objdump -d -M no-aliases --no-show-raw-insn - \| FileCheck %s --check-prefix=INSTR			# RUN: llvm-mc -filetype=obj -triple riscv64 -mattr=+c %s \| llvm-objdump -d -M no-aliases --no-show-raw-insn - \| FileCheck %s --check-prefix=INSTR

	# RUN: llvm-mc -filetype=obj -triple riscv64 -mattr=+c %s --mc-relax-all \| llvm-objdump -d -M no-aliases --no-show-raw-insn - \| FileCheck %s --check-prefix=RELAX-INSTR			# RUN: llvm-mc -filetype=obj -triple riscv64 -mattr=+c %s --mc-relax-all \| llvm-objdump -d -M no-aliases --no-show-raw-insn - \| FileCheck %s --check-prefix=RELAX-INSTR

	## Check the instructions are relaxed correctly			## Check the instructions are relaxed correctly

	NEAR:			NEAR:

	# INSTR: c.beqz a0, 0 <NEAR>			# INSTR: c.beqz a0, 0 #0 <NEAR>
	# RELAX-INSTR: beq a0, zero, 0 <NEAR>			# RELAX-INSTR: beq a0, zero, 0 #0 <NEAR>
	c.beqz a0, NEAR			c.beqz a0, NEAR

	# INSTR: c.j -2 <NEAR>			# INSTR: c.j -2 #0 <NEAR>
	# RELAX-INSTR: jal zero, -4 <NEAR>			# RELAX-INSTR: jal zero, -4 #0 <NEAR>
	c.j NEAR			c.j NEAR

llvm/tools/llvm-objdump/llvm-objdump.cpp

//===-- llvm-objdump.cpp - Object file dumping utility for llvm -----------===//		//===-- llvm-objdump.cpp - Object file dumping utility for llvm -----------===//
		Lint: Lint Inline Actions clang-format suggested style edits found: Lint: Lint: clang-format suggested style edits found:
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This program is a utility that works like binutils "objdump", that is, it		// This program is a utility that works like binutils "objdump", that is, it
▲ Show 20 Lines • Show All 1,173 Lines • ▼ Show 20 Lines	if (Obj->isXCOFF() && SymbolDescription)
return SymbolInfoTy(Addr, Name, None, None, false);		return SymbolInfoTy(Addr, Name, None, None, false);
else		else
return SymbolInfoTy(Addr, Name, Type);		return SymbolInfoTy(Addr, Name, Type);
}		}

static void disassembleObject(const Target TheTarget, const ObjectFile Obj,		static void disassembleObject(const Target TheTarget, const ObjectFile Obj,
MCContext &Ctx, MCDisassembler *PrimaryDisAsm,		MCContext &Ctx, MCDisassembler *PrimaryDisAsm,
MCDisassembler *SecondaryDisAsm,		MCDisassembler *SecondaryDisAsm,
const MCInstrAnalysis MIA, MCInstPrinter IP,		MCInstrAnalysis MIA, MCInstPrinter IP,
const MCSubtargetInfo *PrimarySTI,		const MCSubtargetInfo *PrimarySTI,
const MCSubtargetInfo *SecondarySTI,		const MCSubtargetInfo *SecondarySTI,
PrettyPrinter &PIP,		PrettyPrinter &PIP,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - PrettyPrinter &PIP, - SourcePrinter &SP, bool InlineRelocs) { + PrettyPrinter &PIP, SourcePrinter &SP, + bool InlineRelocs) { Lint: Pre-merge checks: clang-format: please reformat the code ``` - PrettyPrinter &PIP…
SourcePrinter &SP, bool InlineRelocs) {		SourcePrinter &SP, bool InlineRelocs) {
const MCSubtargetInfo *STI = PrimarySTI;		const MCSubtargetInfo *STI = PrimarySTI;
MCDisassembler *DisAsm = PrimaryDisAsm;		MCDisassembler *DisAsm = PrimaryDisAsm;
bool PrimaryIsThumb = false;		bool PrimaryIsThumb = false;
if (isArmElf(Obj))		if (isArmElf(Obj))
PrimaryIsThumb = STI->checkFeatures("+thumb-mode");		PrimaryIsThumb = STI->checkFeatures("+thumb-mode");

		if (MIA)
		MIA->resetAnalysis(true, Obj->getArch());

std::map<SectionRef, std::vector<RelocationRef>> RelocMap;		std::map<SectionRef, std::vector<RelocationRef>> RelocMap;
if (InlineRelocs)		if (InlineRelocs)
RelocMap = getRelocsMap(*Obj);		RelocMap = getRelocsMap(*Obj);
bool Is64Bits = Obj->getBytesInAddress() > 4;		bool Is64Bits = Obj->getBytesInAddress() > 4;

// Create a mapping from virtual address to symbol name. This is used to		// Create a mapping from virtual address to symbol name. This is used to
// pretty print the symbols while disassembling.		// pretty print the symbols while disassembling.
std::map<SectionRef, SectionSymbolsTy> AllSymbols;		std::map<SectionRef, SectionSymbolsTy> AllSymbols;
Show All 15 Lines	if (MachO) {
DataRefImpl SymDRI = Symbol.getRawDataRefImpl();		DataRefImpl SymDRI = Symbol.getRawDataRefImpl();
uint8_t NType = (MachO->is64Bit() ?		uint8_t NType = (MachO->is64Bit() ?
MachO->getSymbol64TableEntry(SymDRI).n_type:		MachO->getSymbol64TableEntry(SymDRI).n_type:
MachO->getSymbolTableEntry(SymDRI).n_type);		MachO->getSymbolTableEntry(SymDRI).n_type);
if (NType & MachO::N_STAB)		if (NType & MachO::N_STAB)
continue;		continue;
}		}

		if (MIA &&
		(Obj->getArch() == Triple::riscv32 \|\|
		Obj->getArch() == Triple::riscv64) &&
		Name == "__global_pointer$")
		MIA->setGPForAnalysis(unwrapOrError(Symbol.getAddress(), FileName));

section_iterator SecI = unwrapOrError(Symbol.getSection(), FileName);		section_iterator SecI = unwrapOrError(Symbol.getSection(), FileName);
if (SecI != Obj->section_end())		if (SecI != Obj->section_end())
AllSymbols[*SecI].push_back(createSymbolInfo(Obj, Symbol));		AllSymbols[*SecI].push_back(createSymbolInfo(Obj, Symbol));
else		else
AbsoluteSymbols.push_back(createSymbolInfo(Obj, Symbol));		AbsoluteSymbols.push_back(createSymbolInfo(Obj, Symbol));
}		}

if (AllSymbols.empty() && Obj->isELF())		if (AllSymbols.empty() && Obj->isELF())
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	if (FilterSections.empty() && !DisassembleAll &&
(!Section.isText() \|\| Section.isVirtual()))		(!Section.isText() \|\| Section.isVirtual()))
continue;		continue;

uint64_t SectionAddr = Section.getAddress();		uint64_t SectionAddr = Section.getAddress();
uint64_t SectSize = Section.getSize();		uint64_t SectSize = Section.getSize();
if (!SectSize)		if (!SectSize)
continue;		continue;

		if (MIA)
		MIA->resetAnalysis();

// Get the list of all the symbols in this section.		// Get the list of all the symbols in this section.
SectionSymbolsTy &Symbols = AllSymbols[Section];		SectionSymbolsTy &Symbols = AllSymbols[Section];
std::vector<MappingSymbolPair> MappingSymbols;		std::vector<MappingSymbolPair> MappingSymbols;
if (hasMappingSymbols(Obj)) {		if (hasMappingSymbols(Obj)) {
for (const auto &Symb : Symbols) {		for (const auto &Symb : Symbols) {
uint64_t Address = Symb.Addr;		uint64_t Address = Symb.Addr;
StringRef Name = Symb.Name;		StringRef Name = Symb.Name;
if (Name.startswith("$d"))		if (Name.startswith("$d"))
▲ Show 20 Lines • Show All 196 Lines • ▼ Show 20 Lines	for (unsigned SI = 0, SE = Symbols.size(); SI != SE; ++SI) {
{SectionAddr + Index + VMAAdjustment, Section.getIndex()},		{SectionAddr + Index + VMAAdjustment, Section.getIndex()},
outs(), "", *STI, &SP, Obj->getFileName(), &Rels);		outs(), "", *STI, &SP, Obj->getFileName(), &Rels);
outs() << CommentStream.str();		outs() << CommentStream.str();
Comments.clear();		Comments.clear();

// If disassembly has failed, avoid analysing invalid/incomplete		// If disassembly has failed, avoid analysing invalid/incomplete
// instruction information. Otherwise, try to resolve the target of a		// instruction information. Otherwise, try to resolve the target of a
// call, tail call, etc. to a specific symbol.		// call, tail call, etc. to a specific symbol.
if (Disassembled && MIA &&		if (Disassembled && MIA) {
(MIA->isCall(Inst) \|\| MIA->isUnconditionalBranch(Inst) \|\|
MIA->isConditionalBranch(Inst))) {
uint64_t Target;		uint64_t Target;
if (MIA->evaluateBranch(Inst, SectionAddr + Index, Size, Target)) {		if (MIA->evaluateInst(Inst, SectionAddr + Index, Size, Target)) {
// In a relocatable object, the target's section must reside in		// In a relocatable object, the target's section must reside in
// the same section as the call instruction or it is accessed		// the same section as the call instruction or it is accessed
// through a relocation.		// through a relocation.
//		//
// In a non-relocatable object, the target may be in any section.		// In a non-relocatable object, the target may be in any section.
// In that case, locate the section(s) containing the target address		// In that case, locate the section(s) containing the target address
// and find the symbol in one of those, if possible.		// and find the symbol in one of those, if possible.
//		//
Show All 30 Lines	for (unsigned SI = 0, SE = Symbols.size(); SI != SE; ++SI) {
*TargetSymbols,		*TargetSymbols,
[=](const SymbolInfoTy &O) { return O.Addr <= Target; });		[=](const SymbolInfoTy &O) { return O.Addr <= Target; });
if (It != TargetSymbols->begin()) {		if (It != TargetSymbols->begin()) {
TargetSym = &*(It - 1);		TargetSym = &*(It - 1);
break;		break;
}		}
}		}

		// For RISC-V it is not possible to print this until the MIA
		// analysis is complete
		if (Obj->getArch() == Triple::riscv32 \|\|
		MaskRayUnsubmitted Not Done Reply Inline Actions I know that other `getArch` calls exist in this file, but for new code we should avoid them. MaskRay: I know that other `getArch` calls exist in this file, but for new code we should avoid them.
		Obj->getArch() == Triple::riscv64)
		outs() << " #" << Twine::utohexstr(Target);

if (TargetSym != nullptr) {		if (TargetSym != nullptr) {
uint64_t TargetAddress = TargetSym->Addr;		uint64_t TargetAddress = TargetSym->Addr;
std::string TargetName = TargetSym->Name.str();		std::string TargetName = TargetSym->Name.str();
if (Demangle)		if (Demangle)
TargetName = demangle(TargetName);		TargetName = demangle(TargetName);

outs() << " <" << TargetName;		outs() << " <" << TargetName;
uint64_t Disp = Target - TargetAddress;		uint64_t Disp = Target - TargetAddress;
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	if (STI->checkFeatures("+thumb-mode"))
Features.AddFeature("-thumb-mode");		Features.AddFeature("-thumb-mode");
else		else
Features.AddFeature("+thumb-mode");		Features.AddFeature("+thumb-mode");
SecondarySTI.reset(TheTarget->createMCSubtargetInfo(TripleName, MCPU,		SecondarySTI.reset(TheTarget->createMCSubtargetInfo(TripleName, MCPU,
Features.getString()));		Features.getString()));
SecondaryDisAsm.reset(TheTarget->createMCDisassembler(*SecondarySTI, Ctx));		SecondaryDisAsm.reset(TheTarget->createMCDisassembler(*SecondarySTI, Ctx));
}		}

std::unique_ptr<const MCInstrAnalysis> MIA(		std::unique_ptr<MCInstrAnalysis> MIA(
TheTarget->createMCInstrAnalysis(MII.get()));		TheTarget->createMCInstrAnalysis(MII.get()));

int AsmPrinterVariant = AsmInfo->getAssemblerDialect();		int AsmPrinterVariant = AsmInfo->getAssemblerDialect();
std::unique_ptr<MCInstPrinter> IP(TheTarget->createMCInstPrinter(		std::unique_ptr<MCInstPrinter> IP(TheTarget->createMCInstPrinter(
Triple(TripleName), AsmPrinterVariant, AsmInfo, MII, *MRI));		Triple(TripleName), AsmPrinterVariant, AsmInfo, MII, *MRI));
if (!IP)		if (!IP)
reportError(Obj->getFileName(),		reportError(Obj->getFileName(),
"no instruction printer for target " + TripleName);		"no instruction printer for target " + TripleName);
▲ Show 20 Lines • Show All 779 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[RFC][RISCV][MC/Objdump] Extend llvm-objdump output to support more instruction patternsNeeds RevisionPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 259524

llvm/include/llvm/MC/MCInstrAnalysis.h

llvm/lib/Target/RISCV/MCTargetDesc/RISCVMCTargetDesc.cpp

llvm/test/MC/Disassembler/RISCV/branch-targets.txt

llvm/test/MC/RISCV/rv64-relax-all.s

llvm/tools/llvm-objdump/llvm-objdump.cpp

[RFC][RISCV][MC/Objdump] Extend llvm-objdump output to support more instruction patterns
Needs RevisionPublic