This is an archive of the discontinued LLVM Phabricator instance.

llvm-objdump Allow disassembly of ARM and thumb code mix in ELF object file.
Needs RevisionPublic

Authored by khemant on Sep 15 2016, 3:08 PM.

Download Raw Diff

Details

Reviewers

rengolin
compnerd
zatrazz
echristo
peter.smith

Summary

This is a patch that tries to enable something that is not usually possible in present design. If the patch looks a little weird bear with me. I want ideas on this issue and any workable solutions if any.

llvm-objdump with no triple or mcpu tries to infer the architecture based on

the details present in file. This works on most architectures and file formats
except ET_ARM type object type. The presence of thumb if any is not indicated
anywhere except in .ARM.attributes and using mapping symbols $t.* Also not all
architectures have all instructions available. This makes the disassembly of ARM
ELF full linked image that has thumb and ARM mix  virtually useless.

The objdump in its present state  has only one disassembler and hence cannot
make use of the above information. This commit will crack open the ELF binary if
it is ET_ARM type and try to see CPU name and if it is thumb only processor.

The tool will create a second disassembler for ARM ELF and try using that if the
first one fails. The change also makes use of mapping symbols $a, $t. The
mapping symbols will determine which disassembler should be used. This makes
usage more user friendly as the CPU no longer needs to be explicit if the
attributes are present in the file. If the attributes are not present, the usage
remains the old way (using -mcpu and -triple options).

Diff Detail

Repository: rL LLVM

Event Timeline

khemant updated this revision to Diff 71565.Sep 15 2016, 3:08 PM

khemant retitled this revision from to llvm-objdump Allow disassembly of ARM and thumb code mix in ELF object file..

khemant updated this object.

khemant added reviewers: echristo, compnerd, • rafael.

khemant set the repository for this revision to rL LLVM.

khemant added subscribers: echristo, compnerd, • rafael, llvm-commits.

Herald added subscribers: samparker, mehdi_amini, rengolin, aemerson. · View Herald TranscriptSep 15 2016, 3:08 PM

khemant added inline comments.Sep 15 2016, 3:10 PM

test/tools/llvm-objdump/ARM/disassemble-arm-thumb-elf-mix.test
1	Will be adding source and the way source was compiled and linked.

• rafael removed a reviewer: • rafael.Sep 16 2016, 4:32 AM

• rafael removed a subscriber: • rafael.

This is an interesting functionality, but I'm not well versed in objdump to know if the code is what's expected.

The first thing I would say is that it really needs a clang-format pass as well as a lot more tests, since the change in functionality is not small.

From a quick look, it seems that you went for the "whatever works" approach, which makes the whole thing extremely complicated, and harder to maintain. I'm not sure all other architectures will be happy with the added context, complexity and conditional blocks for the sake of one arch (ARM).

I honestly think this needs a better design and it would be better to get that discussion to the llvm-dev list first, as an RFC.

cheers,
--renato

tools/llvm-objdump/llvm-objdump.cpp
1092	You're converting a pointer to an array ref to another pointer, this is really confusing. Please, pick one style and stick with it. even better, isn't there a way to decode the attributes into a nice structure? @compnerd, you spent more time in the attributes code than I did, any ideas?
1102	StringRef.startsWith("aeabi")?
1119	there has to be a better way to deal with attributes...
1362	AArch64 code doesn't mix with AArch32, so this is conceptually wrong. If this is just to reduce code duplication in this file, I understand, but there could be an error. It'd be clearer, though, if they were in separate blocks, one for ARM and one for AArch64.

This revision now requires changes to proceed.Sep 16 2016, 5:41 AM

The first thing I would say is that it really needs a clang-format pass as well as a lot more tests, since the change in functionality is not small.

This was clang-formatted, may be my clang format python script is old.

From a quick look, it seems that you went for the "whatever works" approach, which makes the whole thing extremely complicated, and harder to maintain. I'm not sure all other architectures will be happy with the added context, complexity and conditional blocks for the sake of one arch (ARM).

I honestly think this needs a better design and it would be better to get that discussion to the llvm-dev list first, as an RFC.

This is a problem with existing design and is not easily compatible with ARM architecture. Take a look at MachO dumper, it employs similar method to try using both disassembler. But as suggested, I will send an RFC for this functionality on dev list and find out.

tools/llvm-objdump/llvm-objdump.cpp
1092	Got it. The only way to decode the attributes is to at the size a parse them (they are sorted in ULSB format).
1102	Got it.
1119	I am open for ideas. AFAIK there is no way to go past an attribute till you read the size, and since each can vary based on attribute, you have to go over them till you find the one you need or finish reading all of them.
1362	This is not used for mix of ISA. The $d mapping can exist within a region of code that is treated as a single STT_FUNC symbol. The mapping from $x to $d and then back to $x is used to make sure you disassembler is not thrown off balance by trying to dissemble data. This change existed before this change. I made sure the text is separated into arm and thumb and 64 bit arm instructions. Line 1550 will give you an idea of what I am saying.

I've not got a lot to add over what Renato and other comments on llvm-dev have made. I understand that this is probably close to the best incremental change that could be made to llvm-objdump, but I think that this is probably too intrusive a change just for ARM as it is and I think we need to be clearer about the requirements in some areas such as using attributes.

Some comments that might be useful in the future:
It would be good to write down what the scope of disassembly you are aiming for. For example: Is it architectural disassembly where instructions not supported on the target architecture come out as undefined (i.e. disassembly of an ARMv7a object for an ARMv5 target) or a universal if it is a legal ARM or Thumb instruction in any architecture disassemble it regardless of architecture.

Are you intending to try and support stripped binaries with no mapping symbols or static symbol table? There are overlaps between ARM and Thumb bit patterns, and of course literal data so I fear that even trying to do this may cause more problems than it solves.

I think that putting attribute reading code directly into llvm-objdump, potentially duplicating code in llvm-readobj isn't the right thing to do. We should have an attribute reading/writing library that tools can use. This would be really useful for lld for example.

Disassembly is an architectural property and not a property of the CPU name, it is true that a CPU name infers a default set of attributes, but there are ways to alter these defaults and have different properties in the object file. If we are going to read the attributes I think we should be reading the architecture and the various supporting attributes to work out what the target and subtarget features are. In an ideal world we shouldn't be making any disassembly decisions based on the CPU name alone.

When mapping symbols aren't available but the static symbol table still persists, it is possible to use the state of the last STT_FUNC symbol definition (bit 0 == 1 for Thumb) and (bit 0 == 0) for ARM to determine ARM or Thumb. This won't work if there is a state change or literal without another STT_FUNC or STT_OBJ symbol, but it is a reasonable heuristic.

I think that any llvm-objdump is going to end up with an ARM disassembler and a Thumb disassembler, however I think that there may be neater ways to switch between them in a refactored llvm-objdump. For example the mapping symbols identify a non-overlapping range of addresses that are either ARM, Thumb or literal data and there is one disassembler for each of those ranges, a design that in effect does DisassembleRange(Start, End, Disassembler) would work reasonably well.

Revision Contents

Path

Size

test/

tools/

llvm-objdump/

ARM/

Inputs/

arm-thumb-mix.elf-arm

disassemble-arm-thumb-elf-mix.test

37 lines

tools/

llvm-objdump/

llvm-objdump.cpp

314 lines

Diff 71565

test/tools/llvm-objdump/ARM/Inputs/arm-thumb-mix.elf-arm

This binary file was added.

Property	Old Value	New Value
File Mode	null	100755

test/tools/llvm-objdump/ARM/disassemble-arm-thumb-elf-mix.test

This file was added.

				@RUN: llvm-objdump -d %p/Inputs/arm-thumb-mix.elf-arm \| FileCheck %s
				khemantAuthorUnsubmitted Not Done Reply Inline Actions Will be adding source and the way source was compiled and linked. khemant: Will be adding source and the way source was compiled and linked.

				@CHECK: foo:
				@CHECK-NEXT: 80b4: 80 b5 push {r7, lr}
				@CHECK-NEXT: 80b6: 00 af add r7, sp, #0
				@CHECK-NEXT: 80b8: 00 f0 22 f8 bl #68
				@CHECK-NEXT: 80bc: 40 1c adds r0, r0, #1
				@CHECK-NEXT: 80be: 80 bc pop {r7}
				@CHECK-NEXT: 80c0: 02 bc pop {r1}
				@CHECK-NEXT: 80c2: 8e 46 mov lr, r1
				@CHECK-NEXT: 80c4: 70 47 bx lr
				@CHECK-NEXT: 80c6: 00 00 movs r0, r0

				@CHECK: main:
				@CHECK-NEXT: 80c8: 00 48 2d e9 push {r11, lr}
				@CHECK-NEXT: 80cc: 0d b0 a0 e1 mov r11, sp
				@CHECK-NEXT: 80d0: 08 d0 4d e2 sub sp, sp, #8
				@CHECK-NEXT: 80d4: 00 00 a0 e3 mov r0, #0
				@CHECK-NEXT: 80d8: 04 00 8d e5 str r0, [sp, #4]
				@CHECK-NEXT: 80dc: 14 01 00 e3 movw r0, #276
				@CHECK-NEXT: 80e0: 01 00 40 e3 movt r0, #1
				@CHECK-NEXT: 80e4: 00 00 90 e5 ldr r0, [r0]
				@CHECK-NEXT: 80e8: 00 00 8d e5 str r0, [sp]
				@CHECK-NEXT: 80ec: f0 ff ff fa blx #-64 <foo>
				@CHECK-NEXT: 80f0: 00 e0 9d e5 ldr lr, [sp]
				@CHECK-NEXT: 80f4: 00 00 8e e0 add r0, lr, r0
				@CHECK-NEXT: 80f8: 0b d0 a0 e1 mov sp, r11
				@CHECK-NEXT: 80fc: 00 88 bd e8 pop {r11, pc}

				@CHECK: bar:
				@CHECK-NEXT: 8100: 01 48 ldr r0, [pc, #4]
				@CHECK-NEXT: 8102: 00 68 ldr r0, [r0]
				@CHECK-NEXT: 8104: 70 47 bx lr
				@CHECK-NEXT: 8106: c0 46 mov r8, r8

				@CHECK: $d.1:
				@CHECK-NEXT: 8108: 18 01 01 00 .word 0x00010118

tools/llvm-objdump/llvm-objdump.cpp

Show All 14 Lines
// binutils objdump.		// binutils objdump.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm-objdump.h"		#include "llvm-objdump.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
		#include "llvm/ADT/StringSwitch.h"
#include "llvm/ADT/Triple.h"		#include "llvm/ADT/Triple.h"
#include "llvm/CodeGen/FaultMaps.h"		#include "llvm/CodeGen/FaultMaps.h"
#include "llvm/DebugInfo/DWARF/DWARFContext.h"		#include "llvm/DebugInfo/DWARF/DWARFContext.h"
#include "llvm/DebugInfo/Symbolize/Symbolize.h"		#include "llvm/DebugInfo/Symbolize/Symbolize.h"
#include "llvm/MC/MCAsmInfo.h"		#include "llvm/MC/MCAsmInfo.h"
#include "llvm/MC/MCContext.h"		#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCDisassembler/MCDisassembler.h"		#include "llvm/MC/MCDisassembler/MCDisassembler.h"
#include "llvm/MC/MCDisassembler/MCRelocationInfo.h"		#include "llvm/MC/MCDisassembler/MCRelocationInfo.h"
#include "llvm/MC/MCInst.h"		#include "llvm/MC/MCInst.h"
#include "llvm/MC/MCInstPrinter.h"		#include "llvm/MC/MCInstPrinter.h"
#include "llvm/MC/MCInstrAnalysis.h"		#include "llvm/MC/MCInstrAnalysis.h"
#include "llvm/MC/MCInstrInfo.h"		#include "llvm/MC/MCInstrInfo.h"
#include "llvm/MC/MCObjectFileInfo.h"		#include "llvm/MC/MCObjectFileInfo.h"
#include "llvm/MC/MCRegisterInfo.h"		#include "llvm/MC/MCRegisterInfo.h"
#include "llvm/MC/MCSubtargetInfo.h"		#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/Object/Archive.h"		#include "llvm/Object/Archive.h"
#include "llvm/Object/COFF.h"		#include "llvm/Object/COFF.h"
#include "llvm/Object/COFFImportFile.h"		#include "llvm/Object/COFFImportFile.h"
#include "llvm/Object/ELFObjectFile.h"		#include "llvm/Object/ELFObjectFile.h"
#include "llvm/Object/MachO.h"		#include "llvm/Object/MachO.h"
#include "llvm/Object/ObjectFile.h"		#include "llvm/Object/ObjectFile.h"
		#include "llvm/Support/ARMBuildAttributes.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/Errc.h"		#include "llvm/Support/Errc.h"
#include "llvm/Support/FileSystem.h"		#include "llvm/Support/FileSystem.h"
#include "llvm/Support/Format.h"		#include "llvm/Support/Format.h"
#include "llvm/Support/GraphWriter.h"		#include "llvm/Support/GraphWriter.h"
#include "llvm/Support/Host.h"		#include "llvm/Support/Host.h"
		#include "llvm/Support/LEB128.h"
#include "llvm/Support/ManagedStatic.h"		#include "llvm/Support/ManagedStatic.h"
#include "llvm/Support/MemoryBuffer.h"		#include "llvm/Support/MemoryBuffer.h"
#include "llvm/Support/PrettyStackTrace.h"		#include "llvm/Support/PrettyStackTrace.h"
#include "llvm/Support/Signals.h"		#include "llvm/Support/Signals.h"
#include "llvm/Support/SourceMgr.h"		#include "llvm/Support/SourceMgr.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
#include "llvm/Support/TargetSelect.h"		#include "llvm/Support/TargetSelect.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
▲ Show 20 Lines • Show All 401 Lines • ▼ Show 20 Lines	if (FileBuffer != SourceCache.end()) {
<< "\n";		<< "\n";
}		}
}		}
OldLineInfo = LineInfo;		OldLineInfo = LineInfo;
}		}

static bool isArmElf(const ObjectFile *Obj) {		static bool isArmElf(const ObjectFile *Obj) {
return (Obj->isELF() &&		return (Obj->isELF() &&
(Obj->getArch() == Triple::aarch64 \|\|		(Obj->getArch() == Triple::arm \|\| Obj->getArch() == Triple::armeb \|\|
Obj->getArch() == Triple::aarch64_be \|\|
Obj->getArch() == Triple::arm \|\| Obj->getArch() == Triple::armeb \|\|
Obj->getArch() == Triple::thumb \|\|		Obj->getArch() == Triple::thumb \|\|
Obj->getArch() == Triple::thumbeb));		Obj->getArch() == Triple::thumbeb));
}		}

		static bool isAarch64Elf(const ObjectFile *Obj) {
		return (Obj->isELF() && (Obj->getArch() == Triple::aarch64 \|\|
		Obj->getArch() == Triple::aarch64_be));
		}

class PrettyPrinter {		class PrettyPrinter {
public:		public:
virtual ~PrettyPrinter(){}		virtual ~PrettyPrinter() {}
virtual void printInst(MCInstPrinter &IP, const MCInst *MI,		virtual void printInst(MCInstPrinter &IP, const MCInst *MI,
ArrayRef<uint8_t> Bytes, uint64_t Address,		ArrayRef<uint8_t> Bytes, uint64_t Address,
raw_ostream &OS, StringRef Annot,		raw_ostream &OS, StringRef Annot,
MCSubtargetInfo const &STI, SourcePrinter *SP) {		MCSubtargetInfo const &STI, SourcePrinter *SP) {
if (SP && (PrintSource \|\| PrintLines))		if (SP && (PrintSource \|\| PrintLines))
SP->printSourceLine(OS, Address);		SP->printSourceLine(OS, Address);
OS << format("%8" PRIx64 ":", Address);		OS << format("%8" PRIx64 ":", Address);
if (!NoShowRawInsn) {		if (!NoShowRawInsn) {
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	for (auto D : makeArrayRef(reinterpret_cast<const U32*>(Bytes.data()),
OS << format("%08" PRIX32 " ", static_cast<uint32_t>(D));		OS << format("%08" PRIX32 " ", static_cast<uint32_t>(D));

if (!Annot.empty())		if (!Annot.empty())
OS << "// " << Annot;		OS << "// " << Annot;
}		}
};		};
AMDGCNPrettyPrinter AMDGCNPrettyPrinterInst;		AMDGCNPrettyPrinter AMDGCNPrettyPrinterInst;

PrettyPrinter &selectPrettyPrinter(Triple const &Triple) {		PrettyPrinter *selectPrettyPrinter(Triple const &Triple) {
switch(Triple.getArch()) {		switch (Triple.getArch()) {
default:		default:
return PrettyPrinterInst;		return &PrettyPrinterInst;
case Triple::hexagon:		case Triple::hexagon:
return HexagonPrettyPrinterInst;		return &HexagonPrettyPrinterInst;
case Triple::amdgcn:		case Triple::amdgcn:
return AMDGCNPrettyPrinterInst;		return &AMDGCNPrettyPrinterInst;
}		}
}		}
}		}

template <class ELFT>		template <class ELFT>
static std::error_code getRelocationValueString(const ELFObjectFile<ELFT> *Obj,		static std::error_code getRelocationValueString(const ELFObjectFile<ELFT> *Obj,
const RelocationRef &RelRef,		const RelocationRef &RelRef,
SmallVectorImpl<char> &Result) {		SmallVectorImpl<char> &Result) {
▲ Show 20 Lines • Show All 456 Lines • ▼ Show 20 Lines	static uint8_t getElfSymbolType(const ObjectFile *Obj, const SymbolRef &Sym) {
if (auto *Elf64LEObj = dyn_cast<ELF64LEObjectFile>(Obj))		if (auto *Elf64LEObj = dyn_cast<ELF64LEObjectFile>(Obj))
return Elf64LEObj->getSymbol(Sym.getRawDataRefImpl())->getType();		return Elf64LEObj->getSymbol(Sym.getRawDataRefImpl())->getType();
if (auto *Elf32BEObj = dyn_cast<ELF32BEObjectFile>(Obj))		if (auto *Elf32BEObj = dyn_cast<ELF32BEObjectFile>(Obj))
return Elf32BEObj->getSymbol(Sym.getRawDataRefImpl())->getType();		return Elf32BEObj->getSymbol(Sym.getRawDataRefImpl())->getType();
if (auto *Elf64BEObj = cast<ELF64BEObjectFile>(Obj))		if (auto *Elf64BEObj = cast<ELF64BEObjectFile>(Obj))
return Elf64BEObj->getSymbol(Sym.getRawDataRefImpl())->getType();		return Elf64BEObj->getSymbol(Sym.getRawDataRefImpl())->getType();
llvm_unreachable("Unsupported binary format");		llvm_unreachable("Unsupported binary format");
}		}
		template <class ELFT>
static void DisassembleObject(const ObjectFile *Obj, bool InlineRelocs) {		static std::string getArmCpuDetails(const ELFFile<ELFT> *Obj, bool &ThumbOnly) {
if (StartAddress > StopAddress)		ThumbOnly = true;
error("Start address should be less than stop address");		bool CheckedThumb = false;
		bool CheckedArm = false;
const Target *TheTarget = getTarget(Obj);		StringRef CpuStr = StringRef();
		using Elf_Word = typename ELFFile<ELFT>::Elf_Word;
// Package up features to be passed to target/subtarget		for (auto &Shdr : Obj->sections()) {
SubtargetFeatures Features = Obj->getFeatures();		if (Shdr.sh_type != ELF::SHT_ARM_ATTRIBUTES)
if (MAttrs.size()) {		continue;
for (unsigned i = 0; i != MAttrs.size(); ++i)		auto Contents = Obj->getSectionContents(&Shdr);
Features.AddFeature(MAttrs[i]);		if (!Contents)
		return "";
		ArrayRef<uint8_t> AttributeSection = *Contents;
		// Check for valid attribute format
		if (AttributeSection[0] != ARMBuildAttrs::Format_Version)
		return "";
		size_t Offset = 1;
		const uint8_t *Data = AttributeSection.data();
		rengolinUnsubmitted Not Done Reply Inline Actions You're converting a pointer to an array ref to another pointer, this is really confusing. Please, pick one style and stick with it. even better, isn't there a way to decode the attributes into a nice structure? @compnerd, you spent more time in the attributes code than I did, any ideas? rengolin: You're converting a pointer to an array ref to another pointer, this is really confusing.
		khemantAuthorUnsubmitted Not Done Reply Inline Actions Got it. The only way to decode the attributes is to at the size a parse them (they are sorted in ULSB format). khemant: Got it. The only way to decode the attributes is to at the size a parse them (they are sorted…
		// Find the aeabi Vendor sub-section
		while (Offset < AttributeSection.size()) {
		Data += Offset;

		// First 4 bytes are object endian length of subsection
		uint32_t Length = reinterpret_cast<const Elf_Word >(Data);

		size_t SubOffset = 4;

		if (std::string("aeabi") !=
		rengolinUnsubmitted Not Done Reply Inline Actions StringRef.startsWith("aeabi")? rengolin: StringRef.startsWith("aeabi")?
		khemantAuthorUnsubmitted Not Done Reply Inline Actions Got it. khemant: Got it.
		reinterpret_cast<const char *>(Data + SubOffset)) {
		Offset += Length;
		continue;
}		}

std::unique_ptr<const MCRegisterInfo> MRI(		SubOffset += StringRef("aeabi").size() + 1;
TheTarget->createMCRegInfo(TripleName));		while (SubOffset < Length) {
		uint8_t Tag = Data[SubOffset];
		SubOffset += sizeof(uint8_t);
		uint32_t Size = reinterpret_cast<const Elf_Word >(Data + SubOffset);
		if (Tag != ARMBuildAttrs::File) {
		// Do not add size of Tag twice
		SubOffset += Size - sizeof(uint8_t);
		continue;
		}
		SubOffset += 4;
		while (SubOffset < Length) {
		rengolinUnsubmitted Not Done Reply Inline Actions there has to be a better way to deal with attributes... rengolin: there has to be a better way to deal with attributes...
		khemantAuthorUnsubmitted Not Done Reply Inline Actions I am open for ideas. AFAIK there is no way to go past an attribute till you read the size, and since each can vary based on attribute, you have to go over them till you find the one you need or finish reading all of them. khemant: I am open for ideas. AFAIK there is no way to go past an attribute till you read the size…
		unsigned TagLen;
		uint32_t Tag = decodeULEB128(Data + SubOffset, &TagLen);
		SubOffset += TagLen;
		if (Tag == ARMBuildAttrs::CPU_name) {
		CpuStr =
		StringRef(reinterpret_cast<const char *>(Data + SubOffset));
		SubOffset += CpuStr.size() + 1;
		continue;
		} else if (Tag == ARMBuildAttrs::ARM_ISA_use) {
		ThumbOnly = false;
		CheckedArm = true;
		} else if (Tag == ARMBuildAttrs::THUMB_ISA_use) {
		CheckedThumb = true;
		}
		if (CheckedArm && CheckedThumb && CpuStr.size() > 0)
		return CpuStr.lower();
		uint32_t ValueLen;
		decodeULEB128(Data + SubOffset, &ValueLen);
		SubOffset += ValueLen;
		}
		Offset += Length;
		}
		}
		}
		return CpuStr.lower();
		}

		static std::string getCPU(const ObjectFile *Obj, bool &ThumbOnly) {
		if (MCPU.size())
		return MCPU;

		if (Obj->isELF()) {
		if (Obj->getArch() == Triple::thumb \|\| Obj->getArch() == Triple::thumbeb \|\|
		Obj->getArch() == Triple::arm \|\| Obj->getArch() == Triple::armeb) {
		if (const ELF32LEObjectFile *ELFObj = dyn_cast<ELF32LEObjectFile>(Obj))
		return getArmCpuDetails(ELFObj->getELFFile(), ThumbOnly);

		// Big-endian 32-bit
		if (const ELF32BEObjectFile *ELFObj = dyn_cast<ELF32BEObjectFile>(Obj))
		return getArmCpuDetails(ELFObj->getELFFile(), ThumbOnly);

		return MCPU;
		}
		}
		return "";
		}

		struct DecoderContext {
		std::unique_ptr<const MCRegisterInfo> MRI;
		std::unique_ptr<const MCAsmInfo> AsmInfo;
		std::unique_ptr<const MCSubtargetInfo> STI;
		std::unique_ptr<const MCInstrInfo> MII;
		std::unique_ptr<const MCObjectFileInfo> MOFI;
		std::unique_ptr<MCContext> Ctx;
		std::unique_ptr<MCDisassembler> DisAsm;
		std::unique_ptr<const MCInstrAnalysis> MIA;
		std::unique_ptr<MCInstPrinter> IP;
		PrettyPrinter *PIP;
		std::string TargetName;

		DecoderContext()
		: MRI(nullptr), AsmInfo(nullptr), STI(nullptr), MII(nullptr),
		MOFI(nullptr), Ctx(nullptr), DisAsm(nullptr), MIA(nullptr), IP(nullptr),
		PIP(nullptr) {}

		void setContext(const Target TheTarget, const ObjectFile Obj,
		std::string TripleName, SubtargetFeatures &Features) {
		TargetName = TheTarget->getName();
		bool ThumbOnly = false;
		std::string CPU = getCPU(Obj, ThumbOnly);
		// A CPU with thumb only architecture cannot have ARM Subtarget
		if (TargetName == "arm" && ThumbOnly)
		CPU = "";
		MRI.reset(TheTarget->createMCRegInfo(TripleName));
if (!MRI)		if (!MRI)
report_fatal_error("error: no register info for target " + TripleName);		report_fatal_error("error: no register info for target " + TripleName);

// Set up disassembler.		// Set up disassembler.
std::unique_ptr<const MCAsmInfo> AsmInfo(		AsmInfo.reset(TheTarget->createMCAsmInfo(*MRI, TripleName));
TheTarget->createMCAsmInfo(*MRI, TripleName));
if (!AsmInfo)		if (!AsmInfo)
report_fatal_error("error: no assembly info for target " + TripleName);		report_fatal_error("error: no assembly info for target " + TripleName);
std::unique_ptr<const MCSubtargetInfo> STI(		STI.reset(TheTarget->createMCSubtargetInfo(TripleName, CPU,
TheTarget->createMCSubtargetInfo(TripleName, MCPU, Features.getString()));		Features.getString()));
if (!STI)		if (!STI)
report_fatal_error("error: no subtarget info for target " + TripleName);		report_fatal_error("error: no subtarget info for target " + TripleName);
std::unique_ptr<const MCInstrInfo> MII(TheTarget->createMCInstrInfo());		MII.reset(TheTarget->createMCInstrInfo());
if (!MII)		if (!MII)
report_fatal_error("error: no instruction info for target " + TripleName);		report_fatal_error("error: no instruction info for target " + TripleName);
std::unique_ptr<const MCObjectFileInfo> MOFI(new MCObjectFileInfo);		MOFI.reset(new MCObjectFileInfo);
MCContext Ctx(AsmInfo.get(), MRI.get(), MOFI.get());		Ctx.reset(new MCContext(AsmInfo.get(), MRI.get(), MOFI.get()));

std::unique_ptr<MCDisassembler> DisAsm(		DisAsm.reset(TheTarget->createMCDisassembler(STI, Ctx));
TheTarget->createMCDisassembler(*STI, Ctx));
if (!DisAsm)		if (!DisAsm)
report_fatal_error("error: no disassembler for target " + TripleName);		report_fatal_error("error: no disassembler for target " + TripleName);

std::unique_ptr<const MCInstrAnalysis> MIA(		MIA.reset(TheTarget->createMCInstrAnalysis(MII.get()));
TheTarget->createMCInstrAnalysis(MII.get()));

int AsmPrinterVariant = AsmInfo->getAssemblerDialect();		int AsmPrinterVariant = AsmInfo->getAssemblerDialect();
std::unique_ptr<MCInstPrinter> IP(TheTarget->createMCInstPrinter(		IP.reset(TheTarget->createMCInstPrinter(
Triple(TripleName), AsmPrinterVariant, AsmInfo, MII, *MRI));		Triple(TripleName), AsmPrinterVariant, AsmInfo, MII, *MRI));
if (!IP)		if (!IP)
report_fatal_error("error: no instruction printer for target " +		report_fatal_error("error: no instruction printer for target " +
TripleName);		TripleName);
IP->setPrintImmHex(PrintImmHex);		IP->setPrintImmHex(PrintImmHex);
PrettyPrinter &PIP = selectPrettyPrinter(Triple(TripleName));		PIP = selectPrettyPrinter(Triple(TripleName));
		}
		};
		static void DisassembleObject(const ObjectFile *Obj, bool InlineRelocs) {
		if (StartAddress > StopAddress)
		error("Start address should be less than stop address");

StringRef Fmt = Obj->getBytesInAddress() > 4 ? "\t\t%016" PRIx64 ": " :		DecoderContext Primary, Secondary;
"\t\t\t%08" PRIx64 ": ";		const Target *TheTarget = getTarget(Obj);
		bool ThumbMode = false;

		// Package up features to be passed to target/subtarget
		SubtargetFeatures Features = Obj->getFeatures();
		if (MAttrs.size()) {
		for (unsigned i = 0; i != MAttrs.size(); ++i)
		Features.AddFeature(MAttrs[i]);
		}
		Primary.setContext(TheTarget, Obj, TripleName, Features);
		if (isArmElf(Obj) && !isAarch64Elf(Obj)) {
		// ARM ELF binaries may have mixed ARM and thumb code. There is no flag
		// except the mapping symbols that mark these boundaries. Create a second
		// disassembler based on what was the first(inferred by MCPU or using
		// triple)
		SubtargetFeatures Features;
		std::string Name = TripleName == "thumb" ? "arm" : "thumb";
		llvm::Triple SecondaryTriple("unknown-unknown-unknown");
		SecondaryTriple.setTriple(Triple::normalize(Name));
		std::string Error;
		const Target *SecondaryTarget =
		TargetRegistry::lookupTarget("", SecondaryTriple, Error);
		if (!TheTarget)
		report_fatal_error("can't find target: " + Error);
		Secondary.setContext(SecondaryTarget, Obj, SecondaryTriple.str(), Features);
		}

		StringRef Fmt = Obj->getBytesInAddress() > 4 ? "\t\t%016" PRIx64 ": "
		: "\t\t\t%08" PRIx64 ": ";

SourcePrinter SP(Obj, TheTarget->getName());		SourcePrinter SP(Obj, TheTarget->getName());

// Create a mapping, RelocSecs = SectionRelocMap[S], where sections		// Create a mapping, RelocSecs = SectionRelocMap[S], where sections
// in RelocSecs contain the relocations for section S.		// in RelocSecs contain the relocations for section S.
std::error_code EC;		std::error_code EC;
std::map<SectionRef, SmallVector<SectionRef, 1>> SectionRelocMap;		std::map<SectionRef, SmallVector<SectionRef, 1>> SectionRelocMap;
for (const SectionRef &Section : ToolSectionFilter(*Obj)) {		for (const SectionRef &Section : ToolSectionFilter(*Obj)) {
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	for (const SectionRef &Section : ToolSectionFilter(*Obj)) {
uint64_t SectionAddr = Section.getAddress();		uint64_t SectionAddr = Section.getAddress();
uint64_t SectSize = Section.getSize();		uint64_t SectSize = Section.getSize();
if (!SectSize)		if (!SectSize)
continue;		continue;

// Get the list of all the symbols in this section.		// Get the list of all the symbols in this section.
SectionSymbolsTy &Symbols = AllSymbols[Section];		SectionSymbolsTy &Symbols = AllSymbols[Section];
std::vector<uint64_t> DataMappingSymsAddr;		std::vector<uint64_t> DataMappingSymsAddr;
std::vector<uint64_t> TextMappingSymsAddr;		std::vector<uint64_t> ThumbMappingSymsAddr;
if (isArmElf(Obj)) {		std::vector<uint64_t> ArmMappingSymsAddr;
		std::vector<uint64_t> AArch64MappingSymsAddr;
		if (isArmElf(Obj) \|\| isAarch64Elf(Obj)) {
for (const auto &Symb : Symbols) {		for (const auto &Symb : Symbols) {
uint64_t Address = std::get<0>(Symb);		uint64_t Address = std::get<0>(Symb);
StringRef Name = std::get<1>(Symb);		StringRef Name = std::get<1>(Symb);
if (Name.startswith("$d"))		if (Name.startswith("$d"))
DataMappingSymsAddr.push_back(Address - SectionAddr);		DataMappingSymsAddr.push_back(Address - SectionAddr);
if (Name.startswith("$x"))		if (Name.startswith("$x"))
TextMappingSymsAddr.push_back(Address - SectionAddr);		AArch64MappingSymsAddr.push_back(Address - SectionAddr);
		rengolinUnsubmitted Not Done Reply Inline Actions AArch64 code doesn't mix with AArch32, so this is conceptually wrong. If this is just to reduce code duplication in this file, I understand, but there could be an error. It'd be clearer, though, if they were in separate blocks, one for ARM and one for AArch64. rengolin: AArch64 code doesn't mix with AArch32, so this is conceptually wrong. If this is just to…
		khemantAuthorUnsubmitted Not Done Reply Inline Actions This is not used for mix of ISA. The $d mapping can exist within a region of code that is treated as a single STT_FUNC symbol. The mapping from $x to $d and then back to $x is used to make sure you disassembler is not thrown off balance by trying to dissemble data. This change existed before this change. I made sure the text is separated into arm and thumb and 64 bit arm instructions. Line 1550 will give you an idea of what I am saying. khemant: This is not used for mix of ISA. The $d mapping can exist within a region of code that is…
if (Name.startswith("$a"))		if (Name.startswith("$a"))
TextMappingSymsAddr.push_back(Address - SectionAddr);		ArmMappingSymsAddr.push_back(Address - SectionAddr);
if (Name.startswith("$t"))		if (Name.startswith("$t"))
TextMappingSymsAddr.push_back(Address - SectionAddr);		ThumbMappingSymsAddr.push_back(Address - SectionAddr);
}		}
}		}

std::sort(DataMappingSymsAddr.begin(), DataMappingSymsAddr.end());		std::sort(DataMappingSymsAddr.begin(), DataMappingSymsAddr.end());
std::sort(TextMappingSymsAddr.begin(), TextMappingSymsAddr.end());		std::sort(ArmMappingSymsAddr.begin(), ArmMappingSymsAddr.end());
		std::sort(ThumbMappingSymsAddr.begin(), ThumbMappingSymsAddr.end());
		std::sort(AArch64MappingSymsAddr.begin(), AArch64MappingSymsAddr.end());

// Make a list of all the relocations for this section.		// Make a list of all the relocations for this section.
std::vector<RelocationRef> Rels;		std::vector<RelocationRef> Rels;
if (InlineRelocs) {		if (InlineRelocs) {
for (const SectionRef &RelocSec : SectionRelocMap[Section]) {		for (const SectionRef &RelocSec : SectionRelocMap[Section]) {
for (const RelocationRef &Reloc : RelocSec.relocations()) {		for (const RelocationRef &Reloc : RelocSec.relocations()) {
Rels.push_back(Reloc);		Rels.push_back(Reloc);
}		}
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	#endif
// skip byte by byte till StartAddress is reached		// skip byte by byte till StartAddress is reached
Size = 1;		Size = 1;
continue;		continue;
}		}
// AArch64 ELF binaries can interleave data and text in the		// AArch64 ELF binaries can interleave data and text in the
// same section. We rely on the markers introduced to		// same section. We rely on the markers introduced to
// understand what we need to dump. If the data marker is within a		// understand what we need to dump. If the data marker is within a
// function, it is denoted as a word/short etc		// function, it is denoted as a word/short etc
if (isArmElf(Obj) && std::get<2>(Symbols[si]) != ELF::STT_OBJECT &&		if ((isArmElf(Obj) \|\| isAarch64Elf(Obj)) &&
!DisassembleAll) {		std::get<2>(Symbols[si]) != ELF::STT_OBJECT && !DisassembleAll) {
uint64_t Stride = 0;		uint64_t Stride = 0;

auto DAI = std::lower_bound(DataMappingSymsAddr.begin(),		auto DAI = std::lower_bound(DataMappingSymsAddr.begin(),
DataMappingSymsAddr.end(), Index);		DataMappingSymsAddr.end(), Index);
if (DAI != DataMappingSymsAddr.end() && *DAI == Index) {		if (DAI != DataMappingSymsAddr.end() && *DAI == Index) {
// Switch to data.		// Switch to data.
while (Index < End) {		while (Index < End) {
outs() << format("%8" PRIx64 ":", SectionAddr + Index);		outs() << format("%8" PRIx64 ":", SectionAddr + Index);
Show All 34 Lines	#endif
} else {		} else {
Stride = 1;		Stride = 1;
dumpBytes(Bytes.slice(Index, 1), outs());		dumpBytes(Bytes.slice(Index, 1), outs());
outs() << "\t\t.byte\t";		outs() << "\t\t.byte\t";
outs() << "0x" << format("%02" PRIx8, Bytes.slice(Index, 1)[0]);		outs() << "0x" << format("%02" PRIx8, Bytes.slice(Index, 1)[0]);
}		}
Index += Stride;		Index += Stride;
outs() << "\n";		outs() << "\n";
auto TAI = std::lower_bound(TextMappingSymsAddr.begin(),		auto TAI = std::lower_bound(ThumbMappingSymsAddr.begin(),
TextMappingSymsAddr.end(), Index);		ThumbMappingSymsAddr.end(), Index);
if (TAI != TextMappingSymsAddr.end() && *TAI == Index)		auto ARMI = std::lower_bound(ArmMappingSymsAddr.begin(),
		ArmMappingSymsAddr.end(), Index);
		auto AArchI =
		std::lower_bound(AArch64MappingSymsAddr.begin(),
		AArch64MappingSymsAddr.end(), Index);

		if ((TAI != ThumbMappingSymsAddr.end() && *TAI == Index) \|\|
		(ARMI != ArmMappingSymsAddr.end() && *ARMI == Index) \|\|
		(AArchI != AArch64MappingSymsAddr.end() && *AArchI == Index))
break;		break;
}		}
}		}
}		}

// If there is a data symbol inside an ELF text section and we are only		// If there is a data symbol inside an ELF text section and we are only
// disassembling text (applicable all architectures),		// disassembling text (applicable all architectures),
// we are in a situation where we must print the data and not		// we are in a situation where we must print the data and not
Show All 34 Lines	#endif
outs() << '\n';		outs() << '\n';
NumBytes = 0;		NumBytes = 0;
}		}
}		}
}		}
if (Index >= End)		if (Index >= End)
break;		break;

		DecoderContext *Disassembler = nullptr;
		if (isArmElf(Obj) && !isAarch64Elf(Obj)) {
		auto TAI = std::lower_bound(ThumbMappingSymsAddr.begin(),
		ThumbMappingSymsAddr.end(), Index);
		auto ARMI = std::lower_bound(ArmMappingSymsAddr.begin(),
		ArmMappingSymsAddr.end(), Index);
		if (TAI != ThumbMappingSymsAddr.end() && *TAI == Index)
		ThumbMode = true;
		else if (ARMI != ArmMappingSymsAddr.end() && *ARMI == Index)
		ThumbMode = false;

		if (ThumbMode)
		Disassembler =
		Primary.TargetName == "thumb" ? &Primary : &Secondary;
		else
		Disassembler = Primary.TargetName == "arm" ? &Primary : &Secondary;
		} else {
		Disassembler = &Primary;
		}
// Disassemble a real instruction or a data when disassemble all is		// Disassemble a real instruction or a data when disassemble all is
// provided		// provided
bool Disassembled = DisAsm->getInstruction(Inst, Size, Bytes.slice(Index),		bool Disassembled = Disassembler->DisAsm->getInstruction(
SectionAddr + Index, DebugOut,		Inst, Size, Bytes.slice(Index), SectionAddr + Index, DebugOut,
CommentStream);		CommentStream);
		// If ARM, try another disassembler if the first oen failed
		if (!Disassembled && isArmElf(Obj)) {
		if (Disassembler == &Primary && Secondary.DisAsm)
		Disassembler = &Secondary;
		else if (Disassembler == &Secondary && Primary.DisAsm)
		Disassembler = &Primary;
		if (Disassembler->DisAsm)
		Disassembled = Disassembler->DisAsm->getInstruction(
		Inst, Size, Bytes.slice(Index), SectionAddr + Index, DebugOut,
		CommentStream);
		}
if (Size == 0)		if (Size == 0)
Size = 1;		Size = 1;

PIP.printInst(*IP, Disassembled ? &Inst : nullptr,		Disassembler->PIP->printInst(
		*Disassembler->IP, Disassembled ? &Inst : nullptr,
Bytes.slice(Index, Size), SectionAddr + Index, outs(), "",		Bytes.slice(Index, Size), SectionAddr + Index, outs(), "",
*STI, &SP);		*Disassembler->STI, &SP);
outs() << CommentStream.str();		outs() << CommentStream.str();
Comments.clear();		Comments.clear();

// Try to resolve the target of a call, tail call, etc. to a specific		// Try to resolve the target of a call, tail call, etc. to a specific
// symbol.		// symbol.
		auto MIA = Disassembler->MIA.get();
if (MIA && (MIA->isCall(Inst) \|\| MIA->isUnconditionalBranch(Inst) \|\|		if (MIA && (MIA->isCall(Inst) \|\| MIA->isUnconditionalBranch(Inst) \|\|
MIA->isConditionalBranch(Inst))) {		MIA->isConditionalBranch(Inst))) {
uint64_t Target;		uint64_t Target;
if (MIA->evaluateBranch(Inst, SectionAddr + Index, Size, Target)) {		if (MIA->evaluateBranch(Inst, SectionAddr + Index, Size, Target)) {
// In a relocatable object, the target's section must reside in		// In a relocatable object, the target's section must reside in
// the same section as the call instruction or it is accessed		// the same section as the call instruction or it is accessed
// through a relocation.		// through a relocation.
//		//
▲ Show 20 Lines • Show All 597 Lines • Show Last 20 Lines