This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/CommandGuide/
-
CommandGuide/
-
llvm-objdump.rst
-
test/tools/llvm-objdump/
-
tools/
-
llvm-objdump/
-
Inputs/
-
call-graph-section
-
call-graph-info.test
-
tools/llvm-objdump/
-
llvm-objdump/
1/1
ObjdumpOpts.td
7/7
llvm-objdump.cpp

Differential D105917

[llvm-objdump][CallGraphSection] Extract call graph information from binary
AbandonedPublic

Authored by necipfazil on Jul 13 2021, 10:42 AM.

Download Raw Diff

Details

Reviewers

morehouse
kcc
llvm-commits
jhenderson
MaskRay
lattner

Summary

Introduce –call-graph-info option to llvm-objdump to dump call graph
information. Output includes information for type identifiers for indirect
calls and targets from call graph section, if available.

Original RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151044.html
Updated RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-July/151739.html

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

necipfazil created this revision.Jul 13 2021, 10:42 AM

Herald added a reviewer: jhenderson. · View Herald TranscriptJul 13 2021, 10:42 AM

Herald added a reviewer: MaskRay. · View Herald Transcript

Herald added a subscriber: rupprecht. · View Herald Transcript

necipfazil requested review of this revision.Jul 13 2021, 10:42 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2021, 10:42 AM

necipfazil added parent revisions: D105916: [AsmPrinter][CallGraphSection] Emit call graph section, D105915: [CallSiteInfo][CallGraphSection] Extend CallSiteInfo for indirect call type ids, D105911: [CallGraphSection] Introduce CGSectionFuncComdatCreator pass, D105907: [CallGraphSection] Add call graph section options and documentation, D105909: [clang][CallGraphSection] Add type id metadata to indirect call and targets.Jul 13 2021, 10:42 AM

Harbormaster completed remote builds in B113793: Diff 358345.Jul 13 2021, 12:21 PM

tschuett added a subscriber: tschuett.Jul 14 2021, 2:48 PM

Adapt to the new call graph section layout, refactor

Parse the call graph section based on the new section layout
Refactor code for readability and efficiency
Account for functions that call graph section has no info for, and them as indirect target with unknown type id.
Add additional warning messages (for functions that call graph section has no info for)

Harbormaster completed remote builds in B114709: Diff 359594.Jul 17 2021, 9:03 PM

morehouse added inline comments.Jul 20 2021, 9:16 AM

llvm/tools/llvm-objdump/llvm-objdump.cpp
1093
1471	Is `MIA->isIndirectBranch() && MIA->isCall()` sufficient to detect an indirect call?
2156–2158
2220–2226
2228–2242	Nit: the above two loops could be combined.
2250–2266	Nit: the above two loops could be combined.

lattner resigned from this revision.Jul 20 2021, 9:44 PM

Fix small nits

Harbormaster completed remote builds in B115436: Diff 360624.Jul 21 2021, 3:33 PM

necipfazil marked 6 inline comments as done.Jul 21 2021, 3:34 PM

necipfazil added inline comments.

llvm/tools/llvm-objdump/llvm-objdump.cpp
1471	`MIA->isIndirectBranch()` evaluates to `false` for any call instruction.

LGTM

This revision is now accepted and ready to land.Jul 22 2021, 8:41 AM

Please don't commit this yet. there's a lot of code, and I'd like to spend quite some time reviewing it - a first glance through raises some initial concerns, but I need to understand the whole thing to be able to properly consider them and possible suggestions.

Some initial high-level comments/questions:

Is it possible to break this patch down into smaller parts, perhaps implementing some subset of the functionality per part? That'll make things easier to review.
Why is this functionality being added to llvm-objdump rather than llvm-readobj? (There may be a perfectly valid reason to do so, but llvm-readobj tends to be the place we put things to interpret arbitrary sections).
Can we avoid the canned binary in the test, please? Use an input generated at runtime using either llvm-mc or yaml2obj (the latter is preferable, but likely requires adding functionality to yaml2obj for the new section type).

llvm/tools/llvm-objdump/ObjdumpOpts.td
36	Please place this in alphabetical order in relation to other options (i.e. before demangle).

This revision now requires changes to proceed.Jul 26 2021, 12:42 AM

In D105917#2903706, @jhenderson wrote:

Thanks for your feedback.

Please don't commit this yet. there's a lot of code, and I'd like to spend quite some time reviewing it - a first glance through raises some initial concerns, but I need to understand the whole thing to be able to properly consider them and possible suggestions.

Some initial high-level comments/questions:

Is it possible to break this patch down into smaller parts, perhaps implementing some subset of the functionality per part? That'll make things easier to review.

This patch is now split to 6 smaller patches: D107028, D107029, D107030, D107031, D107032, D107033. We can now abandon this revision.

Why is this functionality being added to llvm-objdump rather than llvm-readobj? (There may be a perfectly valid reason to do so, but llvm-readobj tends to be the place we put things to interpret arbitrary sections).

We would like to dissasemble the object for information on functions and call sites. The first attempt was to use llvm-readobj but llvm-objdump seems to provide a large amount of code reuse for our case. Especially see D107029 for functions and D107030 for call sites.

Can we avoid the canned binary in the test, please? Use an input generated at runtime using either llvm-mc or yaml2obj (the latter is preferable, but likely requires adding functionality to yaml2obj for the new section type).

I added a bunch of additional (and smaller) llvm-mc/yaml2obj tests with the new set of patches.

Edit: not -> now

If I get time, I will look at the patch series next week. Before you go ahead with landing anything though, I think you need to demonstrate that this feature is both useful and desired by the community: this is quite a large amount of code, and on my first glance, neither RFC attracted any feedback, which may suggest people aren't interested in this functionality.

In D105917#2912614, @jhenderson wrote:

If I get time, I will look at the patch series next week. Before you go ahead with landing anything though, I think you need to demonstrate that this feature is both useful and desired by the community: this is quite a large amount of code, and on my first glance, neither RFC attracted any feedback, which may suggest people aren't interested in this functionality.

We expect this feature to be useful for sanitizers, especially hardware-supported memory tagging, as it allows us to compress allocation/deallocation stack traces and reconstruct them offline. This will reduce memory overhead from stack traces by 16x and make production deployment feasible in more scenarios. I'd argue that this alone makes the added complexity worthwhile.

I also imagine the call graph would be useful for guided fuzzing (e.g., funcA calls funcB, and we want coverage of funcB, so we mutate inputs that touch funcA).

Abandoning this revision as it is split.

necipfazil removed parent revisions: D105909: [clang][CallGraphSection] Add type id metadata to indirect call and targets, D105907: [CallGraphSection] Add call graph section options and documentation, D105911: [CallGraphSection] Introduce CGSectionFuncComdatCreator pass, D105915: [CallSiteInfo][CallGraphSection] Extend CallSiteInfo for indirect call type ids, D105916: [AsmPrinter][CallGraphSection] Emit call graph section.Jul 29 2021, 3:10 PM

Revision Contents

Path

Size

llvm/

docs/

CommandGuide/

llvm-objdump.rst

5 lines

test/

tools/

llvm-objdump/

Inputs/

call-graph-section

call-graph-info.test

70 lines

tools/

llvm-objdump/

ObjdumpOpts.td

4 lines

llvm-objdump.cpp

319 lines

Diff 360624

llvm/docs/CommandGuide/llvm-objdump.rst

	Show All 19 Lines
	--------			--------
	At least one of the following commands are required, and some commands can be			At least one of the following commands are required, and some commands can be
	combined with other commands:			combined with other commands:

	.. option:: -a, --archive-headers			.. option:: -a, --archive-headers

	Display the information contained within an archive's headers.			Display the information contained within an archive's headers.

				.. option:: --call-graph-info

				Dump call graph information including indirect call and target IDs from call
				graph section, if available.

	.. option:: -d, --disassemble			.. option:: -d, --disassemble

	Disassemble all text sections found in the input files.			Disassemble all text sections found in the input files.

	.. option:: -D, --disassemble-all			.. option:: -D, --disassemble-all

	Disassemble all sections found in the input files.			Disassemble all sections found in the input files.

	▲ Show 20 Lines • Show All 369 Lines • Show Last 20 Lines

llvm/test/tools/llvm-objdump/Inputs/call-graph-section

llvm/test/tools/llvm-objdump/call-graph-info.test

This file was added.

				; RUN: llvm-objdump --call-graph-info %p/Inputs/call-graph-section \| FileCheck %s

				; Source for call-graph-section
				; To regenerate this file:
				; clang -cc1 -triple x86_64-unknown-linux -fcall-graph-section call-graph-section.c -o call-graph-section
				;
				; void foo() {
				; }
				;
				; int bar(char a) {
				; return 0;
				; }
				;
				; int* baz(char* a) {
				; return 0;
				; }
				;
				; int main() {
				; void (*fp_foo)() = foo;
				; fp_foo();
				;
				; char a;
				; int (*fp_bar)(char) = bar;
				; fp_bar(a);
				;
				; int* (fp_baz)(char) = baz;
				; fp_baz(&a);
				;
				; foo();
				; bar(a);
				; baz(&a);
				;
				; return 0;
				; }

				; The generalized type ids for functions are as follows (ITANIUM):
				; function \| clang type id \| call graph type id (MD5 hash)
				; foo \| _ZTSFvE.generalized \| 3ecbeef531f74424
				; bar \| _ZTSFicE.generalized \| 308e4b8159bc8654
				; baz \| _ZTSFPvS_E.generalized \| 77fd97f81468de7a
				; main \| _ZTSFiE.generalized \| fa6809609a76afca

				; CHECK: INDIRECT TARGET TYPES (TYPEID [FUNC_ADDR,])
				; CHECK-DAG: 77fd97f81468de7a [[BAZ_PC:[[:xdigit:]]+]]
				; CHECK-DAG: 308e4b8159bc8654 [[BAR_PC:[[:xdigit:]]+]]
				; CHECK-DAG: fa6809609a76afca [[MAIN_PC:[[:xdigit:]]+]]
				; CHECK-DAG: 3ecbeef531f74424 [[FOO_PC:[[:xdigit:]]+]]
				;
				; CHECK: INDIRECT CALL TYPES (TYPEID [CALL_SITE_ADDR,])
				; CHECK-DAG: 77fd97f81468de7a [[IDIRCALL_TO_BAZ_CALLSITEPC:[[:xdigit:]]+]]
				; CHECK-DAG: 308e4b8159bc8654 [[IDIRCALL_TO_BAR_CALLSITEPC:[[:xdigit:]]+]]
				; CHECK-DAG: 3ecbeef531f74424 [[IDIRCALL_TO_FOO_CALLSITEPC:[[:xdigit:]]+]]
				;
				; CHECK: INDIRECT CALL SITES (CALLER_ADDR [CALL_SITE_ADDR,])
				; CHECK: [[MAIN_PC]]
				; CHECK-SAME: [[IDIRCALL_TO_FOO_CALLSITEPC]]
				; CHECK-SAME: [[IDIRCALL_TO_BAR_CALLSITEPC]]
				; CHECK-SAME: [[IDIRCALL_TO_BAZ_CALLSITEPC]]
				;
				; CHECK: DIRECT CALL SITES (CALLER_ADDR [(CALL_SITE_ADDR, TARGET_ADDR),])
				; CHECK: [[MAIN_PC]]
				; CHECK-SAME: {{[[:xdigit:]]+}} [[FOO_PC]]
				; CHECK-SAME: {{[[:xdigit:]]+}} [[BAR_PC]]
				; CHECK-SAME: {{[[:xdigit:]]+}} [[BAZ_PC]]
				;
				; CHECK: FUNCTIONS (FUNC_ENTRY_ADDR, SYM_NAME)
				; CHECK-DAG: [[MAIN_PC]] main
				; CHECK-DAG: [[FOO_PC]] foo
				; CHECK-DAG: [[BAR_PC]] bar
				; CHECK-DAG: [[BAZ_PC]] baz

llvm/tools/llvm-objdump/ObjdumpOpts.td

Show All 25 Lines

def archive_headers : Flag<["--"], "archive-headers">,

HelpText<"Display archive header information">;

def : Flag<["-"], "a">, Alias<archive_headers>,

HelpText<"Alias for --archive-headers">;

def demangle : Flag<["--"], "demangle">, HelpText<"Demangle symbol names">;

def : Flag<["-"], "C">, Alias<demangle>, HelpText<"Alias for --demangle">;

def call_graph_info : Flag<["--"], "call-graph-info">,

HelpText<"Dump call graph information including indirect call and target IDs "

"from call graph section, if available.">;

jhendersonUnsubmitted

Done

HelpText<"Dump call graph information including indirect call and target IDs "

- "from call graph section, if available.">;

+ "from call graph section, if available">;

def disassemble : Flag<["--"], "disassemble">,

Please place this in alphabetical order in relation to other options (i.e. before demangle).

jhenderson: Please place this in alphabetical order in relation to other options (i.e. before demangle).

def disassemble : Flag<["--"], "disassemble">,

HelpText<"Display assembler mnemonics for the machine instructions">;

def : Flag<["-"], "d">, Alias<disassemble>, HelpText<"Alias for --disassemble">;

def disassemble_all : Flag<["--"], "disassemble-all">,

HelpText<"Display assembler mnemonics for the machine instructions">;

def : Flag<["-"], "D">, Alias<disassemble_all>,

HelpText<"Alias for --disassemble-all">;

▲ Show 20 Lines • Show All 282 Lines • Show Last 20 Lines

llvm/tools/llvm-objdump/llvm-objdump.cpp

Show First 20 Lines • Show All 172 Lines • ▼ Show 20 Lines

#define DEBUG_TYPE "objdump" #define DEBUG_TYPE "objdump"

static uint64_t AdjustVMA; static uint64_t AdjustVMA;

static bool AllHeaders; static bool AllHeaders;

static std::string ArchName; static std::string ArchName;

bool objdump::ArchiveHeaders; bool objdump::ArchiveHeaders;

bool objdump::Demangle; bool objdump::Demangle;

static bool CallGraphInfo;

bool objdump::Disassemble; bool objdump::Disassemble;

bool objdump::DisassembleAll; bool objdump::DisassembleAll;

bool objdump::SymbolDescription; bool objdump::SymbolDescription;

static std::vector<std::string> DisassembleSymbols; static std::vector<std::string> DisassembleSymbols;

static bool DisassembleZeroes; static bool DisassembleZeroes;

static std::vector<std::string> DisassemblerOptions; static std::vector<std::string> DisassemblerOptions;

DIDumpType objdump::DwarfDumpType; DIDumpType objdump::DwarfDumpType;

static bool DynamicRelocations; static bool DynamicRelocations;

Show All 25 Lines

static bool SymbolizeOperands; static bool SymbolizeOperands;

static bool DynamicSymbolTable; static bool DynamicSymbolTable;

std::string objdump::TripleName; std::string objdump::TripleName;

bool objdump::UnwindInfo; bool objdump::UnwindInfo;

static bool Wide; static bool Wide;

std::string objdump::Prefix; std::string objdump::Prefix;

uint32_t objdump::PrefixStrip; uint32_t objdump::PrefixStrip;

static bool QuietDisasm;

enum FunctionKind {

NOT_INDIRECT_TARGET = 0,

INDIRECT_TARGET_UNKNOWN_TID = 1,

INDIRECT_TARGET_KNOWN_TID = 2,

// available in the binary but not listed in the call graph section.

NOT_LISTED = -1,

};

struct FunctionInfo {

std::string Name;

FunctionKind Kind;

using DirectCallSite = std::pair<uint64_t /*CallSite*/, uint64_t /*Callee*/>;

SmallVector<DirectCallSite> DirectCallSites;

SmallVector<uint64_t> IndirectCallSites;

};

// Map function entry pc to function info. This is inclusive of all functions

// regardless of whether they are listed in the call graph section.

DenseMap<uint64_t, FunctionInfo> FuncInfo;

// Set of all indirect call sites.

DenseSet<uint64_t> IndirectCallSites;

DebugVarsFormat objdump::DbgVariables = DVDisabled; DebugVarsFormat objdump::DbgVariables = DVDisabled;

int objdump::DbgIndent = 52; int objdump::DbgIndent = 52;

static StringSet<> DisasmSymbolSet; static StringSet<> DisasmSymbolSet;

StringSet<> objdump::FoundSectionSet; StringSet<> objdump::FoundSectionSet;

static StringRef ToolName; static StringRef ToolName;

▲ Show 20 Lines • Show All 827 Lines • ▼ Show 20 Lines if (!Comments.empty()) {

FOS << MAI.getCommentString() << ' ' << Comment; FOS << MAI.getCommentString() << ' ' << Comment;

} }

LVP.printAfterInst(FOS); LVP.printAfterInst(FOS);

FOS << '\n'; FOS << '\n';

} while (!Comments.empty()); } while (!Comments.empty());

FOS.flush(); FOS.flush();

} }

static raw_ostream &disasmOuts() { return QuietDisasm ? nulls() : outs(); }

morehouseUnsubmitted

Done

FOS.flush();

}

- static raw_ostream &disAsmOuts() { return QuietDisasm ? nulls() : outs(); }

+ static raw_ostream &disasmOuts() { return QuietDisasm ? nulls() : outs(); }

static void disassembleObject(const Target *TheTarget, const ObjectFile *Obj,

morehouse:

static void disassembleObject(const Target *TheTarget, const ObjectFile *Obj, static void disassembleObject(const Target *TheTarget, const ObjectFile *Obj,

MCContext &Ctx, MCDisassembler *PrimaryDisAsm, MCContext &Ctx, MCDisassembler *PrimaryDisAsm,

MCDisassembler *SecondaryDisAsm, MCDisassembler *SecondaryDisAsm,

const MCInstrAnalysis *MIA, MCInstPrinter *IP, const MCInstrAnalysis *MIA, MCInstPrinter *IP,

const MCSubtargetInfo *PrimarySTI, const MCSubtargetInfo *PrimarySTI,

const MCSubtargetInfo *SecondarySTI, const MCSubtargetInfo *SecondarySTI,

PrettyPrinter &PIP, PrettyPrinter &PIP,

SourcePrinter &SP, bool InlineRelocs) { SourcePrinter &SP, bool InlineRelocs) {

▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines if (DbgVariables != DVDisabled) {

DICtx = DWARFContext::create(*Obj); DICtx = DWARFContext::create(*Obj);

for (const std::unique_ptr<DWARFUnit> &CU : DICtx->compile_units()) for (const std::unique_ptr<DWARFUnit> &CU : DICtx->compile_units())

LVP.addCompileUnit(CU->getUnitDIE(false)); LVP.addCompileUnit(CU->getUnitDIE(false));

} }

LLVM_DEBUG(LVP.dump()); LLVM_DEBUG(LVP.dump());

for (const SectionRef &Section : ToolSectionFilter(*Obj)) { for (const SectionRef &Section : ToolSectionFilter(*Obj)) {

if (FilterSections.empty() && !DisassembleAll && if (((FilterSections.empty() && !DisassembleAll) || CallGraphInfo) &&

(!Section.isText() || Section.isVirtual())) (!Section.isText() || Section.isVirtual()))

continue; continue;

uint64_t SectionAddr = Section.getAddress(); uint64_t SectionAddr = Section.getAddress();

uint64_t SectSize = Section.getSize(); uint64_t SectSize = Section.getSize();

if (!SectSize) if (!SectSize)

continue; continue;

▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines for (unsigned SI = 0, SE = Symbols.size(); SI != SE; ++SI) {

End = std::min(End, Symbols[SI + 1].Addr); End = std::min(End, Symbols[SI + 1].Addr);

if (Start >= End || End <= StartAddress) if (Start >= End || End <= StartAddress)

continue; continue;

Start -= SectionAddr; Start -= SectionAddr;

End -= SectionAddr; End -= SectionAddr;

if (!PrintedSection) { if (!PrintedSection) {

PrintedSection = true; PrintedSection = true;

outs() << "\nDisassembly of section "; disasmOuts() << "\nDisassembly of section ";

if (!SegmentName.empty()) if (!SegmentName.empty())

outs() << SegmentName << ","; disasmOuts() << SegmentName << ",";

outs() << SectionName << ":\n"; disasmOuts() << SectionName << ":\n";

} }

outs() << '\n'; disasmOuts() << '\n';

if (LeadingAddr) if (LeadingAddr)

outs() << format(Is64Bits ? "%016" PRIx64 " " : "%08" PRIx64 " ", disasmOuts() << format(Is64Bits ? "%016" PRIx64 " " : "%08" PRIx64 " ",

SectionAddr + Start + VMAAdjustment); SectionAddr + Start + VMAAdjustment);

if (Obj->isXCOFF() && SymbolDescription) { if (Obj->isXCOFF() && SymbolDescription) {

outs() << getXCOFFSymbolDescription(Symbols[SI], SymbolName) << ":\n"; disasmOuts() << getXCOFFSymbolDescription(Symbols[SI], SymbolName)

<< ":\n";

} else } else

outs() << '<' << SymbolName << ">:\n"; disasmOuts() << '<' << SymbolName << ">:\n";

// Don't print raw contents of a virtual section. A virtual section // Don't print raw contents of a virtual section. A virtual section

// doesn't have any contents in the file. // doesn't have any contents in the file.

if (Section.isVirtual()) { if (Section.isVirtual()) {

outs() << "...\n"; disasmOuts() << "...\n";

continue; continue;

} }

auto Status = DisAsm->onSymbolStart(Symbols[SI], Size, auto Status = DisAsm->onSymbolStart(Symbols[SI], Size,

Bytes.slice(Start, End - Start), Bytes.slice(Start, End - Start),

SectionAddr + Start, CommentStream); SectionAddr + Start, CommentStream);

// To have round trippable disassembly, we fall back to decoding the // To have round trippable disassembly, we fall back to decoding the

// remaining bytes as instructions. // remaining bytes as instructions.

// //

// If there is a failure, we disassemble the failed region as bytes before // If there is a failure, we disassemble the failed region as bytes before

// falling back. The target is expected to print nothing in this case. // falling back. The target is expected to print nothing in this case.

// //

// If there is Success or SoftFail i.e no 'real' failure, we go ahead by // If there is Success or SoftFail i.e no 'real' failure, we go ahead by

// Size bytes before falling back. // Size bytes before falling back.

// So if the entire symbol is 'eaten' by the target: // So if the entire symbol is 'eaten' by the target:

// Start += Size // Now Start = End and we will never decode as // Start += Size // Now Start = End and we will never decode as

// // instructions // // instructions

// //

// Right now, most targets return None i.e ignore to treat a symbol // Right now, most targets return None i.e ignore to treat a symbol

// separately. But WebAssembly decodes preludes for some symbols. // separately. But WebAssembly decodes preludes for some symbols.

// //

if (Status.hasValue()) { if (Status.hasValue()) {

if (Status.getValue() == MCDisassembler::Fail) { if (Status.getValue() == MCDisassembler::Fail) {

outs() << "// Error in decoding " << SymbolName disasmOuts() << "// Error in decoding " << SymbolName

<< " : Decoding failed region as bytes.\n"; << " : Decoding failed region as bytes.\n";

for (uint64_t I = 0; I < Size; ++I) { for (uint64_t I = 0; I < Size; ++I) {

outs() << "\t.byte\t " << format_hex(Bytes[I], 1, /*Upper=*/true) disasmOuts() << "\t.byte\t "

<< "\n"; << format_hex(Bytes[I], 1, /*Upper=*/true) << "\n";

} }

} else { } else {

Size = 0; Size = 0;

} }

Start += Size; Start += Size;

Show All 11 Lines for (unsigned SI = 0, SE = Symbols.size(); SI != SE; ++SI) {

Index = End; Index = End;

} }

bool CheckARMELFData = hasMappingSymbols(Obj) && bool CheckARMELFData = hasMappingSymbols(Obj) &&

Symbols[SI].Type != ELF::STT_OBJECT && Symbols[SI].Type != ELF::STT_OBJECT &&

!DisassembleAll; !DisassembleAll;

bool DumpARMELFData = false; bool DumpARMELFData = false;

formatted_raw_ostream FOS(outs()); formatted_raw_ostream FOS(disasmOuts());

std::unordered_map<uint64_t, std::string> AllLabels; std::unordered_map<uint64_t, std::string> AllLabels;

if (SymbolizeOperands) if (SymbolizeOperands)

collectLocalBranchTargets(Bytes, MIA, DisAsm, IP, PrimarySTI, collectLocalBranchTargets(Bytes, MIA, DisAsm, IP, PrimarySTI,

SectionAddr, Index, End, AllLabels); SectionAddr, Index, End, AllLabels);

if (CallGraphInfo && Symbols[SI].Type == ELF::STT_FUNC) {

auto FuncPc = Symbols[SI].Addr;

auto FuncName = Symbols[SI].Name.str();

FuncInfo[FuncPc].Name = FuncName;

// Initalize to be later updated while parsing the call graph section.

FuncInfo[FuncPc].Kind = NOT_LISTED;

}

while (Index < End) { while (Index < End) {

// ARM and AArch64 ELF binaries can interleave data and text in the // ARM and AArch64 ELF binaries can interleave data and text in the

// same section. We rely on the markers introduced to understand what // same section. We rely on the markers introduced to understand what

// we need to dump. If the data marker is within a function, it is // we need to dump. If the data marker is within a function, it is

// denoted as a word/short etc. // denoted as a word/short etc.

if (CheckARMELFData) { if (CheckARMELFData) {

char Kind = getMappingSymbolKind(MappingSymbols, Index); char Kind = getMappingSymbolKind(MappingSymbols, Index);

DumpARMELFData = Kind == 'd'; DumpARMELFData = Kind == 'd';

▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines for (unsigned SI = 0, SE = Symbols.size(); SI != SE; ++SI) {

DisAsm->getInstruction(Inst, Size, Bytes.slice(Index), DisAsm->getInstruction(Inst, Size, Bytes.slice(Index),

SectionAddr + Index, CommentStream); SectionAddr + Index, CommentStream);

if (Size == 0) if (Size == 0)

Size = 1; Size = 1;

LVP.update({Index, Section.getIndex()}, LVP.update({Index, Section.getIndex()},

{Index + Size, Section.getIndex()}, Index + Size != End); {Index + Size, Section.getIndex()}, Index + Size != End);

if (CallGraphInfo) {

if (Disassembled && MIA->isCall(Inst)) {

// Call site address is the address of the instruction just

// next to the call instruction. This is the return address

// as appears on the stack trace.

uint64_t CallSitePc = SectionAddr + Index + Size;

uint64_t CallerPc = Symbols[SI].Addr;

// Check the operands to decide whether this is an direct or

// indirect call.

// Assumption: a call instruction with at least one register

// operand is an indirect call. Otherwise, it is a direct call

// with exactly one immediate operand.

morehouseUnsubmitted

Done

Is MIA->isIndirectBranch() && MIA->isCall() sufficient to detect an indirect call?

morehouse: Is `MIA->isIndirectBranch() && MIA->isCall()` sufficient to detect an indirect call?

necipfazilAuthorUnsubmitted

Done

MIA->isIndirectBranch() evaluates to false for any call instruction.

necipfazil: `MIA->isIndirectBranch()` evaluates to `false` for any call instruction.

bool HasRegOperand = false;

unsigned int ImmOperandCount = 0;

const MCOperand *ImmOperand = NULL;

for (unsigned int I = 0; I < Inst.getNumOperands(); I++) {

const auto &Operand = Inst.getOperand(I);

if (Operand.isReg()) {

HasRegOperand = true;

} else if (Operand.isImm()) {

ImmOperandCount++;

ImmOperand = &Operand;

}

// Check if the assumption holds true.

assert(HasRegOperand ||

(!HasRegOperand && ImmOperandCount == 1) &&

"Call instruction is expected to have at least one "

"register operand (i.e., indirect call) or exactly "

"one immediate operand (i.e., direct call).");

if (HasRegOperand) {

// Indirect call.

IndirectCallSites.insert(CallSitePc);

FuncInfo[CallerPc].IndirectCallSites.push_back(CallSitePc);

} else {

// Direct call.

uint64_t CalleePc;

bool Res = MIA->evaluateBranch(Inst, SectionAddr + Index, Size,

CalleePc);

assert(Res && "Failed to evaluate direct call target address.");

FuncInfo[CallerPc].DirectCallSites.emplace_back(CallSitePc,

CalleePc);

}

IP->setCommentStream(CommentStream); IP->setCommentStream(CommentStream);

PIP.printInst( PIP.printInst(

*IP, Disassembled ? &Inst : nullptr, Bytes.slice(Index, Size), *IP, Disassembled ? &Inst : nullptr, Bytes.slice(Index, Size),

{SectionAddr + Index + VMAAdjustment, Section.getIndex()}, FOS, {SectionAddr + Index + VMAAdjustment, Section.getIndex()}, FOS,

"", *STI, &SP, Obj->getFileName(), &Rels, LVP); "", *STI, &SP, Obj->getFileName(), &Rels, LVP);

IP->setCommentStream(llvm::nulls()); IP->setCommentStream(llvm::nulls());

▲ Show 20 Lines • Show All 597 Lines • ▼ Show 20 Lines void objdump::printSymbol(const ObjectFile *O, const SymbolRef &Symbol,

} }

if (Demangle) if (Demangle)

outs() << ' ' << demangle(std::string(Name)) << '\n'; outs() << ' ' << demangle(std::string(Name)) << '\n';

else else

outs() << ' ' << Name << '\n'; outs() << ' ' << Name << '\n';

} }

static void printCallGraphInfo(const ObjectFile *Obj) {

// Get direct and indirect calls through disassembly.

disassembleObject(Obj, /* InlineRelocs = */ false);

// Get the .callgraph section.

StringRef CallGraphSectionName(".callgraph");

Optional<object::SectionRef> CallGraphSection;

for (auto Sec : ToolSectionFilter(*Obj)) {

StringRef Name;

if (Expected<StringRef> NameOrErr = Sec.getName())

Name = *NameOrErr;

else

consumeError(NameOrErr.takeError());

if (Name == CallGraphSectionName) {

CallGraphSection = Sec;

break;

}

if (!CallGraphSection)

reportWarning("there is no .callgraph section", Obj->getFileName());

// Map type id to indirect call sites.

DenseMap<uint64_t, SmallVector<uint64_t>> TypeIdToIndirCallSites;

// Map type id to indirect targets.

DenseMap<uint64_t, SmallVector<uint64_t>> TypeIdToIndirTargets;

// Instructions that are not indirect calls but have a type id are ignored.

uint64_t IgnoredICallIdCount = 0;

// Number of valid indirect calls with type ids.

uint64_t ICallWithTypeIdCount = 0;

if (CallGraphSection) {

StringRef CGSecContents = unwrapOrError(

CallGraphSection.getValue().getContents(), Obj->getFileName());

// TODO: some entries are written in pointer size. are they always 64-bit?

if (CGSecContents.size() % sizeof(uint64_t))

reportError(Obj->getFileName(), "Malformed .callgraph section.");

size_t Size = CGSecContents.size() / sizeof(uint64_t);

auto *It = reinterpret_cast<const uint64_t *>(CGSecContents.data());

const auto *const End = It + Size;

morehouseUnsubmitted

Done

reportError(Obj->getFileName(), "Malformed .callgraph section.");

size_t Size = CGSecContents.size() / sizeof(uint64_t);

- auto *It = (const uint64_t *const)CGSecContents.data();

- const auto *End = (const uint64_t *const)CGSecContents.data() + Size;

+ auto *It = reinterpret_cast<const uint64_t *>(CGSecContents.data());

+ const auto *End = reinterpret_cast<const uint64_t *const>(CGSecContents.data() + Size);

auto CGHasNext = [&]() { return It < End; };

morehouse:

auto CGHasNext = [&]() { return It < End; };

auto CGNext = [&]() -> uint64_t {

if (!CGHasNext())

reportError(Obj->getFileName(), "Malformed .callgraph section.");

return *It++;

};

// Parse the content

while (CGHasNext()) {

// Format version number.

uint64_t FormatVersionNumber = CGNext();

if (FormatVersionNumber != 0)

reportError(Obj->getFileName(),

"Unknown format version in .callgraph section.");

// Function entry pc.

uint64_t FuncEntryPc = CGNext();

if (!FuncInfo.count(FuncEntryPc))

reportError(Obj->getFileName(),

"Invalid function entry pc in .callgraph section.");

// Function kind.

uint64_t Kind = CGNext();

switch (Kind) {

case 0: // not an indirect target

FuncInfo[FuncEntryPc].Kind = NOT_INDIRECT_TARGET;

break;

case 1: // indirect target with unknown type id

FuncInfo[FuncEntryPc].Kind = INDIRECT_TARGET_UNKNOWN_TID;

break;

case 2: // indirect target with known type id

FuncInfo[FuncEntryPc].Kind = INDIRECT_TARGET_KNOWN_TID;

TypeIdToIndirTargets[CGNext()].push_back(FuncEntryPc);

break;

default:

reportError(Obj->getFileName(),

"Unknown function kind in .callgraph section.");

}

// Read call sites.

uint64_t CallSiteCount = CGNext();

for (unsigned long I = 0; I < CallSiteCount; I++) {

uint64_t TypeId = CGNext();

uint64_t CallSitePc = CGNext();

if (IndirectCallSites.count(CallSitePc)) {

TypeIdToIndirCallSites[TypeId].push_back(CallSitePc);

ICallWithTypeIdCount++;

} else {

IgnoredICallIdCount++;

}

// Print any required warnings regarding the callgraph section.

if (IgnoredICallIdCount)

reportWarning("callgraph section has type ids for " +

std::to_string(IgnoredICallIdCount) + " instructions " +

"which are not indirect calls",

Obj->getFileName());

if (auto ICallWithoutTypeIdCount =

IndirectCallSites.size() - ICallWithTypeIdCount)

reportWarning("callgraph section does not have type ids for " +

std::to_string(ICallWithoutTypeIdCount) +

" indirect calls",

Obj->getFileName());

morehouseUnsubmitted

Done

Obj->getFileName());

- auto ICallWithoutTypeIdCount =

- IndirectCallSites.size() - ICallWithTypeIdCount;

- if (ICallWithoutTypeIdCount)

+ if (uint64_t ICallWithoutTypeIdCount = IndirectCallSites.size() -

+ ICallWithTypeIdCount)

reportWarning("callgraph section does not have type ids for " +

std::to_string(ICallWithoutTypeIdCount) +

" indirect calls",

Obj->getFileName());

uint64_t NotListedCount = 0;

morehouse:

uint64_t NotListedCount = 0;

uint64_t UnknownCount = 0;

for (const auto &El : FuncInfo) {

NotListedCount += El.second.Kind == NOT_LISTED;

UnknownCount += El.second.Kind == INDIRECT_TARGET_UNKNOWN_TID;

}

if (NotListedCount)

reportWarning("callgraph section does not have information for " +

std::to_string(NotListedCount) + " functions",

Obj->getFileName());

if (UnknownCount)

reportWarning("callgraph section has unknown type id for " +

std::to_string(UnknownCount) + " indirect targets",

Obj->getFileName());

// Print indirect targets

morehouseUnsubmitted

Done

Nit: the above two loops could be combined.

morehouse: Nit: the above two loops could be combined.

outs() << "\nINDIRECT TARGET TYPES (TYPEID [FUNC_ADDR,])";

// Print indirect targets with unknown type.

// For completeness, functions for which the call graph section does not

// provide information are included.

if (NotListedCount || UnknownCount) {

outs() << "\nUNKNOWN";

for (const auto &El : FuncInfo) {

uint64_t FuncEntryPc = El.first;

FunctionKind FuncKind = El.second.Kind;

if (FuncKind == NOT_LISTED || FuncKind == INDIRECT_TARGET_UNKNOWN_TID)

outs() << " " << format("%lx", FuncEntryPc);

}

// Print indirect targets to type id mapping.

for (const auto &El : TypeIdToIndirTargets) {

uint64_t TypeId = El.first;

outs() << "\n" << format("%lx", TypeId);

for (uint64_t IndirTargetPc : El.second)

outs() << " " << format("%lx", IndirTargetPc);

}

// Print indirect calls to type id mapping. Any indirect call without a

morehouseUnsubmitted

Done

Nit: the above two loops could be combined.

morehouse: Nit: the above two loops could be combined.

// type id can be deduced by comparing this list to indirect call sites

// list.

outs() << "\n\nINDIRECT CALL TYPES (TYPEID [CALL_SITE_ADDR,])";

for (const auto &El : TypeIdToIndirCallSites) {

uint64_t TypeId = El.first;

outs() << "\n" << format("%lx", TypeId);

for (uint64_t IndirCallSitePc : El.second)

outs() << " " << format("%lx", IndirCallSitePc);

}

// Print function entry to indirect call site addresses mapping from disasm.

outs() << "\n\nINDIRECT CALL SITES (CALLER_ADDR [CALL_SITE_ADDR,])";

for (const auto &El : FuncInfo) {

auto CallerPc = El.first;

auto FuncIndirCallSites = El.second.IndirectCallSites;

if (!FuncIndirCallSites.empty()) {

outs() << "\n" << format("%lx", CallerPc);

for (auto IndirCallSitePc : FuncIndirCallSites)

outs() << " " << format("%lx", IndirCallSitePc);

}

// Print function entry to direct call site and target function entry

// addresses mapping from disasm.

outs()

<< "\n\nDIRECT CALL SITES (CALLER_ADDR [(CALL_SITE_ADDR, TARGET_ADDR),])";

for (const auto &El : FuncInfo) {

auto CallerPc = El.first;

auto FuncDirCallSites = El.second.DirectCallSites;

if (!FuncDirCallSites.empty()) {

outs() << "\n" << format("%lx", CallerPc);

for (auto DirCallSite : FuncDirCallSites) {

auto DirCallSitePc = DirCallSite.first;

auto CalleePc = DirCallSite.second;

outs() << " " << format("%lx", DirCallSitePc) << " "

<< format("%lx", CalleePc);

}

// Print function entry pc to function name mapping.

outs() << "\n\nFUNCTIONS (FUNC_ENTRY_ADDR, SYM_NAME)";

for (const auto &El : FuncInfo) {

uint64_t EntryPc = El.first;

const auto &Name = El.second.Name;

outs() << "\n" << format("%lx", EntryPc) << " " << Name;

}

outs() << "\n";

}

static void printUnwindInfo(const ObjectFile *O) { static void printUnwindInfo(const ObjectFile *O) {

outs() << "Unwind info:\n\n"; outs() << "Unwind info:\n\n";

if (const COFFObjectFile *Coff = dyn_cast<COFFObjectFile>(O)) if (const COFFObjectFile *Coff = dyn_cast<COFFObjectFile>(O))

printCOFFUnwindInfo(Coff); printCOFFUnwindInfo(Coff);

else if (const MachOObjectFile *MachO = dyn_cast<MachOObjectFile>(O)) else if (const MachOObjectFile *MachO = dyn_cast<MachOObjectFile>(O))

printMachOUnwindInfo(MachO); printMachOUnwindInfo(MachO);

else else

▲ Show 20 Lines • Show All 263 Lines • ▼ Show 20 Lines static void dumpObject(ObjectFile *O, const Archive *A = nullptr,

if (WeakBind) if (WeakBind)

printWeakBindTable(O); printWeakBindTable(O);

// Other special sections: // Other special sections:

if (RawClangAST) if (RawClangAST)

printRawClangAST(O); printRawClangAST(O);

if (FaultMapSection) if (FaultMapSection)

printFaultMaps(O); printFaultMaps(O);

if (CallGraphInfo)

printCallGraphInfo(O);

} }

static void dumpObject(const COFFImportFile *I, const Archive *A, static void dumpObject(const COFFImportFile *I, const Archive *A,

const Archive::Child *C = nullptr) { const Archive::Child *C = nullptr) {

StringRef ArchiveName = A ? A->getFileName() : ""; StringRef ArchiveName = A ? A->getFileName() : "";

// Avoid other output when using a raw option. // Avoid other output when using a raw option.

if (!RawClangAST) if (!RawClangAST)

▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines static void parseOtoolOptions(const llvm::opt::InputArgList &InputArgs) {

} }

static void parseObjdumpOptions(const llvm::opt::InputArgList &InputArgs) { static void parseObjdumpOptions(const llvm::opt::InputArgList &InputArgs) {

parseIntArg(InputArgs, OBJDUMP_adjust_vma_EQ, AdjustVMA); parseIntArg(InputArgs, OBJDUMP_adjust_vma_EQ, AdjustVMA);

AllHeaders = InputArgs.hasArg(OBJDUMP_all_headers); AllHeaders = InputArgs.hasArg(OBJDUMP_all_headers);

ArchName = InputArgs.getLastArgValue(OBJDUMP_arch_name_EQ).str(); ArchName = InputArgs.getLastArgValue(OBJDUMP_arch_name_EQ).str();

ArchiveHeaders = InputArgs.hasArg(OBJDUMP_archive_headers); ArchiveHeaders = InputArgs.hasArg(OBJDUMP_archive_headers);

CallGraphInfo = InputArgs.hasArg(OBJDUMP_call_graph_info);

Demangle = InputArgs.hasArg(OBJDUMP_demangle); Demangle = InputArgs.hasArg(OBJDUMP_demangle);

Disassemble = InputArgs.hasArg(OBJDUMP_disassemble); Disassemble = InputArgs.hasArg(OBJDUMP_disassemble);

DisassembleAll = InputArgs.hasArg(OBJDUMP_disassemble_all); DisassembleAll = InputArgs.hasArg(OBJDUMP_disassemble_all);

SymbolDescription = InputArgs.hasArg(OBJDUMP_symbol_description); SymbolDescription = InputArgs.hasArg(OBJDUMP_symbol_description);

DisassembleSymbols = DisassembleSymbols =

commaSeparatedValues(InputArgs, OBJDUMP_disassemble_symbols_EQ); commaSeparatedValues(InputArgs, OBJDUMP_disassemble_symbols_EQ);

DisassembleZeroes = InputArgs.hasArg(OBJDUMP_disassemble_zeroes); DisassembleZeroes = InputArgs.hasArg(OBJDUMP_disassemble_zeroes);

if (const opt::Arg *A = InputArgs.getLastArg(OBJDUMP_dwarf_EQ)) { if (const opt::Arg *A = InputArgs.getLastArg(OBJDUMP_dwarf_EQ)) {

▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines for (StringRef V : Values) {

DisassemblerOptions.push_back(V.str()); DisassemblerOptions.push_back(V.str());

} }

if (AsmSyntax) { if (AsmSyntax) {

const char *Argv[] = {"llvm-objdump", AsmSyntax}; const char *Argv[] = {"llvm-objdump", AsmSyntax};

llvm::cl::ParseCommandLineOptions(2, Argv); llvm::cl::ParseCommandLineOptions(2, Argv);

} }

QuietDisasm = CallGraphInfo;

// objdump defaults to a.out if no filenames specified. // objdump defaults to a.out if no filenames specified.

if (InputFilenames.empty()) if (InputFilenames.empty())

InputFilenames.push_back("a.out"); InputFilenames.push_back("a.out");

} }

int main(int argc, char **argv) { int main(int argc, char **argv) {

using namespace llvm; using namespace llvm;

InitLLVM X(argc, argv); InitLLVM X(argc, argv);

▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines int main(int argc, char **argv) {

if (DisassembleAll || PrintSource || PrintLines || if (DisassembleAll || PrintSource || PrintLines ||

!DisassembleSymbols.empty()) !DisassembleSymbols.empty())

Disassemble = true; Disassemble = true;

if (!ArchiveHeaders && !Disassemble && DwarfDumpType == DIDT_Null && if (!ArchiveHeaders && !Disassemble && DwarfDumpType == DIDT_Null &&

!DynamicRelocations && !FileHeaders && !PrivateHeaders && !RawClangAST && !DynamicRelocations && !FileHeaders && !PrivateHeaders && !RawClangAST &&

!Relocations && !SectionHeaders && !SectionContents && !SymbolTable && !Relocations && !SectionHeaders && !SectionContents && !SymbolTable &&

!DynamicSymbolTable && !UnwindInfo && !FaultMapSection && !DynamicSymbolTable && !UnwindInfo && !FaultMapSection &&

!CallGraphInfo &&

!(MachOOpt && !(MachOOpt &&

(Bind || DataInCode || DylibId || DylibsUsed || ExportsTrie || (Bind || DataInCode || DylibId || DylibsUsed || ExportsTrie ||

FirstPrivateHeader || FunctionStarts || IndirectSymbols || InfoPlist || FirstPrivateHeader || FunctionStarts || IndirectSymbols || InfoPlist ||

LazyBind || LinkOptHints || ObjcMetaData || Rebase || Rpaths || LazyBind || LinkOptHints || ObjcMetaData || Rebase || Rpaths ||

UniversalHeaders || WeakBind || !FilterSections.empty()))) { UniversalHeaders || WeakBind || !FilterSections.empty()))) {

T->printHelp(ToolName); T->printHelp(ToolName);

return 2; return 2;

} }

Show All 9 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objdump][CallGraphSection] Extract call graph information from binaryAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 360624

llvm/docs/CommandGuide/llvm-objdump.rst

llvm/test/tools/llvm-objdump/Inputs/call-graph-section

llvm/test/tools/llvm-objdump/call-graph-info.test

llvm/tools/llvm-objdump/ObjdumpOpts.td

llvm/tools/llvm-objdump/llvm-objdump.cpp

[llvm-objdump][CallGraphSection] Extract call graph information from binary
AbandonedPublic