This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
test/tools/llvm-objdump/ARM/
-
tools/
-
llvm-objdump/
-
ARM/
-
mnemonic-hist0.test
1
mnemonic-hist1.test
-
tools/llvm-objdump/
-
llvm-objdump/
-
ObjdumpOpts.td
3
llvm-objdump.cpp

Differential D125008

[llvm-objdump] Print Mnemonic Histogram
AbandonedPublic

Authored by SjoerdMeijer on May 5 2022, 5:47 AM.

Download Raw Diff

Details

Reviewers

jhenderson
MaskRay
ostannard
keith
DavidSpickett

Summary

This adds new option --mnemonic-hist to print a histogram of all (static) instructions/mnemonics. For example:

$ llvm-objdump --triple=thumbv7 -d --mnemonic-hist
..
Instruction histogram:
         ldr:   120 (27.3973%)
         mov:    96 (21.9178%)
         blx:    56 (12.7854%)
         bl:    31 (7.07763%)
         str:    26 (5.93607%)
         add:    18 (4.10959%)
         b:    12 (2.73973%)
         sub:    10 (2.28311%)
         cmp:     9 (2.05479%)
         ..

I am probably interested in printing more information, like the encoding width, but this seems a good first step.

Diff Detail

Unit TestsFailed

	Time	Test
	800 ms	x64 debian > HWAddressSanitizer-x86_64.TestCases::hwasan_symbolize.cpp
	30 ms	x64 debian > LLVM.tools/llvm-objdump/ARM::mnemonic-hist0.test
	30 ms	x64 debian > LLVM.tools/llvm-objdump/ARM::mnemonic-hist1.test

Event Timeline

SjoerdMeijer created this revision.May 5 2022, 5:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 5:47 AM

Herald added subscribers: StephenFan, rupprecht, mgrang. · View Herald Transcript

SjoerdMeijer requested review of this revision.May 5 2022, 5:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 5:47 AM

SjoerdMeijer edited the summary of this revision. (Show Details)May 5 2022, 5:48 AM

Harbormaster completed remote builds in B162895: Diff 427288.May 5 2022, 6:53 AM

I don't know what bar there is to add stuff like this but I assume it's quite low given that it's self contained and not trying to match any equivalent GNU options. I can see the use for comparing files certainly.

llvm/test/tools/llvm-objdump/ARM/mnemonic-hist1.test
18	If you're going to pad the names to make them all line up then you should test something that isn't 3 characters. Also, and this makes the test bigger but check that the number padding works. Maybe just make one appear 10 times, unless you can figure out a macro to emit 100 of something. Also do you expect an ordering between the 2 instructions that both have 1? As in should it be alphabetical between those 2 or is it going to depend on order of appearance in the object file. Edit: Putting them in a std::map sorts them so you would always see mul then sub even if sub appears first in the object, correct? Can you comment in the test file what things it's looking for.
llvm/tools/llvm-objdump/llvm-objdump.cpp
1493	std::ignore here? Or std::get<0>(IP->getMnemonic(&Inst)) directly.
2190	Probably makes no difference but why not `const std::pair<std::string, unsigned> &`.

tschuett added a subscriber: tschuett.May 5 2022, 9:00 AM

tschuett added inline comments.

llvm/tools/llvm-objdump/llvm-objdump.cpp
2190	Maybe std::stable_sort to make it deterministic?

I don't know what bar there is to add stuff like this but I assume it's quite low given that it's self contained and not trying to match any equivalent GNU options. I can see the use for comparing files certainly.

I think there are two bars:

whether we can add an option that GNU objdump doesn't have: in my understanding this is low, but we really want to avoid potential conflict with GNU. This means we usually cannot add a short option alias as GNU objdump may take a short option for anything in the future (e.g. llvm-nm -U conflict I happen to observe today). A descriptive long option usually has no semantic conflict concern.
whether we should add an option: I think the bar is actually quite high. An option needs to be sufficiently useful to justify its addition.

I am probably interested in printing more information, like the encoding width, but this seems a good first step.

If we add an option, we want to provide some stability guarantee. Removing an option has an extremely high bar as that may break some users.
So when adding an option we need to very careful. If there is a plan to extend the option, it's best to design it well upfront.
I know that sometimes iterative process may be unavoidable due to engineering reality, in that case perhaps we can call an option experimental.
A new option needs a documentation update in llvm/docs/CommandGuide/llvm-objdump.rst

That said, for this case I wonder why we can't just use

llvm-objdump -d --no-show-raw-insn --no-leading-addr =cat | awk '/^ / {freq[$1]++; tot++} END{ for(i in freq) print i "\t" freq[i] "\t" (freq[i]/tot) }'

My reasoning is that different users may have vastly different statistics needs.
If it is difficult to come up with something that meets many's needs, I'd more like the user to compose utilities, but I think I can be persuaded the other way if you can come up with something useful which seems to benefit a lot from built-in support.

I happen to be more concerned with the disassembly stuff because the code (as you may have noticed) is quite difficult to maintain: you can find numerous toggles adapting its behavior. if (Disassembled && MnemonicHistogram) { will not be the thing that will bring everything crashing down but I'd hope that adding any stuff in this area should take more attention.

Thanks all for looking at this!

I understand the concerns about adding/removing options. What I will do is leave a message on discourse to possibly draw some attention to this and check if anyone else has objections/preferences.

In the mean time I will work on a new revision of this which will distinguish between narrow and wide instruction encodings, and see how that fits in. That will serve two purposes. First, it's what I would really need for my use case, although in its current form it's already useful. Second, although not impossible, that would be more difficult to achieve with a one-liner on the command line. That would require a script, and that could be the alternative of this.

Personally, I prefer the built-in option because of its convenience. I could never remember that one-liner command line, would need to save it to a file, which I then don't have available on the different systems I work on etc. A python script contributed to llvm/utils for example would have this problem a lot less, but still wouldn't be as convenient as a built-in option.

https://discourse.llvm.org/t/llvm-objdump-print-instruction-histogram/62333

I haven't addressed feedback yet, but have just added a few things to help with the discussion where this belongs.
This now also prints the number of bytes per instruction, so for thumb it now distinguishes between the narrow/wide encodings, and I have also added printing a histogram of immediates. This will look now look like this:

  Mnemonic histogram:
  Mnemonic Freq             Bytes
  ldr (T1): 112 ( 25.5708%)   224 ( 22.0472%)
  mov (T1):  95 ( 21.6895%)   190 ( 18.7008%)
  blx (T1):  56 ( 12.7854%)   112 ( 11.0236%)
   bl (T2):  31 ( 7.07763%)   124 ( 12.2047%)
  str (T1):  23 ( 5.25114%)    46 ( 4.52756%)
    b (T1):  11 ( 2.51142%)    22 ( 2.16535%)
  add (T1):  11 ( 2.51142%)    22 ( 2.16535%)
  sub (T1):  10 ( 2.28311%)    20 (  1.9685%)
  cmp (T1):   8 ( 1.82648%)    16 (  1.5748%)
  ldr (T2):   8 ( 1.82648%)    32 ( 3.14961%)
  add (T2):   7 ( 1.59817%)    28 ( 2.75591%)
  asr (T1):   6 ( 1.36986%)    12 (  1.1811%)
 ldrb (T1):   5 ( 1.14155%)    10 (0.984252%)
 ..
Immediate histogram:
      14:       425
       0:        40
      -4:        32
       3:        26
       1:        21
      10:        15
       2:        14
       8:        11
       5:         9
      16:         8
      18:         6

Harbormaster completed remote builds in B163133: Diff 427616.May 6 2022, 7:28 AM

SjoerdMeijer abandoned this revision.Mar 17 2023, 1:39 AM

Revision Contents

Path

Size

llvm/

test/

tools/

llvm-objdump/

ARM/

mnemonic-hist0.test

10 lines

mnemonic-hist1.test

17 lines

tools/

llvm-objdump/

ObjdumpOpts.td

3 lines

llvm-objdump.cpp

109 lines

Diff 427616

llvm/test/tools/llvm-objdump/ARM/mnemonic-hist0.test

This file was added.

				#RUN: llvm-mc -filetype=obj -triple=thumbv7 %s -o - \| \
				#RUN: llvm-objdump --triple=thumbv7 -d --histogram - \| FileCheck %s

				.text
				.thumb
				foo:

				#CHECK: Mnemonic histogram:{{[[:space:]]}}
				#CHECK: {{[[:space:]]}}
				#CHECK-NOT: {{.*}}:

llvm/test/tools/llvm-objdump/ARM/mnemonic-hist1.test

This file was added.

				#RUN: llvm-mc -filetype=obj -triple=thumbv7 %s -o - \| \
				#RUN: llvm-objdump --triple=thumbv7 -d --histogram - \| FileCheck %s

				.syntax unified
				.text
				.thumb

				foo:
				add r0,r1,r2
				sub r2,r1,r3
				add r4,r0,r2
				mul r5,r4,r1

				#CHECK:Mnemonic histogram:
				#CHECK: add: 2 (50%)
				#CHECK: mul: 1 (25%)
				#CHECK: sub: 1 (25%)
				DavidSpickettUnsubmitted Not Done Reply Inline Actions If you're going to pad the names to make them all line up then you should test something that isn't 3 characters. Also, and this makes the test bigger but check that the number padding works. Maybe just make one appear 10 times, unless you can figure out a macro to emit 100 of something. Also do you expect an ordering between the 2 instructions that both have 1? As in should it be alphabetical between those 2 or is it going to depend on order of appearance in the object file. Edit: Putting them in a std::map sorts them so you would always see mul then sub even if sub appears first in the object, correct? Can you comment in the test file what things it's looking for. DavidSpickett: If you're going to pad the names to make them all line up then you should test something that…

llvm/tools/llvm-objdump/ObjdumpOpts.td

	Show First 20 Lines • Show All 168 Lines • ▼ Show 20 Lines

	def syms : Flag<["--"], "syms">,			def syms : Flag<["--"], "syms">,
	HelpText<"Display the symbol table">;			HelpText<"Display the symbol table">;
	def : Flag<["-"], "t">, Alias<syms>, HelpText<"Alias for --syms">;			def : Flag<["-"], "t">, Alias<syms>, HelpText<"Alias for --syms">;

	def symbolize_operands : Flag<["--"], "symbolize-operands">,			def symbolize_operands : Flag<["--"], "symbolize-operands">,
	HelpText<"Symbolize instruction operands when disassembling">;			HelpText<"Symbolize instruction operands when disassembling">;

				def histogram: Flag<["--"], "histogram">,
				HelpText<"Display histograms of mnemonics and immediates">;

	def dynamic_syms : Flag<["--"], "dynamic-syms">,			def dynamic_syms : Flag<["--"], "dynamic-syms">,
	HelpText<"Display the contents of the dynamic symbol table">;			HelpText<"Display the contents of the dynamic symbol table">;
	def : Flag<["-"], "T">, Alias<dynamic_syms>,			def : Flag<["-"], "T">, Alias<dynamic_syms>,
	HelpText<"Alias for --dynamic-syms">;			HelpText<"Alias for --dynamic-syms">;

	def triple_EQ : Joined<["--"], "triple=">,			def triple_EQ : Joined<["--"], "triple=">,
	HelpText<"Target triple to disassemble for, "			HelpText<"Target triple to disassemble for, "
	"see --version for available targets">;			"see --version for available targets">;
	▲ Show 20 Lines • Show All 159 Lines • Show Last 20 Lines

llvm/tools/llvm-objdump/llvm-objdump.cpp

Show First 20 Lines • Show All 207 Lines • ▼ Show 20 Lines

static uint64_t StartAddress;		static uint64_t StartAddress;
static bool HasStartAddressFlag;		static bool HasStartAddressFlag;
static uint64_t StopAddress = UINT64_MAX;		static uint64_t StopAddress = UINT64_MAX;
static bool HasStopAddressFlag;		static bool HasStopAddressFlag;

bool objdump::SymbolTable;		bool objdump::SymbolTable;
static bool SymbolizeOperands;		static bool SymbolizeOperands;
		static bool Histogram;
		std::map<std::string, unsigned> HistInst;
		std::map<std::string, unsigned> HistSize;
		std::map<uint64_t, unsigned> HistImm;

static bool DynamicSymbolTable;		static bool DynamicSymbolTable;
std::string objdump::TripleName;		std::string objdump::TripleName;
bool objdump::UnwindInfo;		bool objdump::UnwindInfo;
static bool Wide;		static bool Wide;
std::string objdump::Prefix;		std::string objdump::Prefix;
uint32_t objdump::PrefixStrip;		uint32_t objdump::PrefixStrip;

DebugVarsFormat objdump::DbgVariables = DVDisabled;		DebugVarsFormat objdump::DbgVariables = DVDisabled;
▲ Show 20 Lines • Show All 1,253 Lines • ▼ Show 20 Lines	for (unsigned SI = 0, SE = Symbols.size(); SI != SE; ++SI) {

IP->setCommentStream(CommentStream);		IP->setCommentStream(CommentStream);

PIP.printInst(		PIP.printInst(
*IP, Disassembled ? &Inst : nullptr, Bytes.slice(Index, Size),		*IP, Disassembled ? &Inst : nullptr, Bytes.slice(Index, Size),
{SectionAddr + Index + VMAAdjustment, Section.getIndex()}, FOS,		{SectionAddr + Index + VMAAdjustment, Section.getIndex()}, FOS,
"", *STI, &SP, Obj->getFileName(), &Rels, LVP);		"", *STI, &SP, Obj->getFileName(), &Rels, LVP);

		if (Disassembled && Histogram) {
		const char *Mnemonic;
		int Bytes;
		std::tie(Mnemonic, Bytes) = IP->getMnemonic(&Inst);
		DavidSpickettUnsubmitted Not Done Reply Inline Actions std::ignore here? Or std::get<0>(IP->getMnemonic(&Inst)) directly. DavidSpickett: std::ignore here? Or std::get<0>(IP->getMnemonic(&Inst)) directly.

		std::string ins = Mnemonic;
		if (PrimaryIsThumb) {
		if (Size == 2)
		ins += " (T1)";
		else
		ins += " (T2)";
		}

		HistInst[ins]++;
		HistSize[ins] += Size;

		for (unsigned i = 0; i < Inst.getNumOperands(); i++) {
		MCOperand Op = Inst.getOperand(i);
		if (!Op.isImm())
		continue;
		HistImm[Op.getImm()]++;
		}
		}

IP->setCommentStream(llvm::nulls());		IP->setCommentStream(llvm::nulls());

// If disassembly has failed, avoid analysing invalid/incomplete		// If disassembly has failed, avoid analysing invalid/incomplete
// instruction information. Otherwise, try to resolve the target		// instruction information. Otherwise, try to resolve the target
// address (jump target or memory operand address) and print it on the		// address (jump target or memory operand address) and print it on the
// right of the instruction.		// right of the instruction.
if (Disassembled && MIA) {		if (Disassembled && MIA) {
// Branch targets are printed just after the instructions.		// Branch targets are printed just after the instructions.
▲ Show 20 Lines • Show All 655 Lines • ▼ Show 20 Lines	if (Demangle)
SymName = demangle(SymName);		SymName = demangle(SymName);

if (O->isXCOFF() && SymbolDescription)		if (O->isXCOFF() && SymbolDescription)
SymName = getXCOFFSymbolDescription(createSymbolInfo(O, Symbol), SymName);		SymName = getXCOFFSymbolDescription(createSymbolInfo(O, Symbol), SymName);

outs() << ' ' << SymName << '\n';		outs() << ' ' << SymName << '\n';
}		}

		static void printHistogram(const ObjectFile *O) {
		std::vector<std::pair<std::string, unsigned>> List;
		unsigned TotalBytes = 0;
		for (auto &I : HistSize)
		TotalBytes += I.second;
		for (auto &I : HistInst)
		DavidSpickettUnsubmitted Not Done Reply Inline Actions Probably makes no difference but why not `const std::pair<std::string, unsigned> &`. DavidSpickett: Probably makes no difference but why not `const std::pair<std::string, unsigned> &`.
		tschuettUnsubmitted Not Done Reply Inline Actions Maybe std::stable_sort to make it deterministic? tschuett: Maybe std::stable_sort to make it deterministic?
		List.push_back(I);

		std::sort(List.begin(), List.end(),
		[] (std::pair<std::string, unsigned> &L,
		std::pair<std::string, unsigned> &R) {
		return L.second > R.second; });

		unsigned Total = 0;
		unsigned MaxStrLen = 0;
		for (auto &I : List) {
		Total += I.second;
		MaxStrLen = I.first.length() > MaxStrLen ? I.first.length() : MaxStrLen;
		}

		unsigned MaxDigits = List.empty() ? 0 :
		unsigned(std::log10(List.front().second) + 1);

		// Add 1 for a whitespace.
		MaxStrLen++;
		MaxDigits++;

		outs() << "\nMnemonic histogram:\n\n";

		// Print the columns, the format, the width should match that of the enties.
		const char *MCol = "Mnemonic";
		const char *FCol = "Freq";
		const char *BCol = "Bytes";

		outs() << format("%*s", MaxStrLen, MCol);
		outs() << " "; // ":"
		outs() << format("%*s", MaxDigits, FCol);
		outs() << " "; // " ("
		outs() << " ";
		outs() << " "; // "%)"
		outs() << format("%*s", MaxDigits + 2, BCol);
		outs() << "\n";

		// Now print the entries.
		for (auto &I : List) {
		outs() << format("%*s", MaxStrLen, I.first.c_str());
		outs() << ":";
		outs() << format("%*d", MaxDigits, I.second);
		outs() << " (";
		outs() << format("%8g", ((float)I.second / (float)Total) * 100);
		outs() << "%)";
		outs() << format("%*d", MaxDigits + 2, HistSize[I.first]);
		outs() << " (";
		outs() << format("%8g", ((float) HistSize[I.first] / (float) TotalBytes) * 100);
		outs() << "%)";
		outs() << "\n";
		}

		outs() << "\nImmediate histogram:\n\n";

		std::vector<std::pair<std::uint64_t, unsigned>> ImmList;
		for (auto &I : HistImm)
		ImmList.push_back(I);

		std::sort(ImmList.begin(), ImmList.end(),
		[] (std::pair<std::uint64_t, unsigned> &L,
		std::pair<std::uint64_t, unsigned> &R) {
		return L.second > R.second; });

		for (auto &I : ImmList) {
		outs() << format("%10d", I.first);
		outs() << ":";
		outs() << format("%10d", I.second);
		outs() << "\n";
		}
		}

static void printUnwindInfo(const ObjectFile *O) {		static void printUnwindInfo(const ObjectFile *O) {
outs() << "Unwind info:\n\n";		outs() << "Unwind info:\n\n";

if (const COFFObjectFile *Coff = dyn_cast<COFFObjectFile>(O))		if (const COFFObjectFile *Coff = dyn_cast<COFFObjectFile>(O))
printCOFFUnwindInfo(Coff);		printCOFFUnwindInfo(Coff);
else if (const MachOObjectFile *MachO = dyn_cast<MachOObjectFile>(O))		else if (const MachOObjectFile *MachO = dyn_cast<MachOObjectFile>(O))
printMachOUnwindInfo(MachO);		printMachOUnwindInfo(MachO);
else		else
▲ Show 20 Lines • Show All 243 Lines • ▼ Show 20 Lines	static void dumpObject(ObjectFile O, const Archive A = nullptr,
if (Relocations && !Disassemble)		if (Relocations && !Disassemble)
printRelocations(O);		printRelocations(O);
if (DynamicRelocations)		if (DynamicRelocations)
printDynamicRelocations(O);		printDynamicRelocations(O);
if (SectionContents)		if (SectionContents)
printSectionContents(O);		printSectionContents(O);
if (Disassemble)		if (Disassemble)
disassembleObject(O, Relocations);		disassembleObject(O, Relocations);
		if (Histogram)
		printHistogram(O);
if (UnwindInfo)		if (UnwindInfo)
printUnwindInfo(O);		printUnwindInfo(O);

// Mach-O specific options:		// Mach-O specific options:
if (ExportsTrie)		if (ExportsTrie)
printExportsTrie(O);		printExportsTrie(O);
if (Rebase)		if (Rebase)
printRebaseTable(O);		printRebaseTable(O);
▲ Show 20 Lines • Show All 194 Lines • ▼ Show 20 Lines	static void parseObjdumpOptions(const llvm::opt::InputArgList &InputArgs) {
ShowLMA = InputArgs.hasArg(OBJDUMP_show_lma);		ShowLMA = InputArgs.hasArg(OBJDUMP_show_lma);
PrintSource = InputArgs.hasArg(OBJDUMP_source);		PrintSource = InputArgs.hasArg(OBJDUMP_source);
parseIntArg(InputArgs, OBJDUMP_start_address_EQ, StartAddress);		parseIntArg(InputArgs, OBJDUMP_start_address_EQ, StartAddress);
HasStartAddressFlag = InputArgs.hasArg(OBJDUMP_start_address_EQ);		HasStartAddressFlag = InputArgs.hasArg(OBJDUMP_start_address_EQ);
parseIntArg(InputArgs, OBJDUMP_stop_address_EQ, StopAddress);		parseIntArg(InputArgs, OBJDUMP_stop_address_EQ, StopAddress);
HasStopAddressFlag = InputArgs.hasArg(OBJDUMP_stop_address_EQ);		HasStopAddressFlag = InputArgs.hasArg(OBJDUMP_stop_address_EQ);
SymbolTable = InputArgs.hasArg(OBJDUMP_syms);		SymbolTable = InputArgs.hasArg(OBJDUMP_syms);
SymbolizeOperands = InputArgs.hasArg(OBJDUMP_symbolize_operands);		SymbolizeOperands = InputArgs.hasArg(OBJDUMP_symbolize_operands);
		Histogram = InputArgs.hasArg(OBJDUMP_histogram);
DynamicSymbolTable = InputArgs.hasArg(OBJDUMP_dynamic_syms);		DynamicSymbolTable = InputArgs.hasArg(OBJDUMP_dynamic_syms);
TripleName = InputArgs.getLastArgValue(OBJDUMP_triple_EQ).str();		TripleName = InputArgs.getLastArgValue(OBJDUMP_triple_EQ).str();
UnwindInfo = InputArgs.hasArg(OBJDUMP_unwind_info);		UnwindInfo = InputArgs.hasArg(OBJDUMP_unwind_info);
Wide = InputArgs.hasArg(OBJDUMP_wide);		Wide = InputArgs.hasArg(OBJDUMP_wide);
Prefix = InputArgs.getLastArgValue(OBJDUMP_prefix).str();		Prefix = InputArgs.getLastArgValue(OBJDUMP_prefix).str();
parseIntArg(InputArgs, OBJDUMP_prefix_strip, PrefixStrip);		parseIntArg(InputArgs, OBJDUMP_prefix_strip, PrefixStrip);
if (const opt::Arg *A = InputArgs.getLastArg(OBJDUMP_debug_vars_EQ)) {		if (const opt::Arg *A = InputArgs.getLastArg(OBJDUMP_debug_vars_EQ)) {
DbgVariables = StringSwitch<DebugVarsFormat>(A->getValue())		DbgVariables = StringSwitch<DebugVarsFormat>(A->getValue())
▲ Show 20 Lines • Show All 153 Lines • Show Last 20 Lines