This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/MC/MCDisassembler/
-
llvm/
-
MC/
-
MCDisassembler/
5/9
MCDisassembler.h
-
lib/
-
MC/MCDisassembler/
-
MCDisassembler/
-
MCDisassembler.cpp
-
Target/
-
AArch64/Disassembler/
-
Disassembler/
-
AArch64Disassembler.h
3/4
AArch64Disassembler.cpp
-
ARM/Disassembler/
-
Disassembler/
3/4
ARMDisassembler.cpp
-
test/tools/llvm-objdump/ELF/ARM/
-
tools/
-
llvm-objdump/
-
ELF/
-
ARM/
2/3
unknown-instr-resync.test
-
unknown-instr.test
-
tools/
-
llvm-objdump/
1/2
llvm-objdump.cpp
-
sancov/
-
sancov.cpp

Differential D130357

[MC,llvm-objdump,ARM] Target-dependent disassembly resync policy.
ClosedPublic

Authored by simon_tatham on Jul 22 2022, 7:17 AM.

Download Raw Diff

Details

Reviewers

MaskRay
ostannard
DavidSpickett
jhenderson

Commits

rG55f1fbf005fe: [MC,llvm-objdump,ARM] Target-dependent disassembly resync policy.

Summary

Currently, when llvm-objdump is disassembling a code section and
encounters a point where no instruction can be decoded, it uses the
same policy on all targets: consume one byte of the section, emit it
as "<unknown>", and try disassembling from the next byte position.

On an architecture where instructions are always 4 bytes long and
4-byte aligned, this makes no sense at all. If a 4-byte word cannot be
decoded as an instruction, then the next place that a valid
instruction could possibly be found is 4 bytes further on.
Disassembling from a misaligned address can't possibly produce
anything that the code generator intended, or that the CPU would even
attempt to execute.

This patch introduces a new MCDisassembler virtual method called
suggestBytesToSkip, which allows each target to choose its own
resynchronization policy. For Arm (as opposed to Thumb) and AArch64,
I've filled in the new method to return a fixed width of 4.

Thumb is a more interesting case, because the criterion for
identifying 2-byte and 4-byte instruction encodings is very simple,
and doesn't require the particular instruction to be recognized. So
suggestBytesToSkip is also passed an ArrayRef of the bytes in
question, so that it can take that into account. The new test case
shows Thumb disassembly skipping over two unrecognized instructions,
and identifying one as 2-byte and one as 4-byte.

For targets other than Arm and AArch64, this is NFC: the base class
implementation of suggestBytesToSkip still returns 1, so that the
existing behavior is unchanged. Other targets can fill in their own
implementations as they see fit; I haven't attempted to choose a new
behavior for each one myself.

I've updated all the call sites of MCDisassembler::getInstruction in
llvm-objdump, and also one in sancov, which was the only other place I
spotted the same idiom of if (Size == 0) Size = 1 after a call to
getInstruction.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

simon_tatham created this revision.Jul 22 2022, 7:17 AM

Herald added a reviewer: jhenderson. · View Herald TranscriptJul 22 2022, 7:17 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: StephenFan, rupprecht, hiraditya and 2 others. · View Herald Transcript

simon_tatham requested review of this revision.Jul 22 2022, 7:17 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 22 2022, 7:17 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B177001: Diff 446822.Jul 22 2022, 8:01 AM

DavidSpickett added inline comments.Jul 25 2022, 3:27 AM

llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h
181	"memory space of the region" perhaps?
182	What is "the symbol" here? Maybe from context it will be a symbol but it seems to me like you'd be in the middle of some function maybe. So symbol is "main" but I'm N bytes away from where it starts by this point. Do I have the wrong end of the stick?
183	I'd explain why the parameter exists. You'll probably end up repeating the thumb justification but hey why not. Also is the size of this known to be in some range or does the target also decide that. E.g. if thumb needed 4 bytes of context to decide, the Arm backend would pass this 4 bytes. Some other target could choose less or more or none.
186	`const ArrayRef<uint8_t>`?
llvm/lib/Target/AArch64/Disassembler/AArch64Disassembler.cpp
353	We're assuming that Address is already aligned, I assume that's safe given that we're either: disassembling from the start of a fn, which would be aligned have just disassembled an instruction, which would have been aligned, and 4 bytes in size. Correct? I guess someone could give MC a misaligned function start but garbage in garbage out in that case? Or would you want to align up to the nearest 4 bytes.
llvm/lib/Target/ARM/Disassembler/ARMDisassembler.cpp
758	Is the in memory layout of thumb instructions the same between endians?
771	I might have written this early return style, but that's my lldb bias talking. if (!is_thumb) return 4; if (bytes.size < 2) return 2; Insn16 .... return Insn16 < 0xE800 ? 2 : 4; The logic is clear either way but if you want to have one less indent.
llvm/test/tools/llvm-objdump/ELF/ARM/unknown-instr-resync.test
2	I think this one is the thumb side of the testing, can you add that in a comment in the file if so. Judging by the skip 4 then skip 2 later. (if this is the thumb test do you need another one for Arm only?)
13	Is it worth adding a single byte on the end to cover the thumb path if bytes < 2? Proves you don't out of bounds the bytes you're given, at least in an asserts build (I hope).

Addressed review comments (I hope).

simon_tatham added inline comments.Jul 25 2022, 6:59 AM

llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h
181	Oh yes, I copy-pasted the parameter comments from one of the existing API functions just above it without noticing that they didn't all say quite the same things or make sense in all cases.
182	Another copy-paste goof.
183	Where I've called it, I just passed all the data available. If that's not enough to make the best choice, then the caller can't magic more data out of the air anyway, so the callee will just have to do the best it can (which in Thumb I decided was advancing to the next multiple of 2 bytes – still better than the previous 1!)
186	No, because firstly, that would only make the tiny `ArrayRef` structure itself const and not the pointed-to data, and secondly, `ArrayRef` is implicitly a pointer-to-const anyway. (This too is copied from the previous API functions, so if it had been a bug, they'd have this bug as well.)
llvm/lib/Target/AArch64/Disassembler/AArch64Disassembler.cpp
353	Mmmm. I did wonder about specifying `suggestBytesToSkip` so that you could give it a totally arbitrary alignment and it would do something sensible. But I didn't like the idea that every single implementation (other than the default 'skip 1 byte because anything's a valid start position') would end up having to contain very similar boilerplate. That struck me as a sign of having put the API boundary in the wrong place. In this patch there are only two implementations of `suggestBytesToSkip` (or two and a half if you count the Arm/Thumb branches of the AArch32 one). But I expect that most non-x86 targets will end up wanting to do something sensible in here – surely a lot of targets are fixed-alignment RISC style, and even m68k has an alignment constraint if I remember my Amiga-owning days correctly. So there'd end up being a lot of copies of that code!
llvm/lib/Target/ARM/Disassembler/ARMDisassembler.cpp
758	In the modern (BE8-style) Arm architecture, yes: instructions are stored little-endian regardless of the data endianness. In A32 that means little-endian 32-bit words, and in T32 it means little-endian 16-bit halfwords. In older versions of the architecture that was different, if I remember rightly. But the rest of LLVM doesn't support those versions either, because this code is taken directly from the code in the same file that extracts the halfword for actual disassembly.
llvm/test/tools/llvm-objdump/ELF/ARM/unknown-instr-resync.test
2	It's both – the first three instructions are Arm, including an unrecognised Arm instruction, and the next six are Thumb, including a 16- and a 32-bit unrecognised Thumb instruction.

simon_tatham mentioned this in D130358: [llvm-objdump,ARM] Add PrettyPrinters for Arm and AArch64..Jul 25 2022, 6:59 AM

simon_tatham added inline comments.Jul 25 2022, 7:29 AM

llvm/lib/Target/AArch64/Disassembler/AArch64Disassembler.cpp
353	Thinking about this a bit more ... another problem with trying to use `suggestBytesToSkip` to handle existing alignment problems is that it's called in the wrong place in the disassembly loop. Currently, the expected usage is that you have some starting address, and you call `getInstruction` to see if you can disassemble an instruction starting there. If you can, you advance by its width; otherwise, you call this new `suggestBytesToSkip` function and advance by that many instead. But if the user tries to start disassembly at an address that's invalid due to misalignment, then `suggestBytesToSkip` can't rescue them anyway, because by the time it's first called, it's too late to prevent an initial nonsense instruction from having been decoded at the misaligned starting location. So then you'd get a resynchronization between that first instruction and the next one, which seems even more nonsensical to me! So I think it is right that `suggestBytesToSkip` restricts itself to not creating new alignment problems, and doesn't check whether there's an existing one. The latter (if we think it needs doing at all) would be the job of some other API function.

LGTM

llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h
186	Right, I was reading the header as if the non const methods modified the array it's referencing, not the extent of the reference.

This revision is now accepted and ready to land.Jul 25 2022, 7:29 AM

Harbormaster completed remote builds in B177364: Diff 447315.Jul 25 2022, 7:35 AM

DavidSpickett added inline comments.Jul 25 2022, 7:41 AM

llvm/lib/Target/AArch64/Disassembler/AArch64Disassembler.cpp
353	Makes sense. If your starting point was misaligned to begin with it's either a mistake of some other tool, or you're doing something unusual where you expect to have to handle things yourself (some kind of "is this random data possibly code" tool for example).

simon_tatham added inline comments.Jul 25 2022, 8:44 AM

llvm/lib/Target/ARM/Disassembler/ARMDisassembler.cpp
758	Hmmm, actually, now I go back and check more carefully ... I'd forgotten the wrinkle that in the AArch32 ABI, ELF objects are supposed to have their instructions stored in endianness matching the ELF header. But ELF images have them stored in BE8, and the linker is supposed to byte-swap the right parts of the code sections based on the mapping symbols. So actually, you're right: the places in this patch and in D130358 where I've added always-little-endian accesses into the code section won't work everywhere. On the other hand, (a) LLD doesn't support that byte-reversal, and (b) the MC disassembler also reads instructions as little-endian unconditionally, so even before this patch, `llvm-objdump` would mis-disassemble an object file of that kind. So I'm not introducing any more big-endian incompatibility than was already there. And if we make major changes to fix that at a later date, then I think these extra little-endian accesses won't be forgotten about, because the llvm-objdump tests touched by these patches will fail and remind whoever is doing it!

This revision was landed with ongoing or failed builds.Jul 26 2022, 1:35 AM

Closed by commit rG55f1fbf005fe: [MC,llvm-objdump,ARM] Target-dependent disassembly resync policy. (authored by simon_tatham). · Explain Why

This revision was automatically updated to reflect the committed changes.

simon_tatham added a commit: rG55f1fbf005fe: [MC,llvm-objdump,ARM] Target-dependent disassembly resync policy..

scott.linder added a subscriber: scott.linder.Oct 11 2022, 3:05 PM

scott.linder added inline comments.

llvm/tools/llvm-objdump/llvm-objdump.cpp
1073	@simon_tatham was this change intended? There is a fix at https://reviews.llvm.org/D135430 and I wanted to ping you in case we missed something.

scott.linder mentioned this in D135430: [llvm-objdump] Support nonzero section addresses in addSymbolizer.Oct 11 2022, 3:08 PM

simon_tatham added inline comments.Oct 12 2022, 1:21 AM

llvm/tools/llvm-objdump/llvm-objdump.cpp
1073	I see what you mean – it surely can't be sensible to call `Bytes.slice(Index - SectionAddr)` when `Index` is iterating from 0 up to `Bytes.size()`. This code looks suspiciously like the code in the next hunk up, around line 1027 in `collectLocalBranchTargets`. It's correct there, because the bounds on `Index` are set differently. So I suspect I pasted the same code in both places without noticing the difference. Sorry!

Revision Contents

Path

Size

llvm/

include/

llvm/

MC/

MCDisassembler/

MCDisassembler.h

23 lines

lib/

MC/

MCDisassembler/

MCDisassembler.cpp

5 lines

Target/

AArch64/

Disassembler/

AArch64Disassembler.h

3 lines

AArch64Disassembler.cpp

8 lines

ARM/

Disassembler/

ARMDisassembler.cpp

30 lines

test/

tools/

llvm-objdump/

ELF/

ARM/

unknown-instr-resync.test

52 lines

unknown-instr.test

4 lines

tools/

llvm-objdump/

llvm-objdump.cpp

25 lines

sancov/

sancov.cpp

8 lines

Diff 447619

llvm/include/llvm/MC/MCDisassembler/MCDisassembler.h

Show First 20 Lines • Show All 165 Lines • ▼ Show 20 Lines	public:
// Implement similar hooks that can be used at other points during		// Implement similar hooks that can be used at other points during
// disassembly. Something along the following lines:		// disassembly. Something along the following lines:
// - onBeforeInstructionDecode()		// - onBeforeInstructionDecode()
// - onAfterInstructionDecode()		// - onAfterInstructionDecode()
// - onSymbolEnd()		// - onSymbolEnd()
// It should help move much of the target specific code from llvm-objdump to		// It should help move much of the target specific code from llvm-objdump to
// respective target disassemblers.		// respective target disassemblers.

		/// Suggest a distance to skip in a buffer of data to find the next
		/// place to look for the start of an instruction. For example, if
		/// all instructions have a fixed alignment, this might advance to
		/// the next multiple of that alignment.
		///
		/// If not overridden, the default is 1.
		///
		/// \param Address - The address, in the memory space of region, of the
		DavidSpickettUnsubmitted Not Done Reply Inline Actions "memory space of the region" perhaps? DavidSpickett: "memory space of the region" perhaps?
		simon_tathamAuthorUnsubmitted Done Reply Inline Actions Oh yes, I copy-pasted the parameter comments from one of the existing API functions just above it without noticing that they didn't all say quite the same things or make sense in all cases. simon_tatham: Oh yes, I copy-pasted the parameter comments from one of the existing API functions just above…
		/// starting point (typically the first byte of something
		DavidSpickettUnsubmitted Not Done Reply Inline Actions What is "the symbol" here? Maybe from context it will be a symbol but it seems to me like you'd be in the middle of some function maybe. So symbol is "main" but I'm N bytes away from where it starts by this point. Do I have the wrong end of the stick? DavidSpickett: What is "the symbol" here? Maybe from context it will be a symbol but it seems to me like you'd…
		simon_tathamAuthorUnsubmitted Done Reply Inline Actions Another copy-paste goof. simon_tatham: Another copy-paste goof.
		/// that did not decode as a valid instruction at all).
		DavidSpickettUnsubmitted Not Done Reply Inline Actions I'd explain why the parameter exists. You'll probably end up repeating the thumb justification but hey why not. Also is the size of this known to be in some range or does the target also decide that. E.g. if thumb needed 4 bytes of context to decide, the Arm backend would pass this 4 bytes. Some other target could choose less or more or none. DavidSpickett: I'd explain why the parameter exists. You'll probably end up repeating the thumb justification…
		simon_tathamAuthorUnsubmitted Done Reply Inline Actions Where I've called it, I just passed all the data available. If that's not enough to make the best choice, then the caller can't magic more data out of the air anyway, so the callee will just have to do the best it can (which in Thumb I decided was advancing to the next multiple of 2 bytes – still better than the previous 1!) simon_tatham: Where I've called it, I just passed all the data available. If that's not enough to make the…
		/// \param Bytes - A reference to the actual bytes at Address. May be
		/// needed in order to determine the width of an
		/// unrecognized instruction (e.g. in Thumb this is a simple
		DavidSpickettUnsubmitted Not Done Reply Inline Actions `const ArrayRef<uint8_t>`? DavidSpickett: `const ArrayRef<uint8_t>`?
		simon_tathamAuthorUnsubmitted Done Reply Inline Actions No, because firstly, that would only make the tiny `ArrayRef` structure itself const and not the pointed-to data, and secondly, `ArrayRef` is implicitly a pointer-to-const anyway. (This too is copied from the previous API functions, so if it had been a bug, they'd have this bug as well.) simon_tatham: No, because firstly, that would only make the tiny `ArrayRef` structure itself const and not…
		DavidSpickettUnsubmitted Done Reply Inline Actions Right, I was reading the header as if the non const methods modified the array it's referencing, not the extent of the reference. DavidSpickett: Right, I was reading the header as if the non const methods modified the array it's referencing…
		/// consistent criterion that doesn't require knowing the
		/// specific instruction). The caller can pass as much data
		/// as they have available, and the function is required to
		/// make a reasonable default choice if not enough data is
		/// available to make a better one.
		/// \return - A number of bytes to skip. Must always be greater than
		/// zero. May be greater than the size of Bytes.
		virtual uint64_t suggestBytesToSkip(ArrayRef<uint8_t> Bytes,
		uint64_t Address) const;

private:		private:
MCContext &Ctx;		MCContext &Ctx;

protected:		protected:
// Subtarget information, for instruction decoding predicates if required.		// Subtarget information, for instruction decoding predicates if required.
const MCSubtargetInfo &STI;		const MCSubtargetInfo &STI;
std::unique_ptr<MCSymbolizer> Symbolizer;		std::unique_ptr<MCSymbolizer> Symbolizer;

Show All 24 Lines

llvm/lib/MC/MCDisassembler/MCDisassembler.cpp

	Show All 14 Lines

	Optional<MCDisassembler::DecodeStatus>			Optional<MCDisassembler::DecodeStatus>
	MCDisassembler::onSymbolStart(SymbolInfoTy &Symbol, uint64_t &Size,			MCDisassembler::onSymbolStart(SymbolInfoTy &Symbol, uint64_t &Size,
	ArrayRef<uint8_t> Bytes, uint64_t Address,			ArrayRef<uint8_t> Bytes, uint64_t Address,
	raw_ostream &CStream) const {			raw_ostream &CStream) const {
	return None;			return None;
	}			}

				uint64_t MCDisassembler::suggestBytesToSkip(ArrayRef<uint8_t> Bytes,
				uint64_t Address) const {
				return 1;
				}

	bool MCDisassembler::tryAddingSymbolicOperand(MCInst &Inst, int64_t Value,			bool MCDisassembler::tryAddingSymbolicOperand(MCInst &Inst, int64_t Value,
	uint64_t Address, bool IsBranch,			uint64_t Address, bool IsBranch,
	uint64_t Offset, uint64_t OpSize,			uint64_t Offset, uint64_t OpSize,
	uint64_t InstSize) const {			uint64_t InstSize) const {
	if (Symbolizer)			if (Symbolizer)
	return Symbolizer->tryAddingSymbolicOperand(Inst, *CommentStream, Value,			return Symbolizer->tryAddingSymbolicOperand(Inst, *CommentStream, Value,
	Address, IsBranch, Offset,			Address, IsBranch, Offset,
	OpSize, InstSize);			OpSize, InstSize);
	▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/Disassembler/AArch64Disassembler.h

Show All 24 Lines	AArch64Disassembler(const MCSubtargetInfo &STI, MCContext &Ctx,
MCInstrInfo const *MCII)		MCInstrInfo const *MCII)
: MCDisassembler(STI, Ctx), MCII(MCII) {}		: MCDisassembler(STI, Ctx), MCII(MCII) {}

~AArch64Disassembler() override = default;		~AArch64Disassembler() override = default;

MCDisassembler::DecodeStatus		MCDisassembler::DecodeStatus
getInstruction(MCInst &Instr, uint64_t &Size, ArrayRef<uint8_t> Bytes,		getInstruction(MCInst &Instr, uint64_t &Size, ArrayRef<uint8_t> Bytes,
uint64_t Address, raw_ostream &CStream) const override;		uint64_t Address, raw_ostream &CStream) const override;

		uint64_t suggestBytesToSkip(ArrayRef<uint8_t> Bytes,
		uint64_t Address) const override;
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_AARCH64_DISASSEMBLER_AARCH64DISASSEMBLER_H		#endif // LLVM_LIB_TARGET_AARCH64_DISASSEMBLER_AARCH64DISASSEMBLER_H

llvm/lib/Target/AArch64/Disassembler/AArch64Disassembler.cpp

Show First 20 Lines • Show All 344 Lines • ▼ Show 20 Lines	for (auto Table : Tables) {

if (Result != MCDisassembler::Fail)		if (Result != MCDisassembler::Fail)
return Result;		return Result;
}		}

return MCDisassembler::Fail;		return MCDisassembler::Fail;
}		}

		uint64_t AArch64Disassembler::suggestBytesToSkip(ArrayRef<uint8_t> Bytes,
		DavidSpickettUnsubmitted Not Done Reply Inline Actions We're assuming that Address is already aligned, I assume that's safe given that we're either: disassembling from the start of a fn, which would be aligned have just disassembled an instruction, which would have been aligned, and 4 bytes in size. Correct? I guess someone could give MC a misaligned function start but garbage in garbage out in that case? Or would you want to align up to the nearest 4 bytes. DavidSpickett: We're assuming that Address is already aligned, I assume that's safe given that we're either: *…
		simon_tathamAuthorUnsubmitted Done Reply Inline Actions Mmmm. I did wonder about specifying `suggestBytesToSkip` so that you could give it a totally arbitrary alignment and it would do something sensible. But I didn't like the idea that every single implementation (other than the default 'skip 1 byte because anything's a valid start position') would end up having to contain very similar boilerplate. That struck me as a sign of having put the API boundary in the wrong place. In this patch there are only two implementations of `suggestBytesToSkip` (or two and a half if you count the Arm/Thumb branches of the AArch32 one). But I expect that most non-x86 targets will end up wanting to do something sensible in here – surely a lot of targets are fixed-alignment RISC style, and even m68k has an alignment constraint if I remember my Amiga-owning days correctly. So there'd end up being a lot of copies of that code! simon_tatham: Mmmm. I did wonder about specifying `suggestBytesToSkip` so that you could give it a totally…
		simon_tathamAuthorUnsubmitted Done Reply Inline Actions Thinking about this a bit more ... another problem with trying to use `suggestBytesToSkip` to handle existing alignment problems is that it's called in the wrong place in the disassembly loop. Currently, the expected usage is that you have some starting address, and you call `getInstruction` to see if you can disassemble an instruction starting there. If you can, you advance by its width; otherwise, you call this new `suggestBytesToSkip` function and advance by that many instead. But if the user tries to start disassembly at an address that's invalid due to misalignment, then `suggestBytesToSkip` can't rescue them anyway, because by the time it's first called, it's too late to prevent an initial nonsense instruction from having been decoded at the misaligned starting location. So then you'd get a resynchronization between that first instruction and the next one, which seems even more nonsensical to me! So I think it is right that `suggestBytesToSkip` restricts itself to not creating new alignment problems, and doesn't check whether there's an existing one. The latter (if we think it needs doing at all) would be the job of some other API function. simon_tatham: Thinking about this a bit more ... another problem with trying to use `suggestBytesToSkip` to…
		DavidSpickettUnsubmitted Done Reply Inline Actions Makes sense. If your starting point was misaligned to begin with it's either a mistake of some other tool, or you're doing something unusual where you expect to have to handle things yourself (some kind of "is this random data possibly code" tool for example). DavidSpickett: Makes sense. If your starting point was misaligned to begin with it's either a mistake of some…
		uint64_t Address) const {
		// AArch64 instructions are always 4 bytes wide, so there's no point
		// in skipping any smaller number of bytes if an instruction can't
		// be decoded.
		return 4;
		}

static MCSymbolizer *		static MCSymbolizer *
createAArch64ExternalSymbolizer(const Triple &TT, LLVMOpInfoCallback GetOpInfo,		createAArch64ExternalSymbolizer(const Triple &TT, LLVMOpInfoCallback GetOpInfo,
LLVMSymbolLookupCallback SymbolLookUp,		LLVMSymbolLookupCallback SymbolLookUp,
void DisInfo, MCContext Ctx,		void DisInfo, MCContext Ctx,
std::unique_ptr<MCRelocationInfo> &&RelInfo) {		std::unique_ptr<MCRelocationInfo> &&RelInfo) {
return new AArch64ExternalSymbolizer(*Ctx, std::move(RelInfo), GetOpInfo,		return new AArch64ExternalSymbolizer(*Ctx, std::move(RelInfo), GetOpInfo,
SymbolLookUp, DisInfo);		SymbolLookUp, DisInfo);
}		}
▲ Show 20 Lines • Show All 1,527 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/Disassembler/ARMDisassembler.cpp

Show First 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	public:
}		}

~ARMDisassembler() override = default;		~ARMDisassembler() override = default;

DecodeStatus getInstruction(MCInst &Instr, uint64_t &Size,		DecodeStatus getInstruction(MCInst &Instr, uint64_t &Size,
ArrayRef<uint8_t> Bytes, uint64_t Address,		ArrayRef<uint8_t> Bytes, uint64_t Address,
raw_ostream &CStream) const override;		raw_ostream &CStream) const override;

		uint64_t suggestBytesToSkip(ArrayRef<uint8_t> Bytes,
		uint64_t Address) const override;

private:		private:
DecodeStatus getARMInstruction(MCInst &Instr, uint64_t &Size,		DecodeStatus getARMInstruction(MCInst &Instr, uint64_t &Size,
ArrayRef<uint8_t> Bytes, uint64_t Address,		ArrayRef<uint8_t> Bytes, uint64_t Address,
raw_ostream &CStream) const;		raw_ostream &CStream) const;

DecodeStatus getThumbInstruction(MCInst &Instr, uint64_t &Size,		DecodeStatus getThumbInstruction(MCInst &Instr, uint64_t &Size,
ArrayRef<uint8_t> Bytes, uint64_t Address,		ArrayRef<uint8_t> Bytes, uint64_t Address,
raw_ostream &CStream) const;		raw_ostream &CStream) const;
▲ Show 20 Lines • Show All 584 Lines • ▼ Show 20 Lines	case ARM::t2SUBrs:
if (MI.getOperand(0).getReg() == ARM::SP &&		if (MI.getOperand(0).getReg() == ARM::SP &&
MI.getOperand(1).getReg() != ARM::SP)		MI.getOperand(1).getReg() != ARM::SP)
return MCDisassembler::SoftFail;		return MCDisassembler::SoftFail;
return Result;		return Result;
default: return Result;		default: return Result;
}		}
}		}

		uint64_t ARMDisassembler::suggestBytesToSkip(ArrayRef<uint8_t> Bytes,
		uint64_t Address) const {
		// In Arm state, instructions are always 4 bytes wide, so there's no
		// point in skipping any smaller number of bytes if an instruction
		// can't be decoded.
		if (!STI.getFeatureBits()[ARM::ModeThumb])
		return 4;

		// In a Thumb instruction stream, a halfword is a standalone 2-byte
		// instruction if and only if its value is less than 0xE800.
		// Otherwise, it's the first halfword of a 4-byte instruction.
		//
		// So, if we can see the upcoming halfword, we can judge on that
		// basis, and maybe skip a whole 4-byte instruction that we don't
		DavidSpickettUnsubmitted Not Done Reply Inline Actions Is the in memory layout of thumb instructions the same between endians? DavidSpickett: Is the in memory layout of thumb instructions the same between endians?
		simon_tathamAuthorUnsubmitted Done Reply Inline Actions In the modern (BE8-style) Arm architecture, yes: instructions are stored little-endian regardless of the data endianness. In A32 that means little-endian 32-bit words, and in T32 it means little-endian 16-bit halfwords. In older versions of the architecture that was different, if I remember rightly. But the rest of LLVM doesn't support those versions either, because this code is taken directly from the code in the same file that extracts the halfword for actual disassembly. simon_tatham: In the modern (BE8-style) Arm architecture, yes: instructions are stored little-endian…
		simon_tathamAuthorUnsubmitted Done Reply Inline Actions Hmmm, actually, now I go back and check more carefully ... I'd forgotten the wrinkle that in the AArch32 ABI, ELF objects are supposed to have their instructions stored in endianness matching the ELF header. But ELF images have them stored in BE8, and the linker is supposed to byte-swap the right parts of the code sections based on the mapping symbols. So actually, you're right: the places in this patch and in D130358 where I've added always-little-endian accesses into the code section won't work everywhere. On the other hand, (a) LLD doesn't support that byte-reversal, and (b) the MC disassembler also reads instructions as little-endian unconditionally, so even before this patch, `llvm-objdump` would mis-disassemble an object file of that kind. So I'm not introducing any more big-endian incompatibility than was already there. And if we make major changes to fix that at a later date, then I think these extra little-endian accesses won't be forgotten about, because the llvm-objdump tests touched by these patches will fail and remind whoever is doing it! simon_tatham: Hmmm, actually, now I go back and check more carefully ... I'd forgotten the wrinkle that in…
		// know how to decode, without accidentally trying to interpret its
		// second half as something else.
		//
		// If we don't have the instruction data available, we just have to
		// recommend skipping the minimum sensible distance, which is 2
		// bytes.
		if (Bytes.size() < 2)
		return 2;

		uint16_t Insn16 = (Bytes[1] << 8) \| Bytes[0];
		return Insn16 < 0xE800 ? 2 : 4;
		}

		DavidSpickettUnsubmitted Done Reply Inline Actions I might have written this early return style, but that's my lldb bias talking. if (!is_thumb) return 4; if (bytes.size < 2) return 2; Insn16 .... return Insn16 < 0xE800 ? 2 : 4; The logic is clear either way but if you want to have one less indent. DavidSpickett: I might have written this early return style, but that's my lldb bias talking. ``` if (!
DecodeStatus ARMDisassembler::getInstruction(MCInst &MI, uint64_t &Size,		DecodeStatus ARMDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
ArrayRef<uint8_t> Bytes,		ArrayRef<uint8_t> Bytes,
uint64_t Address,		uint64_t Address,
raw_ostream &CS) const {		raw_ostream &CS) const {
if (STI.getFeatureBits()[ARM::ModeThumb])		if (STI.getFeatureBits()[ARM::ModeThumb])
return getThumbInstruction(MI, Size, Bytes, Address, CS);		return getThumbInstruction(MI, Size, Bytes, Address, CS);
return getARMInstruction(MI, Size, Bytes, Address, CS);		return getARMInstruction(MI, Size, Bytes, Address, CS);
}		}
▲ Show 20 Lines • Show All 6,243 Lines • Show Last 20 Lines

llvm/test/tools/llvm-objdump/ELF/ARM/unknown-instr-resync.test

This file was added.

				# RUN: yaml2obj %s \| llvm-objdump -d --mcpu=cortex-a8 - \| FileCheck %s

				DavidSpickettUnsubmitted Not Done Reply Inline Actions I think this one is the thumb side of the testing, can you add that in a comment in the file if so. Judging by the skip 4 then skip 2 later. (if this is the thumb test do you need another one for Arm only?) DavidSpickett: I think this one is the thumb side of the testing, can you add that in a comment in the file if…
				simon_tathamAuthorUnsubmitted Done Reply Inline Actions It's both – the first three instructions are Arm, including an unrecognised Arm instruction, and the next six are Thumb, including a 16- and a 32-bit unrecognised Thumb instruction. simon_tatham: It's both – the first three instructions are Arm, including an unrecognised Arm instruction…
				# Test that unrecognized instructions are skipped in a way that makes
				# sense for the Arm instruction set encoding.
				#
				# The first three instructions in this file are marked by the mapping
				# symbols as in Arm state, with the one in the middle unknown, and we
				# expect the disassembler to skip 4 bytes because that's the width of
				# any Arm instruction.
				#
				# At address 0xc there's a mapping symbol that says we're now in Thumb
				# mode, and in that mode we include both a 16-bit and a 32-bit unknown
				# Thumb instruction, which the disassembler will identify by the simple
				DavidSpickettUnsubmitted Done Reply Inline Actions Is it worth adding a single byte on the end to cover the thumb path if bytes < 2? Proves you don't out of bounds the bytes you're given, at least in an asserts build (I hope). DavidSpickett: Is it worth adding a single byte on the end to cover the thumb path if bytes < 2? Proves you…
				# encoding criterion that tells you the instruction length without
				# having to recognize it specifically.
				#
				# Finally we end with a single byte, to ensure nothing gets confused
				# when the Thumb instruction stream doesn't contain enough data to
				# even do that check.

				# CHECK: 0: 64 00 a0 e3 mov r0, #100
				# CHECK-NEXT: 4: ff ff ff ff <unknown>
				# CHECK-NEXT: 8: 12 03 81 e0 add r0, r1, r2, lsl r3

				# CHECK: c: 64 20 movs r0, #100
				# CHECK-NEXT: e: 0e b8 <unknown>
				# CHECK-NEXT: 10: 40 18 adds r0, r0, r1
				# CHECK-NEXT: 12: 4f f0 64 00 mov.w r0, #100
				# CHECK-NEXT: 16: ee ff cc dd <unknown>
				# CHECK-NEXT: 1a: 01 eb c2 00 add.w r0, r1, r2, lsl #3
				# CHECK-NEXT: 1e: 9a <unknown>

				--- !ELF
				FileHeader:
				Class: ELFCLASS32
				Data: ELFDATA2LSB
				Type: ET_REL
				Machine: EM_ARM
				Flags: [ EF_ARM_EABI_VER5 ]
				Sections:
				- Name: .text
				Type: SHT_PROGBITS
				Flags: [ SHF_ALLOC, SHF_EXECINSTR ]
				AddressAlign: 0x4
				Content: 6400a0e3ffffffff120381e064200eb840184ff06400eeffccdd01ebc2009a
				Symbols:
				- Name: '$a'
				Section: .text
				- Name: '$t'
				Section: .text
				Value: 0x0c
				...

llvm/test/tools/llvm-objdump/ELF/ARM/unknown-instr.test

	# RUN: yaml2obj %s -o %t			# RUN: yaml2obj %s -o %t
	# RUN: llvm-objdump -D --triple=thumbv8.1m.main-none-eabi %t \| FileCheck %s			# RUN: llvm-objdump -D --triple=thumbv8.1m.main-none-eabi %t \| FileCheck %s

	## This is a test case with "random" data/instructions, checking that			## This is a test case with "random" data/instructions, checking that
	## llvm-objdump handles such instructions cleanly. Disassembly of instructions			## llvm-objdump handles such instructions cleanly. Disassembly of instructions
	## can fail when it e.g. is not given the right set of architecture features,			## can fail when it e.g. is not given the right set of architecture features,
	## for example when the source is compiled with:			## for example when the source is compiled with:
	##			##
	## clang -march=..+ext1+ext2			## clang -march=..+ext1+ext2
	##			##
	## and disassembly is attempted with:			## and disassembly is attempted with:
	##			##
	## llvm-objdump --mattr=+ext1			## llvm-objdump --mattr=+ext1

	# CHECK: 00000000 <.text>:			# CHECK: 00000000 <.text>:
	# CHECK-NEXT: 0: cb <unknown>			# CHECK-NEXT: 0: cb f3 f7 8b <unknown>
	# CHECK-NEXT: 1: f3 f7 8b be b.w 0xffff3d1b <{{.+}}> @ imm = #-49898			# CHECK-NEXT: 4: be <unknown>

	--- !ELF			--- !ELF
	FileHeader:			FileHeader:
	Class: ELFCLASS32			Class: ELFCLASS32
	Data: ELFDATA2LSB			Data: ELFDATA2LSB
	Type: ET_REL			Type: ET_REL
	Machine: EM_ARM			Machine: EM_ARM
	Sections:			Sections:
	- Name: .text			- Name: .text
	Type: SHT_PROGBITS			Type: SHT_PROGBITS
	Content: "cbf3f78bbe"			Content: "cbf3f78bbe"

llvm/tools/llvm-objdump/llvm-objdump.cpp

Show First 20 Lines • Show All 1,016 Lines • ▼ Show 20 Lines	static void collectLocalBranchTargets(
unsigned LabelCount = 0;		unsigned LabelCount = 0;
Start += SectionAddr;		Start += SectionAddr;
End += SectionAddr;		End += SectionAddr;
uint64_t Index = Start;		uint64_t Index = Start;
while (Index < End) {		while (Index < End) {
// Disassemble a real instruction and record function-local branch labels.		// Disassemble a real instruction and record function-local branch labels.
MCInst Inst;		MCInst Inst;
uint64_t Size;		uint64_t Size;
bool Disassembled = DisAsm->getInstruction(		ArrayRef<uint8_t> ThisBytes = Bytes.slice(Index - SectionAddr);
Inst, Size, Bytes.slice(Index - SectionAddr), Index, nulls());		bool Disassembled =
		DisAsm->getInstruction(Inst, Size, ThisBytes, Index, nulls());
if (Size == 0)		if (Size == 0)
Size = 1;		Size = std::min(ThisBytes.size(),
		DisAsm->suggestBytesToSkip(ThisBytes, Index));

if (Disassembled && MIA) {		if (Disassembled && MIA) {
uint64_t Target;		uint64_t Target;
bool TargetKnown = MIA->evaluateBranch(Inst, Index, Size, Target);		bool TargetKnown = MIA->evaluateBranch(Inst, Index, Size, Target);
// On PowerPC, if the address of a branch is the same as the target, it		// On PowerPC, if the address of a branch is the same as the target, it
// means that it's a function call. Do not mark the label for this case.		// means that it's a function call. Do not mark the label for this case.
if (TargetKnown && (Target >= Start && Target < End) &&		if (TargetKnown && (Target >= Start && Target < End) &&
!Labels.count(Target) &&		!Labels.count(Target) &&
Show All 26 Lines	if (!SymbolizeOperands)
return;		return;

// Synthesize labels referenced by branch instructions by		// Synthesize labels referenced by branch instructions by
// disassembling, discarding the output, and collecting the referenced		// disassembling, discarding the output, and collecting the referenced
// addresses from the symbolizer.		// addresses from the symbolizer.
for (size_t Index = 0; Index != Bytes.size();) {		for (size_t Index = 0; Index != Bytes.size();) {
MCInst Inst;		MCInst Inst;
uint64_t Size;		uint64_t Size;
DisAsm->getInstruction(Inst, Size, Bytes.slice(Index), SectionAddr + Index,		ArrayRef<uint8_t> ThisBytes = Bytes.slice(Index - SectionAddr);
		scott.linderUnsubmitted Not Done Reply Inline Actions @simon_tatham was this change intended? There is a fix at https://reviews.llvm.org/D135430 and I wanted to ping you in case we missed something. scott.linder: @simon_tatham was this change intended? There is a fix at https://reviews.llvm.org/D135430 and…
		simon_tathamAuthorUnsubmitted Done Reply Inline Actions I see what you mean – it surely can't be sensible to call `Bytes.slice(Index - SectionAddr)` when `Index` is iterating from 0 up to `Bytes.size()`. This code looks suspiciously like the code in the next hunk up, around line 1027 in `collectLocalBranchTargets`. It's correct there, because the bounds on `Index` are set differently. So I suspect I pasted the same code in both places without noticing the difference. Sorry! simon_tatham: I see what you mean – it surely can't be sensible to call `Bytes.slice(Index - SectionAddr)`…
nulls());		DisAsm->getInstruction(Inst, Size, ThisBytes, Index, nulls());
if (Size == 0)		if (Size == 0)
Size = 1;		Size = std::min(ThisBytes.size(),
		DisAsm->suggestBytesToSkip(ThisBytes, Index));
Index += Size;		Index += Size;
}		}
ArrayRef<uint64_t> LabelAddrsRef = SymbolizerPtr->getReferencedAddresses();		ArrayRef<uint64_t> LabelAddrsRef = SymbolizerPtr->getReferencedAddresses();
// Copy and sort to remove duplicates.		// Copy and sort to remove duplicates.
std::vector<uint64_t> LabelAddrs;		std::vector<uint64_t> LabelAddrs;
LabelAddrs.insert(LabelAddrs.end(), LabelAddrsRef.begin(),		LabelAddrs.insert(LabelAddrs.end(), LabelAddrsRef.begin(),
LabelAddrsRef.end());		LabelAddrsRef.end());
llvm::sort(LabelAddrs);		llvm::sort(LabelAddrs);
▲ Show 20 Lines • Show All 450 Lines • ▼ Show 20 Lines	for (unsigned SI = 0, SE = Symbols.size(); SI != SE; ++SI) {
auto Iter2 = AllLabels.find(SectionAddr + Index);		auto Iter2 = AllLabels.find(SectionAddr + Index);
if (Iter2 != AllLabels.end())		if (Iter2 != AllLabels.end())
FOS << "<" << Iter2->second << ">:\n";		FOS << "<" << Iter2->second << ">:\n";
}		}

// Disassemble a real instruction or a data when disassemble all is		// Disassemble a real instruction or a data when disassemble all is
// provided		// provided
MCInst Inst;		MCInst Inst;
bool Disassembled =		ArrayRef<uint8_t> ThisBytes = Bytes.slice(Index);
DisAsm->getInstruction(Inst, Size, Bytes.slice(Index),		uint64_t ThisAddr = SectionAddr + Index;
SectionAddr + Index, CommentStream);		bool Disassembled = DisAsm->getInstruction(Inst, Size, ThisBytes,
		ThisAddr, CommentStream);
if (Size == 0)		if (Size == 0)
Size = 1;		Size = std::min(ThisBytes.size(),
		DisAsm->suggestBytesToSkip(ThisBytes, ThisAddr));

LVP.update({Index, Section.getIndex()},		LVP.update({Index, Section.getIndex()},
{Index + Size, Section.getIndex()}, Index + Size != End);		{Index + Size, Section.getIndex()}, Index + Size != End);

IP->setCommentStream(CommentStream);		IP->setCommentStream(CommentStream);

PIP.printInst(		PIP.printInst(
*IP, Disassembled ? &Inst : nullptr, Bytes.slice(Index, Size),		*IP, Disassembled ? &Inst : nullptr, Bytes.slice(Index, Size),
▲ Show 20 Lines • Show All 1,318 Lines • Show Last 20 Lines

llvm/tools/sancov/sancov.cpp

Show First 20 Lines • Show All 777 Lines • ▼ Show 20 Lines	for (object::SectionRef Section : O.sections()) {

Expected<StringRef> BytesStr = Section.getContents();		Expected<StringRef> BytesStr = Section.getContents();
failIfError(BytesStr);		failIfError(BytesStr);
ArrayRef<uint8_t> Bytes = arrayRefFromStringRef(*BytesStr);		ArrayRef<uint8_t> Bytes = arrayRefFromStringRef(*BytesStr);

for (uint64_t Index = 0, Size = 0; Index < Section.getSize();		for (uint64_t Index = 0, Size = 0; Index < Section.getSize();
Index += Size) {		Index += Size) {
MCInst Inst;		MCInst Inst;
if (!DisAsm->getInstruction(Inst, Size, Bytes.slice(Index),		ArrayRef<uint8_t> ThisBytes = Bytes.slice(Index);
SectionAddr + Index, nulls())) {		uint64_t ThisAddr = SectionAddr + Index;
		if (!DisAsm->getInstruction(Inst, Size, ThisBytes, ThisAddr, nulls())) {
if (Size == 0)		if (Size == 0)
Size = 1;		Size = std::min(ThisBytes.size(),
		DisAsm->suggestBytesToSkip(ThisBytes, ThisAddr));
continue;		continue;
}		}
uint64_t Addr = Index + SectionAddr;		uint64_t Addr = Index + SectionAddr;
// Sanitizer coverage uses the address of the next instruction - 1.		// Sanitizer coverage uses the address of the next instruction - 1.
uint64_t CovPoint = getPreviousInstructionPc(Addr + Size, TheTriple);		uint64_t CovPoint = getPreviousInstructionPc(Addr + Size, TheTriple);
uint64_t Target;		uint64_t Target;
if (MIA->isCall(Inst) &&		if (MIA->isCall(Inst) &&
MIA->evaluateBranch(Inst, SectionAddr + Index, Size, Target) &&		MIA->evaluateBranch(Inst, SectionAddr + Index, Size, Target) &&
▲ Show 20 Lines • Show All 424 Lines • Show Last 20 Lines