This is an archive of the discontinued LLVM Phabricator instance.

The issues with X86 prefixes: step 2
ClosedPublic

Authored by avt77 on Aug 29 2017, 6:46 AM.

Download Raw Diff

Details

Reviewers

craig.topper
echristo
RKSimon
davide
dtemirbulatov
coby
• rafael

Summary

This patch includes everything we have in D36788 plus it covers PR19251. I was forced to refactor and to re-design the current implementation because it was not adopted for various use cases of prefixes. Now it should fix all known issues.

Diff Detail

Event Timeline

avt77 created this revision.Aug 29 2017, 6:46 AM

craig.topper added inline comments.Sep 1 2017, 10:40 PM

include/llvm/MC/MCInst.h
165	sence->sense

RKSimon added inline comments.Sep 4 2017, 10:17 AM

lib/Target/X86/Disassembler/X86Disassembler.cpp

243

Flags |= 1 ?

251

These hard coded values for the flag bits are going to be tricky to maintain. Perhaps an enum ?

enum OpPrefixFlag {
  OPF_OpSize = 1,
  OPF_AdSize = 2,
  OPF_REP = 4,
  OPF_REP = 8,
};

void clearFlags() { Flags = 0; }
void setFlag(OpPrefixFlag F) { Flags |= (unsigned)F; }
bool isFlagSet(OpPrefixFlag F) const { return !!(Flags & (unsigned)F); }

avt77 mentioned this in D36788: The issues with X86 prefixes.Sep 19 2017, 3:43 AM

I fixed issues raised by Craig and Simon.

RKSimon added a reviewer: coby.Sep 19 2017, 6:38 AM

Craig,
You reviewed https://reviews.llvm.org/D36788 which was the first step of fixing issues with prefixes: could you review this new patch as well? In fact it's simply an extended version of D36788 and covers issues raised by echristo.

craig.topper added a reviewer: • rafael.Sep 25 2017, 10:43 AM

I'm not sure I can approve growing the size of MCInst. Though I can't see anyway around it. @rafael what do you think?

The new test cases don't test anything because they aren't running FileCheck.

test/MC/Disassembler/X86/prefixes-i386.s
1 ↗	(On Diff #115822)	This needs to be named .txt not .s You also need to pass the output to FileCheck
test/MC/Disassembler/X86/prefixes-x86_64.s
1 ↗	(On Diff #115822)	This file needs be named .txt not .s You also need to pass the output to FileCheck

This revision now requires changes to proceed.Sep 25 2017, 12:03 PM

I fixed tests mentioned by Craig. And about extension of MCInst: this opaque data (not simple flag) could be really useful for other components (not only for disassembler).

There was a email thread about the issues in this patch. To keep track of those emails I'm putting them here:

It was added when I’ve start poking around with prefixes, to implement the proper recognition of xaquire/xrelease (rL311309).
I can suggest some additional views to the matter at hand:
Enhancing prefix digestion by the parser is highly recommended – aforementioned FIXME note describes those issues I’ve found on a brief exploring, surly there’s more.
Currently it is emitted as a standalone instruction, which I don’t see much sense in.
IMHO, we should aggregate prefixes till we have consumed an ‘actual’ instruction, and then make queries about whether they form a legal combination, and I deem it to be the right course for the disassembler as well, unless we don’t care about tolerating nonsense in disassembly (how does others disassemblers handle such phenomena?)
Personally, I see prefixes as a part of a particular instruction, so at least on the concept level I’m in favor of ‘Flags’.
More generally, whether producing multiple MCInsts, using flag or whatever other approach – it’s technicalities.
Agreement should be first reached on what would be considered as a proper handling for both ends (parser, disassembler).
p.s. can you guyz please use Phab for further discussion? Hard (for me) to keep track on mail correspondence.

From: Andrew Tischenko [mailto:tishenandr@xenzu.com]
BTW, JFYI, I found the following comment in the source:

// FIXME:
// Enhace prefixes integrity robustness. for example, following forms
// are currently tolerated:
// repz repnz <insn>    ; GAS errors for the use of two similar prefixes
// lock addq %rax, %rbx ; Destination operand must be of memory type
// xacquire <insn>      ; xacquire must be accompanied by 'lock'

The approach with Flag will allow to implement it.
Andrew

On 27.09.2017 12:18, Andrew Tischenko wrote:

OK, I'll try to change the assembler properly but there are some questions:

Should I do it in the same patch?
Currently if we have:

repz repnz repe cmpsb

then we produce with 'llvm-mc -triple x86_64-unknown-unknown -x86-asm-syntax=intel -show-encoding intel-syntax.s':

rep # encoding: [0xf3]
repne # encoding: [0xf2]
rep # encoding: [0xf3]
cmpsb %es:(%rdi), (%rsi) # encoding: [0xa6]

but after the change we'll get only the following:

rep # encoding: [0xf3]
cmpsb %es:(%rdi), (%rsi) # encoding: [0xa6]

Is it OK? (IMHO, Yes.)

If YES, do we need any warnings here? (IMHO, No.)

Andrew

On 26.09.2017 22:31, Rafael Avila de Espindola wrote:

The assembler and disassembler should use the same path.
I would be OK with always producing 1 or N instructions, as long as both
the assembler and disassembler do the same. That is, it is OK to have
Flags, as long as the assembler uses that instead of creating a separate
instruction for prefixes.
It seems that allowing the disassembler to create multiple instructions
would have the advantage of not needing Flags, but that is secondary
IMHO.
Cheers,
Rafael

Craig Topper <craig.topper@gmail.com> writes:

Here's my understanding of what I think happens today.
-For a very select few instructions if the AsmParser sees a repne/repe
prefix it creates a special version of the instruction that has the REP
bits set in TSFlags. For any other instruction it emits the repne/rep/repe
as a separate MCInst.
-For the disassembler if it sees a repne/repe byte at the start that it
doesn't think goes with an instruction it will emit a MCInst containing
just the REP.
-If the disassembler encounters a repne/repe byte not at the start of the
instruction that doesn't go with the instruction we drop it and don't print
anything. The disassembler interface only allows us to return one
instruction. So we can't return a separate repne/repe instruction and a
real instruction from the same byte sequence. I don't believe the assembler
can ever produce a byte sequence that hits this case, but that doesn't mean
some binary couldn't contain that string of bytes created by hand. So this
patch is trying to preserve the extra prefix information in the one MCInst
we're allowed to emit. Maybe another option would be to allow creating
multiple MCInsts from the disassembler?
~Craig

On Tue, Sep 26, 2017 at 10:37 AM, Rafael Avila de Espindola <
rafael.espindola@gmail.com> wrote:
The question is why it is different for disassembler than for the
assembler?
How does the assembler handle trepne?
Cheers,
Rafael

Andrew Tischenko <tishenandr@xenzu.com> writes:
It is not a simple flag, it's some data. And this data could be useful
for any other component because it's some opaque info which could be
send via MCInst from one low level target component to another one.
Without this (additional) data MCInst loosing (potentially very useful)
info about the given instruction.

Andrew

On 25.09.2017 22:05, Rafael Avila de Espindola wrote:

Having a flag field that is used only on disassembly seems wrong.
Don't we support parsing our own output? I don't see trepne in any .s
test for example.

Cheers,
Rafael

Rafael, Craig,

After of some investigation of AsmParser I realised that assembler modification to support Flags for proper prefixes elaboration will require signature change of at least 2 virtual functions: ParseInstruction and MatchAndEmitInstruction. The reason of such change is very simple: ParseInstruction does real parsing (and as result could track prefixes) but does not deal with MCInst while MatchAndEmitInstruction create MCInst. This signature modifications will force massive changes in the sources that's why it will be difficult to review.

The question is: could we accept the current patch as it is (only disassembler is being supported) and follow up with a new patch which will include the corresponding assembler changes?
Or the big patch is OK?

Alternatively, I could add a new X86Operand type (e.g. Prefix): as result I will not change any signature but it will be X86-specific only. What's better from your point of view?

I re-implemented assembler in case of working with X86-prefixes. Now both X86-assembler and X86-disassembler use the new Flags field from MCInst. As result now it's possible to track several prefixes for one instr and now one prefix is not a separate instruction but only is the parameter of the one. I tried to keep the current tests unchanged where it's possible. And I did not extend/change any diagnostic related to prefixes: it should be done in the follow up patches.

I added a test to cover PR32809.
Guys, could you speed up the review? In fact you already reviewed everything except X86-ASM changes: I did it to have one path for both assembler and disassembler as we agreed in our previuos discussions.

craig.topper added inline comments.Oct 11 2017, 9:58 AM

lib/Target/X86/AsmParser/X86AsmParser.cpp
2865	Use isPrefix()?
2866	I'm not sure this isn't a use after free. pop_back_val is going to return a std::unique_ptr which you dereferenced, but I'm not convinced anything is keeping the std::unique_ptr alive.
lib/Target/X86/AsmParser/X86Operand.h
395	isPrefix*

avt77 added inline comments.Oct 12 2017, 1:02 AM

lib/Target/X86/AsmParser/X86AsmParser.cpp
2866	It seems that everything is OK here because C++ Utilities library Dynamic memory management std::unique_ptr typename std::add_lvalue_reference<T>::type operator() const; (1) (since C++11) pointer operator->() const noexcept; (2) (since C++11) operator and operator-> provide access to the object owned by this. The behavior is undefined if get() == nullptr Parameters (none) Return value Returns the object owned by this, equivalent to get(). Returns a pointer to the object owned by this, i.e. get(). Am I right?

I added usage of isPrefix() accordingly to Craig requirement.

I added tests covering PR21640. Now this patch covers PR7709, PR17697, PR19251, PR32809 and PR21640.

craig.topper added inline comments.Oct 12 2017, 11:50 AM

lib/Target/X86/AsmParser/X86AsmParser.cpp
2866	The problem isn't with the operator*. It's about the ordering of when the destructor for the std::unqiue_ptr returned by pop_back_val is executed. I believe it will happen as soon as that line ends since it's an unnamed temporary. So the destructor will run and delete the object the unique_ptr is pointing at. But you're still holding a reference to that object. So it's a use after free. I think you should assign to Prefixes while the object is still in the vector using .back(), and then once you're done with that just call pop_back() on the vector.

avt77 added inline comments.Oct 13 2017, 12:50 AM

lib/Target/X86/AsmParser/X86AsmParser.cpp
2866	I'm not sure you're right because they say about "end of scope" to invoke destructor and here the "scope" should be "if" statement. But to be absolutely safe I'll do it. Tnx.

Safe implementation for std::unique_ptr usage was done (raised by Craig).

Thanks for fixing that. Scope only applies to named objects, the unique_ptr here has no name. https://stackoverflow.com/questions/2298781/why-do-un-named-c-objects-destruct-before-the-scope-block-ends.

LGTM other than the one comment.

This revision is now accepted and ready to land.Oct 13 2017, 8:28 AM

avt77 mentioned this in rL315899: This patch is a result of D37262: The issues with X86 prefixes. It closes….Oct 16 2017, 4:14 AM

The patch was committed as rL315899.

Revision Contents

Path

Size

include/

llvm/

MC/

MCInst.h

7 lines

lib/

Target/

X86/

AsmParser/

X86AsmParser.cpp

66 lines

X86Operand.h

25 lines

Disassembler/

X86Disassembler.cpp

32 lines

X86DisassemblerDecoder.h

16 lines

X86DisassemblerDecoder.cpp

265 lines

InstPrinter/

X86ATTInstPrinter.cpp

8 lines

X86IntelInstPrinter.cpp

6 lines

MCTargetDesc/

X86BaseInfo.h

10 lines

X86MCCodeEmitter.cpp

7 lines

test/

MC/

Disassembler/

X86/

prefixes.txt

56 lines

x86-64.txt

3 lines

X86/

intel-syntax-encoding.s

6 lines

x86-64.s

24 lines

Diff 118602

include/llvm/MC/MCInst.h

	Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines
	template <> struct isPodLike<MCOperand> { static const bool value = true; };			template <> struct isPodLike<MCOperand> { static const bool value = true; };

	/// \brief Instances of this class represent a single low-level machine			/// \brief Instances of this class represent a single low-level machine
	/// instruction.			/// instruction.
	class MCInst {			class MCInst {
	unsigned Opcode = 0;			unsigned Opcode = 0;
	SMLoc Loc;			SMLoc Loc;
	SmallVector<MCOperand, 8> Operands;			SmallVector<MCOperand, 8> Operands;
				// These flags could be used to pass some info from one target subcomponent
				// to another, for example, from disassembler to asm printer. The values of
				// the flags have any sense on target level only (e.g. prefixes on x86).
				craig.topperUnsubmitted Not Done Reply Inline Actions sence->sense craig.topper: sence->sense
				unsigned Flags = 0;

	public:			public:
	MCInst() = default;			MCInst() = default;

	void setOpcode(unsigned Op) { Opcode = Op; }			void setOpcode(unsigned Op) { Opcode = Op; }
	unsigned getOpcode() const { return Opcode; }			unsigned getOpcode() const { return Opcode; }

				void setFlags(unsigned F) { Flags = F; }
				unsigned getFlags() const { return Flags; }

	void setLoc(SMLoc loc) { Loc = loc; }			void setLoc(SMLoc loc) { Loc = loc; }
	SMLoc getLoc() const { return Loc; }			SMLoc getLoc() const { return Loc; }

	const MCOperand &getOperand(unsigned i) const { return Operands[i]; }			const MCOperand &getOperand(unsigned i) const { return Operands[i]; }
	MCOperand &getOperand(unsigned i) { return Operands[i]; }			MCOperand &getOperand(unsigned i) { return Operands[i]; }
	unsigned getNumOperands() const { return Operands.size(); }			unsigned getNumOperands() const { return Operands.size(); }

	void addOperand(const MCOperand &Op) { Operands.push_back(Op); }			void addOperand(const MCOperand &Op) { Operands.push_back(Op); }
	Show All 39 Lines

lib/Target/X86/AsmParser/X86AsmParser.cpp

Show First 20 Lines • Show All 2,307 Lines • ▼ Show 20 Lines	if (ComparisonCode != ~0U) {
const MCExpr *ImmOp = MCConstantExpr::create(ComparisonCode,		const MCExpr *ImmOp = MCConstantExpr::create(ComparisonCode,
getParser().getContext());		getParser().getContext());
Operands.push_back(X86Operand::CreateImm(ImmOp, NameLoc, NameLoc));		Operands.push_back(X86Operand::CreateImm(ImmOp, NameLoc, NameLoc));

PatchedName = PatchedName.substr(PatchedName.size() - CCIdx);		PatchedName = PatchedName.substr(PatchedName.size() - CCIdx);
}		}
}		}

Operands.push_back(X86Operand::CreateToken(PatchedName, NameLoc));

// Determine whether this is an instruction prefix.		// Determine whether this is an instruction prefix.
// FIXME:		// FIXME:
// Enhance prefixes integrity robustness. for example, following forms		// Enhance prefixes integrity robustness. for example, following forms
// are currently tolerated:		// are currently tolerated:
// repz repnz <insn> ; GAS errors for the use of two similar prefixes		// repz repnz <insn> ; GAS errors for the use of two similar prefixes
// lock addq %rax, %rbx ; Destination operand must be of memory type		// lock addq %rax, %rbx ; Destination operand must be of memory type
// xacquire <insn> ; xacquire must be accompanied by 'lock'		// xacquire <insn> ; xacquire must be accompanied by 'lock'
bool isPrefix = StringSwitch<bool>(Name)		bool isPrefix = StringSwitch<bool>(Name)
.Cases("lock",		.Cases("rex64", "data32", "data16", true)
"rep", "repe",
"repz", "repne",
"repnz", "rex64",
"data32", "data16", true)
.Cases("xacquire", "xrelease", true)		.Cases("xacquire", "xrelease", true)
.Cases("acquire", "release", isParsingIntelSyntax())		.Cases("acquire", "release", isParsingIntelSyntax())
.Default(false);		.Default(false);

		auto isLockRepeatPrefix = [](StringRef N) {
		return StringSwitch<bool>(N)
		.Cases("lock", "rep", "repe", "repz", "repne", "repnz", true)
		.Default(false);
		};

bool CurlyAsEndOfStatement = false;		bool CurlyAsEndOfStatement = false;

		unsigned Flags = X86::IP_NO_PREFIX;
		while (isLockRepeatPrefix(Name.lower())) {
		unsigned Prefix =
		StringSwitch<unsigned>(Name)
		.Cases("lock", "lock", X86::IP_HAS_LOCK)
		.Cases("rep", "repe", "repz", X86::IP_HAS_REPEAT)
		.Cases("repne", "repnz", X86::IP_HAS_REPEAT_NE)
		.Default(X86::IP_NO_PREFIX); // Invalid prefix (impossible)
		Flags \|= Prefix;
		Name = Parser.getTok().getString();
		Parser.Lex(); // eat the prefix
		// Hack: we could have something like
		// "lock; cmpxchg16b $1" or "lock\0A\09incl" or "lock/incl"
		while (Name.startswith(";") \|\| Name.startswith("\n") \|\|
		Name.startswith("\t") or Name.startswith("/")) {
		Name = Parser.getTok().getString();
		Parser.Lex(); // go to next prefix or instr
		}
		}

		if (Flags)
		PatchedName = Name;
		Operands.push_back(X86Operand::CreateToken(PatchedName, NameLoc));

// This does the actual operand parsing. Don't parse any more if we have a		// This does the actual operand parsing. Don't parse any more if we have a
// prefix juxtaposed with an operation like "lock incl 4(%rax)", because we		// prefix juxtaposed with an operation like "lock incl 4(%rax)", because we
// just want to parse the "lock" as the first instruction and the "incl" as		// just want to parse the "lock" as the first instruction and the "incl" as
// the next one.		// the next one.
if (getLexer().isNot(AsmToken::EndOfStatement) && !isPrefix) {		if (getLexer().isNot(AsmToken::EndOfStatement) && !isPrefix) {

// Parse '*' modifier.		// Parse '*' modifier.
if (getLexer().is(AsmToken::Star))		if (getLexer().is(AsmToken::Star))
Operands.push_back(X86Operand::CreateToken("*", consumeToken()));		Operands.push_back(X86Operand::CreateToken("*", consumeToken()));

// Read the operands.		// Read the operands.
while(1) {		while(1) {
if (std::unique_ptr<X86Operand> Op = ParseOperand()) {		if (std::unique_ptr<X86Operand> Op = ParseOperand()) {
Operands.push_back(std::move(Op));		Operands.push_back(std::move(Op));
▲ Show 20 Lines • Show All 221 Lines • ▼ Show 20 Lines	if ((Name == "xlat" \|\| Name == "xlatb") && Operands.size() == 2) {
if (Op1.isMem8()) {		if (Op1.isMem8()) {
Warning(Op1.getStartLoc(), "memory operand is only for determining the "		Warning(Op1.getStartLoc(), "memory operand is only for determining the "
"size, (R\|E)BX will be used for the location");		"size, (R\|E)BX will be used for the location");
Operands.pop_back();		Operands.pop_back();
static_cast<X86Operand &>(*Operands[0]).setTokenValue("xlatb");		static_cast<X86Operand &>(*Operands[0]).setTokenValue("xlatb");
}		}
}		}

		if (Flags)
		Operands.push_back(X86Operand::CreatePrefix(Flags, NameLoc, NameLoc));
return false;		return false;
}		}

bool X86AsmParser::processInstruction(MCInst &Inst, const OperandVector &Ops) {		bool X86AsmParser::processInstruction(MCInst &Inst, const OperandVector &Ops) {
return false;		return false;
}		}

static const char *getSubtargetFeatureName(uint64_t Val);		static const char *getSubtargetFeatureName(uint64_t Val);
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	bool X86AsmParser::MatchAndEmitATTInstruction(SMLoc IDLoc, unsigned &Opcode,
X86Operand &Op = static_cast<X86Operand &>(*Operands[0]);		X86Operand &Op = static_cast<X86Operand &>(*Operands[0]);
assert(Op.isToken() && "Leading operand should always be a mnemonic!");		assert(Op.isToken() && "Leading operand should always be a mnemonic!");
SMRange EmptyRange = None;		SMRange EmptyRange = None;

// First, handle aliases that expand to multiple instructions.		// First, handle aliases that expand to multiple instructions.
MatchFPUWaitAlias(IDLoc, Op, Operands, Out, MatchingInlineAsm);		MatchFPUWaitAlias(IDLoc, Op, Operands, Out, MatchingInlineAsm);

bool WasOriginallyInvalidOperand = false;		bool WasOriginallyInvalidOperand = false;
		unsigned Prefixes = 0;

		if (static_cast<X86Operand &>(*Operands.back()).Kind == X86Operand::Prefix) {
		X86Operand &Prefix = static_cast<X86Operand &>(*Operands.pop_back_val());
		Prefixes = Prefix.getPrefix();
		}

MCInst Inst;		MCInst Inst;

		if (Prefixes)
		Inst.setFlags(Prefixes);

// First, try a direct match.		// First, try a direct match.
switch (MatchInstruction(Operands, Inst, ErrorInfo, MatchingInlineAsm,		switch (MatchInstruction(Operands, Inst, ErrorInfo, MatchingInlineAsm,
isParsingIntelSyntax())) {		isParsingIntelSyntax())) {
default: llvm_unreachable("Unexpected match result!");		default: llvm_unreachable("Unexpected match result!");
case Match_Success:		case Match_Success:
// Some instructions need post-processing to, for example, tweak which		// Some instructions need post-processing to, for example, tweak which
// encoding is selected. Loop on it while changes happen so the		// encoding is selected. Loop on it while changes happen so the
// individual transformations can chain off each other.		// individual transformations can chain off each other.
▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	bool X86AsmParser::MatchAndEmitIntelInstruction(SMLoc IDLoc, unsigned &Opcode,
uint64_t &ErrorInfo,		uint64_t &ErrorInfo,
bool MatchingInlineAsm) {		bool MatchingInlineAsm) {
assert(!Operands.empty() && "Unexpect empty operand list!");		assert(!Operands.empty() && "Unexpect empty operand list!");
X86Operand &Op = static_cast<X86Operand &>(*Operands[0]);		X86Operand &Op = static_cast<X86Operand &>(*Operands[0]);
assert(Op.isToken() && "Leading operand should always be a mnemonic!");		assert(Op.isToken() && "Leading operand should always be a mnemonic!");
StringRef Mnemonic = Op.getToken();		StringRef Mnemonic = Op.getToken();
SMRange EmptyRange = None;		SMRange EmptyRange = None;
StringRef Base = Op.getToken();		StringRef Base = Op.getToken();
		unsigned Prefixes = 0;

		if (static_cast<X86Operand &>(*Operands.back()).Kind == X86Operand::Prefix) {
		craig.topperUnsubmitted Not Done Reply Inline Actions Use isPrefix()? craig.topper: Use isPrefix()?
		X86Operand &Prefix = static_cast<X86Operand &>(*Operands.pop_back_val());
		craig.topperUnsubmitted Not Done Reply Inline Actions I'm not sure this isn't a use after free. pop_back_val is going to return a std::unique_ptr which you dereferenced, but I'm not convinced anything is keeping the std::unique_ptr alive. craig.topper: I'm not sure this isn't a use after free. pop_back_val is going to return a std::unique_ptr…
		avt77AuthorUnsubmitted Not Done Reply Inline Actions It seems that everything is OK here because C++ Utilities library Dynamic memory management std::unique_ptr typename std::add_lvalue_reference<T>::type operator() const; (1) (since C++11) pointer operator->() const noexcept; (2) (since C++11) operator and operator-> provide access to the object owned by this. The behavior is undefined if get() == nullptr Parameters (none) Return value Returns the object owned by this, equivalent to get(). Returns a pointer to the object owned by this, i.e. get(). Am I right? avt77: It seems that everything is OK here because C++ Utilities library Dynamic memory…
		craig.topperUnsubmitted Not Done Reply Inline Actions The problem isn't with the operator. It's about the ordering of when the destructor for the std::unqiue_ptr returned by pop_back_val is executed. I believe it will happen as soon as that line ends since it's an unnamed temporary. So the destructor will run and delete the object the unique_ptr is pointing at. But you're still holding a reference to that object. So it's a use after free. I think you should assign to Prefixes while the object is still in the vector using .back(), and then once you're done with that just call pop_back() on the vector. craig.topper:* The problem isn't with the operator*. It's about the ordering of when the destructor for the…
		avt77AuthorUnsubmitted Not Done Reply Inline Actions I'm not sure you're right because they say about "end of scope" to invoke destructor and here the "scope" should be "if" statement. But to be absolutely safe I'll do it. Tnx. avt77: I'm not sure you're right because they say about "end of scope" to invoke destructor and here…
		Prefixes = Prefix.getPrefix();
		}

// First, handle aliases that expand to multiple instructions.		// First, handle aliases that expand to multiple instructions.
MatchFPUWaitAlias(IDLoc, Op, Operands, Out, MatchingInlineAsm);		MatchFPUWaitAlias(IDLoc, Op, Operands, Out, MatchingInlineAsm);

MCInst Inst;		MCInst Inst;

		if (Prefixes)
		Inst.setFlags(Prefixes);

// Find one unsized memory operand, if present.		// Find one unsized memory operand, if present.
X86Operand *UnsizedMemOp = nullptr;		X86Operand *UnsizedMemOp = nullptr;
for (const auto &Op : Operands) {		for (const auto &Op : Operands) {
X86Operand X86Op = static_cast<X86Operand >(Op.get());		X86Operand X86Op = static_cast<X86Operand >(Op.get());
if (X86Op->isMemUnsized()) {		if (X86Op->isMemUnsized()) {
UnsizedMemOp = X86Op;		UnsizedMemOp = X86Op;
// Have we found an unqualified memory operand,		// Have we found an unqualified memory operand,
// break. IA allows only one memory operand.		// break. IA allows only one memory operand.
▲ Show 20 Lines • Show All 297 Lines • Show Last 20 Lines

lib/Target/X86/AsmParser/X86Operand.h

Show All 22 Lines
#include <cassert>		#include <cassert>
#include <memory>		#include <memory>

namespace llvm {		namespace llvm {

/// X86Operand - Instances of this class represent a parsed X86 machine		/// X86Operand - Instances of this class represent a parsed X86 machine
/// instruction.		/// instruction.
struct X86Operand : public MCParsedAsmOperand {		struct X86Operand : public MCParsedAsmOperand {
enum KindTy {		enum KindTy { Token, Register, Immediate, Memory, Prefix } Kind;
Token,
Register,
Immediate,
Memory
} Kind;

SMLoc StartLoc, EndLoc;		SMLoc StartLoc, EndLoc;
SMLoc OffsetOfLoc;		SMLoc OffsetOfLoc;
StringRef SymName;		StringRef SymName;
void *OpDecl;		void *OpDecl;
bool AddressOf;		bool AddressOf;

struct TokOp {		struct TokOp {
const char *Data;		const char *Data;
unsigned Length;		unsigned Length;
};		};

struct RegOp {		struct RegOp {
unsigned RegNo;		unsigned RegNo;
};		};

		struct PrefOp {
		unsigned Prefixes;
		};

struct ImmOp {		struct ImmOp {
const MCExpr *Val;		const MCExpr *Val;
};		};

struct MemOp {		struct MemOp {
unsigned SegReg;		unsigned SegReg;
const MCExpr *Disp;		const MCExpr *Disp;
unsigned BaseReg;		unsigned BaseReg;
unsigned IndexReg;		unsigned IndexReg;
unsigned Scale;		unsigned Scale;
unsigned Size;		unsigned Size;
unsigned ModeSize;		unsigned ModeSize;

/// If the memory operand is unsized and there are multiple instruction		/// If the memory operand is unsized and there are multiple instruction
/// matches, prefer the one with this size.		/// matches, prefer the one with this size.
unsigned FrontendSize;		unsigned FrontendSize;
};		};

union {		union {
struct TokOp Tok;		struct TokOp Tok;
struct RegOp Reg;		struct RegOp Reg;
struct ImmOp Imm;		struct ImmOp Imm;
struct MemOp Mem;		struct MemOp Mem;
		struct PrefOp Pref;
};		};

X86Operand(KindTy K, SMLoc Start, SMLoc End)		X86Operand(KindTy K, SMLoc Start, SMLoc End)
: Kind(K), StartLoc(Start), EndLoc(End) {}		: Kind(K), StartLoc(Start), EndLoc(End) {}

StringRef getSymName() override { return SymName; }		StringRef getSymName() override { return SymName; }
void *getOpDecl() override { return OpDecl; }		void *getOpDecl() override { return OpDecl; }

Show All 22 Lines	void setTokenValue(StringRef Value) {
Tok.Length = Value.size();		Tok.Length = Value.size();
}		}

unsigned getReg() const override {		unsigned getReg() const override {
assert(Kind == Register && "Invalid access!");		assert(Kind == Register && "Invalid access!");
return Reg.RegNo;		return Reg.RegNo;
}		}

		unsigned getPrefix() const {
		assert(Kind == Prefix && "Invalid access!");
		return Pref.Prefixes;
		}

const MCExpr *getImm() const {		const MCExpr *getImm() const {
assert(Kind == Immediate && "Invalid access!");		assert(Kind == Immediate && "Invalid access!");
return Imm.Val;		return Imm.Val;
}		}

const MCExpr *getMemDisp() const {		const MCExpr *getMemDisp() const {
assert(Kind == Memory && "Invalid access!");		assert(Kind == Memory && "Invalid access!");
return Mem.Disp;		return Mem.Disp;
▲ Show 20 Lines • Show All 260 Lines • ▼ Show 20 Lines	struct X86Operand : public MCParsedAsmOperand {
}		}
bool isMemOffs64_32() const {		bool isMemOffs64_32() const {
return isMemOffs() && Mem.ModeSize == 64 && (!Mem.Size \|\| Mem.Size == 32);		return isMemOffs() && Mem.ModeSize == 64 && (!Mem.Size \|\| Mem.Size == 32);
}		}
bool isMemOffs64_64() const {		bool isMemOffs64_64() const {
return isMemOffs() && Mem.ModeSize == 64 && (!Mem.Size \|\| Mem.Size == 64);		return isMemOffs() && Mem.ModeSize == 64 && (!Mem.Size \|\| Mem.Size == 64);
}		}

		bool isPrefux() const { return Kind == Prefix; }
		craig.topperUnsubmitted Not Done Reply Inline Actions isPrefix* craig.topper: isPrefix*
bool isReg() const override { return Kind == Register; }		bool isReg() const override { return Kind == Register; }

bool isGR32orGR64() const {		bool isGR32orGR64() const {
return Kind == Register &&		return Kind == Register &&
(X86MCRegisterClasses[X86::GR32RegClassID].contains(getReg()) \|\|		(X86MCRegisterClasses[X86::GR32RegClassID].contains(getReg()) \|\|
X86MCRegisterClasses[X86::GR64RegClassID].contains(getReg()));		X86MCRegisterClasses[X86::GR64RegClassID].contains(getReg()));
}		}

▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	CreateReg(unsigned RegNo, SMLoc StartLoc, SMLoc EndLoc,
Res->Reg.RegNo = RegNo;		Res->Reg.RegNo = RegNo;
Res->AddressOf = AddressOf;		Res->AddressOf = AddressOf;
Res->OffsetOfLoc = OffsetOfLoc;		Res->OffsetOfLoc = OffsetOfLoc;
Res->SymName = SymName;		Res->SymName = SymName;
Res->OpDecl = OpDecl;		Res->OpDecl = OpDecl;
return Res;		return Res;
}		}

		static std::unique_ptr<X86Operand>
		CreatePrefix(unsigned Prefixes, SMLoc StartLoc, SMLoc EndLoc) {
		auto Res = llvm::make_unique<X86Operand>(Prefix, StartLoc, EndLoc);
		Res->Pref.Prefixes = Prefixes;
		return Res;
		}

static std::unique_ptr<X86Operand> CreateImm(const MCExpr *Val,		static std::unique_ptr<X86Operand> CreateImm(const MCExpr *Val,
SMLoc StartLoc, SMLoc EndLoc) {		SMLoc StartLoc, SMLoc EndLoc) {
auto Res = llvm::make_unique<X86Operand>(Immediate, StartLoc, EndLoc);		auto Res = llvm::make_unique<X86Operand>(Immediate, StartLoc, EndLoc);
Res->Imm.Val = Val;		Res->Imm.Val = Val;
return Res;		return Res;
}		}

/// Create an absolute memory operand.		/// Create an absolute memory operand.
▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

lib/Target/X86/Disassembler/X86Disassembler.cpp

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
// table emitter and the disassembler.		// table emitter and the disassembler.
// X86DisassemblerDecoder.h contains the public interface of the decoder,		// X86DisassemblerDecoder.h contains the public interface of the decoder,
// factored out into C for possible use by other projects.		// factored out into C for possible use by other projects.
// X86DisassemblerDecoder.c contains the source code of the decoder, which is		// X86DisassemblerDecoder.c contains the source code of the decoder, which is
// responsible for steps 1-6.		// responsible for steps 1-6.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		#include "MCTargetDesc/X86BaseInfo.h"
#include "MCTargetDesc/X86MCTargetDesc.h"		#include "MCTargetDesc/X86MCTargetDesc.h"
#include "X86DisassemblerDecoder.h"		#include "X86DisassemblerDecoder.h"
#include "llvm/MC/MCContext.h"		#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCDisassembler/MCDisassembler.h"		#include "llvm/MC/MCDisassembler/MCDisassembler.h"
#include "llvm/MC/MCExpr.h"		#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCInst.h"		#include "llvm/MC/MCInst.h"
#include "llvm/MC/MCInstrInfo.h"		#include "llvm/MC/MCInstrInfo.h"
#include "llvm/MC/MCSubtargetInfo.h"		#include "llvm/MC/MCSubtargetInfo.h"
▲ Show 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	int Ret = decodeInstruction(&InternalInstr, regionReader, (const void *)&R,
LoggerFn, (void *)&VStream,		LoggerFn, (void *)&VStream,
(const void *)MII.get(), Address, fMode);		(const void *)MII.get(), Address, fMode);

if (Ret) {		if (Ret) {
Size = InternalInstr.readerCursor - Address;		Size = InternalInstr.readerCursor - Address;
return Fail;		return Fail;
} else {		} else {
Size = InternalInstr.length;		Size = InternalInstr.length;
return (!translateInstruction(Instr, InternalInstr, this)) ? Success : Fail;		bool Ret = translateInstruction(Instr, InternalInstr, this);
		if (!Ret) {
		unsigned Flags = X86::IP_NO_PREFIX;
		if (InternalInstr.hasAdSize)
		Flags \|= X86::IP_HAS_AD_SIZE;
		if (!InternalInstr.mandatoryPrefix) {
		if (InternalInstr.hasOpSize)
		Flags \|= X86::IP_HAS_OP_SIZE;
		RKSimonUnsubmitted Not Done Reply Inline Actions Flags \|= 1 ? RKSimon: Flags \|= 1 ?
		if (InternalInstr.repeatPrefix == 0xf2)
		Flags \|= X86::IP_HAS_REPEAT_NE;
		else if (InternalInstr.repeatPrefix == 0xf3 &&
		// It should not be 'pause' f3 90
		InternalInstr.opcode != 0x90)
		Flags \|= X86::IP_HAS_REPEAT;
		}
		Instr.setFlags(Flags);
		RKSimonUnsubmitted Not Done Reply Inline Actions These hard coded values for the flag bits are going to be tricky to maintain. Perhaps an enum ? enum OpPrefixFlag { OPF_OpSize = 1, OPF_AdSize = 2, OPF_REP = 4, OPF_REP = 8, }; void clearFlags() { Flags = 0; } void setFlag(OpPrefixFlag F) { Flags \|= (unsigned)F; } bool isFlagSet(OpPrefixFlag F) const { return !!(Flags & (unsigned)F); } RKSimon: These hard coded values for the flag bits are going to be tricky to maintain. Perhaps an enum ?
		}
		return (!Ret) ? Success : Fail;
}		}
}		}

//		//
// Private code that translates from struct InternalInstructions to MCInsts.		// Private code that translates from struct InternalInstructions to MCInsts.
//		//

/// translateRegister - Translates an internal register to the appropriate LLVM		/// translateRegister - Translates an internal register to the appropriate LLVM
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
/// translateSrcIndex - Appends a source index operand to an MCInst.		/// translateSrcIndex - Appends a source index operand to an MCInst.
///		///
/// @param mcInst - The MCInst to append to.		/// @param mcInst - The MCInst to append to.
/// @param insn - The internal instruction.		/// @param insn - The internal instruction.
static bool translateSrcIndex(MCInst &mcInst, InternalInstruction &insn) {		static bool translateSrcIndex(MCInst &mcInst, InternalInstruction &insn) {
unsigned baseRegNo;		unsigned baseRegNo;

if (insn.mode == MODE_64BIT)		if (insn.mode == MODE_64BIT)
baseRegNo = insn.prefixPresent[0x67] ? X86::ESI : X86::RSI;		baseRegNo = insn.hasAdSize ? X86::ESI : X86::RSI;
else if (insn.mode == MODE_32BIT)		else if (insn.mode == MODE_32BIT)
baseRegNo = insn.prefixPresent[0x67] ? X86::SI : X86::ESI;		baseRegNo = insn.hasAdSize ? X86::SI : X86::ESI;
else {		else {
assert(insn.mode == MODE_16BIT);		assert(insn.mode == MODE_16BIT);
baseRegNo = insn.prefixPresent[0x67] ? X86::ESI : X86::SI;		baseRegNo = insn.hasAdSize ? X86::ESI : X86::SI;
}		}
MCOperand baseReg = MCOperand::createReg(baseRegNo);		MCOperand baseReg = MCOperand::createReg(baseRegNo);
mcInst.addOperand(baseReg);		mcInst.addOperand(baseReg);

MCOperand segmentReg;		MCOperand segmentReg;
segmentReg = MCOperand::createReg(segmentRegnums[insn.segmentOverride]);		segmentReg = MCOperand::createReg(segmentRegnums[insn.segmentOverride]);
mcInst.addOperand(segmentReg);		mcInst.addOperand(segmentReg);
return false;		return false;
}		}

/// translateDstIndex - Appends a destination index operand to an MCInst.		/// translateDstIndex - Appends a destination index operand to an MCInst.
///		///
/// @param mcInst - The MCInst to append to.		/// @param mcInst - The MCInst to append to.
/// @param insn - The internal instruction.		/// @param insn - The internal instruction.

static bool translateDstIndex(MCInst &mcInst, InternalInstruction &insn) {		static bool translateDstIndex(MCInst &mcInst, InternalInstruction &insn) {
unsigned baseRegNo;		unsigned baseRegNo;

if (insn.mode == MODE_64BIT)		if (insn.mode == MODE_64BIT)
baseRegNo = insn.prefixPresent[0x67] ? X86::EDI : X86::RDI;		baseRegNo = insn.hasAdSize ? X86::EDI : X86::RDI;
else if (insn.mode == MODE_32BIT)		else if (insn.mode == MODE_32BIT)
baseRegNo = insn.prefixPresent[0x67] ? X86::DI : X86::EDI;		baseRegNo = insn.hasAdSize ? X86::DI : X86::EDI;
else {		else {
assert(insn.mode == MODE_16BIT);		assert(insn.mode == MODE_16BIT);
baseRegNo = insn.prefixPresent[0x67] ? X86::EDI : X86::DI;		baseRegNo = insn.hasAdSize ? X86::EDI : X86::DI;
}		}
MCOperand baseReg = MCOperand::createReg(baseRegNo);		MCOperand baseReg = MCOperand::createReg(baseRegNo);
mcInst.addOperand(baseReg);		mcInst.addOperand(baseReg);
return false;		return false;
}		}

/// translateImmediate - Appends an immediate operand to an MCInst.		/// translateImmediate - Appends an immediate operand to an MCInst.
///		///
▲ Show 20 Lines • Show All 756 Lines • Show Last 20 Lines

lib/Target/X86/Disassembler/X86DisassemblerDecoder.h

Show First 20 Lines • Show All 540 Lines • ▼ Show 20 Lines	struct InternalInstruction {
DisassemblerMode mode;		DisassemblerMode mode;
// The start of the instruction, usable with the reader		// The start of the instruction, usable with the reader
uint64_t startLocation;		uint64_t startLocation;
// The length of the instruction, in bytes		// The length of the instruction, in bytes
size_t length;		size_t length;

// Prefix state		// Prefix state

// 1 if the prefix byte corresponding to the entry is present; 0 if not		// The possible mandatory prefix
uint8_t prefixPresent[0x100];		uint8_t mandatoryPrefix;
// contains the location (for use with the reader) of the prefix byte
uint64_t prefixLocations[0x100];
// The value of the vector extension prefix(EVEX/VEX/XOP), if present		// The value of the vector extension prefix(EVEX/VEX/XOP), if present
uint8_t vectorExtensionPrefix[4];		uint8_t vectorExtensionPrefix[4];
// The type of the vector extension prefix		// The type of the vector extension prefix
VectorExtensionType vectorExtensionType;		VectorExtensionType vectorExtensionType;
// The value of the REX prefix, if present		// The value of the REX prefix, if present
uint8_t rexPrefix;		uint8_t rexPrefix;
// The location where a mandatory prefix would have to be (i.e., right before
// the opcode, or right before the REX prefix if one is present).
uint64_t necessaryPrefixLocation;
// The segment override type		// The segment override type
SegmentOverride segmentOverride;		SegmentOverride segmentOverride;
// 1 if the prefix byte, 0xf2 or 0xf3 is xacquire or xrelease		// 1 if the prefix byte, 0xf2 or 0xf3 is xacquire or xrelease
bool xAcquireRelease;		bool xAcquireRelease;

		// Address-size override
		bool hasAdSize;
		// Operand-size override
		bool hasOpSize;
		// The repeat prefix if any
		uint8_t repeatPrefix;

// Sizes of various critical pieces of data, in bytes		// Sizes of various critical pieces of data, in bytes
uint8_t registerSize;		uint8_t registerSize;
uint8_t addressSize;		uint8_t addressSize;
uint8_t displacementSize;		uint8_t displacementSize;
uint8_t immediateSize;		uint8_t immediateSize;

// Offsets from the start of the instruction to the pieces of data, which is		// Offsets from the start of the instruction to the pieces of data, which is
// needed to find relocation entries for adding symbolic operands.		// needed to find relocation entries for adding symbolic operands.
▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

lib/Target/X86/Disassembler/X86DisassemblerDecoder.cpp

Show First 20 Lines • Show All 271 Lines • ▼ Show 20 Lines	static void dbgprintf(struct InternalInstruction* insn,

va_start(ap, format);		va_start(ap, format);
(void)vsnprintf(buffer, sizeof(buffer), format, ap);		(void)vsnprintf(buffer, sizeof(buffer), format, ap);
va_end(ap);		va_end(ap);

insn->dlog(insn->dlogArg, buffer);		insn->dlog(insn->dlogArg, buffer);
}		}

		static bool isREX(struct InternalInstruction *insn, uint8_t prefix) {
		if (insn->mode == MODE_64BIT)
		return prefix >= 0x40 && prefix <= 0x4f;
		return false;
		}

/*		/*
* setPrefixPresent - Marks that a particular prefix is present at a particular		* setPrefixPresent - Marks that a particular prefix is present as mandatory
* location.
*		*
* @param insn - The instruction to be marked as having the prefix.		* @param insn - The instruction to be marked as having the prefix.
* @param prefix - The prefix that is present.		* @param prefix - The prefix that is present.
* @param location - The location where the prefix is located (in the address
* space of the instruction's reader).
*/		*/
static void setPrefixPresent(struct InternalInstruction* insn,		static void setPrefixPresent(struct InternalInstruction *insn, uint8_t prefix) {
uint8_t prefix,		uint8_t nextByte;
uint64_t location)		switch (prefix) {
{		case 0xf2:
insn->prefixPresent[prefix] = 1;		case 0xf3:
insn->prefixLocations[prefix] = location;		if (lookAtByte(insn, &nextByte))
		break;
		// TODO:
		// 1. There could be several 0x66
		// 2. if (nextByte == 0x66) and nextNextByte != 0x0f then
		// it's not mandatory prefix
		// 3. if (nextByte >= 0x40 && nextByte <= 0x4f) it's REX and we need
		// 0x0f exactly after it to be mandatory prefix
		if (isREX(insn, nextByte) \|\| nextByte == 0x0f \|\| nextByte == 0x66)
		// The last of 0xf2 /0xf3 is mandatory prefix
		insn->mandatoryPrefix = prefix;
		insn->repeatPrefix = prefix;
		break;
		case 0x66:
		if (lookAtByte(insn, &nextByte))
		break;
		// 0x66 can't overwrite existing mandatory prefix and should be ignored
		if (!insn->mandatoryPrefix && (nextByte == 0x0f \|\| isREX(insn, nextByte)))
		insn->mandatoryPrefix = prefix;
		break;
}		}

/*
* isPrefixAtLocation - Queries an instruction to determine whether a prefix is
* present at a given location.
*
* @param insn - The instruction to be queried.
* @param prefix - The prefix.
* @param location - The location to query.
* @return - Whether the prefix is at that location.
*/
static bool isPrefixAtLocation(struct InternalInstruction* insn,
uint8_t prefix,
uint64_t location)
{
return insn->prefixPresent[prefix] == 1 &&
insn->prefixLocations[prefix] == location;
}		}

/*		/*
* readPrefixes - Consumes all of an instruction's prefix bytes, and marks the		* readPrefixes - Consumes all of an instruction's prefix bytes, and marks the
* instruction as having them. Also sets the instruction's default operand,		* instruction as having them. Also sets the instruction's default operand,
* address, and other relevant data sizes to report operands correctly.		* address, and other relevant data sizes to report operands correctly.
*		*
* @param insn - The instruction whose prefixes are to be read.		* @param insn - The instruction whose prefixes are to be read.
* @return - 0 if the instruction could be read until the end of the prefix		* @return - 0 if the instruction could be read until the end of the prefix
* bytes, and no prefixes conflicted; nonzero otherwise.		* bytes, and no prefixes conflicted; nonzero otherwise.
*/		*/
static int readPrefixes(struct InternalInstruction* insn) {		static int readPrefixes(struct InternalInstruction* insn) {
bool isPrefix = true;		bool isPrefix = true;
bool prefixGroups[4] = { false };
uint64_t prefixLocation;
uint8_t byte = 0;		uint8_t byte = 0;
uint8_t nextByte;		uint8_t nextByte;

bool hasAdSize = false;
bool hasOpSize = false;

dbgprintf(insn, "readPrefixes()");		dbgprintf(insn, "readPrefixes()");

while (isPrefix) {		while (isPrefix) {
prefixLocation = insn->readerCursor;

/* If we fail reading prefixes, just stop here and let the opcode reader deal with it */		/* If we fail reading prefixes, just stop here and let the opcode reader deal with it */
if (consumeByte(insn, &byte))		if (consumeByte(insn, &byte))
break;		break;

/*		/*
* If the byte is a LOCK/REP/REPNE prefix and not a part of the opcode, then		* If the byte is a LOCK/REP/REPNE prefix and not a part of the opcode, then
* break and let it be disassembled as a normal "instruction".		* break and let it be disassembled as a normal "instruction".
*/		*/
if (insn->readerCursor - 1 == insn->startLocation && byte == 0xf0)		if (insn->readerCursor - 1 == insn->startLocation && byte == 0xf0) // LOCK
break;		break;

if (insn->readerCursor - 1 == insn->startLocation		if ((byte == 0xf2 \|\| byte == 0xf3) && !lookAtByte(insn, &nextByte)) {
&& (byte == 0xf2 \|\| byte == 0xf3)
&& !lookAtByte(insn, &nextByte))
{
/*		/*
* If the byte is 0xf2 or 0xf3, and any of the following conditions are		* If the byte is 0xf2 or 0xf3, and any of the following conditions are
* met:		* met:
* - it is followed by a LOCK (0xf0) prefix		* - it is followed by a LOCK (0xf0) prefix
* - it is followed by an xchg instruction		* - it is followed by an xchg instruction
* then it should be disassembled as a xacquire/xrelease not repne/rep.		* then it should be disassembled as a xacquire/xrelease not repne/rep.
*/		*/
if ((byte == 0xf2 \|\| byte == 0xf3) &&		if (((nextByte == 0xf0) \|\|
((nextByte == 0xf0) \|\|		((nextByte & 0xfe) == 0x86 \|\| (nextByte & 0xf8) == 0x90))) {
((nextByte & 0xfe) == 0x86 \|\| (nextByte & 0xf8) == 0x90)))
insn->xAcquireRelease = true;		insn->xAcquireRelease = true;
		if (!(byte == 0xf3 && nextByte == 0x90)) // PAUSE instruction support
		break;
		}
/*		/*
* Also if the byte is 0xf3, and the following condition is met:		* Also if the byte is 0xf3, and the following condition is met:
* - it is followed by a "mov mem, reg" (opcode 0x88/0x89) or		* - it is followed by a "mov mem, reg" (opcode 0x88/0x89) or
* "mov mem, imm" (opcode 0xc6/0xc7) instructions.		* "mov mem, imm" (opcode 0xc6/0xc7) instructions.
* then it should be disassembled as an xrelease not rep.		* then it should be disassembled as an xrelease not rep.
*/		*/
if (byte == 0xf3 &&		if (byte == 0xf3 && (nextByte == 0x88 \|\| nextByte == 0x89 \|\|
(nextByte == 0x88 \|\| nextByte == 0x89 \|\|		nextByte == 0xc6 \|\| nextByte == 0xc7)) {
nextByte == 0xc6 \|\| nextByte == 0xc7))
insn->xAcquireRelease = true;		insn->xAcquireRelease = true;
if (insn->mode == MODE_64BIT && (nextByte & 0xf0) == 0x40) {		if (nextByte != 0x90) // PAUSE instruction support
if (consumeByte(insn, &nextByte))		break;
		}
		if (isREX(insn, nextByte)) {
		uint8_t nnextByte;
		// Go to REX prefix after the current one
		if (consumeByte(insn, &nnextByte))
return -1;		return -1;
if (lookAtByte(insn, &nextByte))		// We should be able to read next byte after REX prefix
		if (lookAtByte(insn, &nnextByte))
return -1;		return -1;
unconsumeByte(insn);		unconsumeByte(insn);
}		}
if (nextByte != 0x0f && nextByte != 0x90)
break;
}		}

switch (byte) {		switch (byte) {
case 0xf0: /* LOCK */		case 0xf0: /* LOCK */
case 0xf2: /* REPNE/REPNZ */		case 0xf2: /* REPNE/REPNZ */
case 0xf3: /* REP or REPE/REPZ */		case 0xf3: /* REP or REPE/REPZ */
if (prefixGroups[0])		setPrefixPresent(insn, byte);
dbgprintf(insn, "Redundant Group 1 prefix");
prefixGroups[0] = true;
setPrefixPresent(insn, byte, prefixLocation);
break;		break;
case 0x2e: /* CS segment override -OR- Branch not taken */		case 0x2e: /* CS segment override -OR- Branch not taken */
case 0x36: /* SS segment override -OR- Branch taken */		case 0x36: /* SS segment override -OR- Branch taken */
case 0x3e: /* DS segment override */		case 0x3e: /* DS segment override */
case 0x26: /* ES segment override */		case 0x26: /* ES segment override */
case 0x64: /* FS segment override */		case 0x64: /* FS segment override */
case 0x65: /* GS segment override */		case 0x65: /* GS segment override */
switch (byte) {		switch (byte) {
Show All 14 Lines	case 0x65: /* GS segment override */
break;		break;
case 0x65:		case 0x65:
insn->segmentOverride = SEG_OVERRIDE_GS;		insn->segmentOverride = SEG_OVERRIDE_GS;
break;		break;
default:		default:
debug("Unhandled override");		debug("Unhandled override");
return -1;		return -1;
}		}
if (prefixGroups[1])		setPrefixPresent(insn, byte);
dbgprintf(insn, "Redundant Group 2 prefix");
prefixGroups[1] = true;
setPrefixPresent(insn, byte, prefixLocation);
break;		break;
case 0x66: /* Operand-size override */		case 0x66: /* Operand-size override */
if (prefixGroups[2])		insn->hasOpSize = true;
dbgprintf(insn, "Redundant Group 3 prefix");		setPrefixPresent(insn, byte);
prefixGroups[2] = true;
hasOpSize = true;
setPrefixPresent(insn, byte, prefixLocation);
break;		break;
case 0x67: /* Address-size override */		case 0x67: /* Address-size override */
if (prefixGroups[3])		insn->hasAdSize = true;
dbgprintf(insn, "Redundant Group 4 prefix");		setPrefixPresent(insn, byte);
prefixGroups[3] = true;
hasAdSize = true;
setPrefixPresent(insn, byte, prefixLocation);
break;		break;
default: /* Not a prefix byte */		default: /* Not a prefix byte */
isPrefix = false;		isPrefix = false;
break;		break;
}		}

if (isPrefix)		if (isPrefix)
dbgprintf(insn, "Found prefix 0x%hhx", byte);		dbgprintf(insn, "Found prefix 0x%hhx", byte);
Show All 15 Lines	if (byte == 0x62) {
}		}

if ((insn->mode == MODE_64BIT \|\| (byte1 & 0xc0) == 0xc0) &&		if ((insn->mode == MODE_64BIT \|\| (byte1 & 0xc0) == 0xc0) &&
((~byte1 & 0xc) == 0xc) && ((byte2 & 0x4) == 0x4)) {		((~byte1 & 0xc) == 0xc) && ((byte2 & 0x4) == 0x4)) {
insn->vectorExtensionType = TYPE_EVEX;		insn->vectorExtensionType = TYPE_EVEX;
} else {		} else {
unconsumeByte(insn); /* unconsume byte1 */		unconsumeByte(insn); /* unconsume byte1 */
unconsumeByte(insn); /* unconsume byte */		unconsumeByte(insn); /* unconsume byte */
insn->necessaryPrefixLocation = insn->readerCursor - 2;
}		}

if (insn->vectorExtensionType == TYPE_EVEX) {		if (insn->vectorExtensionType == TYPE_EVEX) {
insn->vectorExtensionPrefix[0] = byte;		insn->vectorExtensionPrefix[0] = byte;
insn->vectorExtensionPrefix[1] = byte1;		insn->vectorExtensionPrefix[1] = byte1;
if (consumeByte(insn, &insn->vectorExtensionPrefix[2])) {		if (consumeByte(insn, &insn->vectorExtensionPrefix[2])) {
dbgprintf(insn, "Couldn't read third byte of EVEX prefix");		dbgprintf(insn, "Couldn't read third byte of EVEX prefix");
return -1;		return -1;
Show All 19 Lines	static int readPrefixes(struct InternalInstruction* insn) {
} else if (byte == 0xc4) {		} else if (byte == 0xc4) {
uint8_t byte1;		uint8_t byte1;

if (lookAtByte(insn, &byte1)) {		if (lookAtByte(insn, &byte1)) {
dbgprintf(insn, "Couldn't read second byte of VEX");		dbgprintf(insn, "Couldn't read second byte of VEX");
return -1;		return -1;
}		}

if (insn->mode == MODE_64BIT \|\| (byte1 & 0xc0) == 0xc0) {		if (insn->mode == MODE_64BIT \|\| (byte1 & 0xc0) == 0xc0)
insn->vectorExtensionType = TYPE_VEX_3B;		insn->vectorExtensionType = TYPE_VEX_3B;
insn->necessaryPrefixLocation = insn->readerCursor - 1;		else
} else {
unconsumeByte(insn);		unconsumeByte(insn);
insn->necessaryPrefixLocation = insn->readerCursor - 1;
}

if (insn->vectorExtensionType == TYPE_VEX_3B) {		if (insn->vectorExtensionType == TYPE_VEX_3B) {
insn->vectorExtensionPrefix[0] = byte;		insn->vectorExtensionPrefix[0] = byte;
consumeByte(insn, &insn->vectorExtensionPrefix[1]);		consumeByte(insn, &insn->vectorExtensionPrefix[1]);
consumeByte(insn, &insn->vectorExtensionPrefix[2]);		consumeByte(insn, &insn->vectorExtensionPrefix[2]);

/* We simulate the REX prefix for simplicity's sake */		/* We simulate the REX prefix for simplicity's sake */

if (insn->mode == MODE_64BIT) {		if (insn->mode == MODE_64BIT)
insn->rexPrefix = 0x40		insn->rexPrefix = 0x40
\| (wFromVEX3of3(insn->vectorExtensionPrefix[2]) << 3)		\| (wFromVEX3of3(insn->vectorExtensionPrefix[2]) << 3)
\| (rFromVEX2of3(insn->vectorExtensionPrefix[1]) << 2)		\| (rFromVEX2of3(insn->vectorExtensionPrefix[1]) << 2)
\| (xFromVEX2of3(insn->vectorExtensionPrefix[1]) << 1)		\| (xFromVEX2of3(insn->vectorExtensionPrefix[1]) << 1)
\| (bFromVEX2of3(insn->vectorExtensionPrefix[1]) << 0);		\| (bFromVEX2of3(insn->vectorExtensionPrefix[1]) << 0);
}

dbgprintf(insn, "Found VEX prefix 0x%hhx 0x%hhx 0x%hhx",		dbgprintf(insn, "Found VEX prefix 0x%hhx 0x%hhx 0x%hhx",
insn->vectorExtensionPrefix[0], insn->vectorExtensionPrefix[1],		insn->vectorExtensionPrefix[0], insn->vectorExtensionPrefix[1],
insn->vectorExtensionPrefix[2]);		insn->vectorExtensionPrefix[2]);
}		}
} else if (byte == 0xc5) {		} else if (byte == 0xc5) {
uint8_t byte1;		uint8_t byte1;

if (lookAtByte(insn, &byte1)) {		if (lookAtByte(insn, &byte1)) {
dbgprintf(insn, "Couldn't read second byte of VEX");		dbgprintf(insn, "Couldn't read second byte of VEX");
return -1;		return -1;
}		}

if (insn->mode == MODE_64BIT \|\| (byte1 & 0xc0) == 0xc0) {		if (insn->mode == MODE_64BIT \|\| (byte1 & 0xc0) == 0xc0)
insn->vectorExtensionType = TYPE_VEX_2B;		insn->vectorExtensionType = TYPE_VEX_2B;
} else {		else
unconsumeByte(insn);		unconsumeByte(insn);
}

if (insn->vectorExtensionType == TYPE_VEX_2B) {		if (insn->vectorExtensionType == TYPE_VEX_2B) {
insn->vectorExtensionPrefix[0] = byte;		insn->vectorExtensionPrefix[0] = byte;
consumeByte(insn, &insn->vectorExtensionPrefix[1]);		consumeByte(insn, &insn->vectorExtensionPrefix[1]);

if (insn->mode == MODE_64BIT) {		if (insn->mode == MODE_64BIT)
insn->rexPrefix = 0x40		insn->rexPrefix = 0x40
\| (rFromVEX2of2(insn->vectorExtensionPrefix[1]) << 2);		\| (rFromVEX2of2(insn->vectorExtensionPrefix[1]) << 2);
}

switch (ppFromVEX2of2(insn->vectorExtensionPrefix[1])) {		switch (ppFromVEX2of2(insn->vectorExtensionPrefix[1])) {
default:		default:
break;		break;
case VEX_PREFIX_66:		case VEX_PREFIX_66:
hasOpSize = true;		insn->hasOpSize = true;
break;		break;
}		}

dbgprintf(insn, "Found VEX prefix 0x%hhx 0x%hhx",		dbgprintf(insn, "Found VEX prefix 0x%hhx 0x%hhx",
insn->vectorExtensionPrefix[0],		insn->vectorExtensionPrefix[0],
insn->vectorExtensionPrefix[1]);		insn->vectorExtensionPrefix[1]);
}		}
} else if (byte == 0x8f) {		} else if (byte == 0x8f) {
uint8_t byte1;		uint8_t byte1;

if (lookAtByte(insn, &byte1)) {		if (lookAtByte(insn, &byte1)) {
dbgprintf(insn, "Couldn't read second byte of XOP");		dbgprintf(insn, "Couldn't read second byte of XOP");
return -1;		return -1;
}		}

if ((byte1 & 0x38) != 0x0) { /* 0 in these 3 bits is a POP instruction. */		if ((byte1 & 0x38) != 0x0) /* 0 in these 3 bits is a POP instruction. */
insn->vectorExtensionType = TYPE_XOP;		insn->vectorExtensionType = TYPE_XOP;
insn->necessaryPrefixLocation = insn->readerCursor - 1;		else
} else {
unconsumeByte(insn);		unconsumeByte(insn);
insn->necessaryPrefixLocation = insn->readerCursor - 1;
}

if (insn->vectorExtensionType == TYPE_XOP) {		if (insn->vectorExtensionType == TYPE_XOP) {
insn->vectorExtensionPrefix[0] = byte;		insn->vectorExtensionPrefix[0] = byte;
consumeByte(insn, &insn->vectorExtensionPrefix[1]);		consumeByte(insn, &insn->vectorExtensionPrefix[1]);
consumeByte(insn, &insn->vectorExtensionPrefix[2]);		consumeByte(insn, &insn->vectorExtensionPrefix[2]);

/* We simulate the REX prefix for simplicity's sake */		/* We simulate the REX prefix for simplicity's sake */

if (insn->mode == MODE_64BIT) {		if (insn->mode == MODE_64BIT)
insn->rexPrefix = 0x40		insn->rexPrefix = 0x40
\| (wFromXOP3of3(insn->vectorExtensionPrefix[2]) << 3)		\| (wFromXOP3of3(insn->vectorExtensionPrefix[2]) << 3)
\| (rFromXOP2of3(insn->vectorExtensionPrefix[1]) << 2)		\| (rFromXOP2of3(insn->vectorExtensionPrefix[1]) << 2)
\| (xFromXOP2of3(insn->vectorExtensionPrefix[1]) << 1)		\| (xFromXOP2of3(insn->vectorExtensionPrefix[1]) << 1)
\| (bFromXOP2of3(insn->vectorExtensionPrefix[1]) << 0);		\| (bFromXOP2of3(insn->vectorExtensionPrefix[1]) << 0);
}

switch (ppFromXOP3of3(insn->vectorExtensionPrefix[2])) {		switch (ppFromXOP3of3(insn->vectorExtensionPrefix[2])) {
default:		default:
break;		break;
case VEX_PREFIX_66:		case VEX_PREFIX_66:
hasOpSize = true;		insn->hasOpSize = true;
break;		break;
}		}

dbgprintf(insn, "Found XOP prefix 0x%hhx 0x%hhx 0x%hhx",		dbgprintf(insn, "Found XOP prefix 0x%hhx 0x%hhx 0x%hhx",
insn->vectorExtensionPrefix[0], insn->vectorExtensionPrefix[1],		insn->vectorExtensionPrefix[0], insn->vectorExtensionPrefix[1],
insn->vectorExtensionPrefix[2]);		insn->vectorExtensionPrefix[2]);
}		}
} else {		} else if (isREX(insn, byte)) {
if (insn->mode == MODE_64BIT) {		if (lookAtByte(insn, &nextByte))
if ((byte & 0xf0) == 0x40) {
uint8_t opcodeByte;

if (lookAtByte(insn, &opcodeByte) \|\| ((opcodeByte & 0xf0) == 0x40)) {
dbgprintf(insn, "Redundant REX prefix");
return -1;		return -1;
}

insn->rexPrefix = byte;		insn->rexPrefix = byte;
insn->necessaryPrefixLocation = insn->readerCursor - 2;

dbgprintf(insn, "Found REX prefix 0x%hhx", byte);		dbgprintf(insn, "Found REX prefix 0x%hhx", byte);
} else {		} else
unconsumeByte(insn);		unconsumeByte(insn);
insn->necessaryPrefixLocation = insn->readerCursor - 1;
}
} else {
unconsumeByte(insn);
insn->necessaryPrefixLocation = insn->readerCursor - 1;
}
}

if (insn->mode == MODE_16BIT) {		if (insn->mode == MODE_16BIT) {
insn->registerSize = (hasOpSize ? 4 : 2);		insn->registerSize = (insn->hasOpSize ? 4 : 2);
insn->addressSize = (hasAdSize ? 4 : 2);		insn->addressSize = (insn->hasAdSize ? 4 : 2);
insn->displacementSize = (hasAdSize ? 4 : 2);		insn->displacementSize = (insn->hasAdSize ? 4 : 2);
insn->immediateSize = (hasOpSize ? 4 : 2);		insn->immediateSize = (insn->hasOpSize ? 4 : 2);
} else if (insn->mode == MODE_32BIT) {		} else if (insn->mode == MODE_32BIT) {
insn->registerSize = (hasOpSize ? 2 : 4);		insn->registerSize = (insn->hasOpSize ? 2 : 4);
insn->addressSize = (hasAdSize ? 2 : 4);		insn->addressSize = (insn->hasAdSize ? 2 : 4);
insn->displacementSize = (hasAdSize ? 2 : 4);		insn->displacementSize = (insn->hasAdSize ? 2 : 4);
insn->immediateSize = (hasOpSize ? 2 : 4);		insn->immediateSize = (insn->hasOpSize ? 2 : 4);
} else if (insn->mode == MODE_64BIT) {		} else if (insn->mode == MODE_64BIT) {
if (insn->rexPrefix && wFromREX(insn->rexPrefix)) {		if (insn->rexPrefix && wFromREX(insn->rexPrefix)) {
insn->registerSize = 8;		insn->registerSize = 8;
insn->addressSize = (hasAdSize ? 4 : 8);		insn->addressSize = (insn->hasAdSize ? 4 : 8);
insn->displacementSize = 4;		insn->displacementSize = 4;
insn->immediateSize = 4;		insn->immediateSize = 4;
} else {		} else {
insn->registerSize = (hasOpSize ? 2 : 4);		insn->registerSize = (insn->hasOpSize ? 2 : 4);
insn->addressSize = (hasAdSize ? 4 : 8);		insn->addressSize = (insn->hasAdSize ? 4 : 8);
insn->displacementSize = (hasOpSize ? 2 : 4);		insn->displacementSize = (insn->hasOpSize ? 2 : 4);
insn->immediateSize = (hasOpSize ? 2 : 4);		insn->immediateSize = (insn->hasOpSize ? 2 : 4);
}		}
}		}

return 0;		return 0;
}		}

/*		/*
* readOpcode - Reads the opcode (excepting the ModR/M byte in the case of		* readOpcode - Reads the opcode (excepting the ModR/M byte in the case of
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	if (current == 0x38) {
return -1;		return -1;

insn->opcodeType = THREEBYTE_3A;		insn->opcodeType = THREEBYTE_3A;
} else {		} else {
dbgprintf(insn, "Didn't find a three-byte escape prefix");		dbgprintf(insn, "Didn't find a three-byte escape prefix");

insn->opcodeType = TWOBYTE;		insn->opcodeType = TWOBYTE;
}		}
}		} else if (insn->mandatoryPrefix)
		// The opcode with mandatory prefix must start with opcode escape.
		// If not it's legacy repeat prefix
		insn->mandatoryPrefix = 0;

/*		/*
* At this point we have consumed the full opcode.		* At this point we have consumed the full opcode.
* Anything we consume from here on must be unconsumed.		* Anything we consume from here on must be unconsumed.
*/		*/

insn->opcode = current;		insn->opcode = current;

▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	if (insn->vectorExtensionType == TYPE_EVEX) {
break;		break;
}		}

if (lFromXOP3of3(insn->vectorExtensionPrefix[2]))		if (lFromXOP3of3(insn->vectorExtensionPrefix[2]))
attrMask \|= ATTR_VEXL;		attrMask \|= ATTR_VEXL;
} else {		} else {
return -1;		return -1;
}		}
} else {		} else if (!insn->mandatoryPrefix) {
if (insn->mode != MODE_16BIT && isPrefixAtLocation(insn, 0x66, insn->necessaryPrefixLocation))		// If we don't have mandatory prefix we should use legacy prefixes here
		if (insn->hasOpSize && (insn->mode != MODE_16BIT))
attrMask \|= ATTR_OPSIZE;		attrMask \|= ATTR_OPSIZE;
else if (isPrefixAtLocation(insn, 0x67, insn->necessaryPrefixLocation))		if (insn->hasAdSize)
attrMask \|= ATTR_ADSIZE;		attrMask \|= ATTR_ADSIZE;
else if (isPrefixAtLocation(insn, 0xf3, insn->necessaryPrefixLocation))		if (insn->opcodeType == ONEBYTE) {
		if (insn->repeatPrefix == 0xf3 && (insn->opcode == 0x90))
		// Special support for PAUSE
attrMask \|= ATTR_XS;		attrMask \|= ATTR_XS;
else if (isPrefixAtLocation(insn, 0xf2, insn->necessaryPrefixLocation))		} else {
		if (insn->repeatPrefix == 0xf2)
		attrMask \|= ATTR_XD;
		else if (insn->repeatPrefix == 0xf3)
		attrMask \|= ATTR_XS;
		}
		} else {
		switch (insn->mandatoryPrefix) {
		case 0xf2:
attrMask \|= ATTR_XD;		attrMask \|= ATTR_XD;
		break;
		case 0xf3:
		attrMask \|= ATTR_XS;
		break;
		case 0x66:
		if (insn->mode != MODE_16BIT)
		attrMask \|= ATTR_OPSIZE;
		break;
		case 0x67:
		attrMask \|= ATTR_ADSIZE;
		break;
		}
}		}

if (insn->rexPrefix & 0x08)		if (insn->rexPrefix & 0x08)
attrMask \|= ATTR_REXW;		attrMask \|= ATTR_REXW;

/*		/*
* JCXZ/JECXZ need special handling for 16-bit mode because the meaning		* JCXZ/JECXZ need special handling for 16-bit mode because the meaning
* of the AdSize prefix is inverted w.r.t. 32-bit mode.		* of the AdSize prefix is inverted w.r.t. 32-bit mode.
*/		*/
if (insn->mode == MODE_16BIT && insn->opcodeType == ONEBYTE &&		if (insn->mode == MODE_16BIT && insn->opcodeType == ONEBYTE &&
insn->opcode == 0xE3)		insn->opcode == 0xE3)
attrMask ^= ATTR_ADSIZE;		attrMask ^= ATTR_ADSIZE;

/*		/*
* In 64-bit mode all f64 superscripted opcodes ignore opcode size prefix		* In 64-bit mode all f64 superscripted opcodes ignore opcode size prefix
* CALL/JMP/JCC instructions need to ignore 0x66 and consume 4 bytes		* CALL/JMP/JCC instructions need to ignore 0x66 and consume 4 bytes
*/		*/

if (insn->mode == MODE_64BIT &&		if ((insn->mode == MODE_64BIT) && insn->hasOpSize) {
isPrefixAtLocation(insn, 0x66, insn->necessaryPrefixLocation)) {
switch (insn->opcode) {		switch (insn->opcode) {
case 0xE8:		case 0xE8:
case 0xE9:		case 0xE9:
// Take care of psubsb and other mmx instructions.		// Take care of psubsb and other mmx instructions.
if (insn->opcodeType == ONEBYTE) {		if (insn->opcodeType == ONEBYTE) {
attrMask ^= ATTR_OPSIZE;		attrMask ^= ATTR_OPSIZE;
insn->immediateSize = 4;		insn->immediateSize = 4;
insn->displacementSize = 4;		insn->displacementSize = 4;
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	static int getID(struct InternalInstruction* insn, const void *miiArg) {
* Absolute moves need special handling.		* Absolute moves need special handling.
* -For 16-bit mode because the meaning of the AdSize and OpSize prefixes are		* -For 16-bit mode because the meaning of the AdSize and OpSize prefixes are
* inverted w.r.t.		* inverted w.r.t.
* -For 32-bit mode we need to ensure the ADSIZE prefix is observed in		* -For 32-bit mode we need to ensure the ADSIZE prefix is observed in
* any position.		* any position.
*/		*/
if (insn->opcodeType == ONEBYTE && ((insn->opcode & 0xFC) == 0xA0)) {		if (insn->opcodeType == ONEBYTE && ((insn->opcode & 0xFC) == 0xA0)) {
/* Make sure we observed the prefixes in any position. */		/* Make sure we observed the prefixes in any position. */
if (insn->prefixPresent[0x67])		if (insn->hasAdSize)
attrMask \|= ATTR_ADSIZE;		attrMask \|= ATTR_ADSIZE;
if (insn->prefixPresent[0x66])		if (insn->hasOpSize)
attrMask \|= ATTR_OPSIZE;		attrMask \|= ATTR_OPSIZE;

/* In 16-bit, invert the attributes. */		/* In 16-bit, invert the attributes. */
if (insn->mode == MODE_16BIT)		if (insn->mode == MODE_16BIT)
attrMask ^= ATTR_ADSIZE \| ATTR_OPSIZE;		attrMask ^= ATTR_ADSIZE \| ATTR_OPSIZE;

if (getIDWithAttrMask(&instructionID, insn, attrMask))		if (getIDWithAttrMask(&instructionID, insn, attrMask))
return -1;		return -1;

insn->instructionID = instructionID;		insn->instructionID = instructionID;
insn->spec = specifierForUID(instructionID);		insn->spec = specifierForUID(instructionID);
return 0;		return 0;
}		}

if ((insn->mode == MODE_16BIT \|\| insn->prefixPresent[0x66]) &&		if ((insn->mode == MODE_16BIT \|\| insn->hasOpSize) &&
!(attrMask & ATTR_OPSIZE)) {		!(attrMask & ATTR_OPSIZE)) {
/*		/*
* The instruction tables make no distinction between instructions that		* The instruction tables make no distinction between instructions that
* allow OpSize anywhere (i.e., 16-bit operations) and that need it in a		* allow OpSize anywhere (i.e., 16-bit operations) and that need it in a
* particular spot (i.e., many MMX operations). In general we're		* particular spot (i.e., many MMX operations). In general we're
* conservative, but in the specific case where OpSize is present but not		* conservative, but in the specific case where OpSize is present but not
* in the right place we check if there's a 16-bit operation.		* in the right place we check if there's a 16-bit operation.
*/		*/
Show All 16 Lines	if (getIDWithAttrMask(&instructionIDWithOpsize,
insn->spec = spec;		insn->spec = spec;
return 0;		return 0;
}		}

specName = GetInstrName(instructionID, miiArg);		specName = GetInstrName(instructionID, miiArg);
specWithOpSizeName = GetInstrName(instructionIDWithOpsize, miiArg);		specWithOpSizeName = GetInstrName(instructionIDWithOpsize, miiArg);

if (is16BitEquivalent(specName.data(), specWithOpSizeName.data()) &&		if (is16BitEquivalent(specName.data(), specWithOpSizeName.data()) &&
(insn->mode == MODE_16BIT) ^ insn->prefixPresent[0x66]) {		(insn->mode == MODE_16BIT) ^ insn->hasOpSize) {
insn->instructionID = instructionIDWithOpsize;		insn->instructionID = instructionIDWithOpsize;
insn->spec = specifierForUID(instructionIDWithOpsize);		insn->spec = specifierForUID(instructionIDWithOpsize);
} else {		} else {
insn->instructionID = instructionID;		insn->instructionID = instructionID;
insn->spec = spec;		insn->spec = spec;
}		}
return 0;		return 0;
}		}
▲ Show 20 Lines • Show All 781 Lines • Show Last 20 Lines

lib/Target/X86/InstPrinter/X86ATTInstPrinter.cpp

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	void X86ATTInstPrinter::printInst(const MCInst *MI, raw_ostream &OS,
const MCInstrDesc &Desc = MII.get(MI->getOpcode());		const MCInstrDesc &Desc = MII.get(MI->getOpcode());
uint64_t TSFlags = Desc.TSFlags;		uint64_t TSFlags = Desc.TSFlags;

// If verbose assembly is enabled, we can print some informative comments.		// If verbose assembly is enabled, we can print some informative comments.
if (CommentStream)		if (CommentStream)
HasCustomInstComment =		HasCustomInstComment =
EmitAnyX86InstComments(MI, *CommentStream, getRegisterName);		EmitAnyX86InstComments(MI, *CommentStream, getRegisterName);

		unsigned Flags = MI->getFlags();
if (TSFlags & X86II::LOCK)		if (TSFlags & X86II::LOCK)
OS << "\tlock\t";		OS << "\tlock\t";
		if (!(TSFlags & X86II::LOCK) && Flags & X86::IP_HAS_LOCK)
		OS << "\tlock\n";

		if (Flags & X86::IP_HAS_REPEAT_NE)
		OS << "\trepne\n";
		else if (Flags & X86::IP_HAS_REPEAT)
		OS << "\trep\n";

// Output CALLpcrel32 as "callq" in 64-bit mode.		// Output CALLpcrel32 as "callq" in 64-bit mode.
// In Intel annotation it's always emitted as "call".		// In Intel annotation it's always emitted as "call".
//		//
// TODO: Probably this hack should be redesigned via InstAlias in		// TODO: Probably this hack should be redesigned via InstAlias in
// InstrInfo.td as soon as Requires clause is supported properly		// InstrInfo.td as soon as Requires clause is supported properly
// for InstAlias.		// for InstAlias.
if (MI->getOpcode() == X86::CALLpcrel32 &&		if (MI->getOpcode() == X86::CALLpcrel32 &&
▲ Show 20 Lines • Show All 252 Lines • Show Last 20 Lines

lib/Target/X86/InstPrinter/X86IntelInstPrinter.cpp

Show All 37 Lines	void X86IntelInstPrinter::printInst(const MCInst *MI, raw_ostream &OS,
StringRef Annot,		StringRef Annot,
const MCSubtargetInfo &STI) {		const MCSubtargetInfo &STI) {
const MCInstrDesc &Desc = MII.get(MI->getOpcode());		const MCInstrDesc &Desc = MII.get(MI->getOpcode());
uint64_t TSFlags = Desc.TSFlags;		uint64_t TSFlags = Desc.TSFlags;

if (TSFlags & X86II::LOCK)		if (TSFlags & X86II::LOCK)
OS << "\tlock\n";		OS << "\tlock\n";

		unsigned Flags = MI->getFlags();
		if (Flags & X86::IP_HAS_REPEAT_NE)
		OS << "\trepne\n";
		else if (Flags & X86::IP_HAS_REPEAT)
		OS << "\trep\n";

printInstruction(MI, OS);		printInstruction(MI, OS);

// Next always print the annotation.		// Next always print the annotation.
printAnnotation(OS, Annot);		printAnnotation(OS, Annot);

// If verbose assembly is enabled, we can print some informative comments.		// If verbose assembly is enabled, we can print some informative comments.
if (CommentStream)		if (CommentStream)
EmitAnyX86InstComments(MI, *CommentStream, getRegisterName);		EmitAnyX86InstComments(MI, *CommentStream, getRegisterName);
▲ Show 20 Lines • Show All 210 Lines • Show Last 20 Lines

lib/Target/X86/MCTargetDesc/X86BaseInfo.h

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	namespace X86 {
/// avx512fintrin.h.		/// avx512fintrin.h.
enum STATIC_ROUNDING {		enum STATIC_ROUNDING {
TO_NEAREST_INT = 0,		TO_NEAREST_INT = 0,
TO_NEG_INF = 1,		TO_NEG_INF = 1,
TO_POS_INF = 2,		TO_POS_INF = 2,
TO_ZERO = 3,		TO_ZERO = 3,
CUR_DIRECTION = 4		CUR_DIRECTION = 4
};		};

		/// The constants to describe instr prefixes if there are
		enum IPREFIXES {
		IP_NO_PREFIX = 0,
		IP_HAS_OP_SIZE = 1,
		IP_HAS_AD_SIZE = 2,
		IP_HAS_REPEAT_NE = 4,
		IP_HAS_REPEAT = 8,
		IP_HAS_LOCK = 16
		};
} // end namespace X86;		} // end namespace X86;

/// X86II - This namespace holds all of the target specific flags that		/// X86II - This namespace holds all of the target specific flags that
/// instruction info tracks.		/// instruction info tracks.
///		///
namespace X86II {		namespace X86II {
/// Target Operand Flag enum.		/// Target Operand Flag enum.
enum TOF {		enum TOF {
▲ Show 20 Lines • Show All 728 Lines • Show Last 20 Lines

lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp

Show First 20 Lines • Show All 1,102 Lines • ▼ Show 20 Lines	bool X86MCCodeEmitter::emitOpcodePrefix(uint64_t TSFlags, unsigned &CurByte,
raw_ostream &OS) const {		raw_ostream &OS) const {
bool Ret = false;		bool Ret = false;
// Emit the operand size opcode prefix as needed.		// Emit the operand size opcode prefix as needed.
if ((TSFlags & X86II::OpSizeMask) == (is16BitMode(STI) ? X86II::OpSize32		if ((TSFlags & X86II::OpSizeMask) == (is16BitMode(STI) ? X86II::OpSize32
: X86II::OpSize16))		: X86II::OpSize16))
EmitByte(0x66, CurByte, OS);		EmitByte(0x66, CurByte, OS);

// Emit the LOCK opcode prefix.		// Emit the LOCK opcode prefix.
if (TSFlags & X86II::LOCK)		if (TSFlags & X86II::LOCK \|\| MI.getFlags() & X86::IP_HAS_LOCK)
EmitByte(0xF0, CurByte, OS);		EmitByte(0xF0, CurByte, OS);

switch (TSFlags & X86II::OpPrefixMask) {		switch (TSFlags & X86II::OpPrefixMask) {
case X86II::PD: // 66		case X86II::PD: // 66
EmitByte(0x66, CurByte, OS);		EmitByte(0x66, CurByte, OS);
break;		break;
case X86II::XS: // F3		case X86II::XS: // F3
EmitByte(0xF3, CurByte, OS);		EmitByte(0xF3, CurByte, OS);
Show All 34 Lines

void X86MCCodeEmitter::		void X86MCCodeEmitter::
encodeInstruction(const MCInst &MI, raw_ostream &OS,		encodeInstruction(const MCInst &MI, raw_ostream &OS,
SmallVectorImpl<MCFixup> &Fixups,		SmallVectorImpl<MCFixup> &Fixups,
const MCSubtargetInfo &STI) const {		const MCSubtargetInfo &STI) const {
unsigned Opcode = MI.getOpcode();		unsigned Opcode = MI.getOpcode();
const MCInstrDesc &Desc = MCII.get(Opcode);		const MCInstrDesc &Desc = MCII.get(Opcode);
uint64_t TSFlags = Desc.TSFlags;		uint64_t TSFlags = Desc.TSFlags;
		unsigned Flags = MI.getFlags();

// Pseudo instructions don't get encoded.		// Pseudo instructions don't get encoded.
if ((TSFlags & X86II::FormMask) == X86II::Pseudo)		if ((TSFlags & X86II::FormMask) == X86II::Pseudo)
return;		return;

unsigned NumOps = Desc.getNumOperands();		unsigned NumOps = Desc.getNumOperands();
unsigned CurOp = X86II::getOperandBias(Desc);		unsigned CurOp = X86II::getOperandBias(Desc);

Show All 19 Lines	encodeInstruction(const MCInst &MI, raw_ostream &OS,
if (MemoryOperand != -1) MemoryOperand += CurOp;		if (MemoryOperand != -1) MemoryOperand += CurOp;

// Emit segment override opcode prefix as needed.		// Emit segment override opcode prefix as needed.
if (MemoryOperand >= 0)		if (MemoryOperand >= 0)
EmitSegmentOverridePrefix(CurByte, MemoryOperand+X86::AddrSegmentReg,		EmitSegmentOverridePrefix(CurByte, MemoryOperand+X86::AddrSegmentReg,
MI, OS);		MI, OS);

// Emit the repeat opcode prefix as needed.		// Emit the repeat opcode prefix as needed.
if (TSFlags & X86II::REP)		if (TSFlags & X86II::REP \|\| Flags & X86::IP_HAS_REPEAT)
EmitByte(0xF3, CurByte, OS);		EmitByte(0xF3, CurByte, OS);
		if (Flags & X86::IP_HAS_REPEAT_NE)
		EmitByte(0xF2, CurByte, OS);

// Emit the address size opcode prefix as needed.		// Emit the address size opcode prefix as needed.
bool need_address_override;		bool need_address_override;
uint64_t AdSize = TSFlags & X86II::AdSizeMask;		uint64_t AdSize = TSFlags & X86II::AdSizeMask;
if ((is16BitMode(STI) && AdSize == X86II::AdSize32) \|\|		if ((is16BitMode(STI) && AdSize == X86II::AdSize32) \|\|
(is32BitMode(STI) && AdSize == X86II::AdSize16) \|\|		(is32BitMode(STI) && AdSize == X86II::AdSize16) \|\|
(is64BitMode(STI) && AdSize == X86II::AdSize32)) {		(is64BitMode(STI) && AdSize == X86II::AdSize32)) {
need_address_override = true;		need_address_override = true;
▲ Show 20 Lines • Show All 329 Lines • Show Last 20 Lines

test/MC/Disassembler/X86/prefixes.txt

	# RUN: llvm-mc --disassemble %s -triple=x86_64 \| FileCheck %s			# RUN: llvm-mc --disassemble %s -triple=x86_64 \| FileCheck %s

				# CHECK: rep
				# CHECK-NEXT: insb %dx, %es:(%rdi)
				0xf3 0x6c #rep ins
				# CHECK: rep
				# CHECK-NEXT: insl %dx, %es:(%rdi)
				0xf3 0x6d #rep ins
				# CHECK: rep
				# CHECK-NEXT: movsb (%rsi), %es:(%rdi)
				0xf3 0xa4 #rep movs
				# CHECK: rep
				# CHECK-NEXT: movsl (%rsi), %es:(%rdi)
				0xf3 0xa5 #rep movs
				# CHECK: rep
				# CHECK-NEXT: outsb (%rsi), %dx
				0xf3 0x6e #rep outs
				# CHECK: rep
				# CHECK-NEXT: outsl (%rsi), %dx
				0xf3 0x6f #rep outs
				# CHECK: rep
				# CHECK-NEXT: lodsb (%rsi), %al
				0xf3 0xac #rep lods
				# CHECK: rep
				# CHECK-NEXT: lodsl (%rsi), %eax
				0xf3 0xad #rep lods
				# CHECK: rep
				# CHECK-NEXT: stosb %al, %es:(%rdi)
				0xf3 0xaa #rep stos
				# CHECK: rep
				# CHECK-NEXT: stosl %eax, %es:(%rdi)
				0xf3 0xab #rep stos
				# CHECK: rep
				# CHECK-NEXT: cmpsb %es:(%rdi), (%rsi)
				0xf3 0xa6 #rep cmps
				# CHECK: rep
				# CHECK-NEXT: cmpsl %es:(%rdi), (%rsi)
				0xf3 0xa7 #repe cmps
				# CHECK: rep
				# CHECK-NEXT: scasb %es:(%rdi), %al
				0xf3 0xae #repe scas
				# CHECK: rep
				# CHECK-NEXT: scasl %es:(%rdi), %eax
				0xf3 0xaf #repe scas
				# CHECK: repne
				# CHECK-NEXT: cmpsb %es:(%rdi), (%rsi)
				0xf2 0xa6 #repne cmps
				# CHECK: repne
				# CHECK-NEXT: cmpsl %es:(%rdi), (%rsi)
				0xf2 0xa7 #repne cmps
				# CHECK: repne
				# CHECK-NEXT: scasb %es:(%rdi), %al
				0xf2 0xae #repne scas
				# CHECK: repne
				# CHECK-NEXT: scasl %es:(%rdi), %eax
				0xf2 0xaf #repne scas

	# CHECK: lock			# CHECK: lock
	# CHECK-NEXT: orl $16, %fs:776			# CHECK-NEXT: orl $16, %fs:776
	0xf0 0x64 0x83 0x0c 0x25 0x08 0x03 0x00 0x00 0x10			0xf0 0x64 0x83 0x0c 0x25 0x08 0x03 0x00 0x00 0x10

	# CHECK: movq %fs:768, %rdi			# CHECK: movq %fs:768, %rdi
	0x64 0x48 0x8b 0x3c 0x25 0x00 0x03 0x00 0x00			0x64 0x48 0x8b 0x3c 0x25 0x00 0x03 0x00 0x00

	# CHECK: rep			# CHECK: rep
	Show All 34 Lines
	0xf0 0x90			0xf0 0x90

	# Test that immediate is printed correctly within opsize prefix			# Test that immediate is printed correctly within opsize prefix
	# CHECK: addw $-12, %ax			# CHECK: addw $-12, %ax
	0x66,0x83,0xc0,0xf4			0x66,0x83,0xc0,0xf4

	# Test that multiple redundant prefixes work (redundant, but valid x86).			# Test that multiple redundant prefixes work (redundant, but valid x86).
	# CHECK: rep			# CHECK: rep
	# CHECK-NEXT: rep
	# CHECK-NEXT: stosq			# CHECK-NEXT: stosq
	0xf3 0xf3 0x48 0xab			0xf3 0xf3 0x48 0xab


	# Test that we can disassembler control registers above CR8			# Test that we can disassembler control registers above CR8
	# CHECK: movq %cr15, %rax			# CHECK: movq %cr15, %rax
	0x44 0x0f 0x20 0xf8			0x44 0x0f 0x20 0xf8
	# CHECK: movq %dr15, %rax			# CHECK: movq %dr15, %rax
	Show All 13 Lines

test/MC/Disassembler/X86/x86-64.txt

	Show First 20 Lines • Show All 480 Lines • ▼ Show 20 Lines
	# CHECK: lwpval $2309737967, (%esp), %edx			# CHECK: lwpval $2309737967, (%esp), %edx
	0x67 0x8f 0xea 0x68 0x12 0x0c 0x24 0xef 0xcd 0xab 0x89			0x67 0x8f 0xea 0x68 0x12 0x0c 0x24 0xef 0xcd 0xab 0x89

	# CHECK: nopq -559038737(%rbx,%rcx,8)			# CHECK: nopq -559038737(%rbx,%rcx,8)
	0x48 0x0f 0x1f 0x84 0xcb 0xef 0xbe 0xad 0xde			0x48 0x0f 0x1f 0x84 0xcb 0xef 0xbe 0xad 0xde

	# CHECK: nopq %rax			# CHECK: nopq %rax
	0x48 0x0f 0x1f 0xC0			0x48 0x0f 0x1f 0xC0

				# CHECK: xchgw %di, %ax
				0x66 0x3e 0x97

test/MC/X86/intel-syntax-encoding.s

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	// CHECK: encoding: [0x66,0x83,0xf8,0xf4]
cmp ax, -12		cmp ax, -12
// CHECK: encoding: [0x83,0xf8,0xf4]		// CHECK: encoding: [0x83,0xf8,0xf4]
cmp eax, -12		cmp eax, -12
// CHECK: encoding: [0x48,0x83,0xf8,0xf4]		// CHECK: encoding: [0x48,0x83,0xf8,0xf4]
cmp rax, -12		cmp rax, -12

acquire lock add [rax], rax		acquire lock add [rax], rax
// CHECK: encoding: [0xf2]		// CHECK: encoding: [0xf2]
// CHECK: encoding: [0xf0]		// CHECK: encoding: [0xf0,0x48,0x01,0x00]
// CHECK: encoding: [0x48,0x01,0x00]
release lock add [rax], rax		release lock add [rax], rax
// CHECK: encoding: [0xf3]		// CHECK: encoding: [0xf3]
// CHECK: encoding: [0xf0]		// CHECK: encoding: [0xf0,0x48,0x01,0x00]
// CHECK: encoding: [0x48,0x01,0x00]

// CHECK: encoding: [0x9c]		// CHECK: encoding: [0x9c]
// CHECK: encoding: [0x9d]		// CHECK: encoding: [0x9d]
pushf		pushf
popf		popf

LBB0_3:		LBB0_3:
// CHECK: encoding: [0xeb,A]		// CHECK: encoding: [0xeb,A]
Show All 27 Lines

test/MC/X86/x86-64.s

	Show First 20 Lines • Show All 896 Lines • ▼ Show 20 Lines
	// rdar://8741045			// rdar://8741045
	lock/incl 1(%rsp)			lock/incl 1(%rsp)
	// CHECK: lock			// CHECK: lock
	// CHECK: incl 1(%rsp)			// CHECK: incl 1(%rsp)


	lock addq %rsi, (%rdi)			lock addq %rsi, (%rdi)
	// CHECK: lock			// CHECK: lock
	// CHECK: encoding: [0xf0]
	// CHECK: addq %rsi, (%rdi)			// CHECK: addq %rsi, (%rdi)
	// CHECK: encoding: [0x48,0x01,0x37]			// CHECK: encoding: [0xf0,0x48,0x01,0x37]

	lock subq %rsi, (%rdi)			lock subq %rsi, (%rdi)
	// CHECK: lock			// CHECK: lock
	// CHECK: encoding: [0xf0]
	// CHECK: subq %rsi, (%rdi)			// CHECK: subq %rsi, (%rdi)
	// CHECK: encoding: [0x48,0x29,0x37]			// CHECK: encoding: [0xf0,0x48,0x29,0x37]

	lock andq %rsi, (%rdi)			lock andq %rsi, (%rdi)
	// CHECK: lock			// CHECK: lock
	// CHECK: encoding: [0xf0]
	// CHECK: andq %rsi, (%rdi)			// CHECK: andq %rsi, (%rdi)
	// CHECK: encoding: [0x48,0x21,0x37]			// CHECK: encoding: [0xf0,0x48,0x21,0x37]

	lock orq %rsi, (%rdi)			lock orq %rsi, (%rdi)
	// CHECK: lock			// CHECK: lock
	// CHECK: encoding: [0xf0]
	// CHECK: orq %rsi, (%rdi)			// CHECK: orq %rsi, (%rdi)
	// CHECK: encoding: [0x48,0x09,0x37]			// CHECK: encoding: [0xf0,0x48,0x09,0x37]

	lock xorq %rsi, (%rdi)			lock xorq %rsi, (%rdi)
	// CHECK: lock			// CHECK: lock
	// CHECK: encoding: [0xf0]
	// CHECK: xorq %rsi, (%rdi)			// CHECK: xorq %rsi, (%rdi)
	// CHECK: encoding: [0x48,0x31,0x37]			// CHECK: encoding: [0xf0,0x48,0x31,0x37]

	xacquire lock addq %rax, (%rax)			xacquire lock addq %rax, (%rax)
	// CHECK: xacquire			// CHECK: xacquire
	// CHECK: encoding: [0xf2]			// CHECK: encoding: [0xf2]
	// CHECK: lock			// CHECK: lock
	// CHECK: encoding: [0xf0]
	// CHECK: addq %rax, (%rax)			// CHECK: addq %rax, (%rax)
	// CHECK: encoding: [0x48,0x01,0x00]			// CHECK: encoding: [0xf0,0x48,0x01,0x00]

	xrelease lock addq %rax, (%rax)			xrelease lock addq %rax, (%rax)
	// CHECK: xrelease			// CHECK: xrelease
	// CHECK: encoding: [0xf3]			// CHECK: encoding: [0xf3]
	// CHECK: lock			// CHECK: lock
	// CHECK: encoding: [0xf0]
	// CHECK: addq %rax, (%rax)			// CHECK: addq %rax, (%rax)
	// CHECK: encoding: [0x48,0x01,0x00]			// CHECK: encoding: [0xf0,0x48,0x01,0x00]

	// rdar://8033482			// rdar://8033482
	rep movsl			rep movsl
	// CHECK: rep			// CHECK: rep
	// CHECK: encoding: [0xf3]
	// CHECK: movsl			// CHECK: movsl
	// CHECK: encoding: [0xa5]			// CHECK: encoding: [0xf3,0xa5]


	// rdar://8403974			// rdar://8403974
	iret			iret
	// CHECK: iretl			// CHECK: iretl
	// CHECK: encoding: [0xcf]			// CHECK: encoding: [0xcf]
	iretw			iretw
	// CHECK: iretw			// CHECK: iretw
	▲ Show 20 Lines • Show All 575 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

The issues with X86 prefixes: step 2ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 118602

include/llvm/MC/MCInst.h

lib/Target/X86/AsmParser/X86AsmParser.cpp

lib/Target/X86/AsmParser/X86Operand.h

lib/Target/X86/Disassembler/X86Disassembler.cpp

lib/Target/X86/Disassembler/X86DisassemblerDecoder.h

lib/Target/X86/Disassembler/X86DisassemblerDecoder.cpp

lib/Target/X86/InstPrinter/X86ATTInstPrinter.cpp

lib/Target/X86/InstPrinter/X86IntelInstPrinter.cpp

lib/Target/X86/MCTargetDesc/X86BaseInfo.h

lib/Target/X86/MCTargetDesc/X86MCCodeEmitter.cpp

test/MC/Disassembler/X86/prefixes.txt

test/MC/Disassembler/X86/x86-64.txt

test/MC/X86/intel-syntax-encoding.s

test/MC/X86/x86-64.s

The issues with X86 prefixes: step 2
ClosedPublic