This is an archive of the discontinued LLVM Phabricator instance.

Support Intel "l" suffixes for x86_64 R8-R15 registers.
AbandonedPublic

Authored by mtrent on Dec 4 2019, 9:26 PM.

Download Raw Diff

Details

Reviewers

ab
pete
hliao
rnk
jyknight

Summary

Intel's 64-bit architecture specifies the low-byte of registers r8-r15 can
be specified using either a "b" suffix ("r8b") or an "l" suffix ("r8l").
This commit adds "l" suffix alternate strings to the r8b - r15b registers,
using TableGen's Register "AltName" mechanism.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 41979
Build 42323: arc lint + arc unit

Event Timeline

mtrent created this revision.Dec 4 2019, 9:26 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 4 2019, 9:26 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B41903: Diff 232260.Dec 4 2019, 9:32 PM

Test case?

In D71046#1770098, @craig.topper wrote:

Test case?

Suggestion?

Do you have examples of other tools that accept this? I checked the GNU assembler and it didn't accept r8l

In D71046#1771414, @craig.topper wrote:

Do you have examples of other tools that accept this? I checked the GNU assembler and it didn't accept r8l

I don't. I know Apple's (old) GNU-based assembler does not accept r8l. I do not know if Intel provided tools that accept r8l, but that's the most likely candidate. I'm going from some (old) user reports stating it should work, as well as documentation found online, such as:

https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf
https://software.intel.com/en-us/articles/introduction-to-x64-assembly
https://stackoverflow.com/questions/1753602/what-are-the-names-of-the-new-x86-64-processors-registers
https://stackoverflow.com/questions/43991779/why-does-apple-use-r8l-for-the-byte-registers-instead-of-r8b

The first intel URL documents r8l exclusively. The second intel URL seems to favor r8b while acknowledging r8l. The stackoverflow links seem to explain why the world prefers using the AMD register names.

I don't have overly strong feelings about this. If r8l and friends are strictly just alternate strings of r8b this seems like a reasonable request for compatibility with code written using r8l. Again, based on some developer feedback I have, there do exist people who expect r8l to work, for whatever reason. If there were a convenient way to force people to opt into this alternate syntax I could go for that, although I don't know of an existing case that handles this, and I don't think this is worth creating some new flag or classification. If someone with "sufficient authority" were to say this Intel syntax is no longer valid, or if LLVM will not support it, I'm also OK with dropping this request and returning my bug reports as "Not To Be Fixed".

I also found this where NASM indicated they wouldn't support it https://sourceforge.net/p/nasm/bugs/324/

I'm not sure what to do here. I'd like to see at least some other widely used tool supporting this. I worry we'll end up in a situation years from now where other tools try to match clang for what seems to have started as quirk in Intel's documentation nearly 15 years ago.

Add tests for these alternate registers.

Harbormaster completed remote builds in B41979: Diff 232458.Dec 5 2019, 2:54 PM

In D71046#1771729, @craig.topper wrote:

I'm not sure what to do here. I'd like to see at least some other widely used tool supporting this. I worry we'll end up in a situation years from now where other tools try to match clang for what seems to have started as quirk in Intel's documentation nearly 15 years ago.

I suppose another way to say it is, we need someone to weigh LLVM's cost of "Allowing code that uses Intel-style register names to exist" against LLVM's cost of "Encouraging Intel-style register names to exist." And this is strictly in the context of x86_64, and not, say, other assembly languages.

In my opinion the cost of code maintenance within LLVM is quite low. Table Gen supports alternate strings, the impact to the parser is negligible. Also, the register names will be canonicalized to the AMD style names if run through a disassembler pass; folks who write "r8l" will have to read "r8b" in otool, lldb, and other tools. That suggests llvm isn't bending over backwards to accommodate or encourage these names.

I'm not sure how to settle the cost of "future tools, years from now" against LLVM's karmic account.

Apparently fasm, x64, Linux, (the "flat assembler") as accessible via "tio.run" will accept "l" suffix as alternate form of the r*b registers. Here's a dorky existence proof:

format ELF executable 3
use64

_start:

mov r8l, 0xff
mov r9l, 0xff
mov r10l, 0xff
mov r11l, 0xff
mov r12l, 0xff
mov r13l, 0xff
mov r14l, 0xff
mov r15l, 0xff
mov r8b, r8l
mov r9b, r9l
mov r10b, r10l
mov r11b, r11l
mov r12b, r12l
mov r13b, r13l
mov r14b, r14l
mov r15b, r15l

mov eax, 4
mov ebx, 1
mov ecx, msg
mov edx, 13
int 0x80
mov eax, 1

mov ebx, 0
int 0x80

msg db "Hello, World!"

Program output:
Hello, World!

Console:
flat assembler version 1.73.16 (16384 kilobytes memory, x64)
2 passes, 179 bytes.

Real time: 0.008 s
User time: 0.004 s
Sys. time: 0.004 s
CPU share: 100.87 %
Exit code: 0

So there is an example.

I contacted our documentation people yesterday to point out this difference between Intel and AMD documentation. They have agreed to fix this in the next release of the SDM.

Do we know what form that fix will take? And does that affect this PR?

Ping.

Looks like the flat assembler supports it, but doesn't document it as supported? https://flatassembler.net/docs.php?article=manual#2.1.19

I believe the Intel SDM is going to change all references to R8L to be R8B.

Adding @rnk and @jyknight as they had expressed an opinion about this in a brief chat on Discord.

Yes, I had expressed a dislike to adding these alises, as there's no pressing need to do so.

X86_64 has been around for 20 years now -- and in all that time, none of the widely-used assemblers have supported these aliases. Adding new aliases now is just adds to confusion and non-portability, which doesn't really help anyone.

Given that the only thing actually using these register names appears to be documentation which is going to be adjusted, that's even more reason not to do it.

+1, let's not do it.

Very good, I will note this is "not to be fixed" and return the request to support.

mtrent abandoned this revision.Dec 16 2019, 1:09 PM

@mtrent A new Intel SDM was released today that changes the names to R8B..R15B

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

AsmParser/

X86AsmParser.cpp

5 lines

X86.td

5 lines

X86RegisterInfo.td

18 lines

test/

MC/

X86/

x86_64-reg-alt.s

42 lines

Diff 232458

llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp

Show First 20 Lines • Show All 987 Lines • ▼ Show 20 Lines	#include "X86GenAsmMatcher.inc"
bool ParseDirective(AsmToken DirectiveID) override;		bool ParseDirective(AsmToken DirectiveID) override;
};		};
} // end anonymous namespace		} // end anonymous namespace

/// @name Auto-generated Match Functions		/// @name Auto-generated Match Functions
/// {		/// {

static unsigned MatchRegisterName(StringRef Name);		static unsigned MatchRegisterName(StringRef Name);
		static unsigned MatchRegisterAltName(StringRef Name);

/// }		/// }

static bool CheckBaseRegAndIndexRegAndScale(unsigned BaseReg, unsigned IndexReg,		static bool CheckBaseRegAndIndexRegAndScale(unsigned BaseReg, unsigned IndexReg,
unsigned Scale, bool Is64BitMode,		unsigned Scale, bool Is64BitMode,
StringRef &ErrMsg) {		StringRef &ErrMsg) {
// If we have both a base register and an index register make sure they are		// If we have both a base register and an index register make sure they are
// both 64-bit or 32-bit registers.		// both 64-bit or 32-bit registers.
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	bool X86AsmParser::ParseRegister(unsigned &RegNo,

if (Tok.isNot(AsmToken::Identifier)) {		if (Tok.isNot(AsmToken::Identifier)) {
if (isParsingIntelSyntax()) return true;		if (isParsingIntelSyntax()) return true;
return Error(StartLoc, "invalid register name",		return Error(StartLoc, "invalid register name",
SMRange(StartLoc, EndLoc));		SMRange(StartLoc, EndLoc));
}		}

RegNo = MatchRegisterName(Tok.getString());		RegNo = MatchRegisterName(Tok.getString());
		if (RegNo == 0)
		RegNo = MatchRegisterAltName(Tok.getString());

// If the match failed, try the register name as lowercase.		// If the match failed, try the register name as lowercase.
if (RegNo == 0)		if (RegNo == 0)
RegNo = MatchRegisterName(Tok.getString().lower());		RegNo = MatchRegisterName(Tok.getString().lower());
		if (RegNo == 0)
		RegNo = MatchRegisterAltName(Tok.getString().lower());

// The "flags" and "mxcsr" registers cannot be referenced directly.		// The "flags" and "mxcsr" registers cannot be referenced directly.
// Treat it as an identifier instead.		// Treat it as an identifier instead.
if (isParsingInlineAsm() && isParsingIntelSyntax() &&		if (isParsingInlineAsm() && isParsingIntelSyntax() &&
(RegNo == X86::EFLAGS \|\| RegNo == X86::MXCSR))		(RegNo == X86::EFLAGS \|\| RegNo == X86::MXCSR))
RegNo = 0;		RegNo = 0;

if (!is64BitMode()) {		if (!is64BitMode()) {
▲ Show 20 Lines • Show All 2,771 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86.td

	Show First 20 Lines • Show All 1,251 Lines • ▼ Show 20 Lines

	include "X86CallingConv.td"			include "X86CallingConv.td"


	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Assembly Parser			// Assembly Parser
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

				def X86AsmParser : AsmParser {
				let ShouldEmitMatchRegisterAltName = 1;
				}

	def ATTAsmParserVariant : AsmParserVariant {			def ATTAsmParserVariant : AsmParserVariant {
	int Variant = 0;			int Variant = 0;

	// Variant name.			// Variant name.
	string Name = "att";			string Name = "att";

	// Discard comments in assembly strings.			// Discard comments in assembly strings.
	string CommentDelimiter = "#";			string CommentDelimiter = "#";
	Show All 28 Lines
	def IntelAsmWriter : AsmWriter {			def IntelAsmWriter : AsmWriter {
	string AsmWriterClassName = "IntelInstPrinter";			string AsmWriterClassName = "IntelInstPrinter";
	int Variant = 1;			int Variant = 1;
	}			}

	def X86 : Target {			def X86 : Target {
	// Information about the instructions...			// Information about the instructions...
	let InstructionSet = X86InstrInfo;			let InstructionSet = X86InstrInfo;
				let AssemblyParsers = [X86AsmParser];
	let AssemblyParserVariants = [ATTAsmParserVariant, IntelAsmParserVariant];			let AssemblyParserVariants = [ATTAsmParserVariant, IntelAsmParserVariant];
	let AssemblyWriters = [ATTAsmWriter, IntelAsmWriter];			let AssemblyWriters = [ATTAsmWriter, IntelAsmWriter];
	let AllowRegisterRenaming = 1;			let AllowRegisterRenaming = 1;
	}			}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Pfm Counters			// Pfm Counters
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	include "X86PfmCounters.td"			include "X86PfmCounters.td"

llvm/lib/Target/X86/X86RegisterInfo.td

	//===- X86RegisterInfo.td - Describe the X86 Register File --- tablegen --==//			//===- X86RegisterInfo.td - Describe the X86 Register File --- tablegen --==//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file describes the X86 Register file, defining the registers themselves,			// This file describes the X86 Register file, defining the registers themselves,
	// aliases between the registers, and the register classes built out of the			// aliases between the registers, and the register classes built out of the
	// registers.			// registers.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	class X86Reg<string n, bits<16> Enc, list<Register> subregs = []> : Register<n> {			class X86Reg<string n, bits<16> Enc, list<Register> subregs = [], list<string> alt = []> : Register<n, alt> {
	let Namespace = "X86";			let Namespace = "X86";
	let HWEncoding = Enc;			let HWEncoding = Enc;
	let SubRegs = subregs;			let SubRegs = subregs;
	}			}

	// Subregister indices.			// Subregister indices.
	let Namespace = "X86" in {			let Namespace = "X86" in {
	def sub_8bit : SubRegIndex<8>;			def sub_8bit : SubRegIndex<8>;
	Show All 37 Lines
	def BH : X86Reg<"bh", 7>;			def BH : X86Reg<"bh", 7>;

	// X86-64 only, requires REX.			// X86-64 only, requires REX.
	let CostPerUse = 1 in {			let CostPerUse = 1 in {
	def SIL : X86Reg<"sil", 6>;			def SIL : X86Reg<"sil", 6>;
	def DIL : X86Reg<"dil", 7>;			def DIL : X86Reg<"dil", 7>;
	def BPL : X86Reg<"bpl", 5>;			def BPL : X86Reg<"bpl", 5>;
	def SPL : X86Reg<"spl", 4>;			def SPL : X86Reg<"spl", 4>;
	def R8B : X86Reg<"r8b", 8>;			def R8B : X86Reg<"r8b", 8, [], ["r8l"]>;
	def R9B : X86Reg<"r9b", 9>;			def R9B : X86Reg<"r9b", 9, [], ["r9l"]>;
	def R10B : X86Reg<"r10b", 10>;			def R10B : X86Reg<"r10b", 10, [], ["r10l"]>;
	def R11B : X86Reg<"r11b", 11>;			def R11B : X86Reg<"r11b", 11, [], ["r11l"]>;
	def R12B : X86Reg<"r12b", 12>;			def R12B : X86Reg<"r12b", 12, [], ["r12l"]>;
	def R13B : X86Reg<"r13b", 13>;			def R13B : X86Reg<"r13b", 13, [], ["r13l"]>;
	def R14B : X86Reg<"r14b", 14>;			def R14B : X86Reg<"r14b", 14, [], ["r14l"]>;
	def R15B : X86Reg<"r15b", 15>;			def R15B : X86Reg<"r15b", 15, [], ["r15l"]>;
	}			}

	let isArtificial = 1 in {			let isArtificial = 1 in {
	// High byte of the low 16 bits of the super-register:			// High byte of the low 16 bits of the super-register:
	def SIH : X86Reg<"", -1>;			def SIH : X86Reg<"", -1>;
	def DIH : X86Reg<"", -1>;			def DIH : X86Reg<"", -1>;
	def BPH : X86Reg<"", -1>;			def BPH : X86Reg<"", -1>;
	def SPH : X86Reg<"", -1>;			def SPH : X86Reg<"", -1>;
	▲ Show 20 Lines • Show All 539 Lines • Show Last 20 Lines

llvm/test/MC/X86/x86_64-reg-alt.s

This file was added.

				// RUN: llvm-mc -triple x86_64-unknown-unknown %s \| FileCheck %s

				movb $0, %r8b
				// CHECK: movb $0, %r8b
				movb $0, %r8l
				// CHECK: movb $0, %r8b

				movb $0, %r9b
				// CHECK: movb $0, %r9b
				movb $0, %r9l
				// CHECK: movb $0, %r9b

				movb $0, %r10b
				// CHECK: movb $0, %r10b
				movb $0, %r10l
				// CHECK: movb $0, %r10b

				movb $0, %r11b
				// CHECK: movb $0, %r11b
				movb $0, %r11l
				// CHECK: movb $0, %r11b

				movb $0, %r12b
				// CHECK: movb $0, %r12b
				movb $0, %r12l
				// CHECK: movb $0, %r12b

				movb $0, %r13b
				// CHECK: movb $0, %r13b
				movb $0, %r13l
				// CHECK: movb $0, %r13b

				movb $0, %r14b
				// CHECK: movb $0, %r14b
				movb $0, %r14l
				// CHECK: movb $0, %r14b

				movb $0, %r15b
				// CHECK: movb $0, %r15b
				movb $0, %r15l
				// CHECK: movb $0, %r15b