Intel's 64-bit architecture specifies the low-byte of registers r8-r15 can
be specified using either a "b" suffix ("r8b") or an "l" suffix ("r8l").
This commit adds "l" suffix alternate strings to the r8b - r15b registers,
using TableGen's Register "AltName" mechanism.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 41979 Build 42323: arc lint + arc unit
Event Timeline
Do you have examples of other tools that accept this? I checked the GNU assembler and it didn't accept r8l
I don't. I know Apple's (old) GNU-based assembler does not accept r8l. I do not know if Intel provided tools that accept r8l, but that's the most likely candidate. I'm going from some (old) user reports stating it should work, as well as documentation found online, such as:
https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-instruction-set-reference-manual-325383.pdf
https://software.intel.com/en-us/articles/introduction-to-x64-assembly
https://stackoverflow.com/questions/1753602/what-are-the-names-of-the-new-x86-64-processors-registers
https://stackoverflow.com/questions/43991779/why-does-apple-use-r8l-for-the-byte-registers-instead-of-r8b
The first intel URL documents r8l exclusively. The second intel URL seems to favor r8b while acknowledging r8l. The stackoverflow links seem to explain why the world prefers using the AMD register names.
I don't have overly strong feelings about this. If r8l and friends are strictly just alternate strings of r8b this seems like a reasonable request for compatibility with code written using r8l. Again, based on some developer feedback I have, there do exist people who expect r8l to work, for whatever reason. If there were a convenient way to force people to opt into this alternate syntax I could go for that, although I don't know of an existing case that handles this, and I don't think this is worth creating some new flag or classification. If someone with "sufficient authority" were to say this Intel syntax is no longer valid, or if LLVM will not support it, I'm also OK with dropping this request and returning my bug reports as "Not To Be Fixed".
I also found this where NASM indicated they wouldn't support it https://sourceforge.net/p/nasm/bugs/324/
I'm not sure what to do here. I'd like to see at least some other widely used tool supporting this. I worry we'll end up in a situation years from now where other tools try to match clang for what seems to have started as quirk in Intel's documentation nearly 15 years ago.
I suppose another way to say it is, we need someone to weigh LLVM's cost of "Allowing code that uses Intel-style register names to exist" against LLVM's cost of "Encouraging Intel-style register names to exist." And this is strictly in the context of x86_64, and not, say, other assembly languages.
In my opinion the cost of code maintenance within LLVM is quite low. Table Gen supports alternate strings, the impact to the parser is negligible. Also, the register names will be canonicalized to the AMD style names if run through a disassembler pass; folks who write "r8l" will have to read "r8b" in otool, lldb, and other tools. That suggests llvm isn't bending over backwards to accommodate or encourage these names.
I'm not sure how to settle the cost of "future tools, years from now" against LLVM's karmic account.
Apparently fasm, x64, Linux, (the "flat assembler") as accessible via "tio.run" will accept "l" suffix as alternate form of the r*b registers. Here's a dorky existence proof:
format ELF executable 3
use64
_start:
mov r8l, 0xff mov r9l, 0xff mov r10l, 0xff mov r11l, 0xff mov r12l, 0xff mov r13l, 0xff mov r14l, 0xff mov r15l, 0xff mov r8b, r8l mov r9b, r9l mov r10b, r10l mov r11b, r11l mov r12b, r12l mov r13b, r13l mov r14b, r14l mov r15b, r15l
mov eax, 4
mov ebx, 1
mov ecx, msg
mov edx, 13
int 0x80
mov eax, 1
mov ebx, 0 int 0x80
msg db "Hello, World!"
Program output:
Hello, World!
Console:
flat assembler version 1.73.16 (16384 kilobytes memory, x64)
2 passes, 179 bytes.
Real time: 0.008 s
User time: 0.004 s
Sys. time: 0.004 s
CPU share: 100.87 %
Exit code: 0
So there is an example.
I contacted our documentation people yesterday to point out this difference between Intel and AMD documentation. They have agreed to fix this in the next release of the SDM.
Looks like the flat assembler supports it, but doesn't document it as supported? https://flatassembler.net/docs.php?article=manual#2.1.19
I believe the Intel SDM is going to change all references to R8L to be R8B.
Adding @rnk and @jyknight as they had expressed an opinion about this in a brief chat on Discord.
Yes, I had expressed a dislike to adding these alises, as there's no pressing need to do so.
X86_64 has been around for 20 years now -- and in all that time, none of the widely-used assemblers have supported these aliases. Adding new aliases now is just adds to confusion and non-portability, which doesn't really help anyone.
Given that the only thing actually using these register names appears to be documentation which is going to be adjusted, that's even more reason not to do it.