This is an archive of the discontinued LLVM Phabricator instance.

[TableGen] X86 mnemonic tables backend
ClosedPublic

Authored by Amir on Mar 13 2022, 11:02 PM.

Details

Summary

Add tablegen backend that generates X86 mnemonic-based opcode groupings, e.g.
isADD, isTEST, etc.

Addresses https://lists.llvm.org/pipermail/llvm-dev/2022-January/154526.html

Diff Detail

Event Timeline

Amir created this revision.Mar 13 2022, 11:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 13 2022, 11:02 PM
Herald added a subscriber: mgorny. · View Herald Transcript
Amir published this revision for review.Mar 13 2022, 11:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 13 2022, 11:10 PM

I was looking at the code in replaceRegWithReg where isAND, isCMP, etc. are used. I don't understand this loop.

for (unsigned Index = InstDesc.getNumDefs() + (HasLHS ? 1 : 0),
              E = InstDesc.getNumOperands();
     Index != E; ++Index) {

HasLHS is set for all AND, ADD, SUB instructions, but some of them are RMW memory operations where getNumDefs() == 0. And there are 5 source address operands. The HasLHS will be 1 and skip the first of those 5 operands. Is that correct?

Amir added a comment.Mar 14 2022, 1:02 PM

I was looking at the code in replaceRegWithReg where isAND, isCMP, etc. are used. I don't understand this loop.

for (unsigned Index = InstDesc.getNumDefs() + (HasLHS ? 1 : 0),
              E = InstDesc.getNumOperands();
     Index != E; ++Index) {

HasLHS is set for all AND, ADD, SUB instructions, but some of them are RMW memory operations where getNumDefs() == 0. And there are 5 source address operands. The HasLHS will be 1 and skip the first of those 5 operands. Is that correct?

Thanks for flagging, it doesn't seem right. The code may have borrowed from a different function (replaceMemOperandWithImm) where the only opcode group having RMW semantics was TESTmr and it was handled separately. For the purposes of replaceRegWithReg, HasLHS logic is not applicable to RMW instructions. I'll address that separately.

Amir retitled this revision from [TableGen][X86] X86 mnemonic tables backend to [TableGen] X86 mnemonic tables backend.Mar 14 2022, 9:24 PM
Amir edited the summary of this revision. (Show Details)
skan added inline comments.Mar 16 2022, 2:58 AM
llvm/utils/TableGen/X86MnemonicTables.cpp
53–59

Use StringRef::take_until

skan added inline comments.Mar 16 2022, 3:12 AM
llvm/utils/TableGen/X86MnemonicTables.cpp
45

Need to check the Form is not neither Pseudo or PrefixByte.

48

Need to check ForceDisassemble is not true for isCodeGenOnly

74

Avoid using switch clause when there is only one instruction.

Amir updated this revision to Diff 416032.Mar 16 2022, 5:24 PM

Feedback

Amir marked 4 inline comments as done.Mar 16 2022, 5:25 PM
Amir added inline comments.Mar 16 2022, 5:30 PM
llvm/utils/TableGen/X86MnemonicTables.cpp
45

I decided to reuse X86Disassembler::RecognizableInstr as it parses almost all the necessary fields, specifically Form. Otherwise I'd have to copy/paste byteFromBitsInit Record field parsing function which I didn't want.

skan added inline comments.Mar 16 2022, 6:28 PM
llvm/utils/TableGen/X86MnemonicTables.cpp
70

I'm afraid using StringRef here is not safe, AsmString is destoryed after each iteration.

Amir updated this revision to Diff 416103.Mar 17 2022, 2:17 AM

Handle CC in mnemonic printing

Amir updated this revision to Diff 416104.Mar 17 2022, 2:17 AM

Handle CC in mnemonic printing

Amir marked an inline comment as done.Mar 17 2022, 2:19 AM
Amir added inline comments.
llvm/utils/TableGen/X86MnemonicTables.cpp
70

You're right. Moved the CC printing down to the emission steps.

skan added inline comments.Mar 17 2022, 5:29 AM
llvm/utils/TableGen/X86MnemonicTables.cpp
73

It's not about CC.

std::string AsmString = I->FlattenAsmStringVariants(I->AsmString, Variant);

AsmString is contructed in each iteration and is destroyed at the end of the iteration, Mnemoic references a substring of it while you use it outside the loop.
I'm afraid it's not safe. Correct me if I am wrong.

Amir updated this revision to Diff 416299.Mar 17 2022, 1:20 PM
Amir marked an inline comment as done.

Store a string in the map

Amir marked an inline comment as done.Mar 17 2022, 1:21 PM
Amir added inline comments.
llvm/utils/TableGen/X86MnemonicTables.cpp
73

Now I understand the issue. I think I've addressed the problem by storing a std::string as a key in the new revision.

Amir marked an inline comment as done.Mar 17 2022, 1:21 PM
skan accepted this revision.Mar 17 2022, 7:00 PM

Thanks! LGTM generally.

llvm/utils/TableGen/X86MnemonicTables.cpp
74

We could use StringRef::contains, but it's a minor issue...

This revision is now accepted and ready to land.Mar 17 2022, 7:00 PM
Amir updated this revision to Diff 416402.Mar 17 2022, 11:46 PM

Reuse Mnemonic.find("${cond}")

Amir marked an inline comment as done.Mar 17 2022, 11:49 PM
Amir added inline comments.
llvm/utils/TableGen/X86MnemonicTables.cpp
74

The result of find is used later to extract the mnemonic part, so I'm keeping it.

Amir updated this revision to Diff 416421.Mar 18 2022, 1:25 AM
Amir marked an inline comment as done.

Replace $cond in-place

This revision was landed with ongoing or failed builds.Mar 18 2022, 1:44 AM
This revision was automatically updated to reflect the committed changes.

Hello!

An LLVM build of mine that has UBsan enabled has started failing since this commit. The error itself is:

[5/1719] Building X86GenMnemonicTables.inc...
FAILED: lib/Target/X86/X86GenMnemonicTables.inc
cd /mnt/c/LLVM/llvm-project/llvm/build-wsld && /mnt/c/LLVM/llvm-project/llvm/build-wsld/bin/llvm-tblgen -gen-x86-mnemonic-tables -asmwriternum=1 -I /mnt/c/LLVM/llvm-project/llvm/lib/Target/X86 -I/mnt/c/LLVM/llvm-project/llvm/build-wsld/include -I/mnt/c/LLVM/llvm-project/llvm/include -I /mnt/c/LLVM/llvm-project/llvm/lib/Target /mnt/c/LLVM/llvm-project/llvm/lib/Target/X86/X86.td --write-if-changed -o lib/Target/X86/X86GenMnemonicTables.inc -d lib/Target/X86/X86GenMnemonicTables.inc.d
../utils/TableGen/X86MnemonicTables.cpp:52:29: runtime error: load of value 96, which is not a valid value for type 'bool'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../utils/TableGen/X86MnemonicTables.cpp:52:29 in

It seems like at X86MnemonicTables.cpp:52 you are accessing the IsCodeGenOnly field of RI, yet the constructor in X86RecognizableInstr.cpp:95 only initializes that field conditionally (based on the early return at line 75).
Would you mind taking a look? I'd suggest initializing those bool fields to some sane defaults or at the very least make the access in X86MnemonicTables.cpp:52 conditional.

For the sake of completion: These were the build options I used:

cmake .. -GNinja -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/mnt/c/Libraries/WSL/LLVMd -DLLVM_ENABLE_PROJECTS="mlir;lld" -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_USE_SANITIZER="Address;Undefined" -DLLVM_BUILD_TOOLS=OFF -DLLVM_BUILD_UTILS=ON -DLLVM_BUILD_TESTS=OFF -DCMAKE_C_COMPILER=clang-12 -DCMAKE_CXX_COMPILER=clang++-12 -DLLVM_ENABLE_LIBXML2=OFF -DLLVM_INSTALL_UTILS=ON -DLLVM_FORCE_ENABLE_STATS=ON -DLLVM_ENABLE_LLD=ON

Thanks

Hello!

An LLVM build of mine that has UBsan enabled has started failing since this commit. The error itself is:

[5/1719] Building X86GenMnemonicTables.inc...
FAILED: lib/Target/X86/X86GenMnemonicTables.inc
cd /mnt/c/LLVM/llvm-project/llvm/build-wsld && /mnt/c/LLVM/llvm-project/llvm/build-wsld/bin/llvm-tblgen -gen-x86-mnemonic-tables -asmwriternum=1 -I /mnt/c/LLVM/llvm-project/llvm/lib/Target/X86 -I/mnt/c/LLVM/llvm-project/llvm/build-wsld/include -I/mnt/c/LLVM/llvm-project/llvm/include -I /mnt/c/LLVM/llvm-project/llvm/lib/Target /mnt/c/LLVM/llvm-project/llvm/lib/Target/X86/X86.td --write-if-changed -o lib/Target/X86/X86GenMnemonicTables.inc -d lib/Target/X86/X86GenMnemonicTables.inc.d
../utils/TableGen/X86MnemonicTables.cpp:52:29: runtime error: load of value 96, which is not a valid value for type 'bool'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../utils/TableGen/X86MnemonicTables.cpp:52:29 in

It seems like at X86MnemonicTables.cpp:52 you are accessing the IsCodeGenOnly field of RI, yet the constructor in X86RecognizableInstr.cpp:95 only initializes that field conditionally (based on the early return at line 75).
Would you mind taking a look? I'd suggest initializing those bool fields to some sane defaults or at the very least make the access in X86MnemonicTables.cpp:52 conditional.

For the sake of completion: These were the build options I used:

cmake .. -GNinja -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/mnt/c/Libraries/WSL/LLVMd -DLLVM_ENABLE_PROJECTS="mlir;lld" -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_USE_SANITIZER="Address;Undefined" -DLLVM_BUILD_TOOLS=OFF -DLLVM_BUILD_UTILS=ON -DLLVM_BUILD_TESTS=OFF -DCMAKE_C_COMPILER=clang-12 -DCMAKE_CXX_COMPILER=clang++-12 -DLLVM_ENABLE_LIBXML2=OFF -DLLVM_INSTALL_UTILS=ON -DLLVM_FORCE_ENABLE_STATS=ON -DLLVM_ENABLE_LLD=ON

Thanks

Thanks for reporting! I'm working on it.