PTX 6.3 requires using ".aligned" in the MMA instruction names.
In order to generate correct name, now we pass current
PTX version to each instruction as an extra constant operand
and InstPrinter adjusts its output accordingly.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
llvm/lib/Target/NVPTX/NVPTXIntrinsics.td | ||
---|---|---|
53 ↗ | (On Diff #190741) | Rebase onto D59389? |
llvm/test/CodeGen/NVPTX/wmma.py | ||
244 ↗ | (On Diff #190741) | Who is supposed to run this script? Can we check-in the result of this script and make them part of the regression tests? Relatedly, for other backends we have a framework for it. See llvm/utils/update_llc_test_checks.py. The generated file looks like llvm/test/CodeGen/PowerPC/atomics-regression.ll. One of the advantages to check-in the generated file is that, and succeeding behavioral changes are reflected in the patch. |
llvm/test/CodeGen/NVPTX/wmma.py | ||
---|---|---|
244 ↗ | (On Diff #190741) |
I guess I can answer this part - lit. Still, it'd be great to check-in the generated .ll files with RUN lines in them. |
llvm/test/CodeGen/NVPTX/wmma.py | ||
---|---|---|
244 ↗ | (On Diff #190741) | The script is executed by the lit which then runs llc with the generated output and checks the resulting PTX. I'm not convinced that committing generated .ll has much value -- it's in the ballpark of a megabyte of uninteresting boilerplate mostly consisting of enumerating 4- and 8-tuples of arguments and results. Upcoming changes will bring more supported types and will multiply the amount of generated ll without making it any more interesting for humans. e.g just one function out of *a lot*: declare {<2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>} @llvm.nvvm.wmma.m16n16k16.load.a.row.f16.p3i8(i8 addrspace(3)* %src ); ; CHECK-LABEL: .func {{.*}}test_llvm_nvvm_wmma_m16n16k16_load_a_row_f16_p3i8( define {<2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>} @test_llvm_nvvm_wmma_m16n16k16_load_a_row_f16_p3i8(i8 addrspace(3)* %src ) { ; CHECK: wmma.load.a.sync.aligned.row.m16n16k16.shared.f16 ; CHECK: {{{%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+, *%hh[0-9]+}}} ; CHECK: [%rd{{[0-9]+}}] %v0 = call {<2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>} @llvm.nvvm.wmma.m16n16k16.load.a.row.f16.p3i8(i8 addrspace(3)* %src ); ret {<2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>, <2 x half>} %v0; } It's easy enough to generate or grab the one done by lit -- the name would be right there in the failing command. |
llvm/test/CodeGen/NVPTX/wmma.py | ||
---|---|---|
244 ↗ | (On Diff #190741) |
test/CodeGen/X86 already does this: It has 37MB of autogenerated .ll files, presumably for its massive intrinsics.
Compared to test/CodeGen/X86/avx512vl-vec-masked-cmp.ll it isn't that bad. ;) However, I realized that wmma.py is somewhat different from utils/update_llc_test_checks.py What utils/update_llc_test_checks.py does:
What wmma.py does:
The key difference is that update_llc_test_checks.py won't be wmma-specific. Another crucial difference is that wmma.py generates very generic check-lines like [%rd{{[0-9]+}}], while update_llc_test_checks.py usually prints out the exact literal it extracts from the asm result, e.g. %rd1. As a result, wmma.py's output isn't as readable as I thought it would be (less literals), so I'm fine without checking-in the wppa.py-generated files. However, I encourage that some of the NVPTX contributors (!) add NVPTX support to update_llc_test_checks.py. With that, we could have supported wmma.py almost freely, along with all other kinds of PTX regression tests. |
llvm/test/CodeGen/NVPTX/wmma.py | ||
---|---|---|
244 ↗ | (On Diff #190741) | IIUIC, update_llc_test_checks.py effectively freezes the output generated by llcn *now* so it can be checked for regressions later. wmma.py use case is different, at least for me -- I use it as a way to *create* the reference output that llc can't generate yet and then use it to make sure my NVPTX back-end changes do the right thing. I'll think of splitting these two use cases. Perhaps I should keep the script to aid with development, but, once it's done, generate reference .ll with implemented intrinsics and let update_llc_test_checks.py generate the checks for generated PTX. |