NVPTX does not support the testcase llvm/test/CodeGen/Generic/2010-11-04-BigByval.ll
There are NVPTX specific testcases in the llvm/test/CodeGen/NVPTX
The test is marked as UNSUPPORTED for NVPTX due to unacceptable run time when using XFAIL
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Does it mean that the test will be disabled for everyone with NVPTX back-end compiled in? That would not be the right thing to do, IMO.
While I agree that this particular test does not add much to NVPTX testing, I think we still may want it to run.
I'm also quite puzzled about unacceptable run time when using XFAIL and what NVPTX has to do with it. Are you saying that the test has unacceptable run time when NVPTX is the default architecture targeted by LLC?
Perhaps a better approach would be to run llc with explicit triple(s). It would have the benefit of making the test agnostic of the host or build options.
The test/CodeGen/Generic/ tests without target triple or llc -mtriple= tests the default target tripe. Can NVPTX be used as a default target triple?
There is some value in having such generic tests: we avoid duplicating them into every test/CodeGen/${arch}/ directory. Such tests are a bit tricky for a contributor to update, though. By testing every possible default target means the output we can test is very limited.
yes
That would not be the right thing to do, IMO.
While I agree that this particular test does not add much to NVPTX testing, I think we still may want it to run.
I'm also quite puzzled about unacceptable run time when using XFAIL and what NVPTX has to do with it. Are you saying that the test has unacceptable run time when NVPTX is the default architecture targeted by LLC?
yes, about ~1h for array [65535 x i8]
Perhaps a better approach would be to run llc with explicit triple(s). It would have the benefit of making the test agnostic of the host or build options.
Some more info:
- The test uses the type: %big = type [131072 x i8] for functions’ by value param
- The llc crashes with arg_size >= 65536 in pass 'NVPTX DAG->DAG Pattern Instruction Selection' with failed assertion `NumValues == VTs.NumVTs && "NumValues wasn't wide enough for its operands!"'
- The test crashes after ~1h run time
- The class SDNode used during IR generation uses "unsigned short NumValues", so the max values count can be 65535 only
- Anyway, NVPTX vectorizer did not vectorize the array load/store for nvptx triple, so there are load/store instructions for every byte (max count is 65535)
- So, the llc crashes with assert for array length 131072 (defined in the test)
- Other problem is that test run time is ~1h (on my machine) for array size 65535, this is too long
- NVPTX vectorizer expects the alignment of the data 16 bytes, but the alignment for vectorizer is defined by the data type of the array, not alignment of the function param
- Therefore, the current NVPTX should be substantially updated to support the testcase
- The x86 target successfully emit short code for the array load/store, it uses other path for vectorization
I built llvm and ran tests with NVPTX default target triple
There is some value in having such generic tests: we avoid duplicating them into every test/CodeGen/${arch}/ directory. Such tests are a bit tricky for a contributor to update, though. By testing every possible default target means the output we can test is very limited.
Maybe the good solution is to define some timeout for the test ?
Then define the test as XFAIL for nvptx.
So in the future, if the test is passed, we will see it
Not sure if you're the author of this post: https://discourse.llvm.org/t/nvptx-codegen-surprisingly-slow-on-some-functions/61307, but the issue is somewhat similar -- there's a quadratic behavior in LoadStoreVectorizer.
So, what we have here is that the test is nearly a show-stopper for the affected users, we do want to keep it running everywhere else as it has a value for most of the users, and we have no easy way to satisfy both.
We have incoming LIT changes for conditional test execution (https://reviews.llvm.org/D122569). If we can expose 'default target' as a LIT feature, then we could run the test only if the default target is not NVPTX.
Great, thanks for pointing me.
The spent time on my machine is about const * (arg_size)^1.6
So, what we have here is that the test is nearly a show-stopper for the affected users, we do want to keep it running everywhere else as it has a value for most of the users, and we have no easy way to satisfy both.
We have incoming LIT changes for conditional test execution (https://reviews.llvm.org/D122569). If we can expose 'default target' as a LIT feature, then we could run the test only if the default target is not NVPTX.
I’m currently working on a patch solving a similar problem (actually, an inseparable chain of problems).
- When compiling code with large structs (~1 million elements), got “Assertion `NumBits <= MAX_INT_BITS && "bitwidth too large"' failed.”
- Fixed that, but needed a test for the fix – and it turned out that we can’t create structs with (1<<16) elements and more because NumValues in SDNode is 16-bit width (when passing structs, we create merge_values node in LowerArguments). So had to widen NumValues and other similar fields.
- Just widening does not solve the problem because some functions have quadratic complexity of NumValues (for instance, SDValue::hasOneUse→SDNode::hasNUsesOfValue). But they only need SDNode uses corresponging to a single ResNo, so I decided to split UseList in SDNode corresponding to ResNo’s.
- The patch is more or less ready, but now I work on performance testing – we need to be sure that additional fields and logic about splitted UseList does not decrease performance. It is already clear that performance increases drastically on cases when we have structs with enormous number of elements, but we also need to check normal cases too.
I suppose that the problem you are discussing here will be solved by the patch I described above in middle-term perspective.
I'm sorry, but after some verifying, I need to state that my answer yes is wrong.
- The UNSUPPORTED: nvptx is applied to default target only
- I have built two llvm builds: Default target: x86_64-unknown-linux-gnu and Default target: nvptx64. Then I ran the test with and without -march=nvptx and UNSUPPORTED: nvptx|x86. The following table shows the cases when the UNSUPPORTED is applied:
+---------------------------+--------------+--------------------+-------------------+ | default target | -march=<...> | UNSUPPORTED: <...> | Did the test run? | +---------------------------+--------------+--------------------+-------------------+ | | | nvptx | run | | | N/A +--------------------+-------------------+ | | | x86 | NOT run | | +--------------+--------------------+-------------------+ | | | nvptx | run | | x86_64-unknown-linux-gnu | nvptx +--------------------+-------------------+ | | | x86 | NOT run | | +--------------+--------------------+-------------------+ | | | nvptx | run | | | x86 +--------------------+-------------------+ | | | x86 | NOT run | +---------------------------+--------------+--------------------+-------------------+ | | | nvptx | NOT run | | | N/A +--------------------+-------------------+ | | | x86 | run | | +--------------+--------------------+-------------------+ | | | nvptx | NOT run | | nvptx | nvptx +--------------------+-------------------+ | | | x86 | run | | +--------------+--------------------+-------------------+ | | | nvptx | NOT run | | | x86 +--------------------+-------------------+ | | | x86 | run | +---------------------------+--------------+--------------------+-------------------+
So, it seems that using UNSUPPORTED: nvptx is OK.
Please review.
Were both x86 and nvptx compiled in for your experiments?
If the default target is the only target compiled in, then yes, UNSUPPORTED: nvptx would work. However, the most common case (the default) includes both x86 and nvptx. What happens in that build configurations? Will the test run?
Looking at lit docs and sources, it appears that UNSUPPORTED will disable the test if any of the expressions is true and it matches against both features and the target triple (the default target?).
This leads me to believe that mentioning nvptx there will disable the test whenever the NVPTX back-end is compiled in, regardless of whether it's the default target.
You may try specifying something that looks like a triple (UNSUPPORTED: nvptx64-|nvptx-) which would match only the target triple and not a back-end feature and configure the build target with a full triple nvptx64-nvidia-cuda. This is probably the closest we'd get to keeping the test running for everyone who does not have NVPTX as the default target.
My default build (Ubuntu 20.04, virtual machine), so UNSUPPORTED: nvptx did not stop the test running
$ cmake -S ./llvm-project/llvm -B ./llvm-build -G Ninja -DCMAKE_BUILD_TYPE=Debug -DLLVM_ENABLE_ASSERTIONS=On -DCMAKE_INSTALL_PREFIX=./llvm-install -DLLVM_CCACHE_BUILD=On -DBUILD_SHARED_LIBS=On
$ cmake --build ./llvm-build -- check-llvm
$ ./llvm-build/bin/llc --version
LLVM (http://llvm.org/): LLVM version 15.0.0git DEBUG build with assertions. Default target: x86_64-unknown-linux-gnu Host CPU: skylake Registered Targets: aarch64 - AArch64 (little endian) aarch64_32 - AArch64 (little endian ILP32) aarch64_be - AArch64 (big endian) amdgcn - AMD GCN GPUs arm - ARM arm64 - ARM64 (little endian) arm64_32 - ARM64 (little endian ILP32) armeb - ARM (big endian) avr - Atmel AVR Microcontroller bpf - BPF (host endian) bpfeb - BPF (big endian) bpfel - BPF (little endian) hexagon - Hexagon lanai - Lanai mips - MIPS (32-bit big endian) mips64 - MIPS (64-bit big endian) mips64el - MIPS (64-bit little endian) mipsel - MIPS (32-bit little endian) msp430 - MSP430 [experimental] nvptx - NVIDIA PTX 32-bit nvptx64 - NVIDIA PTX 64-bit ppc32 - PowerPC 32 ppc32le - PowerPC 32 LE ppc64 - PowerPC 64 ppc64le - PowerPC 64 LE r600 - AMD GPUs HD2XXX-HD6XXX riscv32 - 32-bit RISC-V riscv64 - 64-bit RISC-V sparc - Sparc sparcel - Sparc LE sparcv9 - Sparc V9 systemz - SystemZ thumb - Thumb thumbeb - Thumb (big endian) ve - VE wasm32 - WebAssembly 32-bit wasm64 - WebAssembly 64-bit x86 - 32-bit X86: Pentium-Pro and above x86-64 - 64-bit X86: EM64T and AMD64 xcore - XCore
Lit's internals:
available_features: {'debug_frame', 'webassembly-registered-target', 'aarch64-registered-target', 'amdgpu-registered-target', 'llvm-64-bits', 'plugins', 'nvptx-registered-target', 'system-linux', 'shell', 'powerpc-registered-target', 'lanai-registered-target', 'object-emission', 'riscv-registered-target', 'arm-registered-target', 'can-execute', 'hexagon-registered-target', 'mips-registered-target', 'msp430-registered-target', 'bpf-registered-target', 'cxx-shared-library', 'zlib', 'default_triple', 'x86-registered-target', 'xcore-registered-target', 'target-x86_64', 'thread_support', 'asserts', 'debug', 've-registered-target', 'x86_64-linux', 'avr-registered-target', 'native', 'systemz-registered-target', 'host-byteorder-little-endian', 'sparc-registered-target'}
target_triple: x86_64-unknown-linux-gnu
This is odd. nvptx-registered-target is present among the features and (I think) should have been matched by UNSUPPORTED: nvptx.
Documentation says that UNSUPPORTED matches Substrings of the target triple and I assumed that would be the case for the feature matches, too, but, apparently that's not the case.
So, it appears that your observation that UNSUPPORTED: nvptx does what we want is correct. It ends up matching the target triple only.
LGTM.
llvm/test/CodeGen/Generic/2010-11-04-BigByval.ll | ||
---|---|---|
4 | Please also mention that the test is intentionally disabled only for the NVPTX target (i.e. not for nvptx-registered-target feature) due to excessive runtime. |
One more check.
The following regex matches the feature and disabled the test:
UNSUPPORTED: nvptx-registered-target
or
UNSUPPORTED: nvptx{{\S*}}
Please also mention that the test is intentionally disabled only for the NVPTX target (i.e. not for nvptx-registered-target feature) due to excessive runtime.