This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX][tests] Do not run the test CodeGen/Generic/2010-11-04-BigByval.ll
ClosedPublic

Authored by i-chebykin on Apr 1 2022, 1:45 PM.

Details

Summary

NVPTX does not support the testcase llvm/test/CodeGen/Generic/2010-11-04-BigByval.ll
There are NVPTX specific testcases in the llvm/test/CodeGen/NVPTX
The test is marked as UNSUPPORTED for NVPTX due to unacceptable run time when using XFAIL

Diff Detail

Event Timeline

i-chebykin requested review of this revision.Apr 1 2022, 1:45 PM
i-chebykin created this revision.
i-chebykin edited the summary of this revision. (Show Details)
tra added a comment.Apr 1 2022, 1:54 PM

Does it mean that the test will be disabled for everyone with NVPTX back-end compiled in? That would not be the right thing to do, IMO.

While I agree that this particular test does not add much to NVPTX testing, I think we still may want it to run.

I'm also quite puzzled about unacceptable run time when using XFAIL and what NVPTX has to do with it. Are you saying that the test has unacceptable run time when NVPTX is the default architecture targeted by LLC?

Perhaps a better approach would be to run llc with explicit triple(s). It would have the benefit of making the test agnostic of the host or build options.

The test/CodeGen/Generic/ tests without target triple or llc -mtriple= tests the default target tripe. Can NVPTX be used as a default target triple?

There is some value in having such generic tests: we avoid duplicating them into every test/CodeGen/${arch}/ directory. Such tests are a bit tricky for a contributor to update, though. By testing every possible default target means the output we can test is very limited.

Does it mean that the test will be disabled for everyone with NVPTX back-end compiled in?

yes

That would not be the right thing to do, IMO.

While I agree that this particular test does not add much to NVPTX testing, I think we still may want it to run.

I'm also quite puzzled about unacceptable run time when using XFAIL and what NVPTX has to do with it. Are you saying that the test has unacceptable run time when NVPTX is the default architecture targeted by LLC?

yes, about ~1h for array [65535 x i8]

Perhaps a better approach would be to run llc with explicit triple(s). It would have the benefit of making the test agnostic of the host or build options.

Some more info:

  • The test uses the type: %big = type [131072 x i8] for functions’ by value param
  • The llc crashes with arg_size >= 65536 in pass 'NVPTX DAG->DAG Pattern Instruction Selection' with failed assertion `NumValues == VTs.NumVTs && "NumValues wasn't wide enough for its operands!"'
  • The test crashes after ~1h run time
  • The class SDNode used during IR generation uses "unsigned short NumValues", so the max values count can be 65535 only
  • Anyway, NVPTX vectorizer did not vectorize the array load/store for nvptx triple, so there are load/store instructions for every byte (max count is 65535)
  • So, the llc crashes with assert for array length 131072 (defined in the test)
  • Other problem is that test run time is ~1h (on my machine) for array size 65535, this is too long
  • NVPTX vectorizer expects the alignment of the data 16 bytes, but the alignment for vectorizer is defined by the data type of the array, not alignment of the function param
  • Therefore, the current NVPTX should be substantially updated to support the testcase
  • The x86 target successfully emit short code for the array load/store, it uses other path for vectorization

The test/CodeGen/Generic/ tests without target triple or llc -mtriple= tests the default target tripe. Can NVPTX be used as a default target triple?

I built llvm and ran tests with NVPTX default target triple

There is some value in having such generic tests: we avoid duplicating them into every test/CodeGen/${arch}/ directory. Such tests are a bit tricky for a contributor to update, though. By testing every possible default target means the output we can test is very limited.

Maybe the good solution is to define some timeout for the test ?
Then define the test as XFAIL for nvptx.
So in the future, if the test is passed, we will see it

tra added a comment.Apr 4 2022, 12:35 PM

Not sure if you're the author of this post: https://discourse.llvm.org/t/nvptx-codegen-surprisingly-slow-on-some-functions/61307, but the issue is somewhat similar -- there's a quadratic behavior in LoadStoreVectorizer.

So, what we have here is that the test is nearly a show-stopper for the affected users, we do want to keep it running everywhere else as it has a value for most of the users, and we have no easy way to satisfy both.

We have incoming LIT changes for conditional test execution (https://reviews.llvm.org/D122569). If we can expose 'default target' as a LIT feature, then we could run the test only if the default target is not NVPTX.

Great, thanks for pointing me.

Not sure if you're the author of this post: https://discourse.llvm.org/t/nvptx-codegen-surprisingly-slow-on-some-functions/61307, but the issue is somewhat similar -- there's a quadratic behavior in LoadStoreVectorizer.

The spent time on my machine is about const * (arg_size)^1.6

So, what we have here is that the test is nearly a show-stopper for the affected users, we do want to keep it running everywhere else as it has a value for most of the users, and we have no easy way to satisfy both.

We have incoming LIT changes for conditional test execution (https://reviews.llvm.org/D122569). If we can expose 'default target' as a LIT feature, then we could run the test only if the default target is not NVPTX.

I’m currently working on a patch solving a similar problem (actually, an inseparable chain of problems).

  • When compiling code with large structs (~1 million elements), got “Assertion `NumBits <= MAX_INT_BITS && "bitwidth too large"' failed.”
  • Fixed that, but needed a test for the fix – and it turned out that we can’t create structs with (1<<16) elements and more because NumValues in SDNode is 16-bit width (when passing structs, we create merge_values node in LowerArguments). So had to widen NumValues and other similar fields.
  • Just widening does not solve the problem because some functions have quadratic complexity of NumValues (for instance, SDValue::hasOneUseSDNode::hasNUsesOfValue). But they only need SDNode uses corresponging to a single ResNo, so I decided to split UseList in SDNode corresponding to ResNo’s.
  • The patch is more or less ready, but now I work on performance testing – we need to be sure that additional fields and logic about splitted UseList does not decrease performance. It is already clear that performance increases drastically on cases when we have structs with enormous number of elements, but we also need to check normal cases too.

I suppose that the problem you are discussing here will be solved by the patch I described above in middle-term perspective.

Does it mean that the test will be disabled for everyone with NVPTX back-end compiled in?

yes

That would not be the right thing to do, IMO.

While I agree that this particular test does not add much to NVPTX testing, I think we still may want it to run.

I'm sorry, but after some verifying, I need to state that my answer yes is wrong.

  1. The UNSUPPORTED: nvptx is applied to default target only
  2. I have built two llvm builds: Default target: x86_64-unknown-linux-gnu and Default target: nvptx64. Then I ran the test with and without -march=nvptx and UNSUPPORTED: nvptx|x86. The following table shows the cases when the UNSUPPORTED is applied:
	+---------------------------+--------------+--------------------+-------------------+
	|      default target       | -march=<...> | UNSUPPORTED: <...> | Did the test run? |
	+---------------------------+--------------+--------------------+-------------------+
	|                           |              | nvptx              | run               |
	|                           | N/A          +--------------------+-------------------+
	|                           |              | x86                | NOT run           |
	|                           +--------------+--------------------+-------------------+
	|                           |              | nvptx              | run               |
	|  x86_64-unknown-linux-gnu | nvptx        +--------------------+-------------------+
	|                           |              | x86                | NOT run           |
	|                           +--------------+--------------------+-------------------+
	|                           |              | nvptx              | run               |
	|                           | x86          +--------------------+-------------------+
	|                           |              | x86                | NOT run           |
	+---------------------------+--------------+--------------------+-------------------+
	|                           |              | nvptx              | NOT run           |
	|                           | N/A          +--------------------+-------------------+
	|                           |              | x86                | run               |
	|                           +--------------+--------------------+-------------------+
	|                           |              | nvptx              | NOT run           |
	| nvptx                     | nvptx        +--------------------+-------------------+
	|                           |              | x86                | run               |
	|                           +--------------+--------------------+-------------------+
	|                           |              | nvptx              | NOT run           |
	|                           | x86          +--------------------+-------------------+
	|                           |              | x86                | run               |
	+---------------------------+--------------+--------------------+-------------------+

So, it seems that using UNSUPPORTED: nvptx is OK.
Please review.

tra added a comment.Apr 12 2022, 10:53 AM
  1. The UNSUPPORTED: nvptx is applied to default target only
  2. I have built two llvm builds: Default target: x86_64-unknown-linux-gnu and Default target: nvptx64. Then I ran the test with and without -march=nvptx and UNSUPPORTED: nvptx|x86. The following table shows the cases when the UNSUPPORTED is applied:

Were both x86 and nvptx compiled in for your experiments?
If the default target is the only target compiled in, then yes, UNSUPPORTED: nvptx would work. However, the most common case (the default) includes both x86 and nvptx. What happens in that build configurations? Will the test run?

tra added a comment.Apr 12 2022, 11:08 AM

Looking at lit docs and sources, it appears that UNSUPPORTED will disable the test if any of the expressions is true and it matches against both features and the target triple (the default target?).
This leads me to believe that mentioning nvptx there will disable the test whenever the NVPTX back-end is compiled in, regardless of whether it's the default target.

You may try specifying something that looks like a triple (UNSUPPORTED: nvptx64-|nvptx-) which would match only the target triple and not a back-end feature and configure the build target with a full triple nvptx64-nvidia-cuda. This is probably the closest we'd get to keeping the test running for everyone who does not have NVPTX as the default target.

My default build (Ubuntu 20.04, virtual machine), so UNSUPPORTED: nvptx did not stop the test running

$ cmake -S ./llvm-project/llvm -B ./llvm-build -G Ninja -DCMAKE_BUILD_TYPE=Debug -DLLVM_ENABLE_ASSERTIONS=On -DCMAKE_INSTALL_PREFIX=./llvm-install -DLLVM_CCACHE_BUILD=On -DBUILD_SHARED_LIBS=On

$ cmake --build ./llvm-build -- check-llvm

$ ./llvm-build/bin/llc --version

LLVM (http://llvm.org/):
  LLVM version 15.0.0git
  DEBUG build with assertions.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: skylake

  Registered Targets:
    aarch64    - AArch64 (little endian)
    aarch64_32 - AArch64 (little endian ILP32)
    aarch64_be - AArch64 (big endian)
    amdgcn     - AMD GCN GPUs
    arm        - ARM
    arm64      - ARM64 (little endian)
    arm64_32   - ARM64 (little endian ILP32)
    armeb      - ARM (big endian)
    avr        - Atmel AVR Microcontroller
    bpf        - BPF (host endian)
    bpfeb      - BPF (big endian)
    bpfel      - BPF (little endian)
    hexagon    - Hexagon
    lanai      - Lanai
    mips       - MIPS (32-bit big endian)
    mips64     - MIPS (64-bit big endian)
    mips64el   - MIPS (64-bit little endian)
    mipsel     - MIPS (32-bit little endian)
    msp430     - MSP430 [experimental]
    nvptx      - NVIDIA PTX 32-bit
    nvptx64    - NVIDIA PTX 64-bit
    ppc32      - PowerPC 32
    ppc32le    - PowerPC 32 LE
    ppc64      - PowerPC 64
    ppc64le    - PowerPC 64 LE
    r600       - AMD GPUs HD2XXX-HD6XXX
    riscv32    - 32-bit RISC-V
    riscv64    - 64-bit RISC-V
    sparc      - Sparc
    sparcel    - Sparc LE
    sparcv9    - Sparc V9
    systemz    - SystemZ
    thumb      - Thumb
    thumbeb    - Thumb (big endian)
    ve         - VE
    wasm32     - WebAssembly 32-bit
    wasm64     - WebAssembly 64-bit
    x86        - 32-bit X86: Pentium-Pro and above
    x86-64     - 64-bit X86: EM64T and AMD64
    xcore      - XCore

Lit's internals:

available_features: {'debug_frame', 'webassembly-registered-target', 'aarch64-registered-target', 'amdgpu-registered-target', 'llvm-64-bits', 'plugins', 'nvptx-registered-target', 'system-linux', 'shell', 'powerpc-registered-target', 'lanai-registered-target', 'object-emission', 'riscv-registered-target', 'arm-registered-target', 'can-execute', 'hexagon-registered-target', 'mips-registered-target', 'msp430-registered-target', 'bpf-registered-target', 'cxx-shared-library', 'zlib', 'default_triple', 'x86-registered-target', 'xcore-registered-target', 'target-x86_64', 'thread_support', 'asserts', 'debug', 've-registered-target', 'x86_64-linux', 'avr-registered-target', 'native', 'systemz-registered-target', 'host-byteorder-little-endian', 'sparc-registered-target'}

target_triple: x86_64-unknown-linux-gnu

tra accepted this revision.Apr 12 2022, 12:21 PM

This is odd. nvptx-registered-target is present among the features and (I think) should have been matched by UNSUPPORTED: nvptx.
Documentation says that UNSUPPORTED matches Substrings of the target triple and I assumed that would be the case for the feature matches, too, but, apparently that's not the case.

So, it appears that your observation that UNSUPPORTED: nvptx does what we want is correct. It ends up matching the target triple only.

LGTM.

llvm/test/CodeGen/Generic/2010-11-04-BigByval.ll
4

Please also mention that the test is intentionally disabled only for the NVPTX target (i.e. not for nvptx-registered-target feature) due to excessive runtime.

This revision is now accepted and ready to land.Apr 12 2022, 12:21 PM

One more check.
The following regex matches the feature and disabled the test:
UNSUPPORTED: nvptx-registered-target
or
UNSUPPORTED: nvptx{{\S*}}

i-chebykin removed rG LLVM Github Monorepo as the repository for this revision.

Just add comments about applied target

i-chebykin marked an inline comment as done.Apr 14 2022, 8:58 AM

I do not have commit access yet. Could someone help me to land the patch?

tra added a comment.Apr 14 2022, 10:51 AM

I do not have commit access yet. Could someone help me to land the patch?

Sure. I'll do it.

This revision was landed with ongoing or failed builds.Apr 14 2022, 2:25 PM
This revision was automatically updated to reflect the committed changes.