ninja check-cuda-simple will build cuda-tests-simple and will run them one variant at a time.
Individual variant of the tests can be checked with ninja check-cuda-simple-<variant>.
Detection of lit command in the top-level CMakeLists.txt had to be moved upwards in order
to make TEST_SUITE_LIT command available in subdirectories.
GPU tests are not well suited for running in parallel as they will be bottlenecked by GPU (few of them, high start-up overhead).
cmake -DCUDA_JOBS=N will set the limit on number of simultaneous CUDA tests to N.