Page MenuHomePhabricator

[buildbot, CUDA] Adjust GPU list for the tests on cuda-build-test-01
ClosedPublic

Authored by tra on Jul 20 2018, 2:23 PM.

Details

Summary

Recent drivers changed the way GPUs are enumerated, so we've been running
the tests on the wrong GPU which took ~45minutes.

Diff Detail

Repository
rL LLVM

Event Timeline

tra created this revision.Jul 20 2018, 2:23 PM

You don't want to pass in the GPU-deadbeef form from nvidia-smi -L so we don't have this problem?

tra added a comment.Jul 20 2018, 2:39 PM

I'm not confident that the GUIDS remain stable, either. I have no data to tell one way or another, though.
CUDA currently allows specifying enumeration order by PCI_ID or by 'fastest'.
If it becomes a problem I'll just force enumeration by PCI_ID, which should be somewhat more stable (though it may change on BIOS update or if I add/remove other PCIe devices).

Another option would be to not include PTX in the binaries, so we'll know right away if we attempt to run the tests on the wrong GPU.

I'm not confident that the GUIDS remain stable, either. I have no data to tell one way or another, though.

That's a good point, but presumably if the GUIDs change, they're not going to *permute* and point to a different GPU. That is, the failure mode is noisy? Dunno if the same is true for pcid, but it's certainly not true for the integer identifiers.

tra updated this revision to Diff 156612.Jul 20 2018, 2:46 PM

Use GUIDs to identify GPUs.

tra added a comment.Jul 20 2018, 2:47 PM

OK. Updated the patch to use GUIDs.

jlebar accepted this revision.Jul 20 2018, 2:47 PM
This revision is now accepted and ready to land.Jul 20 2018, 2:47 PM
This revision was automatically updated to reflect the committed changes.