This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
zorg/trunk/buildbot/osuosl/master/config/
-
trunk/
-
buildbot/
-
osuosl/
-
master/
-
config/
-
builders.py

Differential D49616

[buildbot, CUDA] Adjust GPU list for the tests on cuda-build-test-01
ClosedPublic

Authored by tra on Jul 20 2018, 2:23 PM.

Download Raw Diff

Details

Reviewers

gkistanova
jlebar

Commits

rZORG00c08b1a2490: Revert r337773, r337779.
rZORG3be71ac2d6e8: [buildbot, CUDA] Adjust GPU list for the tests on cuda-build-test-01
rL337623: [buildbot, CUDA] Adjust GPU list for the tests on cuda-build-test-01

Summary

Recent drivers changed the way GPUs are enumerated, so we've been running
the tests on the wrong GPU which took ~45minutes.

Diff Detail

Repository: rL LLVM

Event Timeline

tra created this revision.Jul 20 2018, 2:23 PM

Herald added subscribers: bixia, sanjoy. · View Herald TranscriptJul 20 2018, 2:23 PM

Harbormaster completed remote builds in B20565: Diff 156607.Jul 20 2018, 2:23 PM

You don't want to pass in the GPU-deadbeef form from nvidia-smi -L so we don't have this problem?

I'm not confident that the GUIDS remain stable, either. I have no data to tell one way or another, though.
CUDA currently allows specifying enumeration order by PCI_ID or by 'fastest'.
If it becomes a problem I'll just force enumeration by PCI_ID, which should be somewhat more stable (though it may change on BIOS update or if I add/remove other PCIe devices).

Another option would be to not include PTX in the binaries, so we'll know right away if we attempt to run the tests on the wrong GPU.

I'm not confident that the GUIDS remain stable, either. I have no data to tell one way or another, though.

That's a good point, but presumably if the GUIDs change, they're not going to *permute* and point to a different GPU. That is, the failure mode is noisy? Dunno if the same is true for pcid, but it's certainly not true for the integer identifiers.

Use GUIDs to identify GPUs.

Harbormaster completed remote builds in B20567: Diff 156612.Jul 20 2018, 2:47 PM

OK. Updated the patch to use GUIDs.

jlebar accepted this revision.Jul 20 2018, 2:47 PM

This revision is now accepted and ready to land.Jul 20 2018, 2:47 PM

Closed by commit rL337623: [buildbot, CUDA] Adjust GPU list for the tests on cuda-build-test-01 (authored by tra). · Explain WhyJul 20 2018, 2:49 PM

This revision was automatically updated to reflect the committed changes.

LGTM

Revision Contents

Path

Size

zorg/

trunk/

buildbot/

osuosl/

master/

config/

builders.py

4 lines

Diff 156614

zorg/trunk/buildbot/osuosl/master/config/builders.py

Show First 20 Lines • Show All 1,548 Lines • ▼ Show 20 Lines	return [
stage1_config='Release',		stage1_config='Release',
extra_cmake_args=[		extra_cmake_args=[
'-DLLVM_ENABLE_ASSERTIONS=ON',		'-DLLVM_ENABLE_ASSERTIONS=ON',
"-DCMAKE_C_COMPILER:FILEPATH=/usr/bin/clang",		"-DCMAKE_C_COMPILER:FILEPATH=/usr/bin/clang",
"-DCMAKE_CXX_COMPILER:FILEPATH=/usr/bin/clang++"		"-DCMAKE_CXX_COMPILER:FILEPATH=/usr/bin/clang++"
],		],
externals="/home/botanist/bots/externals",		externals="/home/botanist/bots/externals",
gpu_arch_list=["sm_35", "sm_61"],		gpu_arch_list=["sm_35", "sm_61"],
gpu_devices=[2, 0], # K40c(sm_35), GTX1070(sm_61)		gpu_devices=["GPU-af66efa4", # K40c(sm_35),
		"GPU-44fe2444" # GTX1070(sm_61)
		],
extra_ts_cmake_args=[],		extra_ts_cmake_args=[],
enable_thrust_tests=False,		enable_thrust_tests=False,
),		),
'category' : 'clang'},		'category' : 'clang'},

# lldb builders		# lldb builders
{'name': "lldb-x86_64-darwin-13.4",		{'name': "lldb-x86_64-darwin-13.4",
'slavenames': ["lldb-x86_64-darwin-13.4"],		'slavenames': ["lldb-x86_64-darwin-13.4"],
▲ Show 20 Lines • Show All 134 Lines • Show Last 20 Lines