This is an archive of the discontinued LLVM Phabricator instance.

[Libomptarget] Correctly implement `getWTime` on AMDGPU
ClosedPublic

Authored by jhuber6 on Jul 4 2023, 10:33 AM.

Details

Summary

AMDGPU provides a fixed frequency clock since some generations back.
However, the frequency is variable by card and must be looked up at
runtime. This patch adds a new device environment line for the clock
frequency so that we can use it in the same way as NVPTX. This is the
correct implementation and the version in ASO should be replaced.

Diff Detail

Event Timeline

jhuber6 created this revision.Jul 4 2023, 10:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 4 2023, 10:33 AM
jhuber6 requested review of this revision.Jul 4 2023, 10:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 4 2023, 10:33 AM
tianshilei1992 added inline comments.Jul 4 2023, 10:37 AM
openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp
833

This doesn't need to be 1000000000UL?

jhuber6 added inline comments.Jul 4 2023, 10:38 AM
openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp
833

A 32-bit integer fits at least two billion, so we're just under here.

Do we have a test?

tianshilei1992 added inline comments.Jul 4 2023, 11:15 AM
openmp/libomptarget/test/offloading/wtime.c
25 ↗(On Diff #537130)

Probably a more reliable way is to check the diff between duration and 0.0. as long as it is greater than epsilon we are good.

jhuber6 updated this revision to Diff 537140.Jul 4 2023, 11:29 AM

Updating test

tianshilei1992 added inline comments.Jul 4 2023, 7:28 PM
openmp/libomptarget/test/offloading/wtime.c
25 ↗(On Diff #537130)

The check will fail if the time is 0.01 right? You can potentially just do compile and run since you put an assertion there.

jhuber6 updated this revision to Diff 537201.Jul 4 2023, 7:38 PM

Update test, I think the host and device miught use different deltas so it failed sometimes.

tianshilei1992 accepted this revision.Jul 4 2023, 7:40 PM

LGTM, suppose the AMD intrinsics are correct. :-D

This revision is now accepted and ready to land.Jul 4 2023, 7:40 PM

LGTM, suppose the AMD intrinsics are correct. :-D

It's taken from the device libs, and when I tested myself I got a number that was in-line with what I got on the host, and I tested it on two cards with different frequencies.

This revision was automatically updated to reflect the committed changes.