This is an archive of the discontinued LLVM Phabricator instance.

[Libomptarget] Improve next-gen AMDGPU plugin error messages
ClosedPublic

Authored by jhuber6 on Feb 2 2023, 8:31 AM.

Details

Summary

The next-gen plugin properly prints errors. This patch improves the
error messages by including the Node-ID of the GPU that failed as well
as a textual representation of the enumeration values.

Diff Detail

Event Timeline

jhuber6 created this revision.Feb 2 2023, 8:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 2 2023, 8:31 AM
jhuber6 requested review of this revision.Feb 2 2023, 8:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 2 2023, 8:31 AM
kevinsala accepted this revision.Feb 2 2023, 9:39 AM

LGTM

openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
2460

We could print the fatal message even if hsa_agent_get_info fails, right? We can print the GPU id or unknown/unspecified if the call failed.

This revision is now accepted and ready to land.Feb 2 2023, 9:39 AM
jhuber6 added inline comments.Feb 2 2023, 9:40 AM
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
2460

Yeah, we could probably just initialize it to -1 and continue like nothing happened.

kevinsala added inline comments.Feb 2 2023, 9:46 AM
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
2460

That's fine. We have to reinterpret the value as signed, or it will print the uint32_t's maximum value.

arsenm added a subscriber: arsenm.Feb 2 2023, 9:56 AM
arsenm added inline comments.
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
2435

Is it really possible to have multiple of these hit at a time?

jhuber6 added inline comments.Feb 2 2023, 9:57 AM
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
2435

Yes, but I think the only one I'm aware of is a typical segfault will present HSA_AMD_MEMORY_FAULT_PAGE_NOT_PRESENT and HSA_AMD_MEMORY_FAULT_READ_ONLY at the same time.