This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][GFX908] Add builtin support for global add atomic f16/f32
Needs RevisionPublic

Authored by mariusz-sikora-at-amd on Apr 20 2023, 5:38 AM.

Details

Summary

Global add atomic instructions f16/f32 are supported on many targets (like gfx908,
gfx90a, ... ), but only on gfx908 these instructions are not returning
old value from memory. This difference resulted to omitting them while
adding builtin support.

This change is extending support of existing builtins to support gfx908.
By default builtins are returning v2f16 or float types, but if target is
gfx908 then clang-sema will override return type to void. This will lead
later to errors if user is expecting to receive any return values from
builtins.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2023, 5:39 AM
mariusz-sikora-at-amd requested review of this revision.Apr 20 2023, 5:39 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 20 2023, 5:39 AM

This is a proposal how we could extend global fadd atomic builtins support for gfx908

I thought we had separate _rtn* builtins for this?

clang/lib/Sema/SemaChecking.cpp
4484

Would need something better to test

4491

I'm assuming mutating the AST is very not OK

clang/lib/Sema/SemaChecking.cpp
4491

I based my changes on how __sync_* is handled in SemaBuiltinAtomicOverloaded. You suggest to just add additional builtins with suffix _not_ret ?

rampitec requested changes to this revision.Apr 20 2023, 11:52 AM

We used to support it that way and decided just not doing it. It is very hard to explain why a supported atomic results in error. Someone who really needs it can use intrinsic.

This revision now requires changes to proceed.Apr 20 2023, 11:52 AM

We used to support it that way and decided just not doing it. It is very hard to explain why a supported atomic results in error. Someone who really needs it can use intrinsic.

I tend to agree. This oddity is probably best handled with an intrinsic.