This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][MC] Added support of SP3 syntax for MTBUF format modifier
ClosedPublic

Authored by dp on Jul 17 2020, 7:11 AM.

Details

Summary

Currently supported LLVM MTBUF syntax is not compatible with SP3:

op     dst, addr, rsrc, FORMAT, soffset

This change adds support for SP3 syntax:

op     dst, addr, rsrc, soffset SP3FORMAT

In addition to being compatible with SP3, this syntax allows using symbolic names for data, numeric and unified formats. Below is a list of added syntax variants.

format:<expression>
format:[<numeric-format-name>,<data-format-name>]
format:[<data-format-name>]
format:[<numeric-format-name>]
format:[<unified-format-name>]

The last syntax is supported for GFX10 only.

See llvm bug 37738

Diff Detail

Event Timeline

dp created this revision.Jul 17 2020, 7:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 17 2020, 7:11 AM
dp edited the summary of this revision. (Show Details)Jul 17 2020, 7:17 AM
dp updated this revision to Diff 279004.Jul 18 2020, 7:46 AM

Minor correction - removed 'SP3' from function names.

Overally looks good

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
794

Should we use any map structures here and other places? I was always curious is there something lightweight for such cases, like presorted list of IDs with binary search on it.

dp marked an inline comment as done.Jul 23 2020, 4:43 AM
dp added inline comments.
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
794

I do not know if there is anything useful in llvm libraries. We could use binary search but I believe we won't get any benefit for short tables (say 10-15 elements or so). And most parser tables are short indeed.

A speedup due to binary search use will be noticeable for long tables. But how much could we gain in real-life scenarios?

I did a little experiment with this change for gfx10. I assembled 10.000.000 of mtbuf instructions with BUF_FMT_8_UNORM (which is the second element of UfmtSymbolic) and another 10.000.000 with BUF_FMT_32_32_32_32_FLOAT (which is the 79th element). The assembly time was 77 and 85 seconds respectively. So in the worst case we have 10% slowdown and 5% on the average - this is the price for using linear search.

I believe mtbuf instructions won't take up more than 10% of real-life programs so the slowdown comparing with binary search is less than 0.5%. Is that acceptable?

arsenm added inline comments.Jul 23 2020, 1:09 PM
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
794

Could you use StringLiteral to cut down on the string compare time instead of the const char*s? They'll fail on the length compare

803

Spacing off

dp marked an inline comment as done.Jul 23 2020, 1:52 PM
dp added inline comments.
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
794

Thanks Matt, that’s a good idea!

vpykhtin accepted this revision.Jul 23 2020, 10:35 PM

Ok, let's keep it simple, no much benefit with maps. LGTM.

This revision is now accepted and ready to land.Jul 23 2020, 10:35 PM
This revision was automatically updated to reflect the committed changes.
dp marked 2 inline comments as done.
dp added inline comments.Jul 24 2020, 6:45 AM
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
794

Replacing 'C' strings with StringLiteral improved performance dramatically. Assembly time for tests I described above is now identical (77-78 seconds). Using other elements from the end of the table did not change the results. Thanks!