This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Vectorize misaligned global loads & stores
ClosedPublic

Authored by jrbyrnes on Mar 2 2023, 9:47 AM.

Details

Summary

Based on experimentation on gfx906,908,90a and 1030, wider global loads / stores are more performant than multiple narrower ones independent of alignment -- this is especially true when combining 8 bit loads / stores, in which case speedup was usually 2x across all alignments.

Change-Id: I1713c6edfc189052b8a71dc1135f9a436c1042e0

Diff Detail

Event Timeline

jrbyrnes created this revision.Mar 2 2023, 9:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2023, 9:47 AM
jrbyrnes requested review of this revision.Mar 2 2023, 9:47 AM
jrbyrnes updated this revision to Diff 501907.Mar 2 2023, 9:52 AM

Edit comment.

arsenm added inline comments.Mar 2 2023, 10:27 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1551

Should also cover "other" address space. We have isFlatGlobalAddrSpace but probably need one that excludes flat

LGTM, modulo Matt's comment.

jrbyrnes updated this revision to Diff 502013.Mar 2 2023, 3:25 PM
jrbyrnes marked an inline comment as done.

Also include "other" address space in updated model.

rampitec added inline comments.Mar 2 2023, 3:42 PM
llvm/lib/Target/AMDGPU/AMDGPU.h
431

I do not like the name here. It suggests it is global address space only.

jrbyrnes updated this revision to Diff 502026.Mar 2 2023, 5:08 PM
jrbyrnes marked an inline comment as done.

Naming

rampitec accepted this revision.Mar 2 2023, 5:31 PM

LGTM, please give Matt a chance to review.

This revision is now accepted and ready to land.Mar 2 2023, 5:31 PM
arsenm accepted this revision.Mar 2 2023, 6:22 PM

Thanks for review -- passed psdb.

This revision was landed with ongoing or failed builds.Mar 3 2023, 1:19 PM
This revision was automatically updated to reflect the committed changes.