Page MenuHomePhabricator

[HIP] Align device binary
ClosedPublic

Authored by yaxunl on Fri, Oct 2, 5:54 AM.

Details

Summary

To facilitate faster loading of device binaries and share them among processes,
HIP runtime favors their alignment being 4096 bytes. HIP runtime can load
unaligned device binaries, however, aligning them at 4096 bytes results in
faster loading and less shared memory usage.

This patch adds an option -bundle-align to clang-offload-bundler which allows
bundles to be aligned at specified alignment. By default it is 1, which is NFC
compared to existing format.

This patch then aligns embedded fat binary and device binary inside fat binary
at 4096 bytes.

It has been verified this change does not cause significant overall file size increase
for typical HIP applications (less than 1%).

Diff Detail

Event Timeline

yaxunl requested review of this revision.Fri, Oct 2, 5:54 AM
yaxunl created this revision.
tra accepted this revision.Fri, Oct 2, 11:48 AM
tra added inline comments.
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
374

Does the bundler anways create the file from scratch or truncate it?
If it were to operate on existing file, seek would leave some data as is and that may result in nondeterministic output.
It may be better to explicitly zero the padding.

This revision is now accepted and ready to land.Fri, Oct 2, 11:48 AM
yaxunl marked an inline comment as done.Fri, Oct 2, 1:59 PM
yaxunl added inline comments.
clang/tools/clang-offload-bundler/ClangOffloadBundler.cpp
374

It will truncate the output file if it exists, therefore it should be fine to use seek.

This revision was automatically updated to reflect the committed changes.
yaxunl marked an inline comment as done.
Herald added a project: Restricted Project. · View Herald TranscriptFri, Oct 2, 3:12 PM