This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Improve lowering of i32->{i16,i16}/i64->{i32,i32}
AbandonedPublicDraft

Authored by tra on Feb 2 2023, 2:23 PM.

Details

Reviewers
None
Summary

This helps generating faster code for fp16 code which is implemented in CUDA via
inline asm operating on opaque i16/i32 types.

Diff Detail

Event Timeline

tra created this revision.Feb 2 2023, 2:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 2 2023, 2:23 PM
tra updated this revision to Diff 494437.Feb 2 2023, 2:46 PM

Tinkering with tests.

tra updated this revision to Diff 494466.Feb 2 2023, 4:24 PM

Cleaned up. More tests.

tra updated this revision to Diff 494467.Feb 2 2023, 4:25 PM

cleanup

tra updated this revision to Diff 494474.Feb 2 2023, 4:53 PM

Removed unneeded code.

tra updated this revision to Diff 494479.Feb 2 2023, 5:23 PM

Only accept shifts w/ const values.
Updated f16x2 test

Herald added a project: Restricted Project. · View Herald TranscriptFeb 9 2023, 4:57 PM