This helps generating faster code for fp16 code which is implemented in CUDA via
inline asm operating on opaque i16/i32 types.
Details
Details
- Reviewers
- None
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Paths
| Differential D143216
[NVPTX] Improve lowering of i32->{i16,i16}/i64->{i32,i32} Authored by tra on Feb 2 2023, 2:23 PM.
Details
Summary This helps generating faster code for fp16 code which is implemented in CUDA via
Diff Detail
Event Timeline
Revision Contents
Diff 494479 llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.h
llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
llvm/test/CodeGen/NVPTX/f16x2-instructions.ll
llvm/test/CodeGen/NVPTX/idioms.ll
|