Range that f16 can represent fits into i32.
Lower as f16->i32->i64 instead of f16->f32->i64
since f32->i64 has long expansion.
Details
- Reviewers
foad arsenm - Commits
- rG44967fc60451: AMDGPU: Simplify f16 to i64 custom lowering
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/AMDGPU/fptosi.f16.ll | ||
---|---|---|
108 | Why are there test changes only for the vector cases? |
Context was missing.
llvm/test/CodeGen/AMDGPU/fptosi.f16.ll | ||
---|---|---|
108 | Tests only check for a few instructions and here there was no v_cvt_f32_f16_e32. |
This update also simplifies f16 -> i64 on subtargets without has16BitInsts().
Such targets immediately promote f16 to f32 whenever they encounter f16 def using fp16_to_fp node. Recognize fp16_to_fp input to f32 to i64 and do the same thing as for f16 to i64 conversion. This gives similar(small difference for f16 vectors) results for subtargets with or without has16BitInsts() since f16->i32 gets selected like f16->f32->i32.
Update tests with more detailed checks.
I don't think this needs to be limited to Subtarget->has16BitInsts()