This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Simplify f16 to i64 custom lowering
ClosedPublic

Authored by Petar.Avramovic on Jul 20 2020, 5:48 AM.

Details

Summary

Range that f16 can represent fits into i32.
Lower as f16->i32->i64 instead of f16->f32->i64
since f32->i64 has long expansion.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2020, 5:48 AM
arsenm added inline comments.Jul 20 2020, 6:13 AM
llvm/test/CodeGen/AMDGPU/fptosi.f16.ll
108

Why are there test changes only for the vector cases?

Petar.Avramovic marked an inline comment as done.

Context was missing.

llvm/test/CodeGen/AMDGPU/fptosi.f16.ll
108

Tests only check for a few instructions and here there was no v_cvt_f32_f16_e32.
I think that they test if there was a f16 -> f32 followed by long f32 -> i64 expansion but don't check expansion.
Now f16->i32 is selected as f16->f32->i32 and f16->f32 part is still there.

arsenm added inline comments.Jul 20 2020, 6:31 AM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
2722–2723

I don't think this needs to be limited to Subtarget->has16BitInsts()

llvm/test/CodeGen/AMDGPU/fptosi.f16.ll
45

These checks should be strengthened to show the new expansion

This update also simplifies f16 -> i64 on subtargets without has16BitInsts().
Such targets immediately promote f16 to f32 whenever they encounter f16 def using fp16_to_fp node. Recognize fp16_to_fp input to f32 to i64 and do the same thing as for f16 to i64 conversion. This gives similar(small difference for f16 vectors) results for subtargets with or without has16BitInsts() since f16->i32 gets selected like f16->f32->i32.
Update tests with more detailed checks.

arsenm accepted this revision.Jul 21 2020, 8:04 AM
This revision is now accepted and ready to land.Jul 21 2020, 8:04 AM
This revision was automatically updated to reflect the committed changes.