This is an archive of the discontinued LLVM Phabricator instance.

Enable roundeven.
ClosedPublic

Authored by Leonc on Nov 14 2022, 7:45 AM.

Details

Summary

Add support for roundeven and implement appropriate tests.

Diff Detail

Event Timeline

Leonc created this revision.Nov 14 2022, 7:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 14 2022, 7:45 AM
Leonc requested review of this revision.Nov 14 2022, 7:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 14 2022, 7:45 AM

Can you also take care of globalisel here?

arsenm added inline comments.Nov 14 2022, 7:52 AM
llvm/test/CodeGen/AMDGPU/roundeven.ll
17

Does v_trunc_f32 ever raise FP exceptions?

Leonc added a comment.Nov 14 2022, 7:53 AM

Can you also take care of globalisel here?

From what I can tell globalisel already has support.

Leonc added inline comments.Nov 14 2022, 7:55 AM
llvm/test/CodeGen/AMDGPU/roundeven.ll
17

I assume we have tests for that in a different file, but if not I could make a new task to add them.

Can you also take care of globalisel here?

From what I can tell globalisel already has support.

Tests should be unified then

llvm/test/CodeGen/AMDGPU/roundeven.ll
17

That's not testable here, it's an isa behavior question

Leonc updated this revision to Diff 475367.Nov 15 2022, 1:29 AM
  • Address comments.
Leonc marked 2 inline comments as done.Nov 15 2022, 1:31 AM
Leonc added a comment.Nov 15 2022, 1:33 AM

@arsenm how's this? The tests are unified, but it seems a little misleading to have ISel and GlobalISel tests in the same directory.

foad added a comment.Nov 15 2022, 1:42 AM

@arsenm how's this? The tests are unified, but it seems a little misleading to have ISel and GlobalISel tests in the same directory.

Move it out of GlobalISel/. There are plenty of unified sdag/gisel tests in the parent directory already.

foad added inline comments.Nov 15 2022, 1:43 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/roundeven.ll
1

SDAG- is commonly used for this. ISEL_ is a bit vague.

Leonc updated this revision to Diff 475374.Nov 15 2022, 1:45 AM
  • Move test file.
Leonc updated this revision to Diff 475375.Nov 15 2022, 1:47 AM
  • Address comments.
Leonc marked an inline comment as done.Nov 15 2022, 1:51 AM
Leonc added a comment.Nov 17 2022, 4:45 AM

Couldn't reproduce pre-merge warning locally.
All tests have passed successfully.

Leonc updated this revision to Diff 476094.Nov 17 2022, 4:48 AM
  • Rebase.
Leonc added a comment.Nov 17 2022, 6:04 AM

The warning is from clang-format. It wants to make 86 changes over hundreds of lines unrelated to this patch.

foad added a comment.Nov 18 2022, 3:19 AM

The warning is from clang-format. It wants to make 86 changes over hundreds of lines unrelated to this patch.

No, it only wants correct formatting for the lines you actually changed in your patch. If you "install" git clang-format, by making sure llvm-project/clang/tools/clang-format/git-clang-format is on your $PATH, then you can run: git clang-format @^. This is what the Harbormaster build does.

Leonc updated this revision to Diff 476424.Nov 18 2022, 4:40 AM
  • Changes required by clang-format.
Leonc added a comment.Nov 18 2022, 4:41 AM

The warning is from clang-format. It wants to make 86 changes over hundreds of lines unrelated to this patch.

No, it only wants correct formatting for the lines you actually changed in your patch. If you "install" git clang-format, by making sure llvm-project/clang/tools/clang-format/git-clang-format is on your $PATH, then you can run: git clang-format @^. This is what the Harbormaster build does.

Thanks. I've applied the changes now.

arsenm requested changes to this revision.Nov 18 2022, 2:26 PM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
1195

I'm perpetually confused by the variety of rounding functions.

llvm.rint => round to nearest integer, in current rounding mode (which is assumed to be round nearest even), which is implied by being non-constrained
llvm.nearbyint => same as llvm.rint, except no FP exceptions. FP exceptions aren't supported with non-constrained intrinsics, so this distinction is pointless

llvm.round -> round, away from 0
llvm.roundeven -> round halfway nearest 0

so I think this isn't the same as round, but is supposed to be the same as llvm.rint (which we have 3 different names for)

This revision now requires changes to proceed.Nov 18 2022, 2:26 PM
Leonc added inline comments.Nov 19 2022, 1:16 AM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
1195

I thought llvm.roundeven is supposed to always round to nearest even regardless of the rounding mode. Does our implementation of llvm.rint ignore the rounding mode?

arsenm added inline comments.Dec 6 2022, 2:43 PM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
1195

Yes. This is all legacy cruft from when people were imaging possible solutions to supporting strictfp in the distant future without an actual design in mind. The non-strict, regular floating point intrinsics all assume RTE rounding with no fp exceptions.

This cruft is bothering me; as a follow up, can you prepare a patch to deprecate the old intrinsics and auto-upgrade them so we're left with one?

Leonc added inline comments.Dec 6 2022, 8:07 PM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
1195

Can do. I agree it's needlessly confusing.

craig.topper added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
1195

For targets that don't optimize them to inline sequences, wouldn't that cause calls library calls to mutate into a different library call? Functionally it would be correct in the default environment, but might be surprising.

Depending on which one you choose as canonical it could cause link errors. I don't think you could choose roundeven as canonical since its not in older libm.

arsenm added inline comments.Dec 6 2022, 9:00 PM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
1195

Part of the problem is thinking the intrinsics have anything to do with libm. You can lower the intrinsic to a different name for the libcall

craig.topper added inline comments.Dec 6 2022, 9:10 PM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
1195

I didn't say we couldn't. Merely saying that someone will find that surprising and probably file a bug.

Leonc updated this revision to Diff 482795.Dec 14 2022, 3:39 AM
  • Change rounding method to match GlobalISel.
Leonc added a comment.Dec 14 2022, 4:20 AM

FRINT only handles f64 so I'm updating the patch to handle the remaining scalar types.

arsenm accepted this revision.Dec 14 2022, 3:09 PM

LGTM

llvm/test/CodeGen/AMDGPU/roundeven.ll
9

Should use explicit -global-isel=0

This revision is now accepted and ready to land.Dec 14 2022, 3:09 PM
Leonc updated this revision to Diff 484137.Dec 19 2022, 5:07 PM
  • Add support for vector types.
arsenm added inline comments.Dec 19 2022, 5:13 PM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
309–312

Commented out vector types? The vectors should be and have defaulted to expand

Leonc updated this revision to Diff 484141.Dec 19 2022, 5:44 PM
  • Revert vector code in favour of auto-vectorisation.
Leonc marked an inline comment as done.Dec 19 2022, 5:49 PM
Leonc added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
309–312

Apologies these were debug comments. I've removed them now.

Leonc updated this revision to Diff 484143.Dec 19 2022, 5:50 PM
Leonc marked an inline comment as done.
  • Address comments.
Leonc marked an inline comment as done.Dec 19 2022, 5:50 PM
arsenm added inline comments.Dec 19 2022, 5:52 PM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
2194–2196

Subtarget is already available as a member here, don't need to go through the function

2198–2204

You don't need a generation check here, you can just lower to whatever opcode you choose to consolidate on and let the handling of that one take care of the subtarget specific legality considerations which should already work

Leonc added inline comments.Dec 19 2022, 5:59 PM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
2198–2204

LowerFRINT handles lowering when f64 isn't supported. Based on GlobalISel's implementation that's any generation before gfx7.

Leonc updated this revision to Diff 484146.Dec 19 2022, 6:01 PM
  • Address comments.
Leonc marked an inline comment as done.Dec 19 2022, 6:02 PM
Leonc marked 6 inline comments as done.Dec 20 2022, 3:28 AM
arsenm added inline comments.Dec 20 2022, 4:12 AM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
2198–2204

You do not need to directly call LowerFRINT. You can just unconditionally produce the frint, and let that be legalized. What you have here is repeating the legality condition in a second place

Leonc updated this revision to Diff 484244.Dec 20 2022, 6:00 AM
  • Clang formatting.
Leonc added inline comments.Dec 20 2022, 6:01 AM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
2198–2204

I tried that and it crashed.

arsenm added inline comments.Dec 20 2022, 6:04 AM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
2198–2204

Define "crashed". Something else is wrong, you can rely on re-legalization of new nodes

Leonc added inline comments.Dec 20 2022, 6:05 AM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
2198–2204

I didn't save the log unfortunately. I'll see if I can reproduce it.

Leonc updated this revision to Diff 484246.Dec 20 2022, 6:24 AM
  • Address comments.
Leonc marked 2 inline comments as done.Dec 20 2022, 6:26 AM
Leonc added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
2198–2204

You were right. I must have done something wrong the first time.

arsenm accepted this revision.Dec 20 2022, 7:27 AM
This revision was automatically updated to reflect the committed changes.
bcahoon added inline comments.Dec 20 2022, 9:02 AM
llvm/test/CodeGen/AMDGPU/roundeven.ll
479

It looks like arguments are converted from f32->f16->f32. Is that correct/efficient?

arsenm added inline comments.Dec 20 2022, 9:04 AM
llvm/test/CodeGen/AMDGPU/roundeven.ll
479

This is the broken ABI the DAG wants to give targets without legal f16. It’s a problem and ends up with different behavior for GlobalISel

Leonc added inline comments.Dec 20 2022, 9:18 AM
llvm/test/CodeGen/AMDGPU/roundeven.ll
479

Is there a workaround?

arsenm added inline comments.Dec 20 2022, 9:20 AM
llvm/test/CodeGen/AMDGPU/roundeven.ll
479

use an i16 argument and bitcast to half in the IR. Should also figure out how to fix the DAG from promoting to float