Page MenuHomePhabricator

[GlobalISel][AMDGPU] Legalize saturating add/subtract
ClosedPublic

Authored by arsenm on Jan 20 2020, 8:36 AM.

Details

Summary

Add support in LegalizerHelper for lowering G_SADDSAT etc. either using
add/subtract-with-overflow or using max/min instructions.

Enable this lowering for AMDGPU so it can be tested. The legalization
rules are still approximate and skips out on using the clamp bit to
treat these as legal, which has never been used before. This also
doesn't yet try to deal with expanding SALU cases.

Diff Detail

Event Timeline

foad created this revision.Jan 20 2020, 8:36 AM

Unit tests: pass. 62011 tests passed, 0 failed and 783 were skipped.

clang-tidy: unknown.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

foad updated this revision to Diff 239151.Jan 20 2020, 8:58 AM

Add legalize tests for signed 16-bit operations.

Unit tests: pass. 62011 tests passed, 0 failed and 783 were skipped.

clang-tidy: unknown.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

arsenm added inline comments.Jan 20 2020, 1:04 PM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
4193–4196

I think this won't produce what we want with VOP3P instructions. For v3+s16, we would want G_UMIN to fewerElementsVector to v2s16, which won't be any of these. We would get vector min after lowering

5301

You can do .getReg(0) instead of ->getOperand(0).getReg()

5314–5315

Just .getReg(0)/1 works

5316–5317

Just passing -1/0 to buildConstant should work

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
519

Bot is wrong here

llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sat.ll
3 ↗(On Diff #239151)

Should probably just use update_mir_test_checks on this

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sat.mir
2 ↗(On Diff #239151)

This should get an explicit -mcpu. Maybe a runline for with/without i16 instructions

254 ↗(On Diff #239151)

Should add at least a v2s16, v3s16, v4s16, v2i32, and s64 test

arsenm added inline comments.Jan 20 2020, 1:06 PM
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sat.mir
2 ↗(On Diff #239151)

Really need at least a VI and gfx9+

arsenm added inline comments.Jan 20 2020, 1:30 PM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
5297

I think it would also be beneficial to implement this in terms of two helper functions for the two different expansions. That way it's more flexible if a target wants to contextually use different strategies in a custom lowering

foad updated this revision to Diff 239851.Jan 23 2020, 4:23 AM
foad marked 5 inline comments as done.

Rebase and address some review comments.

foad marked 4 inline comments as done.Jan 23 2020, 4:35 AM
foad added inline comments.
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
4193–4196

Right, this requires more thought. I was hoping to reuse the exisiting isSupported logic without thinking too hard about it.

5297

Are there precedents for that kind of helper function that the target calls directly, instead of via the standard lower() action?

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sat.mir
2 ↗(On Diff #239151)

What sort of run line do I need to disable i16 instructions?

254 ↗(On Diff #239151)

That would require G_SADDO (et al) to be legalizable for all those types, which it currently is not. Do I need to tackle that first, before proceeding with this patch?

Unit tests: fail. 62131 tests passed, 1 failed and 808 were skipped.

failed: libc++.std/thread/thread_mutex/thread_mutex_requirements/thread_mutex_requirements_mutex/thread_mutex_class/try_lock.pass.cpp

clang-tidy: fail. clang-tidy found 0 errors and 4 warnings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

arsenm added inline comments.Jan 23 2020, 4:55 AM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
5297

We use Helper.lowerFMinNumMaxNum directly in AMDGPULegalizerInfo.

I'm specifically thinking we want to switch which expansion is used on AMDGPU based on whether it's scalar or vector, so we would make these legal and then use the helpers in applyMappingImpl

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-sat.mir
2 ↗(On Diff #239151)

Use a target without one. I usually use a Tahiti, Fiji, and gfx900 run line

254 ↗(On Diff #239151)

You can add the tests and use -global-isel-abort=0, and they just won't legalize and defer it to a later patch. Handling the vector cases at least should be as easy as adding the instructions to the fewerElementsVector switch

When you commit can you split this into one patch to add the opcode/irtranslator support and another for the legalizer changes.

Can't we also set op_sel on v_add_i32 to get saturating behavior?

foad updated this revision to Diff 251997.Mar 23 2020, 5:03 AM

Split opcode/irtranslator stuff out into D76600.

Use separate helper functions for expanding in terms of max/min or in terms of addo/subo.

foad retitled this revision from [GlobalISel][AMDGPU] Saturating add/subtract to [GlobalISel][AMDGPU] Legalize saturating add/subtract.Mar 23 2020, 5:04 AM
foad edited the summary of this revision. (Show Details)
foad edited the summary of this revision. (Show Details)

Can you also add some end to end IR tests make sure to stress SALU vs. VALU for a mix of scalar and vector types

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
2874–2875

By scalar vs. vector, I meant SGPR vs. VGPR. I haven't looked at whether this registers a real improvement here

arsenm added a comment.Jul 6 2020, 8:13 AM

ping. Are you going to get back to this soon, or should I adopt this? This is on the shortlist of remaining operations falling back in the OpenCL conformance tests

foad added a comment.Jul 7 2020, 12:38 AM

ping. Are you going to get back to this soon, or should I adopt this? This is on the shortlist of remaining operations falling back in the OpenCL conformance tests

In the short term I don't have time to work on this, so I would be happy for you to commandeer it.

arsenm commandeered this revision.Jul 13 2020, 5:21 AM
arsenm edited reviewers, added: foad; removed: arsenm.
arsenm updated this revision to Diff 277542.Jul 13 2020, 12:55 PM
arsenm edited the summary of this revision. (Show Details)

Rebase on other patches. Don't use custom lowering, and define legality rules as a placeholder until the clamp modifier is used

arsenm updated this revision to Diff 277553.Jul 13 2020, 1:30 PM

Map to VALU for now

arsenm updated this revision to Diff 278040.Jul 14 2020, 6:36 PM

Add gfx10 checks and expect different legality in the next patch

foad accepted this revision.Jul 23 2020, 1:35 AM

LGTM, thanks.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
443

This comment seems to be in an odd place (not next to the saturating operations) and/or redundant with the assert just above?

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-saddsat.mir
491–492

Just curious: can't G_CONSTANT represent vector constants directly?

This revision is now accepted and ready to land.Jul 23 2020, 1:35 AM
arsenm closed this revision.Jul 23 2020, 6:06 AM
arsenm marked 2 inline comments as done.
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-saddsat.mir
491–492

No