Page MenuHomePhabricator

AMDGPU/GlobalISel: cmp/select method for extract element
ClosedPublic

Authored by rampitec on May 28 2020, 11:33 AM.

Diff Detail

Event Timeline

rampitec created this revision.May 28 2020, 11:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 28 2020, 11:33 AM
rampitec updated this revision to Diff 266978.May 28 2020, 11:53 AM

Pre-commited test and rebased.

rampitec retitled this revision from AMDGPU/GlobalISel: cmp/select metod for extract element to AMDGPU/GlobalISel: cmp/select method for extract element.May 28 2020, 12:06 PM
arsenm added inline comments.May 28 2020, 1:23 PM
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
230–231 ↗(On Diff #266978)

For the 2 element case, we should do this in the pre-legalizer combiner (I guess the 16-bit case code would be different, since it would be bitcast and shift)

252 ↗(On Diff #266978)

Register Vec = ...?

258 ↗(On Diff #266978)

It would be better to produce a single G_UNMERGE_VALUES and index out of the result list than to produce a sequence of G_EXTRACTs.

Also FYI I'm planning on making 16-bit vector indexing bitcast promote to 32-bit vectors, so this would only see 32 and 64-bit element vectors

264–267 ↗(On Diff #266978)

You can fold these like
auto B = buildConstant(LLT::scalar(32), ...)
auto Cmp = B.buildICmp(LLT::scalar(1), CmpInst::...)

268 ↗(On Diff #266978)

Weird spacing, NumElem - 1?

rampitec marked 6 inline comments as done.May 28 2020, 2:57 PM
rampitec added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp
230–231 ↗(On Diff #266978)

Why is that? The problem is we need to know if index is divergent or not in general, so we avoid creating loops. I assume if we need some special case for 2 elements, that shall be a separate combine.

258 ↗(On Diff #266978)

Thanks. Apparently there is a bug in the CombinerHelper::matchEqualDefs() when it works with an opcode producing multiple values. I will have to fix it first.

rampitec updated this revision to Diff 267054.EditedMay 28 2020, 3:07 PM
rampitec marked an inline comment as done.

Addressed review comments.
It also contains fix for CombinerHelper::matchEqualDefs(), it likely needs a separate change.

rampitec updated this revision to Diff 267067.May 28 2020, 3:41 PM

Extracted CombinerHelper::matchEqualDefs() fix into D80767.

arsenm added inline comments.May 29 2020, 12:18 PM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
6967–6968 ↗(On Diff #267067)

This kind of thing belongs in TargetLowering still, it just needs to be made public

rampitec updated this revision to Diff 267354.May 29 2020, 1:21 PM
rampitec marked an inline comment as done.

Moved predicate function back to TLI, it is static anyway.

rampitec updated this revision to Diff 268016.Jun 2 2020, 4:22 PM

Moved to the RegBankSelect itself. It is too late to do it past RegBankSelect because waterfall loops are already created, and it is too early to do it before because we need divergence info.

rampitec updated this revision to Diff 268305.Jun 3 2020, 2:22 PM

Changed scalar condition type to S32.

rampitec updated this revision to Diff 268529.Jun 4 2020, 10:52 AM
arsenm accepted this revision.Jun 5 2020, 12:52 PM

LGTM. You could use the ApplyRegBankMapping observer to avoid all the explicit setRegBankCalls, but not everywhere is consistently using it yet anyway

This revision is now accepted and ready to land.Jun 5 2020, 12:52 PM
This revision was automatically updated to reflect the committed changes.