This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][GlobalISel] Try to use op_sel when selecting packed instructions
Needs ReviewPublic

Authored by vangthao on Jun 3 2021, 5:51 PM.

Details

Reviewers
arsenm
foad
Summary

Also look for per-component fnegs.

Diff Detail

Event Timeline

vangthao created this revision.Jun 3 2021, 5:51 PM
vangthao requested review of this revision.Jun 3 2021, 5:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2021, 5:51 PM
arsenm added inline comments.Jun 3 2021, 6:31 PM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3327

I would prefer to return the new register value instead of using the out argument

3348–3349

I don't think this is valid because it changes the bit position of the fneg

foad added inline comments.Jun 4 2021, 3:58 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3473

SmallVector is overkill, just use int Mask[2]? Or std::array if you want more type safety.

foad added inline comments.Jun 4 2021, 4:06 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3341

Return OutSrc, or NoRegister on failure?

3359–3360

This seems redundant.

3412

Could return a std::pair of Registers, or {NoRegister,NoRegister} on failure. But I'm not sure if that would be cleaner.

3416

Assert that we're looking at a G_BITCAST before doing this.

vangthao updated this revision to Diff 350699.Jun 8 2021, 12:59 PM

Addressed comments.

Functions return register value instead of using argument to return value.
Fix bug with illegally stripping FNeg instructions that are not of type 2x16 or size 16.
Added mir tests.

vangthao marked 6 inline comments as done.Jun 8 2021, 1:00 PM
foad added inline comments.Jun 9 2021, 1:40 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3326

Nit: "xor it with what??". How about "flip the value" instead?

3331

Everywhere that you use getVRegDef, do you also potentially need to look through copies as well?

3382

Just use Register() instead of MCRegister::NoRegister. Same in a few other places below.

3433

Is there a chance that the G_AND would already have been optimized away, e.g. if Lo was a constant or was otherwise known not to have any high bits set?