Page MenuHomePhabricator

AMDGPU: Add llvm.amdgcn.{read,readfirst,write}lane2 intrinsics with type overloads
Needs ReviewPublic

Authored by nhaehnle on Aug 18 2020, 10:23 AM.

Details

Reviewers
arsenm
foad
Summary

These intrinsics should work at least with standard integer and floating
point sizes, pointers, and vectors of those.

This fixes selection for non-s32 types when readfirstlane is inserted
for SGPR return values.

Moving the atomic optimizer pass in the pass pipeline so that it can be
simplified and rely on the more general support of lane intrinsics.

API users should move to these new intrinsics so that we can remove the
old versions.

Change-Id: I1c5e7e7858890e1c30d3b46c8551e74ab7027552
Based-on: https://reviews.llvm.org/D84639

Diff Detail

Unit TestsFailed

TimeTest
20 mslinux > LLVM-Unit.IR/_/IRTests::IRBuilderTest.Intrinsics
Note: Google Test filter = IRBuilderTest.Intrinsics [==========] Running 1 test from 1 test case. [----------] Global test environment set-up.
50 mslinux > LLVM.Analysis/DivergenceAnalysis/AMDGPU::intrinsics.ll
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/bin/opt -mtriple=amdgcn-- -analyze -divergence -use-gpu-divergence-analysis /mnt/disks/ssd0/agent/llvm-project/llvm/test/Analysis/DivergenceAnalysis/AMDGPU/intrinsics.ll | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/llvm/test/Analysis/DivergenceAnalysis/AMDGPU/intrinsics.ll
30 mswindows > LLVM-Unit.IR/_/IRTests_exe::IRBuilderTest.Intrinsics
Note: Google Test filter = IRBuilderTest.Intrinsics [==========] Running 1 test from 1 test case.
130 mswindows > LLVM.Analysis/DivergenceAnalysis/AMDGPU::intrinsics.ll
Script: -- : 'RUN: at line 1'; c:\ws\w1\llvm-project\premerge-checks\build\bin\opt.exe -mtriple=amdgcn-- -analyze -divergence -use-gpu-divergence-analysis C:\ws\w1\llvm-project\premerge-checks\llvm\test\Analysis\DivergenceAnalysis\AMDGPU\intrinsics.ll | c:\ws\w1\llvm-project\premerge-checks\build\bin\filecheck.exe C:\ws\w1\llvm-project\premerge-checks\llvm\test\Analysis\DivergenceAnalysis\AMDGPU\intrinsics.ll

Event Timeline

nhaehnle created this revision.Aug 18 2020, 10:23 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 18 2020, 10:23 AM
nhaehnle requested review of this revision.Aug 18 2020, 10:23 AM

Do we really have to use worse names here? Keeping the name works even if suboptimal for the attributes

Note that part of my motivation here over D84639 is to support more general types on the lane intrinsics, since they also express some semantic content which would be interesting to be able to express e.g. on descriptors. I wasn't able to bend the SelectionDAG type legalization to my will, so that's why I instead "legalize" the intrinsics in the AMDGPUCodeGenPrepare pass.

Note that part of my motivation here over D84639 is to support more general types on the lane intrinsics, since they also express some semantic content which would be interesting to be able to express e.g. on descriptors. I wasn't able to bend the SelectionDAG type legalization to my will, so that's why I instead "legalize" the intrinsics in the AMDGPUCodeGenPrepare pass.

Don't you just need to handle this in ReplaceNodeResults the same way?

nhaehnle updated this revision to Diff 286898.Aug 20 2020, 1:44 PM

Don't duplicate the intrinsics. Rely on D86317 to reduce the pain of this
change caused to downstream users.

Note that part of my motivation here over D84639 is to support more general types on the lane intrinsics, since they also express some semantic content which would be interesting to be able to express e.g. on descriptors. I wasn't able to bend the SelectionDAG type legalization to my will, so that's why I instead "legalize" the intrinsics in the AMDGPUCodeGenPrepare pass.

Don't you just need to handle this in ReplaceNodeResults the same way?

ReplaceNodeResults expects the result type to be changed in semi-magical ways during vector type legalization, which is non-obvious since the method can be called from different places. I think it *could* be made to work with a lot of patience, but it's really a bad interface -- and besides, by doing it in IR we reduce code duplication between SelectionDAG and GlobalISel, which is an added benefit IMO.

Note that part of my motivation here over D84639 is to support more general types on the lane intrinsics, since they also express some semantic content which would be interesting to be able to express e.g. on descriptors. I wasn't able to bend the SelectionDAG type legalization to my will, so that's why I instead "legalize" the intrinsics in the AMDGPUCodeGenPrepare pass.

Don't you just need to handle this in ReplaceNodeResults the same way?

ReplaceNodeResults expects the result type to be changed in semi-magical ways during vector type legalization, which is non-obvious since the method can be called from different places. I think it *could* be made to work with a lot of patience, but it's really a bad interface -- and besides, by doing it in IR we reduce code duplication between SelectionDAG and GlobalISel, which is an added benefit IMO.

Well the globalisel handling should be much simpler. We have a lot of stuff that's randomly handled in the IR to work around the DAG which long term should be moved where it belongs in codegen

ReplaceNodeResults expects the result type to be changed in semi-magical ways during vector type legalization, which is non-obvious since the method can be called from different places. I think it *could* be made to work with a lot of patience, but it's really a bad interface -- and besides, by doing it in IR we reduce code duplication between SelectionDAG and GlobalISel, which is an added benefit IMO.

Well the globalisel handling should be much simpler. We have a lot of stuff that's randomly handled in the IR to work around the DAG which long term should be moved where it belongs in codegen

Is the GlobalISel handling simpler than the handling in IR? What would happen if we wanted to extend the handling to struct types?