This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Support horizontal vectorization.
ClosedPublic

Authored by FarhanaAleen on Apr 27 2018, 2:53 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

FarhanaAleen created this revision.Apr 27 2018, 2:53 PM
rampitec added inline comments.Apr 27 2018, 3:02 PM
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
86 ↗(On Diff #144402)

Why is it disabled?

test/CodeGen/AMDGPU/reduction.ll
5 ↗(On Diff #144402)

Can you please complete this statement to include required op_sel if any?

6 ↗(On Diff #144402)

And the sdwa clauses too...

arsenm added inline comments.Apr 27 2018, 3:05 PM
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
83 ↗(On Diff #144402)

Why do we need to do this at all? The standard pass pipeline adds this?

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
483 ↗(On Diff #144402)

Also needs to consider LT.second?

lib/Target/AMDGPU/SIISelLowering.cpp
6614–6635 ↗(On Diff #144402)

This is a separate patch

6635 ↗(On Diff #144402)

Needs a default

Added op_sel clauses.
Addressed other comments.

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
83 ↗(On Diff #144402)

Right, the standard pass pipeline adds this. I thought it's a good practice to check the desired instructions directly for a lit-test. Without this switch, I will have to feed the vecotized testcase.

86 ↗(On Diff #144402)

SLP is a common optimizer, it does not need to be run as a target optimizer.

I added this switch so that SLP can be run with llc. This allows checking the machine instructions directly for a scalar testcase as opposed to a vectorized testcase.

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
483 ↗(On Diff #144402)

We don't want to consider the number of vector elements(LT.second) since we only care about the vectorization cost here.
And the type legalization cost(LT.first) is sufficient for that which give the cost for the total number of vector lanes.

lib/Target/AMDGPU/SIISelLowering.cpp
6635 ↗(On Diff #144402)

I am not sure whether I understand your comment here.

This is an auxiliary patch for enabling slp-reduction and it is easy to see the motivation of this patch when it's combined with SLP-reduction. Without this patch enabling SLP-reduction could cause performance regression. The current tests cover this piece of code.

test/CodeGen/AMDGPU/reduction.ll
6 ↗(On Diff #144402)

Added.

rampitec added inline comments.Apr 30 2018, 12:27 PM
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
86 ↗(On Diff #144402)

I am not sure we need to add options just for lit testing.

test/CodeGen/AMDGPU/reduction.ll
5 ↗(On Diff #144590)

If there are no op_sel modifiers please add {{$}} to the end of line to show there are none. Here and below.

Added {{$}}.
Removed the amdgpu-slp_vectorizer switch.

Added default label.

rampitec accepted this revision.May 1 2018, 1:40 PM

LGTM

This revision is now accepted and ready to land.May 1 2018, 1:40 PM
This revision was automatically updated to reflect the committed changes.