This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/R600: Serialize vector trunc stores to private AS
ClosedPublic

Authored by jvesely on Dec 19 2016, 5:46 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

jvesely updated this revision to Diff 82051.Dec 19 2016, 5:46 PM
jvesely retitled this revision from to AMDGPU/R600: Serialize vector trunc stores to private AS.
jvesely updated this object.
jvesely added a reviewer: tstellarAMD.
jvesely set the repository for this revision to rL LLVM.
jvesely added subscribers: hfinkel, arsenm.
arsenm added inline comments.Dec 20 2016, 9:35 AM
lib/Target/AMDGPU/R600ISelLowering.cpp
1182 ↗(On Diff #82051)

You shouldn't need a new DUMMY_CHAIN node. You should be able to create an appropriate MERGE_VALUES that will take its place

jvesely added inline comments.Dec 20 2016, 11:29 AM
lib/Target/AMDGPU/R600ISelLowering.cpp
1182 ↗(On Diff #82051)

Using MERGE_VALUES does not work (I tried using TOKEN_FACTOR as well, and it did not work either). I think that the 1<->1 chain dummy node that I create gets eliminated before lowerPrivateTruncStore can use it.
Dumping StoreNode on line 1213 confirms that after using MERGE_VALUES it still uses EntryToken as chain.

Why are all the test changes for local memory read/write, when the code changes behavior for private memory stores?

jvesely added a comment.EditedDec 21 2016, 8:27 PM

Why are all the test changes for local memory read/write, when the code changes behavior for private memory stores?

similar changes were necessary for constant/global tests but D24746, made the checks a bit more loose so they pass without changes.
The tests cast vectors with elements of sub-i32 types. The conversion results in stack slot stores/loads (Redundant MOVs mentioned in constant/global tests' TODOs ).
I don't remember exactly whether the stack loads/stores come directly from expandUnalignedStores, or ExpandOp_BITCAST (I looked into it for D24745).

hopefully, providing better machinePointerInfo will help eliminate all of those stack accesses.

For some reason, tests in store-private.ll did not see the issue when storing to generic private AS pointer, and already test for correct sequence of RMWs.

arsenm added inline comments.Jan 9 2017, 11:00 AM
lib/Target/AMDGPU/R600ISelLowering.cpp
1182 ↗(On Diff #82051)

Can you post the version of the patch that uses that? DUMMY_CHAIN is a big hack

jvesely added inline comments.Jan 9 2017, 11:16 AM
lib/Target/AMDGPU/R600ISelLowering.cpp
1182 ↗(On Diff #82051)

it does not work. both MERGE_VALUES and TOKEN_FACTOR using single chain input are removed before the individual vector stores are expanded.

jvesely added inline comments.Jan 10 2017, 5:26 PM
lib/Target/AMDGPU/R600ISelLowering.cpp
1182 ↗(On Diff #82051)

to be more specific. Those nodes won't even be created. SelectionDAG::getNode(SelectionDAG.cpp:3322), returns just the operand for MERGE_VALUES and TokenFactor if there is only one operand.
Moreover, given that inputs are legalized before the Node, I need a chain Node that is guaranteed to survive legalization process.

ping.
should I be looking into alternative approaches?
of the 3 I considered (D24745, this one, and using results of AA), this one seems to be the least ugly.

arsenm accepted this revision.Jan 20 2017, 12:02 PM

LGTM. I think there's a better solution but I'm not sure if it's worth investing in the effort

lib/Target/AMDGPU/R600Instructions.td
731 ↗(On Diff #82051)

Typo: tog et

This revision is now accepted and ready to land.Jan 20 2017, 12:02 PM
This revision was automatically updated to reflect the committed changes.