This is an archive of the discontinued LLVM Phabricator instance.

Expandload and Compressing store - documentation update
ClosedPublic

Authored by delena on Nov 16 2016, 5:17 AM.

Details

Summary

Added description of new intrinsics masked.expandload and masked.compressstore to the LLVM LangRef.

Implementation is partially committed. I'm working on the rest.
The related discussion is here:
http://lists.llvm.org/pipermail/llvm-dev/2016-September/104985.html

Diff Detail

Repository
rL LLVM

Event Timeline

delena updated this revision to Diff 78170.Nov 16 2016, 5:17 AM
delena retitled this revision from to Expandload and Compressing store - documentation update.
delena updated this object.
delena added reviewers: Ayal, mkuper.
delena set the repository for this revision to rL LLVM.
delena added a subscriber: llvm-commits.
Ayal added inline comments.Nov 16 2016, 11:26 PM
docs/LangRef.rst
11857 ↗(On Diff #78170)

Data selected from a vector according to a mask is stored in consecutive memory addresses (compressed store), and vice-versa (expanding load). These operations effective map to "if (cond) a[i++] = v.i" and "if (cond) v.i = a[i++]" patterns, respectively. Note that when the mask starts with '1' bits followed by '0' bits, these operations are identical to llvm.masked.store and llvm.masked.load [link to them].

11866 ↗(On Diff #78170)

"The loaded data is a number of scalar values of any integer," >> "Several scalar values of integer,"

"loaded together" >> "are loaded from consecutive memory addresses"

"spread according to the mask into one vector" >> "stored into the elements of a vector according to the mask"

11876 ↗(On Diff #78170)

spread >> spreads

If the mask >> E.g., if the mask

the "expandload" >> "expandload"

from memory >> from memory addresses ptr, ptr+1, ptr+2

"position" >> "positions" (or "places")

11902 ↗(On Diff #78170)

N?

11905 ↗(On Diff #78170)

%Aptr should have pointer type

11908 ↗(On Diff #78170)

load operation >> load operations

11909 ↗(On Diff #78170)

regular >> regular unmasked

delena updated this revision to Diff 78327.Nov 17 2016, 1:08 AM
delena marked 5 inline comments as done.

Updated, thanks Ayal for the comments.

delena updated this revision to Diff 79352.Nov 27 2016, 12:07 AM

minor change + ping..

Ayal added inline comments.Nov 28 2016, 12:27 AM
docs/LangRef.rst
11866 ↗(On Diff #79352)

"The loaded data is several values of integer," >> "Several values of integer,"

"memory address" >> "memory addresses"

11882 ↗(On Diff #79352)

Explain about the type of the first operand.

11887 ↗(On Diff #79352)

If the terms "dense" or "sparse" are to be used they should be defined to avoid confusion - a sparse representation is often the one that is condensed.
Alternatively:
"designed for sequential reading of multiple scalar values from memory into a sparse vector in a single IR operation" >>
"designed for reading multiple scalar values from adjacent memory addresses into possibly non-adjacent vector lanes in a single IR operation"

11902 ↗(On Diff #79352)

for consistency, use "'1' bits" or "'true' elements" but not both.

11903 ↗(On Diff #79352)

%Bptr should have type <8 x double>*, right?

11918 ↗(On Diff #79352)

"The stored data is a number of scalar values of any integer, floating point or pointer data type picked up from an input vector and stored as a contiguous vector in memory" >>
"A number of scalar values of integer, floating point or pointer data type are collected from an input vector and stored into adjacent memory addresses"

"The mask defines active elements from the input vector that should be stored" >>
"A mask defines which elements to collect from the vector"

11922–11923 ↗(On Diff #79352)

ptr should have pointer-to-vector types.

11928 ↗(On Diff #79352)

"Writes all selected elements from lower to higher sequentially to memory '`ptr`' as one contiguous vector." >>
"All selected elements are written into adjacent memory addresses starting at address '`ptr`', from lower to higher."

"equal to number" >> "equal to the number"

11933 ↗(On Diff #79352)

"The first operand is the vector value, which elements to be picked up and written to memory." >>
"The first operand is the input vector, from which elements are collected and written to memory."

"vector value operand." >> "input vector operand."

"The types of the mask and the value operand" >> "The mask and the input vector"

11939 ↗(On Diff #79352)

"data compressing" >> "compressing data in memory"

"to pick up single elements" >> "to collect elements from possibly non-adjacent lanes of a vector"

"store operation" >> "store operations"

"vectorizing loop" >> "vectorizing loops"

"a cross-iteration dependency" >> "cross-iteration dependences"

11943 ↗(On Diff #79352)

"dense them" >> "store them consecutively"

11955 ↗(On Diff #79352)

"densely" >> "consecutively"

delena marked 8 inline comments as done.Dec 6 2016, 2:49 AM
delena added inline comments.
docs/LangRef.rst
11882 ↗(On Diff #79352)

"The underlying type of the pointer is a scalar type of vector element" - is this description clear?

11903 ↗(On Diff #79352)

No, a pointer to scalar.

11922–11923 ↗(On Diff #79352)

I defined them as pointers to scalar values.

delena updated this revision to Diff 80545.Dec 7 2016, 12:17 AM

I changed the text according to Ayal's comments.

Ayal edited edge metadata.Dec 7 2016, 12:49 AM

This version of the documentation LGTM, thanks for addressing.

Michael, ok to land, given the discussion in http://lists.llvm.org/pipermail/llvm-dev/2016-September/104985.html?

../docs/LangRef.rst
11857 ↗(On Diff #80545)

May be clearer to write
"if(cond) a[j++] = v.i" >>
"if (cond.i) a[j++] = v.i"
and
"if (cond) v.i = a[j++]" >>
"if (cond.i) v.i = a[j++]"
showing the condition as a mask vector similar to the data vector.

The way I understand it, the mailing list discussion ended with "let's discuss this at the BoF", and the decision post-BoF was to have a working group to decide on idiom representation, etc.

Having said that, I'm ok with this going on, since I don't see a sane way to represent this in IR that's selectable in DAG.

But for the signatures, I don't have enough non-X86 context.
Hal, Adam, does this seem sensible to you too?

../docs/LangRef.rst
11861 ↗(On Diff #80545)

Bikeshedding - will "llvm.masked.load.expand.*" and "llvm.masked.store.compress.*" make more or less sense than the current names?

11882 ↗(On Diff #80545)

"are the same vector types" -> "have the same vector type"?

11887 ↗(On Diff #80545)

"in a single IR operation" is redundant.

11901 ↗(On Diff #80545)

This example is slightly confusing to me, because it's not clear "what happens to Bptr" - we need to advance it to the next iteration by adding the popcount of %mask, right?

Do we have a good way to represent that right now? It seems like "llvm.ctpop" for a <k x i1> type does the wrong thing (it's basically a nop).

anemet edited edge metadata.Dec 9 2016, 8:34 AM

Hal, Adam, does this seem sensible to you too?

My preference would be to wait for the outcome of the result of the working group on vectorization idioms. This is not the only idiom that requires control-flow so there may be some common mechanism developed and as a result this would have to be completely redone/autoupgraded, etc.

But if you and Hal feel strongly about this I won't stand in the way.

Adam

Ayal added a comment.Dec 9 2016, 1:07 PM

This patch complements the LangRef documentation, which we seem to be converging on above, following http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20161031/402101.html. Changing the signatures and/or a complete redoing/autoupgrade may be needed, but best keep documentation and codegen in sync, right?

delena marked 3 inline comments as done.Dec 11 2016, 1:36 AM
../docs/LangRef.rst
11901 ↗(On Diff #80545)

I extended the code to show popcnt operation. Mask should be converted to integer, otherwise llvm.ctpop does not help.

delena updated this revision to Diff 81015.Dec 11 2016, 1:49 AM
delena edited edge metadata.

Updated according to Michael's and Ayal's comments.
Codegen Implementation was partially committed based on RFC discussion. This doc update synchronizes LangRef and CodeGen.
I can change "expandload" to "load.expand" is the both places, if it is principal.

May I commit this patch?

Ping. The intrinsics have been implemented with partial support for a year and a half now. I'd like to use them to replace the X86 specific expand load intrinsics. Can we commit this documentation?

hsaito added a comment.EditedJun 4 2018, 10:18 AM

Ping. The intrinsics have been implemented with partial support for a year and a half now. I'd like to use them to replace the X86 specific expand load intrinsics. Can we commit this documentation?

Are the LLVM intrinsics already in the trunk? Does the behavior match with the documentation here? I expect the answer is yes, but I need someone who knows enough about actual implementation to say yes, or verify that myself before marking this good to go.

Thanks,
Hideki

I wrote this documentation after implementation. I don't work on X86 about a year, but I doubt that somebody touched this code.

hsaito accepted this revision.Jun 5 2018, 8:18 AM

I wrote this documentation after implementation. I don't work on X86 about a year, but I doubt that somebody touched this code.

Then, the doc update is long overdue, and the current description reasonably reflects the review feedback. If anything else should be done, it can be done after commit.
Sanity checked intrinsic names/args matching here and in the Intrinsics.td. LGTM.

This revision is now accepted and ready to land.Jun 5 2018, 8:18 AM
This revision was automatically updated to reflect the committed changes.