This is an archive of the discontinued LLVM Phabricator instance.

Introduce llvm.loop.parallel_accesses and llvm.access.group metadata.
ClosedPublic

Authored by Meinersbur on Sep 14 2018, 12:04 PM.

Details

Summary

The current llvm.mem.parallel_loop_access metadata has a problem in that it uses LoopIDs. LoopID unfortunately is not loop identifier. It is neither unique (there's even a regression test assigning the some LoopID to multiple loops; can otherwise happen if passes such as LoopVersioning make copies of entire loops) nor persistent (every time a property is removed/added from a LoopID's MDNode, it will also receive a new LoopID; this happens e.g. when calling Loop::setLoopAlreadyUnrolled()).
Since most loop transformation passes change the loop attributes (even if it just to mark that a loop should not be processed again as llvm.loop.isvectorized does, for the versioned and unversioned loop), the parallel access information is lost for any subsequent pass.

This patch unlinks LoopIDs and parallel accesses. llvm.mem.parallel_loop_access metadata on instruction is replaced by llvm.access.group metadata. llvm.access.group points to a distinct MDNode with no operands (avoiding the problem to ever need to add/remove operands), called "access group". Alternatively, it can point to a list of access groups. The LoopID then has an attribute llvm.loop.parallel_accesses with all the access groups that are parallel (no dependencies carries by this loop).

This intentionally avoid any kind of "ID". Loops that are clones/have their attributes modifies retain the llvm.loop.parallel_accesses attribute. Access instructions that a cloned point to the same access group. It is not necessary for each access to have it's own "ID" MDNode, but those memory access instructions with the same behavior can be grouped together.

The behavior of llvm.mem.parallel_loop_access is not changed by this patch, but should be considered deprecated.

Possible extensions/follow-up patches:

  • AutoUpgrade llvm.mem.parallel_loop_access to llvm.access.group such that we can remove its handling in the passes.

Diff Detail

Event Timeline

Meinersbur created this revision.Sep 14 2018, 12:04 PM

Thanks for making this MD more robust! It is essential for the vectorization performance of pocl. Sorry for my slow response time.

Related to OpenCL, there's usually a 3D work-item loop all of which levels are parallel. I didn't spot a test that shows multiple loop levels and thus multiple parallel access groups attached to an instruction. Moreover, the inliner MD transfer would indeed be much improved in case it then replicated the parallel access info from all loop hierarchy levels downwards. The comment which mentions that you focus on inner loop is true, but if there is (or will be) a loop interchange optimization pass that utilizes the parallel loop info, then any loop level might be potentially transferred to be the inner loop.

This is again very useful for OpenCL work-item loops as it allows optimizing memory access patterns via loop interchange, thus essentially performing outer loop vectorization when it's the best way to get performance.

This revision is now accepted and ready to land.Oct 3 2018, 12:47 AM

Thank you for your feedback. If you don'r mind, before I commit this, I will prepare a patch for clang generating this new kind of metadata (since atm nothing is generating it) and give others some time for feedback.

Dropping one of the llvm.access.group annotation instead of merging is a trade-off I had to make. It only affects inlining situation where the a loop containing a function call to a function call containing a loop, both loop being marked as parallel. I can make patch suggestion later.

I actually already created a review for the Clang part which is D52117, but uploaded the wrong diff. Corrected. I'll wait for both being accepted before committing.

I'm basically happy with this, but we shouldn't add things to the LangRef without an RFC. I don't recall seeing one.

lib/Transforms/Scalar/LoopVersioningLICM.cpp
636

I don't understand what this FIXME is saying. What needs to be fixed?

lib/Transforms/Utils/InlineFunction.cpp
809

Is the problem with "updating all uses of one of the access groups" that we don't have a way to efficiently enumerate them? Would we need to scan the functions for branches and collect all of the loop-id metadata that's relevant first? It would be nice not to lose this information.

Meinersbur marked 3 inline comments as done.Dec 3 2018, 9:33 PM

I am going to to prepare an RFC.

lib/Transforms/Scalar/LoopVersioningLICM.cpp
636

The line below adds llvm.mem.parallel_loop_access to a LoopID, but is expected as annotations of instructions that access memory (with a the loop it is parallel to as parameter).

There is no code that looks for llvm.mem.parallel_loop_access in LoopID metadata. Hence, adding the property has no effect.

lib/Transforms/Utils/InlineFunction.cpp
809

Either search (and update) all LoopIDs that reference the access group or create a new 'meta-access-group' as outlined in the FIXME.

Of course it would be nice to not lose information, but as for any analyses there is a trade-off between accuracy and computational complexity. E.g. it would be nice if alias-analysis would be control-flow-sensitive. At there moment the only pass making use of Loop::isAnnotatedParallel() are LoopVectorize, LoopVersioningLICM, LoopDistribute and LoopLoadElimination. All of which only process innermost loops. That is, there would be no effect of keeping more information. There's also the FIXME still in the code such that the issue is not forgotten.

Do you still want me to implement one of the solutions?

hfinkel added inline comments.Dec 4 2018, 8:03 AM
lib/Transforms/Scalar/LoopVersioningLICM.cpp
636

OIC, okay. Thanks.

lib/Transforms/Utils/InlineFunction.cpp
809

Of course it would be nice to not lose information, but as for any analyses there is a trade-off between accuracy and computational complexity.

Clearly I know this ;)

At there moment the only pass making use of Loop::isAnnotatedParallel() are LoopVectorize, LoopVersioningLICM, LoopDistribute and LoopLoadElimination. All of which only process innermost loops. That is, there would be no effect of keeping more information.

Two things: First, this might be true today, but very soon won't be true (vectorization will soon handle outer loops, and we are developing other loop-nest transformations) and if we do the right thing now, we won't later need to go back and fix this later. Second, while it's true that full loop unrolling generally happens before inlining, we could have a loop that, at the point of inlining is an inner loop, but is later fully unrolled such that before vectorization (etc.) the loop here might become, once again, the inner loop.

Do you still want me to implement one of the solutions?

Yes.

Meinersbur updated this revision to Diff 177322.Dec 7 2018, 2:36 PM
Meinersbur marked an inline comment as done.
  • Allow multiple access groups per instructions, i.e. an instruction can be in multiple access groups. This allows a simple 'union' operation that occurs when inlining into another function. A memory access is considered parallel when at least one access group is listed in llvm.loop.parallel_accesses. This is prioritized over the 'intersect' case for combining instructions which would be dual. We only do best-effort here.
Meinersbur edited the summary of this revision. (Show Details)Dec 7 2018, 2:42 PM
Meinersbur added a reviewer: jdoerfert.
  • Rebase to trunk
Meinersbur marked an inline comment as done.
  • Reinsert parts of LangRef.rst that went missing after merge.
lib/Transforms/Utils/LoopUtils.cpp
190–219

This has moved to LoopInfo.cpp. LoopUtils.cpp belongs to libTransform, but not all users have a dependency to libTransform (such as Loop::isAnnotatedParallel()).

hfinkel added inline comments.Dec 18 2018, 5:32 PM
docs/LangRef.rst
5469

of -> if

5472

update -> updating

include/llvm/Analysis/VectorUtils.h
120

access group lists -> access-group lists

126

access group list -> access-group list

lib/Analysis/LoopInfo.cpp
328

Repeating this set of asserts seems unfortunate. Also, they have no comment. Maybe make a function?

assert(AccGroup->isDistinct() && "Access group metadata nodes must be distinct");
assert(AccGroup->getNumOperands() == 0 && "Access group metadata nodes must have zero operands");

you also repeat these asserts in lib/Analysis/VectorUtils.cpp below.

lib/Analysis/VectorUtils.cpp
474

Indentation here is odd.

477

Indentation here looks odd too.

Meinersbur marked 10 inline comments as done.
  • Address @hfinkel's review
  • clang-format
hfinkel accepted this revision.Dec 19 2018, 12:38 PM

Aside from the requested renaming (see below), this LGTM.

include/llvm/Analysis/LoopInfo.h
1016

I'd prefer to name this isValidAsAccessGroup() (because the function does not actually determine whether the MD node *is* an access group, but rather, whether it might be valid to use it as one).

  • Rename: isAccessGroup -> isValidAsAccessGroup
This revision was automatically updated to reflect the committed changes.
Meinersbur marked an inline comment as done.