This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][Falkor] Avoid HW prefetcher tag collisions (step 1)
ClosedPublic

Authored by gberry on Jul 3 2017, 1:33 PM.

Details

Summary

This patch is the first step in reducing HW prefetcher instruction tag
collisions in inner loops for Falkor. It adds a pass that annotates IR
loads with metadata to indicate that they are known to be strided loads,
and adds a target lowering hook that translates this metadata to a
target-specific MachineMemOperand flag.

A follow on change will use this MachineMemOperand flag to re-write
instructions to reduce tag collisions.

Diff Detail

Repository
rL LLVM

Event Timeline

gberry created this revision.Jul 3 2017, 1:33 PM
hfinkel added a subscriber: hfinkel.Jul 3 2017, 4:28 PM

I don't quite understand the design requirements here. Instead of adding custom metadata, would it be better just to have the prefetching pass tag the loads that it believes that it has prefetched? Right now, we only prefetch strided accesses (and I can't tell from your description whether the tag collision problem is specific to strided access because those are what we might prefetch, or because of some other reason).

I don't quite understand the design requirements here. Instead of adding custom metadata, would it be better just to have the prefetching pass tag the loads that it believes that it has prefetched? Right now, we only prefetch strided accesses (and I can't tell from your description whether the tag collision problem is specific to strided access because those are what we might prefetch, or because of some other reason).

The loads I'm interested in tagging for this series of changes are ones that the LoopDataPrefetch pass has decided *not* to insert SW prefetch instructions for. For Falkor, we only insert SW prefetch instructions for strided accesses when the stride is known to be too large for the HW prefetchers to handle. In all other cases, (i.e. when we expect the HW prefetchers to work), there is a separate issue with the HW prefetcher implementation that we want to try to work-around by inserting MOVs very late in the code generation pipeline.

My first prototype of this pass re-arranged the LoopDataPrefetch pass a bit to achieve the same goal (i.e. marking loads that we think the HW prefetcher can handle well), but it seemed like it muddled that pass a bit too much just to address one sub-target's issues.

I don't understand your point about not adding custom metadata. These loads need to be tagged in some way that will survive (as much as possible) all the way to near the end of code generation. A custom metadata annotation, translated to a target MMO flag seemed like the cleanest way to achieve this, but I'm certainly open to other suggestions.

I don't quite understand the design requirements here. Instead of adding custom metadata, would it be better just to have the prefetching pass tag the loads that it believes that it has prefetched? Right now, we only prefetch strided accesses (and I can't tell from your description whether the tag collision problem is specific to strided access because those are what we might prefetch, or because of some other reason).

The loads I'm interested in tagging for this series of changes are ones that the LoopDataPrefetch pass has decided *not* to insert SW prefetch instructions for. For Falkor, we only insert SW prefetch instructions for strided accesses when the stride is known to be too large for the HW prefetchers to handle. In all other cases, (i.e. when we expect the HW prefetchers to work), there is a separate issue with the HW prefetcher implementation that we want to try to work-around by inserting MOVs very late in the code generation pipeline.

My first prototype of this pass re-arranged the LoopDataPrefetch pass a bit to achieve the same goal (i.e. marking loads that we think the HW prefetcher can handle well), but it seemed like it muddled that pass a bit too much just to address one sub-target's issues.

I don't understand your point about not adding custom metadata. These loads need to be tagged in some way that will survive (as much as possible) all the way to near the end of code generation. A custom metadata annotation, translated to a target MMO flag seemed like the cleanest way to achieve this, but I'm certainly open to other suggestions.

Ah, okay, thanks for clarifying. I indeed misunderstood the situation. I don't object to the current direction.

gberry updated this revision to Diff 106374.Jul 12 2017, 9:31 PM

Rebase and change metadata string to "falkor.strided.access"

Ping? @hfinkel should I interpret your previous comment as an approval?

Ping? @hfinkel should I interpret your previous comment as an approval?

I don't know anything about the hardware problem you're trying to solve, so I'd prefer that someone who does look at this as well. I was mostly interested in the information-passing issues between the IR level and the MI level.

Ping? @hfinkel should I interpret your previous comment as an approval?

I don't know anything about the hardware problem you're trying to solve, so I'd prefer that someone who does look at this as well. I was mostly interested in the information-passing issues between the IR level and the MI level.

Fair enough. FWIW, I'm mostly looking for feedback on stylistic/code organization issues that need to be addressed.

mcrosier accepted this revision.Jul 13 2017, 11:58 AM

The Falkor specific changes were reviewed internally and they still LGTM. Assuming @hfinkel is okay with the target-independent parts, this patch LGTM.

lib/Target/AArch64/AArch64FalkorHWPFFix.cpp
121 ↗(On Diff #106374)
return false;
This revision is now accepted and ready to land.Jul 13 2017, 11:58 AM
This revision was automatically updated to reflect the committed changes.
MatzeB added a subscriber: MatzeB.Jul 16 2017, 10:35 AM
MatzeB added inline comments.
llvm/trunk/lib/Target/AArch64/AArch64FalkorHWPFFix.cpp
9–13

Start with /// \file to have doxygen pick it up.

67–69

how strange...

110

no need for auto

116

Could use a reference parameter as it cannot be nullptr.

123–124

Would be nicer without auto.

gberry marked 3 inline comments as done.Jul 17 2017, 1:20 PM
gberry added inline comments.
llvm/trunk/lib/Target/AArch64/AArch64FalkorHWPFFix.cpp
67–69

This was inherited from the LoopDataPrefetch pass. I'll circle back and check if either of these are still necessary and if so if the underlying issue is fixable.

110

It seems like most other code uses auto for df_iterators, so I've left this one alone. I did rename 'I' to 'L' and 'L' to 'LIt' to hopefully make it a bit more clear.