This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Override useAA and remove unneeded edges in the scheduling graph. (Phabricator)
ClosedPublic

Authored by ssijaric on Aug 28 2014, 8:06 AM.

Details

Summary

All,
This patch removes edges in the scheduling graph between stores/loads off the same base register if memory accesses do not overlap. An improvement on eembc/automark on a cortex-a53 device, as well as a small improvement on spec2000/eon. No significant performance changes were observed on cortex-a57, but this may be more relevant on in-order architectures. Please have a look.

Thanks,

Sanjin

(I submitted on behalf of Sanjin to ease review.)

Diff Detail

Event Timeline

mcrosier updated this revision to Diff 13038.Aug 28 2014, 8:06 AM
mcrosier retitled this revision from to [AArch64] Override useAA and remove unneeded edges in the scheduling graph. (Phabricator).
mcrosier updated this object.
mcrosier edited the test plan for this revision. (Show Details)
mcrosier added a subscriber: Unknown Object (MLST).
mcrosier added inline comments.Aug 28 2014, 8:16 AM
lib/Target/AArch64/AArch64InstrInfo.cpp
1312

Spacing of parameters.

1396

Check prior to the switch?

Just to share some experience... (I enable AA during CodeGen on many PowerPC cores)

I've found using AA during CodeGen is very useful for in-order cores. It is less useful for ooo cores, but this is somewhat dependent on the size of the reorder buffers (and sometimes the loop dispatch buffers, depending on how the architecture is setup), because its main effect is to reduce pressure on these resources.

But it is sometimes very useful on ooo cores for architecture-specific reasons. For example, on the POWER cores there are expensive load-after-store hazards that are critical to avoid. These can be avoided by forcing dispatch-group termination using special nop instructions, but you don't want to do this if you don't have to. Using AA in CodeGen to reduce the number of unneeded group-terminating nops inserted is very helpful.

There is potentially some compile-time impact to doing this, so make sure you keep that in mind.

mcrosier updated this revision to Diff 13041.Aug 28 2014, 8:43 AM

Add missing test case.

jmolloy requested changes to this revision.Aug 29 2014, 2:19 AM
jmolloy edited edge metadata.

Hi Sanjin,

I have a bunch of comments, but this patch looks in good shape already! One other thing, it looks like you're adding quite a few blank lines between functions. There should be one blank line only. A nitpick I know, but nitpicks like this keep the codebase in decent shape.

Cheers,

James

include/llvm/Target/TargetInstrInfo.h
1195

I don't like the truncation of "different" to "diff". In fact, I think "disjoint" would be more accurate: "areMemAccessesTriviallyDisjoint".

What is the behaviour is MIa or MIb are not memory access instructions? you've below had them return false but I think it should assert in this case.

1204

Unused function.

lib/CodeGen/ScheduleDAGInstrs.cpp
521

Either you need to check if these are memory accesses first, or you should rename the function or make it explicit that it will be well behaved on non-memory operands. With its current name, one would expect that a precondition is the arguments are memory instructions.

lib/Target/AArch64/AArch64InstrInfo.cpp
617

As mentioned above, unintuitive.

631

For clarity, I'd prefer if you used the same condition in each ternary instead of chaining off the first condition:

int LowOffset = OffsetA < OffsetB ? ...
int HighOffset = OffsetA < OffsetB ? ...
1396

Probably best to early-exit this before the switch.

lib/Target/AArch64/AArch64Subtarget.h
114 ↗(On Diff #13041)

This should go in in a separate commit before the rest of the code as it is orthogonal.

test/CodeGen/AArch64/arm64-triv-diff-mem-access.ll
2 ↗(On Diff #13041)

Do we need -enable-misched here? isn't it enabled by default?

6 ↗(On Diff #13041)

Tests that check debug output are generally frowned upon, unless there is no better way to write them.

I assume there is no better way to write this?

This revision now requires changes to proceed.Aug 29 2014, 2:19 AM

Thanks, Hal. We saw some improvements on an in-order cortex-a53 device, but didn't observe performance improvements on an out-of-order cortex-a57 device. We also didn't observe significant compile time degradations when compiling spec2006 with -Ofast.

Jiangning edited edge metadata.Sep 3 2014, 2:53 AM

Hi Sanjin,

spec.cpu2000.ref.181_mcf -1.86%
spec.cpu2000.ref.183_equake -3.11%
spec.cpu2000.ref.186_crafty 1.53%
spec.cpu2000.ref.254_gap 1.40%

I can confirm that we only see some fluctuations on coretex-A57 as above,
so there isn't degradation observed for spec2000 at least.

Thanks,
-Jiangning

2014-08-30 2:04 GMT+08:00 Sanjin Sijaric <ssijaric@codeaurora.org>:

Thanks, Hal. We saw some improvements on an in-order cortex-a53 device,
but didn't observe performance improvements on an out-of-order cortex-a57
device. We also didn't observe significant compile time degradations when
compiling spec2006 with -Ofast.

http://reviews.llvm.org/D5103

ssijaric commandeered this revision.Sep 5 2014, 10:52 AM
ssijaric added a reviewer: mcrosier.
ssijaric updated this revision to Diff 13347.Sep 5 2014, 2:46 PM
ssijaric edited edge metadata.

Thanks, James. The updated patch should address some of the problems. The call to MIsNeedChainEdge from addChainDependency depends on AA being enabled, so I enabled AA together with the rest of the changes. Otherwise, the test case will fail. Enabling AA may go in a separate commit, but it would have to be committed before any other changes.

Thanks for the results, Jiangning. The updated patch enables AA for the in-order cortex-a53 core only.

I recommend splitting the useAA change from the rest of it (for better bisecting ;) ).

include/llvm/Target/TargetInstrInfo.h
1200

Can you also pass an optional AliasAnalysis* here too, so the target can use AA in conjunction with whatever target-specific information it has to determine aliasing.

ssijaric updated this revision to Diff 13354.Sep 5 2014, 6:14 PM
ssijaric edited edge metadata.

Incorporate Hal's suggestions.

mcrosier accepted this revision.Sep 5 2014, 6:24 PM
mcrosier edited edge metadata.

Sanjin,
After committing the change to enable AA in AArch64Subtarget.h, this patch LGTM.

Chad

mcrosier accepted this revision.Sep 8 2014, 9:00 AM

Committed in r217370, r217371, and r217381.

jmolloy accepted this revision.Sep 8 2014, 9:40 AM
jmolloy edited edge metadata.
This revision is now accepted and ready to land.Sep 8 2014, 9:40 AM
jmolloy closed this revision.Sep 8 2014, 9:41 AM