This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Always add -tune-cpu argument to -cc1 driver
ClosedPublic

Authored by david-arm on Sep 22 2021, 8:00 AM.

Details

Summary

This patch ensures that we always tune for a given CPU on AArch64
targets when the user specifies the "-mtune=xyz" flag. In the
AArch64Subtarget if the tune flag is unset we use the CPU value
instead.

Tests added here:

clang/test/Driver/aarch64-mtune.c

Diff Detail

Event Timeline

david-arm created this revision.Sep 22 2021, 8:00 AM
david-arm requested review of this revision.Sep 22 2021, 8:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 22 2021, 8:00 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
Matt added a subscriber: Matt.Sep 22 2021, 8:27 AM

My understanding is that -mtune doesn't work sensibly for Arm backends. The tuning features and architecture features are not separated out at the subtarget level.

Is the idea to teach people to start using it? That sounds dangerous without fixing the issues with it first. Or is this to make sure it's set to something?

(Fixing it properly would be a really good thing.)

Hi @dmgreen, this is specifically being introduced for SVE targets to help make informed cost model decisions regarding the value of vscale - see D110259. We thought that using the "tune-cpu" attribute might be a good way of doing this.

Hi @dmgreen, this is specifically being introduced for SVE targets to help make informed cost model decisions regarding the value of vscale - see D110259. We thought that using the "tune-cpu" attribute might be a good way of doing this.

Yeah. I added comments there. It would feel strange to me to have a single part of the tuning features for a cpu based on a tune-cpu, where everything else uses the target-cpu. We should try and make -mtune work better, but we should aim to do that consistently in a way that doesn't just make thing _more_ confusing!

paulwalker-arm added inline comments.Sep 23 2021, 2:53 AM
clang/lib/Driver/ToolChains/Clang.cpp
1850

What benefit does -tune-cpu generic provide?

I'm wondering if the patch can be restricted to only add -tune-cpu when a -mtune= is specified with a real name or a detected name for when "native" is specified.

1857–1860

See me comment above.

david-arm updated this revision to Diff 375169.Sep 27 2021, 1:48 AM
david-arm retitled this revision from [AArch64][Clang] Always add -tune-cpu argument to -cc1 driver to [AArch64] Always add -tune-cpu argument to -cc1 driver.
david-arm edited the summary of this revision. (Show Details)
  • Updated the patch to use mtune properly and set the scheduling model accordingly
Herald added a project: Restricted Project. · View Herald TranscriptSep 27 2021, 1:48 AM

Sounds great. Glad to see us taking this route.

Unfortunately I think we do need to split the subtargetfeatures up into arch flags and tune flags. Same for the details in AArch64Subtarget::initializeProperties. It is hopefully a fairly mechanical process, but they are an important part of tuning and without them -mtune is only a half-working option.

Are you happy to give that a go too?

llvm/lib/Target/AArch64/AArch64Subtarget.h
305

Needs reformatting.

Sounds great. Glad to see us taking this route.

Unfortunately I think we do need to split the subtargetfeatures up into arch flags and tune flags. Same for the details in AArch64Subtarget::initializeProperties. It is hopefully a fairly mechanical process, but they are an important part of tuning and without them -mtune is only a half-working option.

Are you happy to give that a go too?

Hi @dmgreen, sure I can try. The only problem is that I don't really understand what to do here. I used the X86Subtarget as a guidance and the TuneCPU flag only seems to be used for scheduling and nothing else. It's not obvious to me how the TuneCPU flag decides the features as I thought it was purely for scheduling?

I think ProcessorModel is

class ProcessorModel<string n, SchedMachineModel m, list<SubtargetFeature> f,
                     list<SubtargetFeature> tunef = []>

So we are always currently passing tunef = [] from AArch64 and passing all features through f. They need to be split out and then hopefully the call to ParseSubtargetFeatures will use CPU and TuneCPU appropriately.

I think these are the features I would class as tuning:

FeatureExperimentalZeroingPseudos
FeatureZCRegMove
FeatureZCZeroingGP
FeatureNoZCZeroingFP
FeatureZCZeroing
FeatureZCZeroingFPWorkaround
FeatureStrictAlign
FeatureBalanceFPOps
FeaturePredictableSelectIsExpensive
FeatureCustomCheapAsMoveHandling
FeatureExynosCheapAsMoveHandling
FeaturePostRAScheduler
FeatureSlowMisaligned128Store
FeatureSlowPaired128
FeatureSlowSTRQro
FeatureAlternateSExtLoadCVTF32Pattern
FeatureArithmeticBccFusion
FeatureArithmeticCbzFusion
FeatureCmpBccFusion
FeatureFuseAddress
FeatureFuseAES
FeatureFuseArithmeticLogic
FeatureFuseCCSelect
FeatureFuseCryptoEOR
FeatureFuseLiterals
FeatureDisableLatencySchedHeuristic
FeatureForce32BitJumpTables
FeatureUseRSqrt
FeatureLSLFast
FeatureAggressiveFMA

The ProcA55 type features set ARMProcFamily, which seems to be used for tuning everywhere they are used, so would probably class as tuning features too.

Hi @dmgreen, would you be happy for me to do the splitting-out of arch and tuning features in a separate follow-on patch? I think it's a good idea and I don't object to doing it, but I'm not sure that it really needs to hold up this initial patch? I personally think it makes sense to live in a separate patch because it seems riskier in terms of possible effects on performance. As far as I understand it, this isn't a functional requirement, but more for completeness.

Hi @dmgreen, would you be happy for me to do the splitting-out of arch and tuning features in a separate follow-on patch? I think it's a good idea and I don't object to doing it, but I'm not sure that it really needs to hold up this initial patch? I personally think it makes sense to live in a separate patch because it seems riskier in terms of possible effects on performance. As far as I understand it, this isn't a functional requirement, but more for completeness.

I think that the ARMProcFamily subtarget features are really tuning features, which would mean that the AArch64Subtarget::initializeProperties would end up being based on tuning features and the code in D110259 would simplify as a result, not needing to pass TuneCPU around and check for specific cpus. We could change how that works code later, but I am a little worried about it being half finished, having tune-cpu not really working as it should but being required for some features. I would be fine as a separate patch so long as we are pretty confident that -mtune was going to work correctly before we start using it.

lenary added a subscriber: lenary.Sep 29 2021, 9:03 AM
david-arm updated this revision to Diff 378180.Oct 8 2021, 5:24 AM
david-arm marked an inline comment as done.
  • Fixed formatting issues.
david-arm marked 2 inline comments as done.Oct 8 2021, 5:25 AM

Gentle ping!

If D111551 was folded into this patch, would it be possible to add tests for -tune-cpu enabling/disabling features at the correct times?

clang/test/Driver/aarch64-mtune.c
4

Why do we not want to add a default tune-cpu?

35

My understanding is that -mcpu=cpu is the same as -march=something + -mtune=cpu. Why would this case not add a -tune-cpu too? Is it because that gets handled in llvm?

If D111551 was folded into this patch, would it be possible to add tests for -tune-cpu enabling/disabling features at the correct times?

Similar to the comment I left on D110259, I don't want D111551 to hold up the cost model changes, which are critical. I'd prefer them to be independent. I expect D111551 to take longer to get approval, and even once approved/merged if for any reason D111551 causes issues after merging we only have to revert that one change to AArch64.td.

clang/test/Driver/aarch64-mtune.c
4

This was in response to an earlier review comment by @paulwalker-arm asking what benefit "-mtune=generic" provded and about restricting the patch to only add tune-cpu when the user has explicitly specified one.

35

I thought this was pretty standard behaviour? We're already adding -target-cpu, which implies the arch + tuning, so isn't adding -tune-cpu redundant? Not sure what value "-target-cpu=thunderx -tune-cpu=thunderx" adds really.

If D111551 was folded into this patch, would it be possible to add tests for -tune-cpu enabling/disabling features at the correct times?

Similar to the comment I left on D110259, I don't want D111551 to hold up the cost model changes, which are critical.

They are not. The changes in the cost model in D110259 are independent of whether -mcpu or -mtune work a particular way and could probably be committed now if it was rebased away from the mtune patches :)

I'd prefer them to be independent.

That's fine. They seem like two patches doing two halves of one change to me, but having them separate sounds perfectly fine so long as we have the tests we need. I see tests that the clang -mcpu and -mtune set llvm flags as expected, but not that those target-cpu and tune-cpu options then set the correct features at the correct times.

clang/test/Driver/aarch64-mtune.c
35

It could work either way with adding both attributes or letting the target-cpu act as both if it is the only one present. So long as we are happy with it working this way it sounds good to me.

Now that we have the tests on the D111551 side, I think this patch LGTM.

Do you think it's worth adding something to the release notes? "-mtune now actually works for AArch64". But, umm, perhaps better written than that.

david-arm updated this revision to Diff 380325.Oct 18 2021, 3:04 AM
  • Added something to the ReleaseNotes file.

There are clang release notes that could also do with a line or two, after all the -mtune flag being a clang flag. The X86 notes for reference were added in https://releases.llvm.org/12.0.0/docs/ReleaseNotes.html and https://releases.llvm.org/12.0.0/tools/clang/docs/ReleaseNotes.html, from the look of it. I like "no longer ignored", but don't really have a strong opinion on the exact wording.

david-arm updated this revision to Diff 380649.Oct 19 2021, 4:42 AM
  • Added release notes for both clang and llvm.
sdesmalen added inline comments.Oct 19 2021, 5:03 AM
clang/docs/ReleaseNotes.rst
185–188 ↗(On Diff #380649)

nit: compiling with "-mcpu=generic -mtune=cortex-a57" will not enable any Cortex-a57 specific architecture features, but will enable certain optimizations specific to Cortex-a57 CPUs and enable the use of a more accurate scheduling model.

clang/test/Driver/aarch64-mtune.c
6

nit: Did you make these prefixes lower-case on purpose?

llvm/docs/ReleaseNotes.rst
80 ↗(On Diff #380649)

nit: s/we tune/it tunes/

david-arm added inline comments.Oct 19 2021, 5:05 AM
clang/test/Driver/aarch64-mtune.c
6

Sort of, in that I copied this test from x86-mtune.c, which has the same CHECK prefixes. :) I can capitalise them.

Thanks. I'm happy as soon as everyone else is happy with the wording.

david-arm updated this revision to Diff 380663.Oct 19 2021, 5:54 AM
david-arm marked 3 inline comments as done.
sdesmalen accepted this revision.Oct 19 2021, 5:56 AM

Thanks @david-arm, LGTM

This revision is now accepted and ready to land.Oct 19 2021, 5:56 AM
This revision was landed with ongoing or failed builds.Oct 19 2021, 6:58 AM
This revision was automatically updated to reflect the committed changes.