This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Split out processor/tuning features
ClosedPublic

Authored by david-arm on Oct 11 2021, 7:58 AM.

Details

Summary

Following on from an earlier patch that introduced support for -mtune
for AArch64 backends, this patch splits out the tuning features
from the processor features. This gives us the ability to enable
architectural feature set A for a given processor with "-mcpu=A"
and define the set of tuning features B with "-mtune=B".

It's quite difficult to write a test that proves we select the
right features according to the tuning attribute because most
of these relate to scheduling. I have created a test here:

CodeGen/AArch64/misched-fusion-addr-tune.ll

that demonstrates the different scheduling choices based upon
the tuning.

Diff Detail

Event Timeline

david-arm created this revision.Oct 11 2021, 7:58 AM
david-arm requested review of this revision.Oct 11 2021, 7:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 11 2021, 7:58 AM
c-rhodes added inline comments.Oct 11 2021, 8:30 AM
llvm/lib/Target/AArch64/AArch64.td
691

Is there any value in defining this if the A35 has no tuning features? Same applies for others like Carmel.

c-rhodes added inline comments.Oct 11 2021, 8:32 AM
llvm/lib/Target/AArch64/AArch64.td
692

I think this would easier to review if you kept the existing formatting, difficult to compare these long lines.

Thanks for doing this!

Adding some newlines sounds like a good idea :)

Matt added a subscriber: Matt.Oct 11 2021, 10:32 AM
lenary added a subscriber: lenary.Oct 12 2021, 2:16 AM
david-arm marked an inline comment as done.Oct 12 2021, 7:11 AM
david-arm added inline comments.
llvm/lib/Target/AArch64/AArch64.td
691

Yeah we have to unfortunately even if it looks silly. That's because a lot of the tuning in AArch64Subtarget.cpp depends upon ARMProcFamily being set to the CPU.

692

I'll try to make it look a bit nicer with some new lines, but it will still be difficult to review because the processor features have been pulled so the lists will be different.

david-arm updated this revision to Diff 379300.Oct 13 2021, 1:31 AM
david-arm marked an inline comment as done.
  • Reintroduced the original formatting, which was often inconsistent, but does make the patch easier to review.
  • I've replaced the ProcAppleA7 feature with FeatureAppleA7SysReg because the only reason I kept this originally was to maintain a dependency in a system operands tablegen file.

@david-arm You should probably rebase your patch, because I noticed UseScalarIncVL is still marked as an architecture feature (but I can't see it in this revision yet).

To help the review of this patch, I wrote a tablegen backend to flatten all the subtarget-features for each processor and mark them as either (arch) feature or tuning feature. Then I created a diff between the before/after of this patch. The result is here: F19590126.

Would there be any use in having an llvm tool that dumps the processor features? (I was thinking of having a tool, say dump-cpu-features, which could be used in tests, e.g.

; RUN: dump-cpu-features --target=aarch64
; CHECK:       Processor: ....
; CHECK-NEXT: HasNEON = true [feature]
; CHECK-NEXT: HasSHA2 = true [feature]
; ...

If so, I could spend some more time to make this into a little tool and generate some tests.

Thanks for the updates. My comments about newlines was referring more to llvm preferring 80 line (ish) character limits, and some of the new code here going over that. tb files are not as strict as c files and sometimes go over a little if it helps readability, but formatting the new code here to be easier on the eyes would be good. I think @sdesmalen script then suggested the details here were looking OK.

I think we should be able to write a few tests that tuning features and arch features are being enabled at the correct times too.

llvm/lib/Target/AArch64/AArch64.td
694–704

Can this be moved next to all the other subtarget features above.

Hi @david-arm, I'm not sure if you rebased this correctly, because I can't see a definition for FeatureUseScalarIncVL in AArch64.td this patch. That's one of the features which should become a tuning feature as well.

llvm/lib/Target/AArch64/AArch64.td
896

nit: Could you please format these ProcessorFeatures to something a bit easier on the eyes? :)

1014–1015

nit: missing whitespace above

david-arm updated this revision to Diff 379631.Oct 14 2021, 2:24 AM
  • Uploaded the actual rebased version correctly this time. :)
  • Address review comments.
sdesmalen accepted this revision.Oct 14 2021, 5:18 AM

LGTM, thanks for the changes!

I'm not sure how you could easily write tests for this, other than creating a TableGen tool to generate the list of features per CPU, and comparing it with a predefined list of features. But that requires quite a bit of work to make it into an LLVM tool and create a lit test and tbh I'm not sure how valuable such tests would be. I have confirmed your test is NFC, so I'm happy for this patch to move forward.

This revision is now accepted and ready to land.Oct 14 2021, 5:18 AM
dmgreen requested changes to this revision.Oct 14 2021, 5:23 AM

LGTM, thanks for the changes!

I'm not sure how you could easily write tests for this, other than creating a TableGen tool to generate the list of features per CPU, and comparing it with a predefined list of features. But that requires quite a bit of work to make it into an LLVM tool and create a lit test and tbh I'm not sure how valuable such tests would be. I have confirmed your test is NFC, so I'm happy for this patch to move forward.

There does need to be some tests. Can there not be tests that show that -target-cpu does one thing and -tune-cpu does another? They only need to be .ll tests, and I'm not talking about testing every feature, but there needs to be something.

llvm/lib/Target/AArch64/AArch64.td
985

All these lines are still too long and need to cleaned up.

This revision now requires changes to proceed.Oct 14 2021, 5:23 AM
david-arm updated this revision to Diff 379711.Oct 14 2021, 7:18 AM
david-arm edited the summary of this revision. (Show Details)

Hi @dmgreen, it's quite difficult to write tests to prove that the tuning has any effect, because we can only test the side-effects of the tuning features. These mostly involve different scheduling choices. It took quite a while write a test, which I did by deliberately removing lots of tuning features from CPUs and looking for tests that broke. The test I have added basically ensures we schedule instructions differently. However, if you can think of a better test I'm willing to consider it!

Yeah scheduling changes can be a bit difficult to check well. Thanks for adding the new test. There are others like FeatureAggressiveFMA and FeatureLSLFast that can more directly alter codegen, if they are easier to test with.

If not, whilst looking around I found an issue with FeatureAltFPCmp, but otherwise this looks good to me. (And perhaps consider formatting the ProcessorModel lines to be sorter, but them being like this doesn't block the patch).

llvm/lib/Target/AArch64/AArch64.td
758

I was wrong about FeatureAltFPCmp if I said it should be a tuning feature. Looks like it is a 8.5 architecture feature controlling the axflag and xaflag instructions.

david-arm updated this revision to Diff 379947.Oct 15 2021, 3:06 AM
  • Addressed review comments.
david-arm marked 4 inline comments as done.Oct 15 2021, 3:08 AM
dmgreen accepted this revision.Oct 15 2021, 6:05 AM

Thanks.

This revision is now accepted and ready to land.Oct 15 2021, 6:05 AM
This revision was landed with ongoing or failed builds.Oct 19 2021, 7:57 AM
This revision was automatically updated to reflect the committed changes.
tmatheson added inline comments.
llvm/lib/Target/AArch64/AArch64.td
783–785

Should this (and the next line) say M4 rather than M3?