This is an archive of the discontinued LLVM Phabricator instance.

[clang][aarch64] Generate preprocessor macros for -march=armv8.6a+sve.
ClosedPublic

Authored by fpetrogalli on Jul 2 2020, 12:58 PM.

Details

Summary

The following preprocessor macros are implied when -march=armv8.6a+sve:

__ARM_FEATURE_SVE 1
__ARM_FEATURE_SVE_BF16 1
__ARM_FEATURE_SVE_MATMUL_FP32 1
__ARM_FEATURE_SVE_MATMUL_INT8 1

Diff Detail

Event Timeline

fpetrogalli created this revision.Jul 2 2020, 12:58 PM
sdesmalen added inline comments.Jul 2 2020, 2:26 PM
clang/lib/Driver/ToolChains/Arch/AArch64.cpp
118

Looking at what Clang emits for e.g. -march=armv8.5-a, it just adds a target-feature +v8.5a. The definitions in llvm/lib/Target/AArch64/AArch64.td. suggests that LLVM is already able to infer all supported features from that. e.g.

def HasV8_4aOps : SubtargetFeature<
   :
   :

def HasV8_5aOps : SubtargetFeature<
  "v8.5a", "HasV8_5aOps", "true", "Support ARM v8.5a instructions",
  [HasV8_4aOps, FeatureAltFPCmp, FeatureFRInt3264, FeatureSpecRestrict,
   FeatureSSBS, FeatureSB, FeaturePredRes, FeatureCacheDeepPersist,
   FeatureBranchTargetId]>;

def HasV8_6aOps : SubtargetFeature<
  "v8.6a", "HasV8_6aOps", "true", "Support ARM v8.6a instructions",

  [HasV8_5aOps, FeatureAMVS, FeatureBF16, FeatureFineGrainedTraps,
   FeatureEnhancedCounterVirtualization, FeatureMatMulInt8]>;

So I don't think you necessarily have to decompose the architecture version into target-features in the Clang driver as well. For Clang it matters that the right set of feature macros are defined so that the ACLE header file exposes the correct set of functions for the given architecture version. At least for the SVE ACLE that is just a small handful of features.

SjoerdMeijer added a comment.EditedJul 2 2020, 3:28 PM

I haven't looked into details of these arch extensions yet, will do that tomorrow, but I don't think there's any disagreement, i.e., these options and their behaviour are synchronised with the GCC community. Thus, it's better if you remove this wording from both description and comments in the source code. If there is divergence in behaviour, then that is probably be an oversight somewhere. I am also pretty sure that this 8.5 stuff is implemented, not sure if that is upstreamed. And also, I think we need to be more specific than "Default target features implied by -march on AArch64", because this is a big topic, which is not addressed in this patch. This patch is only about a few architecture extension. I am also for example surprised to see bfloat there, as I saw quite some activity in this area.

fpetrogalli marked an inline comment as done.Jul 2 2020, 8:00 PM

I haven't looked into details of these arch extensions yet, will do that tomorrow, but I don't think there's any disagreement, i.e., these options and their behaviour are synchronised with the GCC community. Thus, it's better if you remove this wording from both description and comments in the source code. If there is divergence in behaviour, then that is probably be an oversight somewhere. I am also pretty sure that this 8.5 stuff is implemented, not sure if that is upstreamed. And also, I think we need to be more specific than "Default target features implied by -march on AArch64", because this is a big topic, which is not addressed in this patch. This patch is only about a few architecture extension. I am also for example surprised to see bfloat there, as I saw quite some activity in this area.

I @SjoerdMeijer - I think I misinterpreted the way the target feature are generated, and which part of the codebase handles that. I wasn't aware about the common code pointed out by @sdesmalen that was translating architecture version into target feature - I thought everything needed to be translated by the driver into a list of -target-feature, that is why I thought there was discrepancy. If there is any, this patch is probably not the right way to fix it. The best thing to do is abandon this patch.

clang/lib/Driver/ToolChains/Arch/AArch64.cpp
118

Thank you for explaining this @sdesmalen, I thought that the features needed to get explicitly generated by clang.
I will create a separate patch that adds only the feature macros needed for the SVE ACLEs.

Update the patch to limit its scope to generate the feature macros for -march=armv8.6a+sve.

fpetrogalli retitled this revision from [clang] Default target features implied by `-march` on AArch64. to [clang][aarch64] Generate preprocessor macros for -march=armv8.6a+sve..Jul 2 2020, 8:46 PM
fpetrogalli edited the summary of this revision. (Show Details)
fpetrogalli edited the summary of this revision. (Show Details)Jul 2 2020, 8:46 PM
sdesmalen accepted this revision.Jul 7 2020, 12:01 AM

LGTM

clang/test/Preprocessor/aarch64-target-features.c
415

nit: unrelated change.

This revision is now accepted and ready to land.Jul 7 2020, 12:01 AM

Removed the unrelated change of the empty line.

fpetrogalli marked an inline comment as done.Jul 7 2020, 9:43 AM
SjoerdMeijer added inline comments.Jul 7 2020, 11:53 AM
clang/lib/Driver/ToolChains/Arch/AArch64.cpp
121

Would it be more consistent to move this....

267

...to somewhere here where implied target features are handled....

367–371

For example, to here.

clang/test/Preprocessor/aarch64-target-features.c
115

Can you add a run line for v8.5 if there isn't already one, and add CHECK-NOTs for these macros.

Addressed code review, moving the code and adding more testing, including the v8.5 one.

fpetrogalli marked 4 inline comments as done.Jul 7 2020, 2:00 PM
sdesmalen added inline comments.Jul 8 2020, 12:51 AM
clang/lib/Driver/ToolChains/Arch/AArch64.cpp
369

Is this correct and/or necessary? I would expect LLVM to just handle features in the order they're passed, and the architecture version is always processed first, e.g. -march=armv8.6-a+noi8mm will always first process armv8.6a before processing features like noi8mm.

fpetrogalli marked 2 inline comments as done.Jul 8 2020, 7:20 AM
fpetrogalli added inline comments.
clang/lib/Driver/ToolChains/Arch/AArch64.cpp
369

I was expecting that too, but in in this place the +i8mm is added after whatever the user have passed to -march, which means that without this extra check the user input -mattr=armv8.6a+sve+noimm8 becomes broken because we are adding -target-feature=+i8mm after -i8mm. This behavior is guarded by a regression tests that starts failing if I don't use these extra checks. This was not needed in the original place were I added the functionality because the +i8mm was being added right after +v8.6a and before splitting up the +sve+noi8mm, so that the user input was the one (un)setting the feature.

sdesmalen added inline comments.Jul 8 2020, 9:46 AM
clang/lib/Driver/ToolChains/Arch/AArch64.cpp
369

As you said, we end up with a Feature list as follows:

parsing(-march=armv8.6-a+noi8mm)
=> Features = [ v8.6a ]

parsing(+noi8mm)
=> Features = [ v8.6a, -i8mm ]

Then going through the feature list again:
=> Features = [ v8.6a, -i8mm, +i8mm ]
                ^^^^^          ^^^^
                  \_____________/
                     adds +i8mm

To fix that, you can insert these features into the list straight after "+v8.6a", instead of appending at the end of the Features list. Either that, or calling llvm::AArch64::getDefaultExtensions() + llvm::AArch64::getExtensionFeatures() in getAArch64ArchFeaturesFromMarch like is done in DecodeAArch64Mcpu, which should do all this for free. That seems like a more invasive change though that you shouldn't try to do in this patch.

fpetrogalli marked an inline comment as done.

@sdesmalen, I have followed your suggestion to use insert instead of push_back!

Code is much nicer now, thanks!

fpetrogalli marked an inline comment as done.Jul 8 2020, 1:37 PM
sdesmalen added inline comments.Jul 13 2020, 12:48 AM
clang/lib/Driver/ToolChains/Arch/AArch64.cpp
370

Both +i8mm and +bf16 should be added at iterator V8_6Pos, because I believe that std::next(V8_6Pos) inserts it after the item that follows v8.6a, rather than after v8.6a directly.

sdesmalen added inline comments.Jul 13 2020, 7:51 AM
clang/lib/Driver/ToolChains/Arch/AArch64.cpp
370

Never mind the above comment, you pointed out offline that std::next is correct because insert inserts the value before V8_6Pos, my bad!

Looking at the documentation for std::vector::insert, it seems you can write:

if (V8_6Pos != std::end(Features))
  Features.insert(std::next(V8_6Pos), {"+bf16", "+i8mm"});

which avoids having to update V8_6Pos.

Update the insert invocation to use initializer list instead of calling it twice.

fpetrogalli marked 2 inline comments as done.Jul 13 2020, 8:54 AM
sdesmalen accepted this revision.Jul 13 2020, 9:12 AM

LGTM again!

clang/test/Preprocessor/aarch64-target-features.c
163

nit: I don't think you really need to test all permutations, one positive test (for -march=armv8.6-a+sve) and this negative test should be sufficient?

Removed extra tests that are not needed.

fpetrogalli marked an inline comment as done.Jul 14 2020, 9:41 AM
This revision was automatically updated to reflect the committed changes.