This is an archive of the discontinued LLVM Phabricator instance.

[LoongArch] Support -march=native and -mtune=
ClosedPublic

Authored by SixWeining on Jul 20 2023, 5:36 AM.

Details

Summary

As described in [1][2], -mtune= is used to select the type of target
microarchitecture, defaults to the value of -march. The set of
possible values should be a superset of -march values. Currently
possible values of -march= and -mtune= are native, loongarch64
and la464.

D136146 has supported -march={loongarch64,la464} and this patch adds
support for -march=native and -mtune=.

A new ProcessorModel called loongarch64 is defined in LoongArch.td
to support -mtune=loongarch64.

llvm::sys::getHostCPUName() returns generic on unknown or future
LoongArch CPUs, e.g. the not yet added la664, leading to
llvm::LoongArch::isValidArchName() failing to parse the arch name.
In this case, use loongarch64 as the default arch name for 64-bit
CPUs.

Two preprocessor macros are defined based on user-provided -march=
and -mtune= options and the defaults.

  • __loongarch_arch
  • __loongarch_tune

Note that, to work with -fno-integrated-cc1 we leverage cc1 options
-target-cpu and -tune-cpu to pass driver options -march= and
-mtune= respectively because cc1 needs these information to define
macros in LoongArchTargetInfo::getTargetDefines.

[1]: https://github.com/loongson/LoongArch-Documentation/blob/2023.04.20/docs/LoongArch-toolchain-conventions-EN.adoc
[2]: https://github.com/loongson/la-softdev-convention/blob/v0.1/la-softdev-convention.adoc

Diff Detail

Unit TestsFailed

Event Timeline

SixWeining created this revision.Jul 20 2023, 5:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2023, 5:36 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
SixWeining requested review of this revision.Jul 20 2023, 5:36 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 20 2023, 5:36 AM

Do we need to convert Xuerui's label alignment change to use the -mtune framework?

Do we need to convert Xuerui's label alignment change to use the -mtune framework?

How to set the numbers if TuneCPU is empty? If there is no differences among empty, loongarch64 and la464, seems it's better to do this until we support other uarchs, e.g. la[236]64.

Just like the TODO in D148622:

63   // TODO: Check TuneCPU and override defaults (that are for LA464) once we
64   // support optimizing for more uarchs.

Do we need to convert Xuerui's label alignment change to use the -mtune framework?

How to set the numbers if TuneCPU is empty? If there is no differences among empty, loongarch64 and la464, seems it's better to do this until we support other uarchs, e.g. la[236]64.

Just like the TODO in D148622:

63   // TODO: Check TuneCPU and override defaults (that are for LA464) once we
64   // support optimizing for more uarchs.

In GCC I added the numbers for "loongarch64" and "la464" although they are the same, and GCC does not have "empty". But yes we can do things later here.

In GCC I added the numbers for "loongarch64" and "la464" although they are the same, and GCC does not have "empty". But yes we can do things later here.

Ah, I just realize TuneCPU is always not empty because:

27 LoongArchSubtarget &LoongArchSubtarget::initializeSubtargetDependencies(
28     const Triple &TT, StringRef CPU, StringRef TuneCPU, StringRef FS,
29     StringRef ABIName) {
30   bool Is64Bit = TT.isArch64Bit();
31   if (CPU.empty() || CPU == "generic")
32     CPU = Is64Bit ? "generic-la64" : "generic-la32"; // empty CPU will be converted to generic-la{64,32}
33 
34   if (TuneCPU.empty())
35     TuneCPU = CPU; // empty TuneCPU will be converted to CPU, so TuneCPU is not empty when we all initializeProperties below
36 
37   ParseSubtargetFeatures(CPU, TuneCPU, FS);
38   initializeProperties(TuneCPU);

So, the problem is how to set numbers for generic-la{64,32} ? Currenly if we directly use clang hello.c, TuneCPU is generic-la64.

@xen0n Do you have any inputs? I think supporting these options can improve compatibility with gcc (although there is no ScheduleModel difference among currently supported processors) and I hope it can be merged into LLVM17.

@xen0n Do you have any inputs? I think supporting these options can improve compatibility with gcc (although there is no ScheduleModel difference among currently supported processors) and I hope it can be merged into LLVM17.

Sorry for the late reply (busy with rewriting the doc comments for D138135). I'm in support of adding those flags, and for now I think defaulting to LA464 values would suffice.

update clang release notes

xen0n accepted this revision.Jul 25 2023, 3:12 AM

Overall LGTM except one small nit, thanks!

clang/docs/ReleaseNotes.rst
902 ↗(On Diff #543902)

"Added"? Or rephrase like "The ... options and ... macros are now supported".

This revision is now accepted and ready to land.Jul 25 2023, 3:12 AM

Address @xen0n's comments.

add missing .

wangleiat accepted this revision.Jul 25 2023, 6:00 AM
This revision was landed with ongoing or failed builds.Jul 25 2023, 6:02 AM
This revision was automatically updated to reflect the committed changes.
SixWeining added inline comments.Jul 25 2023, 8:29 AM
clang/docs/ReleaseNotes.rst
138–143 ↗(On Diff #543942)

These are not for this change. (They have be cleaned up after the llvm18-init tag). I will revert it.

steven_wu added inline comments.
llvm/lib/TargetParser/LoongArchTargetParser.cpp
19

Why do we store Arch and TuneCPU as globals? We should not have as little global state in compiler as possible. Any reason why they are not stored in LoongArchTargetInfo instead like other Targets?

After another look, I will need to request to revert this because this implementation doesn't work with -fno-integrated-cc1 at all. You can't setCPU/Tune in the driver into a global variable and expect that will work during compilation.

You can reproduce with -fno-integrated-cc1 option or build clang with cmake option -DCLANG_SPAWN_CC1=On.

After another look, I will need to request to revert this because this implementation doesn't work with -fno-integrated-cc1 at all. You can't setCPU/Tune in the driver into a global variable and expect that will work during compilation.

You can reproduce with -fno-integrated-cc1 option or build clang with cmake option -DCLANG_SPAWN_CC1=On.

Thanks. I’ll investigate as soon as possible. BTW, did it break any buildbot?

After another look, I will need to request to revert this because this implementation doesn't work with -fno-integrated-cc1 at all. You can't setCPU/Tune in the driver into a global variable and expect that will work during compilation.

You can reproduce with -fno-integrated-cc1 option or build clang with cmake option -DCLANG_SPAWN_CC1=On.

Thanks. I’ll investigate as soon as possible. BTW, did it break any buildbot?

Yes, I only know this bootstrap job (has CMAKE configuration that sets CLANG_SPAWN_CC1). https://green.lab.llvm.org/green/job/clang-stage2-Rthinlto

My local revert is clean and all tests passed. I will just go ahead and revert that.

SixWeining reopened this revision.Aug 3 2023, 12:30 AM
This revision is now accepted and ready to land.Aug 3 2023, 12:30 AM
SixWeining edited the summary of this revision. (Show Details)
  • work with -fno-integrated-cc1
  • add mingsing "s in macros
SixWeining requested review of this revision.Aug 3 2023, 12:32 AM
xen0n added inline comments.Aug 3 2023, 12:43 AM
clang/lib/Driver/ToolChains/Arch/LoongArch.cpp
187–192

This part is identical between -march= and -mtune= handling. I'd suggest extracting out a common helper for this, or making this function generic over options::OPT_m*_EQ (whichever more suitable for you and the Clang maintainers; I don't have a particular preference).

SixWeining updated this revision to Diff 546742.Aug 3 2023, 1:32 AM
SixWeining marked an inline comment as done.

Add a common helper for -m{arch,tune}=native handling.

clang/lib/Driver/ToolChains/Arch/LoongArch.cpp
187–192

Thanks. Just added.

SixWeining updated this revision to Diff 546766.Aug 3 2023, 2:26 AM

rename the common helper to postProcessTargetCPUString

xen0n accepted this revision.Aug 8 2023, 3:32 AM

This still LGTM, @steven_wu would you please take another look so this can get re-landed if confirmed working?

This revision is now accepted and ready to land.Aug 8 2023, 3:32 AM
steven_wu accepted this revision.Aug 8 2023, 9:18 AM

This still LGTM, @steven_wu would you please take another look so this can get re-landed if confirmed working?

The driver part LGTM.

MaskRay accepted this revision.Aug 8 2023, 5:48 PM

Some test suggestions

clang/lib/Basic/Targets/LoongArch.cpp
203

Use the Twine(char) constructor.

clang/test/Driver/loongarch-mtune-error.c
1

Use -### to test the driver. Consider merging this test into the loongarch-mtune.c.

SixWeining updated this revision to Diff 548442.Aug 8 2023, 7:27 PM
SixWeining marked an inline comment as done.

Address @MaskRay's comments.

SixWeining marked an inline comment as done.Aug 8 2023, 7:28 PM
SixWeining added inline comments.
clang/lib/Basic/Targets/LoongArch.cpp
203

Thanks.

clang/test/Driver/loongarch-mtune-error.c
1

Done. Thanks.

This revision was landed with ongoing or failed builds.Aug 8 2023, 7:32 PM
This revision was automatically updated to reflect the committed changes.
SixWeining marked an inline comment as done.