The file may need some refactoring first. You can let RUN lines share some common check prefixes, instead of adding a bunch of defines for every new processor.
// CHECK_X86_64_V2: ... // CHECK_X86_64_V2: ... // CHECK_X86_64_V3: ... // CHECK_PROCESSOR1_M32: // CHECK_PROCESSOR1_M64: // CHECK_PROCESSOR2_M32: // CHECK_PROCESSOR2_M64:
THX for review!
I agree. I'll do it
This is a mistake. I'll modify. And reference is missing in two switch. I'll add.
Yes. And I don't know any simple ways to exclude SGX here, any suggestions?
Hi @MaskRay, I tried to refactor, but met some difficulties. Since these defines are dictionary ordered, a new #define may insert into a common CHECK. So it is difficult to let different RUN share common CHECKs.
// RUN: %clang -march=pentium-mmx -m32 -E -dM %s -o - 2>&1 \ // RUN: -target i386-unknown-linux \ // RUN: | FileCheck -match-full-lines %s -check-prefixes=CHECK_I386_M32,CHECK_I586_M32,CHECK_PENTIUM_MMX_M32 // CHECK_I386_M32: #define __i386 1 // CHECK_I386_M32: #define __i386__ 1 // CHECK_I386_M32: #define i386 1 // CHECK_I586_M32: #define __i586 1 // CHECK_I586_M32: #define __i586__ 1 // CHECK_I586_M32: #define __pentium 1 // CHECK_I586_M32: #define __pentium__ 1 // CHECK_I586_M32: #define __tune_i586__ 1 // CHECK_I586_M32: #define __tune_pentium__ 1 // CHECK_PENTIUM_MMX_M32: #define __MMX__ 1 // CHECK_PENTIUM_MMX_M32: #define __pentium_mmx__ 1 // CHECK_PENTIUM_MMX_M32: #define __tune_pentium_mmx__ 1
The example above will destroy the original order of the define list. Do you have some good suggestions?
Using ICLTuning suggests we should still be avoiding 512-bit ops (FeaturePrefer256Bit) - is this still true for RKL (or anything past CNL...)?
I posted PR48336 but never got any response, but from what others have reported (Travis Downs, Phoronix etc) its mainly a power issue these days, not a perf issue due to big freq drops.
Hi @MaskRay , I tried to refactor the test file by assembling the common CHECKs. But I found it will lead to too many check-prefixes ine one RUN line. For example,
// RUN: %clang -march=c3 -m32 -E -dM %s -o - 2>&1 \ // RUN: -target i386-unknown-linux \ // RUN: | FileCheck -match-full-lines %s -check-prefixes=CHECK_I386_M32C,CHECK_I486_M32C,CHECK_I486_M32S,CHECK_PENTIUM_MMX_M32S,CHECK_WINCHIP2_M32S // CHECK_WINCHIP2_M32S: #define __3dNOW__ 1 // CHECK_PENTIUM_MMX_M32S: #define __MMX__ 1 // CHECK_WINCHIP_C6_M32S: #define __MMX__ 1 // CHECK_I386_M32C: #define __i386 1 // CHECK_I386_M32C: #define __i386__ 1 // CHECK_I486_M32C: #define __i486 1 // CHECK_I486_M32C: #define __i486__ 1 // CHECK_I586_M32C: #define __i586 1 // CHECK_I586_M32C: #define __i586__ 1 // CHECK_I586_M32C: #define __pentium 1 // CHECK_I586_M32C: #define __pentium__ 1 // CHECK_PENTIUM_MMX_M32S: #define __pentium_mmx__ 1 // CHECK_I386_M32S: #define __tune_i386__ 1 // CHECK_I486_M32S: #define __tune_i486__ 1 // CHECK_PENTIUM_MMX_M32S: #define __tune_pentium_mmx__ 1 // CHECK_I386_M32C: #define i386 1
And that CPU is still a very old CPU. It is easy to imagine how long for example skylake's RUN line is. Generally speaking, X86's ISA evolution between different chips is not very clear. So this refactoring work is beyond my ETA. I uploaded a new version to reuse the most similar CHECKS. Does that look good to you?
Nothing pretty. Guess it depends on if SGX is going to not appear in more future CPUs or if this is a one off case. If it's going to continue then we could remove it from the inheritance and just give it to SKL, ICL, CNL, etc. individually.
Or we could just not default SGX on for any CPU. It's probably not all that useful in the backend anyway. Clang will put it in the target-feature attribute anyway. Having it in the backend feature lists doesn't really do anything since I don't think we have any IR intrinsics for SGX.
If you delete the definition of FeatureSGX, you need to remove the related code in X86Subtarget.h too. BTW, I don't think "there are no IR intrinsics for a feature" is a good reason to remove a feature.
I'm not sure that rocketlake has CLWB. Can you double check that? It's not listed in the cpuinfo dump on the 11700K that I found with a google search here https://www.pugetsystems.com/labs/hpc/Intel-Rocket-Lake-Compute-Performance-Results-HPL-HPCG-NAMD-and-Numpy-2116/
For now I have only an icelake-client machine and found that CLWB is not there, too. Guess I can do that modification in this patch? Rocketlake may probably lose CLWB. I'll double check.
Thanks. I also found this https://github.com/gcc-mirror/gcc/commit/c422e5f81f42a0fc197f0715f4fcd81f1be90bff can you create a new patch to do the same for llvm/clang and rebase this patch on top of it.
That would change the frontend behavior which would require a wider discussion. My suggestion was only to change the backend behavior since there was nothing testable with llc anyway.