Page MenuHomePhabricator

[X86] Remove the feature dependency handling in X86TargetInfo::setFeatureEnabledImpl to a table based lookup in X86TargetParser.cpp
ClosedPublic

Authored by craig.topper on Jul 6 2020, 6:45 PM.

Details

Summary

Previously we had to specify the forward and backwards feature dependencies separately which was error prone. And as dependencies have gotten more complex it was hard to be sure the transitive dependencies were handled correctly. The way it was written was also not super readable.

This patch replaces everything with a table that lists what features a feature is dependent on directly. Then we just have to recursively walk through the table to find the transitive dependencies. This is largely based on how we handle subtarget features in the MC layer from the tablegen descriptions.

Diff Detail

Event Timeline

craig.topper created this revision.Jul 6 2020, 6:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2020, 6:45 PM
Herald added subscribers: jfb, hiraditya. · View Herald Transcript

Drop two header includes I was using for debugging

echristo accepted this revision.Jul 6 2020, 6:48 PM

Works for me :)

This revision is now accepted and ready to land.Jul 6 2020, 6:48 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2020, 11:15 PM

Hi @craig.topper , @Nathan-Huckleberry and I are seeing getImpliedEnabledFeatures at the top of our profiles now. It seems that llvm::X86::getImpliedFeatures is repeatedly queried for the same inputs. Should we try to cache the ImpliedBits query or try to have X86TargetInfo::setFeatureEnabled not rebuild ImpliedBits every invocation? FWIW, I'm seeing getImpliedEnabledFeatures take 2.33% of compilation for the median duration to build input file from the x86 Linux kernel.

nickdesaulniers added a comment.EditedAug 4 2020, 3:48 PM

I just collected a perf profile for an entire Linux kernel x86 build. getImpliedDisabledFeatures is at the top with 9.15% of time spent in self.

Ow. Can revert and reapply after we fix the caching problem perhaps?

-eric

It's a pretty nasty revert in our downstream tree where we have prototyped future ISAs. So I'd like a little time to take a look.

@nickdesaulniers what cpu and fetaures are on your command lines. getImpliedDisabledFeatures should only be called if some feature is explicitly being disabled.

It's a pretty nasty revert in our downstream tree where we have prototyped future ISAs. So I'd like a little time to take a look.

@nickdesaulniers what cpu and fetaures are on your command lines. getImpliedDisabledFeatures should only be called if some feature is explicitly being disabled.

Likely, as the kernel is very explicit to disable the use of all FP extensions, see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/Makefile#n57. Maybe we could just set fewer flags, but there's still probably room for improvement here, too.

MaskRay added a subscriber: MaskRay.Aug 4 2020, 4:13 PM

I can optimize getImpliedDisabledFeatures & getImpliedEnabledFeatures

It's a pretty nasty revert in our downstream tree where we have prototyped future ISAs. So I'd like a little time to take a look.

@nickdesaulniers what cpu and fetaures are on your command lines. getImpliedDisabledFeatures should only be called if some feature is explicitly being disabled.

Likely, as the kernel is very explicit to disable the use of all FP extensions, see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/Makefile#n57. Maybe we could just set fewer flags, but there's still probably room for improvement here, too.

That said, it's a 10% compile time regression for compiling something like the linux kernel or anything that's very explicit what flags they set.

MaskRay added a comment.EditedAug 4 2020, 4:32 PM

Sent D85257 to hopefully fix the compile time regression.

That said, it's a 10% compile time regression for compiling something like the linux kernel or anything that's very explicit what flags they set.

Digging into the profile a little deeper:

+    9.20%  [.] getImpliedDisabledFeatures
+    1.46%  [.] llvm::X86::getImpliedFeatures
   - 3.00% llvm::StringMapImpl::LookupBucketFor
      + 1.55% llvm::StringMap<clang::IdentifierInfo*, llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul, 128ul> >::try_emplace<clang::IdentifierInfo*>
      - 1.11% llvm::StringMap<bool, llvm::MallocAllocator>::try_emplace<>
         + 1.11% clang::targets::X86TargetInfo::setFeatureEnabled
+    1.46%  [.] llvm::X86::getImpliedFeatures

so potentially a 13.23% compile time savings for the entire codebase.

That said, it's a 10% compile time regression for compiling something like the linux kernel or anything that's very explicit what flags they set.

Digging into the profile a little deeper:

+    9.20%  [.] getImpliedDisabledFeatures
+    1.46%  [.] llvm::X86::getImpliedFeatures
   - 3.00% llvm::StringMapImpl::LookupBucketFor
      + 1.55% llvm::StringMap<clang::IdentifierInfo*, llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096ul, 4096ul, 128ul> >::try_emplace<clang::IdentifierInfo*>
      - 1.11% llvm::StringMap<bool, llvm::MallocAllocator>::try_emplace<>
         + 1.11% clang::targets::X86TargetInfo::setFeatureEnabled
+    1.46%  [.] llvm::X86::getImpliedFeatures

so potentially a 13.23% compile time savings for the entire codebase.

I started on a patch to cache the feature map calculated in TargetInfo::CreateTargetInfo and use it whenever there isn't a target attribute instead of rebuilding it from the cached feature vector. Unfortunatley, that seems to be failing a WebAssembly test because they use some flags in initFeatureMap that are set by handleTargetFeatures which is called after initFeatureMap is called the first time. So the map is different when initFeatureMap is called from Sema and CodeGen can be different than the map created in TargetInfo::CreateTargetInfo so the cache doesn't work. I'll work with WebAssembly folks to see if this was intentional behavior.