This is an archive of the discontinued LLVM Phabricator instance.

[TLI] Per-function fveclib for math library used for vectorization
ClosedPublic

Authored by wenlei on Apr 6 2020, 11:59 PM.

Details

Summary

Encode -fveclib setting as per-function attribute so it can be threaded through to LTO backends. Accordingly, per-function TLI now reads the attribute and populated available vector function list based on that. Note that we expect functions within the same module to share fveclib setting, so vector function list is still shared between functions, as part of the shared TargetLibraryInfoImpl. Inlining between functions with different vect lib attribute is now blocked.

Diff Detail

Event Timeline

wenlei created this revision.Apr 6 2020, 11:59 PM
wenlei edited the summary of this revision. (Show Details)Apr 7 2020, 12:05 AM
wenlei added reviewers: tejohnson, hoyFB, spatel, gchatelet.

Needs testing of the inline handling, and of LTO linking IR with different attributes (which is going to hit your assert, see below).

clang/lib/CodeGen/CGCall.cpp
1987

Nit: why not "vec-lib" or just "veclib", to match the option?

llvm/include/llvm/Analysis/TargetLibraryInfo.h
47

Key comment about handling of TLII vs TLI. The former is computed once per module by the analysis (which is going to be the combined module in the case of LTO), the latter is the per-function data structure.

91

To avoid building and storing the VectorDescs and ScalarDescs for every function in the TLI, what I would do is keep 3 sets of VectorDescs/ScalarDescs on the TLII object (one set per possible veclib, built once per module during construction of the TLII), then move the new VectorLibrary member to the TLI and set it there per function based on the attribute, and use it to select which pair of VectorDescs/ScalarDescs is queried.

275–276

I don't think this will do anything currently since the TLII is built once per module by the analysis. You'll hit your assert about incompatibility below first, see comment there.

llvm/lib/Analysis/TargetLibraryInfo.cpp
1549

You'll certainly hit this assert if you try LTO linking two .ll files built with different -fveclib options, because the TLII is built once per module by the analysis.

1631

This is going to override the baseline TLI veclib with whatever is the latest function we build a TLI for (and you'll hit the assert as noted earlier if they conflict).

wenlei added a comment.Apr 8 2020, 4:40 PM

Thanks for taking a look and the suggestions, @tejohnson. Yeah, you're right about the potential conflict of attributes. Initially I thought even though we now allow this to be per-function, but since it comes from per-module switch, module level consistency can still be expected which can simplify things a bit (hence the assertion). But I overlooked the combined module from LTO.

I will get back to this later in the week - the change will be a bit more involving as there're a few other places where we populate the module level function list directly for TLII (clang and opt).

wenlei updated this revision to Diff 256350.Apr 9 2020, 11:56 AM

address feedback, allow functions within a module to have different vectlib setting. add test case for inline compatibility check.

wenlei marked 5 inline comments as done.Apr 9 2020, 11:58 AM

Great, thanks! A few minor comments below. Looks like it needs a clang-format too.

llvm/include/llvm/Analysis/TargetLibraryInfo.h
239

Suggest moving the implementation of this constructor to the .cpp file, in which case you can just set VectLibrary directly from ClVectorLibrary there and remove the member on the Impl object.

274

This is set via a flag called "inline-caller-superset-nobuiltin". Suggest changing the name to something like "inline-caller-superset-tli" to reflect new larger scope. Also add a check with that option to your new inline test case.

llvm/lib/Analysis/TargetLibraryInfo.cpp
576

Why not just have "i < NumVecLibs"?

587

ditto

1580

Should these two be llvm_unreachable?

wenlei marked 4 inline comments as done.Apr 9 2020, 1:31 PM
wenlei added inline comments.
llvm/include/llvm/Analysis/TargetLibraryInfo.h
239

There're utilities that use TargetLibraryInfo, but don't link with TargetLibraryInfo.o. And looking at TargetLibraryInfo, all of the functions are in this header, so I assumed it's intentional to keep this type self-contained in this header, as it's public API, which is why I add ClVectorLibrary to Impl to pass it back to TargetLibraryInfo. For TargetLibraryInfoImpl, it's ok to have the implementation outside of the header. I can give it a try if keeping the class implementation/definition self-contained in the header isn't important.

llvm/lib/Analysis/TargetLibraryInfo.cpp
576

Good catch, thanks..

tejohnson added inline comments.Apr 9 2020, 1:58 PM
llvm/include/llvm/Analysis/TargetLibraryInfo.h
239

I don't think there should be anything using TLI without linking with libAnalysis, which contains TargetLibraryInfo.o. I don't think it should be important to keep the implementation in the header, any more so than for other headers in the Analysis library.

wenlei updated this revision to Diff 256437.Apr 9 2020, 3:56 PM
wenlei marked an inline comment as done.

address feedback

wenlei marked 4 inline comments as done.Apr 9 2020, 3:58 PM
wenlei added inline comments.
llvm/include/llvm/Analysis/TargetLibraryInfo.h
239

Ok, I moved it to .cpp, thanks!

tejohnson accepted this revision.Apr 9 2020, 4:21 PM

lgtm. I think one check is missing in the test, see comment below.

llvm/test/Transforms/Inline/veclib-compat.ll
28 ↗(On Diff #256437)

I think NOSUPERSET should also check that there is still a call here. You can probably replace some of the duplicated checks with COMMON in this function too.

This revision is now accepted and ready to land.Apr 9 2020, 4:21 PM
wenlei updated this revision to Diff 256465.Apr 9 2020, 6:26 PM
wenlei marked an inline comment as done.

rebase, update test

This revision was automatically updated to reflect the committed changes.
nikic added a subscriber: nikic.Apr 10 2020, 1:55 AM

This change causes a ~0.5% compile-time regressions: http://llvm-compile-time-tracker.com/compare.php?from=5b18b6e9a84d985c0a907009fb71de7c1943bc88&to=60c642e74be6af86906d9f3d982728be7bd4329f&stat=instructions This is quite a lot as these things go, so it would be great if you could double check if there's any optimization potential here. In particular I'm wondering why this affects normal builds so much, even though they (presumably?) don't use any veclib at all.

This change causes a ~0.5% compile-time regressions: http://llvm-compile-time-tracker.com/compare.php?from=5b18b6e9a84d985c0a907009fb71de7c1943bc88&to=60c642e74be6af86906d9f3d982728be7bd4329f&stat=instructions This is quite a lot as these things go, so it would be great if you could double check if there's any optimization potential here. In particular I'm wondering why this affects normal builds so much, even though they (presumably?) don't use any veclib at all.

Thanks for the heads-up. This is surprising but there is a change even when veclib is not used - in order to allow each function to use different veclib without duplicating the work of populating vector function list for each function, we now always pre-populate vector function list for three supported vector libraries for each module. However 0.5% compile-time for that work given it's per-module is not expected. I suspect we may be passing/copying TLII around more than we anticipated (now we always have more stuff to copy). I will take a look. We could also turn this into a lazy initialization - only populate the needed list for module level TLII when it's first queried by a function level TLI.

This change causes a ~0.5% compile-time regressions: http://llvm-compile-time-tracker.com/compare.php?from=5b18b6e9a84d985c0a907009fb71de7c1943bc88&to=60c642e74be6af86906d9f3d982728be7bd4329f&stat=instructions This is quite a lot as these things go, so it would be great if you could double check if there's any optimization potential here. In particular I'm wondering why this affects normal builds so much, even though they (presumably?) don't use any veclib at all.

Thanks for the heads-up. This is surprising but there is a change even when veclib is not used - in order to allow each function to use different veclib without duplicating the work of populating vector function list for each function, we now always pre-populate vector function list for three supported vector libraries for each module. However 0.5% compile-time for that work given it's per-module is not expected. I suspect we may be passing/copying TLII around more than we anticipated (now we always have more stuff to copy). I will take a look. We could also turn this into a lazy initialization - only populate the needed list for module level TLII when it's first queried by a function level TLI.

Hmm, yeah that is surprising, because the TLII should be built once per module per TLI analysis, which is never invalidated. We've gone from populating one set of vec libs to 3, I wouldn't have thought that was particularly expensive, so it would be good to see what is going on here and confirm we are only building this once as expected.

Looking at the compile time data at that link, interestingly the "instructions" metric increased, but not wall time or cycles or task clock - they were all neutral.

The existing TLI provides a very convenient way to define a VecLib without LLVM knowing about it ahead of time. This feature is important for any embedded use of LLVM as a library out-of-tree (I'll add a unit-test in-tree).
I don't think it is a big change to this patch to preserve the current ability but I wanted to check first (and in the meantime I reverted in temporarily in https://reviews.llvm.org/D77925 to avoid the feature regression).

At the moment the place where you seem to use this knowledge is with the enum VectorLibrary in the TargetLibraryInfoImpl class, and the VecLibDescs array which statically contains the known VecLib.
It seems to me that if we replace this enum with a string instead to identify the VecLib everything should still hold together and this would fit with minor changes to this path. The VecLibDescs could just be a StringMap<VectorLibraryDescriptors> in this case.

That was a third-party (in my case the XLA compiler) can still register its own "XLA" VecLib and add all the descriptors.

How does it sound?

The existing TLI provides a very convenient way to define a VecLib without LLVM knowing about it ahead of time. This feature is important for any embedded use of LLVM as a library out-of-tree (I'll add a unit-test in-tree).
I don't think it is a big change to this patch to preserve the current ability but I wanted to check first (and in the meantime I reverted in temporarily in https://reviews.llvm.org/D77925 to avoid the feature regression).

At the moment the place where you seem to use this knowledge is with the enum VectorLibrary in the TargetLibraryInfoImpl class, and the VecLibDescs array which statically contains the known VecLib.
It seems to me that if we replace this enum with a string instead to identify the VecLib everything should still hold together and this would fit with minor changes to this path. The VecLibDescs could just be a StringMap<VectorLibraryDescriptors> in this case.

That was a third-party (in my case the XLA compiler) can still register its own "XLA" VecLib and add all the descriptors.

How does it sound?

I think this should work. Just reiterating something we chatted about off patch yesterday, we really need a unit test that mimics the behavior utilized by the XLA compiler, for regression testing.

I think this should work. Just reiterating something we chatted about off patch yesterday, we really need a unit test that mimics the behavior utilized by the XLA compiler, for regression testing.

Yes I pinged some of the XLA folks to make it happen.

The existing TLI provides a very convenient way to define a VecLib without LLVM knowing about it ahead of time. This feature is important for any embedded use of LLVM as a library out-of-tree (I'll add a unit-test in-tree).
I don't think it is a big change to this patch to preserve the current ability but I wanted to check first (and in the meantime I reverted in temporarily in https://reviews.llvm.org/D77925 to avoid the feature regression).

At the moment the place where you seem to use this knowledge is with the enum VectorLibrary in the TargetLibraryInfoImpl class, and the VecLibDescs array which statically contains the known VecLib.
It seems to me that if we replace this enum with a string instead to identify the VecLib everything should still hold together and this would fit with minor changes to this path. The VecLibDescs could just be a StringMap<VectorLibraryDescriptors> in this case.

That was a third-party (in my case the XLA compiler) can still register its own "XLA" VecLib and add all the descriptors.

How does it sound?

Thanks for the explanation about the revert. The proposal of using a StringMap to maintain the openness sounds good to me. And agree with @tejohnson, if the openness is a feature, it should be covered in tests, otherwise it can feel somewhat like a loophole and prone to breakage, though I can see how it can be useful.. Hope this patch can be restored with tweaks soon (we have workloads with very visible vectorization that relies on this).

This change causes a ~0.5% compile-time regressions: http://llvm-compile-time-tracker.com/compare.php?from=5b18b6e9a84d985c0a907009fb71de7c1943bc88&to=60c642e74be6af86906d9f3d982728be7bd4329f&stat=instructions This is quite a lot as these things go, so it would be great if you could double check if there's any optimization potential here. In particular I'm wondering why this affects normal builds so much, even though they (presumably?) don't use any veclib at all.

Thanks for the heads-up. This is surprising but there is a change even when veclib is not used - in order to allow each function to use different veclib without duplicating the work of populating vector function list for each function, we now always pre-populate vector function list for three supported vector libraries for each module. However 0.5% compile-time for that work given it's per-module is not expected. I suspect we may be passing/copying TLII around more than we anticipated (now we always have more stuff to copy). I will take a look. We could also turn this into a lazy initialization - only populate the needed list for module level TLII when it's first queried by a function level TLI.

Hmm, yeah that is surprising, because the TLII should be built once per module per TLI analysis, which is never invalidated. We've gone from populating one set of vec libs to 3, I wouldn't have thought that was particularly expensive, so it would be good to see what is going on here and confirm we are only building this once as expected.

Looking at the compile time data at that link, interestingly the "instructions" metric increased, but not wall time or cycles or task clock - they were all neutral.

Turns out there're a few places where we call copy ctor for TLI unnecessarily. Made some changes in D77952 to use move when possible. In addition, I should have used move for TLI.VecLibDescs in move ctor of TargetLibraryInfoImpl too.

And agree with @tejohnson, if the openness is a feature, it should be covered in tests, otherwise it can feel somewhat like a loophole and prone to breakage

The thing is that LLVM does not have much C++ unittests in general, so most of the "features" like this one that LLVM offers as a library are just an artifact of being "designed as a library" and being mindful about the layering.
From this point of view this patch is changing the design of a component that was modular/pluggable into a closed system. I'm perfectly fine with trying to add a unit-test, I just don't know yet where it would fit in the LLVM testing though.

This change causes a ~0.5% compile-time regressions: http://llvm-compile-time-tracker.com/compare.php?from=5b18b6e9a84d985c0a907009fb71de7c1943bc88&to=60c642e74be6af86906d9f3d982728be7bd4329f&stat=instructions This is quite a lot as these things go, so it would be great if you could double check if there's any optimization potential here. In particular I'm wondering why this affects normal builds so much, even though they (presumably?) don't use any veclib at all.

Thanks for the heads-up. This is surprising but there is a change even when veclib is not used - in order to allow each function to use different veclib without duplicating the work of populating vector function list for each function, we now always pre-populate vector function list for three supported vector libraries for each module. However 0.5% compile-time for that work given it's per-module is not expected. I suspect we may be passing/copying TLII around more than we anticipated (now we always have more stuff to copy). I will take a look. We could also turn this into a lazy initialization - only populate the needed list for module level TLII when it's first queried by a function level TLI.

Hmm, yeah that is surprising, because the TLII should be built once per module per TLI analysis, which is never invalidated. We've gone from populating one set of vec libs to 3, I wouldn't have thought that was particularly expensive, so it would be good to see what is going on here and confirm we are only building this once as expected.

Looking at the compile time data at that link, interestingly the "instructions" metric increased, but not wall time or cycles or task clock - they were all neutral.

Turns out there're a few places where we call copy ctor for TLI unnecessarily.

I assume you mean the TargetLibraryInfoImpl (TLII) here, not the TargetLibraryInfo (TLI), right? The latter should be cheap to copy. Are these the changes in BackendUtil.cpp in D77952? I had a question about that on that patch as I think we will be calling the initializer more. Mostly we should only be copying the TargetLibraryInfo during optimization though, and not the TLII impl object.

Made some changes in D77952 to use move when possible. In addition, I should have used move for TLI.VecLibDescs in move ctor of TargetLibraryInfoImpl too.

nikic added a comment.Apr 13 2020, 2:50 AM

I gave D77952 a try (on top of this one), but didn't see a significant improvement from that change.

Looking at the callgrind output for compilation of a small file, I see 52M total instructions, 4 calls to TLII initialization, where addition of the vector functions takes up the majority of the time, at 0.7M. Most of the cost is in the sorting. 2 of the initialization calls are default-constructed TLII without target triple, which seems suspect to me (are we not adding TLI early enough, and something pulls it in via analysis dependency?)

So for small files, just registering the vector functions does make up a non-trivial fraction of time, and lazy initialization might make sense. This isn't the whole truth though: While the largest regressions are indeed on small files, there are also quite a few > 1% regressions on very large files.

For a mid-size file with ~6000M instructions retried, the main difference I see is TargetLibraryAnalysis::run() going up from 82M to 126M, with the cost coming from the extra getFnAttribute("veclib") call in the TargetLibraryInfo constructor. Fetching attributes is surprisingly expensive, as it performs an iteration over all attributes internally. As this code is iterating over all attributes anyway in order to handle no-builtin-*, it might make sense to move the checks for "veclib" and "no-builtins" into that loop as well, which should make them essentially free.

And agree with @tejohnson, if the openness is a feature, it should be covered in tests, otherwise it can feel somewhat like a loophole and prone to breakage

The thing is that LLVM does not have much C++ unittests in general, so most of the "features" like this one that LLVM offers as a library are just an artifact of being "designed as a library" and being mindful about the layering.
From this point of view this patch is changing the design of a component that was modular/pluggable into a closed system.

The interfaces being relied on were in the underlying Impl class, I think if that is expected to be pluggable and stable it really needs unit testing to reflect that usage.

I'm perfectly fine with trying to add a unit-test, I just don't know yet where it would fit in the LLVM testing though.

Presumably the testing should be in llvm/unittests/Analysis/TargetLibraryInfoTest.cpp, which already exists but only tests the LibFuncs (builtins) interfaces.