This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
-
Metadata.h
-
lib/
-
Analysis/
2/3
TypeBasedAliasAnalysis.cpp
-
Transforms/InstCombine/
-
InstCombine/
1/4
InstCombineCalls.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
memcpy-tbaa.ll

Differential D100717

[InstCombine] Transform memcpy to ptr load/stores if TBAA says so
AbandonedPublic

Authored by aqjune on Apr 18 2021, 1:38 AM.

Download Raw Diff

Details

Reviewers

lebedev.ri
nikic
fhahn
spatel
nlopes
jeroen.dobbelaere

Summary

Folding inttoptr(ptrtoint p) -> p has been known as a source of miscompilation (http://llvm.org/pr34548)
because it swaps the object that some pointer is based-on.
Recently I've been playing with CastInst::isEliminableCastPair to see how removal of it
affects performance, and got very nice results.

On x86-64, compiling single amalgamated file benchmarks (bzip2, gzip, oggenc, ph7, sqlite3, gcc)[1, 2, 3]
didn't show a single difference in assembly even after the folding is removed.
I guess this is highly related to the recent efforts (D88788, D88789, D88979) in reducing the number of
inttoptr casts unnecessarily generated by middle-end transformations.

The int->ptr casts used in the source programs are either

(1) int->ptr casts of a constant, e.g. (int*)-1
(2) manipulating bits using constants, e.g. (int*)((intptr_t)x | 0x3

The first one cannot gain benefit from inttoptr(ptrtoint p) -> p folding anyway.
The second one may gain benefit in theory, but for the 6 benchmarks it didn't. Also, there are possible cases where folding is allowed, and we can suggest using llvm.ptrmask as well for better performance.

For larger benchmark, I chose LLVM (as suggested by @nlopes).
Compiling LLVM with/without this folding showed quite a few differences in assembly, however.
But they are still mostly from inttoptrs generated by the optimizations.

This patch fixes LLVM by using load/store i8* (which is char* in C/C++) instead if !tbaa_struct
says the element had a character type: https://godbolt.org/z/PTaM51fKo

This patch reduces the number of different assembly files from 1255 to 845 (-32%).
This also reduces the total number of inttoptrs after -O3 significantly for some files.

[1] https://people.csail.mit.edu/smcc/projects/single-file-programs/
[2] https://www.sqlite.org/amalgamation.html
[3] https://ph7.symisc.net/downloads.html

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aqjune created this revision.Apr 18 2021, 1:38 AM

Herald added subscribers: dexonsmith, pengfei, kosarev and 2 others. · View Herald TranscriptApr 18 2021, 1:38 AM

aqjune requested review of this revision.Apr 18 2021, 1:38 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 18 2021, 1:38 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B99367: Diff 338363.Apr 18 2021, 1:39 AM

lebedev.ri edited the summary of this revision. (Show Details)Apr 18 2021, 1:50 AM

aqjune edited the summary of this revision. (Show Details)Apr 18 2021, 1:50 AM

lebedev.ri edited the summary of this revision. (Show Details)Apr 18 2021, 1:52 AM

Oops, there was a race!

I'll apply your edit.

Nevermind, thank you for the link.

lebedev.ri accepted this revision.Apr 18 2021, 1:55 AM

lebedev.ri added inline comments.

llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp
455	ArrayRef<StringLiteral> / ArrayRef<StringRef>

This revision is now accepted and ready to land.Apr 18 2021, 1:55 AM

err, didn't mean to accept just yet

This revision now requires review to proceed.Apr 18 2021, 1:55 AM

lebedev.ri edited the summary of this revision. (Show Details)Apr 18 2021, 1:58 AM

Yep, this looks pretty good!

llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp
459–461	return is_contained(Descs, Tag1->getString());
469–471	return is_contained(Descs, Id->getString());

This revision is now accepted and ready to land.Apr 18 2021, 2:01 AM

I'm uncomfortable with using TBAA for this purpose. The problem is that not all languages use TBAA, simply because that does not match their aliasing model. They shouldn't be penalized because TBAA information gets reused for an unrelated purpose.

Might it make sense to simply always use a pointer type here (assuming size matches of course)?

In D100717#2697087, @nikic wrote:

I'm uncomfortable with using TBAA for this purpose.

The problem is that not all languages use TBAA, simply because that does not match their aliasing model.
They shouldn't be penalized because TBAA information gets reused for an unrelated purpose.

How does this penalize them?
If they don't use TBAA then there simply won't be a TBAA info on that memcpy,
and we'll use the previous behavior of performing an integer op.

Might it make sense to simply always use a pointer type here (assuming size matches of course)?

No. If it was a pointer we should treat it as such, if it was an integer we should treat it as such.
We shouldn't introduce bogus ptr<->int casts that weren't there originally.

In D100717#2697099, @lebedev.ri wrote:

In D100717#2697087, @nikic wrote:

I'm uncomfortable with using TBAA for this purpose.

The problem is that not all languages use TBAA, simply because that does not match their aliasing model.
They shouldn't be penalized because TBAA information gets reused for an unrelated purpose.

How does this penalize them?
If they don't use TBAA then there simply won't be a TBAA info on that memcpy,
and we'll use the previous behavior of performing an integer op.

It penalizes them because they cannot use TBAA metadata, because they don't use type-based alias analysis. So they don't have a way to provide the necessary information, without also opting into alias analysis behavior that does not match their language semantics. It binds two things (type hints for memcpy and alias analysis) together that don't have any relation.

I'm assuming here that it's not possible to use TBAA without also impacting alias analysis -- but I'm not really familiar with TBAA and maybe my assumption is simply wrong. Is it possible to specify !tbaa.struct metadata on memcpy's without affecting AA behavior in any way?

Might it make sense to simply always use a pointer type here (assuming size matches of course)?

No. If it was a pointer we should treat it as such, if it was an integer we should treat it as such.
We shouldn't introduce bogus ptr<->int casts that weren't there originally.

Just to clarify my thinking here: While we cannot fold inttoptr(ptrtoint p) -> p, we can fold ptrtoint(inttoptr p) -> p, so always doing the transfer as a pointer would avoid issues if the former fold is removed, because the latter would still eliminate cast pairs if it turns out to be the wrong type.

In D100717#2697110, @nikic wrote:

In D100717#2697099, @lebedev.ri wrote:

In D100717#2697087, @nikic wrote:

I'm uncomfortable with using TBAA for this purpose.

The problem is that not all languages use TBAA, simply because that does not match their aliasing model.
They shouldn't be penalized because TBAA information gets reused for an unrelated purpose.

How does this penalize them?
If they don't use TBAA then there simply won't be a TBAA info on that memcpy,
and we'll use the previous behavior of performing an integer op.

It penalizes them because they cannot use TBAA metadata, because they don't use type-based alias analysis. So they don't have a way to provide the necessary information, without also opting into alias analysis behavior that does not match their language semantics.

Sure, but how does *this* patch penalize them?
Isn't that more about the removal of faulty inttoptr(ptrtoint p) -> p fold?

It binds two things (type hints for memcpy and alias analysis) together that don't have any relation.

I'm assuming here that it's not possible to use TBAA without also impacting alias analysis -- but I'm not really familiar with TBAA and maybe my assumption is simply wrong. Is it possible to specify !tbaa.struct metadata on memcpy's without affecting AA behavior in any way?

Might it make sense to simply always use a pointer type here (assuming size matches of course)?

No. If it was a pointer we should treat it as such, if it was an integer we should treat it as such.
We shouldn't introduce bogus ptr<->int casts that weren't there originally.

Just to clarify my thinking here: While we cannot fold inttoptr(ptrtoint p) -> p, we can fold ptrtoint(inttoptr p) -> p, so always doing the transfer as a pointer would avoid issues if the former fold is removed, because the latter would still eliminate cast pairs if it turns out to be the wrong type.

Both don't really sound sound to me, but i may be wrong.
Regardless, this seems like it will create polarly opposite problem to those fixed in referenced patches.

It penalizes them because they cannot use TBAA metadata, because they don't use type-based alias analysis. So they don't have a way to provide the necessary information, without also opting into alias analysis behavior that does not match their language semantics. It binds two things (type hints for memcpy and alias analysis) together that don't have any relation.
I'm assuming here that it's not possible to use TBAA without also impacting alias analysis -- but I'm not really familiar with TBAA and maybe my assumption is simply wrong. Is it possible to specify !tbaa.struct metadata on memcpy's without affecting AA behavior in any way?

TBAA is used because otherwise there is no way to preserve the source’s data type down to middle-end optimizations.
To avoid the unwanted alias analysis, I think we have following options:

Use unsigned char for all other fields in tbaa_struct: unsigned char pointer is supposed to alias any pointer in C/C++, so it won't hurt.
Update optimization pipeline to never run TypeBasedAliasAnalysis for specific frontends.

If the language is supposed to add their own !tbaa/!tbaa_struct support, I guess either

TypeBasedAliasAnalysis can be extended to be more pluggable (allow one to add its own rule)
The language’s own TypeBasedAliasAnalysis can be just implemented and used instead.

Might it make sense to simply always use a pointer type here (assuming size matches of course)?

Just to clarify my thinking here: While we cannot fold inttoptr(ptrtoint p) -> p, we can fold ptrtoint(inttoptr p) -> p, so always doing the transfer as a pointer would avoid issues if the former fold is removed, because the latter would still eliminate cast pairs if it turns out to be the wrong type.

I also think it is desirable to preserve the original pointee type; it's because it again brings the complexity to the semantics of load/store.
Also, it is possible to introduce inttoptr again:

store i64 %i, i64* %p
memcpy(q, p, 8)
->
store i64 %i, i64* %p
%j = load i8* %p
store i8* %j, i8** %q
->
store i64 %i, i64* %p
%j = inttoptr %i to i8*
store i8* j, i8** %q

Existence of inttoptr complicates analyses to many attributes, e.g. nocapture/nofree/etc, so let's move towards avoiding introduction of them. :)

I don't see how this use of TBAA metadata is within the specification of the metadata.

IIUC the type descriptors only encode the layout and relationship between types, but not whether a scalar type is a pointer, integer, floating point or other type. It just happens to be that Clang uses names that are related to the source types. But IIUC those names are arbitrary and Clang could just as well use A, B, C,... or any other naming scheme.

Please let me know if I am missing something that provides the guarantees the patch uses.

In D100717#2697125, @fhahn wrote:

I don't see how this use of TBAA metadata is within the specification of the metadata.

IIUC the type descriptors only encode the layout and relationship between types, but not whether a scalar type is a pointer, integer, floating point or other type. It just happens to be that Clang uses names that are related to the source types. But IIUC those names are arbitrary and Clang could just as well use A, B, C,... or any other naming scheme.

Please let me know if I am missing something that provides the guarantees the patch uses.

For metadata nodes whose descriptions are simply C/C++'s type name they shouldn't carry info, but for those with special descriptions would it be reasonable if they are given special meaning?
I thought TBAA used the description because MDNode::isTBAAVtableAccess checks whether description is "vtable pointer".

In D100717#2697137, @aqjune wrote:

In D100717#2697125, @fhahn wrote:

I don't see how this use of TBAA metadata is within the specification of the metadata.

IIUC the type descriptors only encode the layout and relationship between types, but not whether a scalar type is a pointer, integer, floating point or other type. It just happens to be that Clang uses names that are related to the source types. But IIUC those names are arbitrary and Clang could just as well use A, B, C,... or any other naming scheme.

Please let me know if I am missing something that provides the guarantees the patch uses.

For metadata nodes whose descriptions are simply C/C++'s type name they shouldn't carry info, but for those with special descriptions would it be reasonable if they are given special meaning?
I thought TBAA used the description because MDNode::isTBAAVtableAccess checks whether description is "vtable pointer".

The vtable pointer case seems only to be used by the ThreadSanitizer.

I am not opposed to provide special meaning to certain nodes, but this should be mentioned in the documentation. Currently vtable pointer and any pointer are not documented as special descriptors in the langref's tbaa/tbaa.struct chapters.

@jdoerfert what do you think ?

I am not sure what kind of complications to expect when you have pointers pointing to a non-0 address space. These can differ in size and representation, but that should be taken care of by the size check.

First thought: Why is there no test in which we generate no int2ptr with the patch but do without?

@fhahn: I don't see how this use of TBAA metadata is within the specification of the metadata.

I read through the code and I'm not sure if this is even a semantic change. I mean, can't we pick any
type to do the memory transfer expansion? If so, TBAA metadata as a heuristic should be totally fine.
We could also look at uses of the source and target pointers, for example. That said, we should write
in the lang ref that we do use the TBAA names for heuristics and they should be chosen to match the code/intent.

In D100717#2698680, @jdoerfert wrote:

First thought: Why is there no test in which we generate no int2ptr with the patch but do without?

@fhahn: I don't see how this use of TBAA metadata is within the specification of the metadata.

I read through the code and I'm not sure if this is even a semantic change. I mean, can't we pick any
type to do the memory transfer expansion? If so, TBAA metadata as a heuristic should be totally fine.
We could also look at uses of the source and target pointers, for example. That said, we should write
in the lang ref that we do use the TBAA names for heuristics and they should be chosen to match the code/intent.

For CHERI/Morello the type of the memory transfer does matter. For us pointers are 65/129-bit capabilities (32/64-bit addresses, bounds & permission metadata, and a hidden validity bit). The hardware clears the validity bit when storing an integer to a memory location that holds a valid capability. So if we convert a pointer-type memcpy to a i64/i128 load+store the pointer becomes invalid and will trigger in a trap on next dereference.
We therefore have to use pointer-type loads+stores as the default if the size matches the pointer size and pointers in that AS are capabilities.
In our fork this can be checked using DL.isFatPointer(AS) (a weaker version of DL.isNonIntegralAddressSpace(AS) that allows ptrtoint, but not inttoptr).

A few years ago @theraven added a workaround to our fork to allow the memcpy->load+store transformation for 128-bit CHERI (since it is guarded by Size <=8): https://github.com/CTSRD-CHERI/llvm-project/commit/f477ef4d292eb45871fc6cab0262d0844ce5d5aa

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
177	I wonder if it would make sense to default to pointer load+store for non-integral address space? For CHERI we would definitely want this to be the default, but we can carry that patch downstream.
178

lebedev.ri added inline comments.Apr 19 2021, 10:26 AM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
177	I'm not really sure why that would be the right default, but regardless let's not conflate that here.

arichardson added inline comments.Apr 19 2021, 10:30 AM

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
177	Please ignore this suggestion. For some reason I read this as `SrcAddrSp` being the address space of the pointee, which we of course don't have here. The address space of the pointer is irrelevant in this case.

dexonsmith removed a subscriber: dexonsmith.Apr 19 2021, 11:20 AM

@aqjune can you please add requested langref wording?

Once the LangRef changes are there, I'd suggest cross-posting this to llvm-dev for visibility. This patch was discussed at the last alias analysis meeting, and I think most people were in favor of using metadata for this purpose, but weren't super fond of determining the type to use based on string matching. Maybe someone else has more detailed notes...

In D100717#2706571, @nikic wrote:

Once the LangRef changes are there, I'd suggest cross-posting this to llvm-dev for visibility. This patch was discussed at the last alias analysis meeting,

(hm, was there no reminder mail, or did i miss it?)

and I think most people were in favor of using metadata for this purpose, but weren't super fond of determining the type to use based on string matching. Maybe someone else has more detailed notes...

To be noted, isn't really inventing a new approach though, it's already done for vtable pointers.
I feel like that if this isn't ok, then the existing code also should be removed.

In D100717#2706600, @lebedev.ri wrote:

In D100717#2706571, @nikic wrote:

Once the LangRef changes are there, I'd suggest cross-posting this to llvm-dev for visibility. This patch was discussed at the last alias analysis meeting,

(hm, was there no reminder mail, or did i miss it?)

and I think most people were in favor of using metadata for this purpose, but weren't super fond of determining the type to use based on string matching. Maybe someone else has more detailed notes...

To be noted, isn't really inventing a new approach though, it's already done for vtable pointers.
I feel like that if this isn't ok, then the existing code also should be removed.

It is done for vtable pointers, but that is only used by the thread sanitizer. There was discussion about using the available information in tbaa for guiding the type to load/store (1) vs forcing to use tbaa to guide what kind of load/stores are used. (2)

(1) is deemed to be acceptable, if it is clearly documented. The forced naming based on what names clang produces is unfortunate, but as a stopgap this would be acceptable until we come up with a cleaner mechanism.
That cleaner mechanism could be part of the rework of new struct path tbaa (that was also discussed at the meeting).

For (2) we don't have a good idea yet. Maybe a separate !tb.struct with the same layout could be used for this particular purpose if !tbaa.struct is not available (just thinking out loud).

Thank you for the comments. I'll update this patch to contain tests that were creating inttoptr before.

In D100717#2707333, @jeroen.dobbelaere wrote:

It is done for vtable pointers, but that is only used by the thread sanitizer. There was discussion about using the available information in tbaa for guiding the type to load/store (1) vs forcing to use tbaa to guide what kind of load/stores are used. (2)

(1) is deemed to be acceptable, if it is clearly documented. The forced naming based on what names clang produces is unfortunate, but as a stopgap this would be acceptable until we come up with a cleaner mechanism.
That cleaner mechanism could be part of the rework of new struct path tbaa (that was also discussed at the meeting).

For (2) we don't have a good idea yet. Maybe a separate !tb.struct with the same layout could be used for this particular purpose if !tbaa.struct is not available (just thinking out loud).

I believe this patch falls into category (1) - this patch is mainly for removal of inttoptrs created by middle-end transformations, helping deletion of inttoptr(ptrtoint p) -> p fold.

I'll create a LangRef patch that describes the policy & share it to llvm-dev tomorrow.

In D100717#2699004, @arichardson wrote:

For CHERI/Morello the type of the memory transfer does matter. For us pointers are 65/129-bit capabilities (32/64-bit addresses, bounds & permission metadata, and a hidden validity bit). The hardware clears the validity bit when storing an integer to a memory location that holds a valid capability. So if we convert a pointer-type memcpy to a i64/i128 load+store the pointer becomes invalid and will trigger in a trap on next dereference.
We therefore have to use pointer-type loads+stores as the default if the size matches the pointer size and pointers in that AS are capabilities.
In our fork this can be checked using DL.isFatPointer(AS) (a weaker version of DL.isNonIntegralAddressSpace(AS) that allows ptrtoint, but not inttoptr).

A few years ago @theraven added a workaround to our fork to allow the memcpy->load+store transformation for 128-bit CHERI (since it is guarded by Size <=8): https://github.com/CTSRD-CHERI/llvm-project/commit/f477ef4d292eb45871fc6cab0262d0844ce5d5aa

This is really interesting! IIUC my patch will bring benefit to this case as well? cheri-memcpy.ll also uses !tbaa.struct and my patch preserves the address space of the pointer type.

Add the inttoptr-removing example, address comments

aqjune marked 3 inline comments as done.Apr 23 2021, 6:54 AM

Harbormaster completed remote builds in B100558: Diff 340008.Apr 23 2021, 6:54 AM

Looks good.

Still two remarks:

Can you add a langref update indicating that vtable pointer and any pointer names might be used to drive optimization passes/code generation ?
Can you extend the testcase with a vtable pointer and an int case ?

And also two questions:

The testcase tries out a p64:64:64 and a p32:32:32 case, assuming that the p32:32:32 case should use a i64. Is this indeed the expected lowering ? Or would we expect two pointer load/stores ?
When I try out a struct with single pointer fields and array pointer fields, I see that clang produces separate pointer fields for the single pointers and a large 'char' field for the array of pointers. (See https://www.godbolt.org/z/PT3qorvnq ) Maybe that is something that could be changed later in a pointer field with a larger size ?

extend the test case, split the 32bit cast into a separate file to memcpy 4 bytes only

aqjune mentioned this in D101185: [LangRef] tbaa: type names can be used as hints to optimizations.Apr 23 2021, 10:33 AM

In D100717#2712186, @jeroen.dobbelaere wrote:

The testcase tries out a p64:64:64 and a p32:32:32 case, assuming that the p32:32:32 case should use a i64. Is this indeed the expected lowering ? Or would we expect two pointer load/stores ?

I found that the test case was indeed bogus a bit; the corresponding !tbaa_struct has one pointer field, so it is 4 bytes, but the memcpy was copying 8 bytes.
I made a separate test (memcpy-tbaa32.ll) that copies 4 bytes only.

When I try out a struct with single pointer fields and array pointer fields, I see that clang produces separate pointer fields for the single pointers and a large 'char' field for the array of pointers. (See https://www.godbolt.org/z/PT3qorvnq ) Maybe that is something that could be changed later in a pointer field with a larger size ?

This is interesting, the !tbaa_struct is using char. I also think using a pointer type is totally fine in this case.

Harbormaster completed remote builds in B100624: Diff 340105.Apr 23 2021, 10:55 AM

In D100717#2698680, @jdoerfert wrote:

I read through the code and I'm not sure if this is even a semantic change. I mean, can't we pick any
type to do the memory transfer expansion? If so, TBAA metadata as a heuristic should be totally fine.
We could also look at uses of the source and target pointers, for example. That said, we should write
in the lang ref that we do use the TBAA names for heuristics and they should be chosen to match the code/intent.

I agree with @nikic and @fhahn that we are trying to encode new information beyond what TBAA supposed to describe according to LangRef. I think vtable pointer handling is particularly telling - it benefits from special treatment, since it is not a "regular" pointer. Conceptually this change (along with vtable support) belong to the same problem space as encoding non-aliasing pointers (cough, Fortran). In all this cases we need to know when something is a pointer, while simulateneously looking at the rules of source language to short-circuit optimization heuristics.

Aside from may be generic pointer handling, I am not super comfortable in baking this kind of logic into the backend and would be interested in finding a better solution in the long run. Though the challenge is that different source languages have completely unrelated pointer rules, which would require different heuristics.

@aqjune please can you prepare the same change for SROA?

In D100717#2713553, @penzn wrote:

In D100717#2698680, @jdoerfert wrote:

I read through the code and I'm not sure if this is even a semantic change. I mean, can't we pick any
type to do the memory transfer expansion? If so, TBAA metadata as a heuristic should be totally fine.
We could also look at uses of the source and target pointers, for example. That said, we should write
in the lang ref that we do use the TBAA names for heuristics and they should be chosen to match the code/intent.

I agree with @nikic and @fhahn that we are trying to encode new information beyond what TBAA supposed to describe according to LangRef. I think vtable pointer handling is particularly telling - it benefits from special treatment, since it is not a "regular" pointer. Conceptually this change (along with vtable support) belong to the same problem space as encoding non-aliasing pointers (cough, Fortran). In all this cases we need to know when something is a pointer, while simulateneously looking at the rules of source language to short-circuit optimization heuristics.

Aside from may be generic pointer handling, I am not super comfortable in baking this kind of logic into the backend and would be interested in finding a better solution in the long run. Though the challenge is that different source languages have completely unrelated pointer rules, which would require different heuristics.

Instead of using !tbaa as proxy for intent, couldn't we get the actual intent by looking at the users of the pointer?

In D100717#2716833, @fhahn wrote:

In D100717#2713553, @penzn wrote:

In D100717#2698680, @jdoerfert wrote:

I read through the code and I'm not sure if this is even a semantic change. I mean, can't we pick any
type to do the memory transfer expansion? If so, TBAA metadata as a heuristic should be totally fine.
We could also look at uses of the source and target pointers, for example. That said, we should write
in the lang ref that we do use the TBAA names for heuristics and they should be chosen to match the code/intent.

I agree with @nikic and @fhahn that we are trying to encode new information beyond what TBAA supposed to describe according to LangRef. I think vtable pointer handling is particularly telling - it benefits from special treatment, since it is not a "regular" pointer. Conceptually this change (along with vtable support) belong to the same problem space as encoding non-aliasing pointers (cough, Fortran). In all this cases we need to know when something is a pointer, while simulateneously looking at the rules of source language to short-circuit optimization heuristics.

Aside from may be generic pointer handling, I am not super comfortable in baking this kind of logic into the backend and would be interested in finding a better solution in the long run. Though the challenge is that different source languages have completely unrelated pointer rules, which would require different heuristics.

Instead of using !tbaa as proxy for intent, couldn't we get the actual intent by looking at the users of the pointer?

No.

In D100717#2716838, @lebedev.ri wrote:

In D100717#2716833, @fhahn wrote:

In D100717#2713553, @penzn wrote:

In D100717#2698680, @jdoerfert wrote:

I read through the code and I'm not sure if this is even a semantic change. I mean, can't we pick any
type to do the memory transfer expansion? If so, TBAA metadata as a heuristic should be totally fine.
We could also look at uses of the source and target pointers, for example. That said, we should write
in the lang ref that we do use the TBAA names for heuristics and they should be chosen to match the code/intent.

I agree with @nikic and @fhahn that we are trying to encode new information beyond what TBAA supposed to describe according to LangRef. I think vtable pointer handling is particularly telling - it benefits from special treatment, since it is not a "regular" pointer. Conceptually this change (along with vtable support) belong to the same problem space as encoding non-aliasing pointers (cough, Fortran). In all this cases we need to know when something is a pointer, while simulateneously looking at the rules of source language to short-circuit optimization heuristics.

Aside from may be generic pointer handling, I am not super comfortable in baking this kind of logic into the backend and would be interested in finding a better solution in the long run. Though the challenge is that different source languages have completely unrelated pointer rules, which would require different heuristics.

Instead of using !tbaa as proxy for intent, couldn't we get the actual intent by looking at the users of the pointer?

No.

That does not really answer my question for this specific case though I think (or I miss how this is related to this particular issue). IIUC the linked patch correctly, it undos converting pointer load/stores to integer ones if possible.

But my suggestion/question is the other way around: check if a pointer is used to load a pointer instead of checking if tbaa claims it should be treated as pointer.

In D100717#2713553, @penzn wrote:

Aside from may be generic pointer handling, I am not super comfortable in baking this kind of logic into the backend and would be interested in finding a better solution in the long run. Though the challenge is that different source languages have completely unrelated pointer rules, which would require different heuristics.

I think the underlying problem is that there are only two options of types when canonicalizing memcpy/load/store: either integer type or pointer type.
Once it is canonicalized into one of these two, memory accesses with another type requires introduction of casting that is not easy to be removed due to the miscompilation issues.

In D100717#2716900, @fhahn wrote:

But my suggestion/question is the other way around: check if a pointer is used to load a pointer instead of checking if tbaa claims it should be treated as pointer.

Well, actually this sounds like a great idea... Let me check whether it works.

In D100717#2716812, @lebedev.ri wrote:

@aqjune please can you prepare the same change for SROA?

I'm not familiar with SROA, but will try.

In D100717#2718366, @aqjune wrote:

In D100717#2716900, @fhahn wrote:

But my suggestion/question is the other way around: check if a pointer is used to load a pointer instead of checking if tbaa claims it should be treated as pointer.

Well, actually this sounds like a great idea... Let me check whether it works.

At least one case where this won't work is when the llvm.memcpy is expanded first, before the containing function is inlined in another one, where the access types are exposed.

In D100717#2718740, @jeroen.dobbelaere wrote:

In D100717#2718366, @aqjune wrote:

In D100717#2716900, @fhahn wrote:

But my suggestion/question is the other way around: check if a pointer is used to load a pointer instead of checking if tbaa claims it should be treated as pointer.

Well, actually this sounds like a great idea... Let me check whether it works.

At least one case where this won't work is when the llvm.memcpy is expanded first, before the containing function is inlined in another one, where the access types are exposed.

That's precisely my point about that being a flipped-direction of D88789.
Nowadays, it should be a obviously not a good idea.
I don't believe i will be okay with a patch doing that.

Matt added a subscriber: Matt.May 6 2021, 7:32 AM

Since there is a second opinion about using TBAA, will close this for now.

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

Metadata.h

3 lines

lib/

Analysis/

TypeBasedAliasAnalysis.cpp

26 lines

Transforms/

InstCombine/

InstCombineCalls.cpp

22 lines

test/

Transforms/

InstCombine/

memcpy-tbaa.ll

10 lines

Diff 340008

llvm/include/llvm/IR/Metadata.h

//===- llvm/IR/Metadata.h - Metadata definitions ----------------- C++ --===//		//===- llvm/IR/Metadata.h - Metadata definitions ----------------- C++ --===//
		Lint: Lint Inline Actions clang-format not found in user's PATH; not linting file. Lint: Lint: clang-format not found in user's PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 1,105 Lines • ▼ Show 20 Lines	default:
return false;		return false;
#define HANDLE_MDNODE_LEAF(CLASS) \		#define HANDLE_MDNODE_LEAF(CLASS) \
case CLASS##Kind: \		case CLASS##Kind: \
return true;		return true;
#include "llvm/IR/Metadata.def"		#include "llvm/IR/Metadata.def"
}		}
}		}

/// Check whether MDNode is a vtable access.		/// Check whether MDNode is a pointer/vtable access.
bool isTBAAVtableAccess() const;		bool isTBAAVtableAccess() const;
		bool isTBAAPointerAccess() const;

/// Methods for metadata merging.		/// Methods for metadata merging.
static MDNode concatenate(MDNode A, MDNode *B);		static MDNode concatenate(MDNode A, MDNode *B);
static MDNode intersect(MDNode A, MDNode *B);		static MDNode intersect(MDNode A, MDNode *B);
static MDNode getMostGenericTBAA(MDNode A, MDNode *B);		static MDNode getMostGenericTBAA(MDNode A, MDNode *B);
static MDNode getMostGenericFPMath(MDNode A, MDNode *B);		static MDNode getMostGenericFPMath(MDNode A, MDNode *B);
static MDNode getMostGenericRange(MDNode A, MDNode *B);		static MDNode getMostGenericRange(MDNode A, MDNode *B);
static MDNode getMostGenericAliasScope(MDNode A, MDNode *B);		static MDNode getMostGenericAliasScope(MDNode A, MDNode *B);
▲ Show 20 Lines • Show All 364 Lines • Show Last 20 Lines

llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp

//===- TypeBasedAliasAnalysis.cpp - Type-Based Alias Analysis -------------===//		//===- TypeBasedAliasAnalysis.cpp - Type-Based Alias Analysis -------------===//
		Lint: Lint Inline Actions clang-format not found in user's PATH; not linting file. Lint: Lint: clang-format not found in user's PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 437 Lines • ▼ Show 20 Lines	ModRefInfo TypeBasedAAResult::getModRefInfo(const CallBase *Call1,
if (const MDNode *M1 = Call1->getMetadata(LLVMContext::MD_tbaa))		if (const MDNode *M1 = Call1->getMetadata(LLVMContext::MD_tbaa))
if (const MDNode *M2 = Call2->getMetadata(LLVMContext::MD_tbaa))		if (const MDNode *M2 = Call2->getMetadata(LLVMContext::MD_tbaa))
if (!Aliases(M1, M2))		if (!Aliases(M1, M2))
return ModRefInfo::NoModRef;		return ModRefInfo::NoModRef;

return AAResultBase::getModRefInfo(Call1, Call2, AAQI);		return AAResultBase::getModRefInfo(Call1, Call2, AAQI);
}		}

bool MDNode::isTBAAVtableAccess() const {		static bool isAccessWithDesc(const MDNode *Node,
if (!isStructPathTBAA(this)) {		const ArrayRef<StringRef> Descs) {
		lebedev.riUnsubmitted Done Reply Inline Actions ArrayRef<StringLiteral> / ArrayRef<StringRef> lebedev.ri: ArrayRef<StringLiteral> / ArrayRef<StringRef>
if (getNumOperands() < 1)		if (!isStructPathTBAA(Node)) {
		if (Node->getNumOperands() < 1)
return false;		return false;
if (MDString *Tag1 = dyn_cast<MDString>(getOperand(0))) {		if (MDString *Tag1 = dyn_cast<MDString>(Node->getOperand(0)))
if (Tag1->getString() == "vtable pointer")		return is_contained(Descs, Tag1->getString());
return true;
}
return false;		return false;
		lebedev.riUnsubmitted Done Reply Inline Actions return is_contained(Descs, Tag1->getString()); lebedev.ri: return is_contained(Descs, Tag1->getString());
}		}

// For struct-path aware TBAA, we use the access type of the tag.		// For struct-path aware TBAA, we use the access type of the tag.
TBAAStructTagNode Tag(this);		TBAAStructTagNode Tag(Node);
TBAAStructTypeNode AccessType(Tag.getAccessType());		TBAAStructTypeNode AccessType(Tag.getAccessType());
if(auto *Id = dyn_cast<MDString>(AccessType.getId()))		if(auto *Id = dyn_cast<MDString>(AccessType.getId()))
if (Id->getString() == "vtable pointer")		return is_contained(Descs, Id->getString());
return true;
return false;		return false;
}		}

		lebedev.riUnsubmitted Not Done Reply Inline Actions return is_contained(Descs, Id->getString()); lebedev.ri: return is_contained(Descs, Id->getString());
		bool MDNode::isTBAAVtableAccess() const {
		return isAccessWithDesc(this, {"vtable pointer"});
		}

		bool MDNode::isTBAAPointerAccess() const {
		return isAccessWithDesc(this, {"vtable pointer", "any pointer"});
		}

static bool matchAccessTags(const MDNode A, const MDNode B,		static bool matchAccessTags(const MDNode A, const MDNode B,
const MDNode **GenericTag = nullptr);		const MDNode **GenericTag = nullptr);

MDNode MDNode::getMostGenericTBAA(MDNode A, MDNode *B) {		MDNode MDNode::getMostGenericTBAA(MDNode A, MDNode *B) {
const MDNode *GenericTag;		const MDNode *GenericTag;
matchAccessTags(A, B, &GenericTag);		matchAccessTags(A, B, &GenericTag);
return const_cast<MDNode*>(GenericTag);		return const_cast<MDNode*>(GenericTag);
}		}
▲ Show 20 Lines • Show All 307 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

//===- InstCombineCalls.cpp -----------------------------------------------===// //===- InstCombineCalls.cpp -----------------------------------------------===//

Lint: Lint

clang-format not found in user's PATH; not linting file.

Lint: Lint: clang-format not found in user's PATH; not linting file.

// //

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information. // See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

// //

//===----------------------------------------------------------------------===// //===----------------------------------------------------------------------===//

// //

▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines Instruction *InstCombinerImpl::SimplifyAnyMemTransfer(AnyMemTransferInst *MI) {

// If it is an atomic and alignment is less than the size then we will // If it is an atomic and alignment is less than the size then we will

// introduce the unaligned memory access which will be later transformed // introduce the unaligned memory access which will be later transformed

// into libcall in CodeGen. This is not evident performance gain so disable // into libcall in CodeGen. This is not evident performance gain so disable

// it now. // it now.

if (isa<AtomicMemTransferInst>(MI)) if (isa<AtomicMemTransferInst>(MI))

if (*CopyDstAlign < Size || *CopySrcAlign < Size) if (*CopyDstAlign < Size || *CopySrcAlign < Size)

return nullptr; return nullptr;

// Use an integer load+store unless we can find something better.

unsigned SrcAddrSp = unsigned SrcAddrSp =

cast<PointerType>(MI->getArgOperand(1)->getType())->getAddressSpace(); cast<PointerType>(MI->getArgOperand(1)->getType())->getAddressSpace();

unsigned DstAddrSp = unsigned DstAddrSp =

cast<PointerType>(MI->getArgOperand(0)->getType())->getAddressSpace(); cast<PointerType>(MI->getArgOperand(0)->getType())->getAddressSpace();

IntegerType* IntType = IntegerType::get(MI->getContext(), Size<<3);

Type *NewSrcPtrTy = PointerType::get(IntType, SrcAddrSp);

Type *NewDstPtrTy = PointerType::get(IntType, DstAddrSp);

// If the memcpy has metadata describing the members, see if we can get the // If the memcpy has metadata describing the members, see if we can get the

// TBAA tag describing our copy. // TBAA tag describing our copy.

MDNode *CopyMD = nullptr; MDNode *CopyMD = nullptr;

if (MDNode *M = MI->getMetadata(LLVMContext::MD_tbaa)) { if (MDNode *M = MI->getMetadata(LLVMContext::MD_tbaa)) {

CopyMD = M; CopyMD = M;

} else if (MDNode *M = MI->getMetadata(LLVMContext::MD_tbaa_struct)) { } else if (MDNode *M = MI->getMetadata(LLVMContext::MD_tbaa_struct)) {

if (M->getNumOperands() == 3 && M->getOperand(0) && if (M->getNumOperands() == 3 && M->getOperand(0) &&

mdconst::hasa<ConstantInt>(M->getOperand(0)) && mdconst::hasa<ConstantInt>(M->getOperand(0)) &&

mdconst::extract<ConstantInt>(M->getOperand(0))->isZero() && mdconst::extract<ConstantInt>(M->getOperand(0))->isZero() &&

M->getOperand(1) && M->getOperand(1) &&

mdconst::hasa<ConstantInt>(M->getOperand(1)) && mdconst::hasa<ConstantInt>(M->getOperand(1)) &&

mdconst::extract<ConstantInt>(M->getOperand(1))->getValue() == mdconst::extract<ConstantInt>(M->getOperand(1))->getValue() ==

Size && Size &&

M->getOperand(2) && isa<MDNode>(M->getOperand(2))) M->getOperand(2) && isa<MDNode>(M->getOperand(2)))

CopyMD = cast<MDNode>(M->getOperand(2)); CopyMD = cast<MDNode>(M->getOperand(2));

} }

Type* DataType = nullptr;

if (CopyMD && CopyMD->isTBAAPointerAccess() && SrcAddrSp == DstAddrSp &&

arichardsonUnsubmitted

Not Done

Type* DataType = nullptr;

- if (CopyMD && CopyMD->isTBAAPointerAccess() && SrcAddrSp == DstAddrSp &&

+ if (SrcAddrSp == DstAddrSp &&

+ ((CopyMD && CopyMD->isTBAAPointerAccess()) ||

+ DL.isNonIntegralAddressSpace(SrcAddrSp)) &&

Size * 8 == DL.getPointerSizeInBits(SrcAddrSp)) {

I wonder if it would make sense to default to pointer load+store for non-integral address space?
For CHERI we would definitely want this to be the default, but we can carry that patch downstream.

arichardson: I wonder if it would make sense to default to pointer load+store for non-integral address space?

lebedev.riUnsubmitted

Not Done

I'm not really sure why that would be the right default,
but regardless let's not conflate that here.

lebedev.ri: I'm not really sure why that would be the right default, but regardless let's not conflate that…

arichardsonUnsubmitted

Not Done

Please ignore this suggestion. For some reason I read this as SrcAddrSp being the address space of the pointee, which we of course don't have here. The address space of the pointer is irrelevant in this case.

arichardson: Please ignore this suggestion. For some reason I read this as `SrcAddrSp` being the address…

DL.getPointerSize(SrcAddrSp) == Size) {

arichardsonUnsubmitted

Done

if (CopyMD && CopyMD->isTBAAPointerAccess() && SrcAddrSp == DstAddrSp &&

- Size * 8 == DL.getPointerSizeInBits(SrcAddrSp)) {

+ DL.getPointerSize(SrcAddrSp) == Size) {

// TBAA tag says this is a pointer type; follow the instruction to avoid

arichardson:

// TBAA tag says this is a pointer type; follow the instruction to avoid

// creating redundant inttoptr/ptrtoints.

DataType = PointerType::get(

IntegerType::get(MI->getContext(), 8), SrcAddrSp);

} else {

// Use an integer load+store unless we can find something better.

DataType = IntegerType::get(MI->getContext(), Size<<3);

}

Type *NewSrcPtrTy = PointerType::get(DataType, SrcAddrSp);

Type *NewDstPtrTy = PointerType::get(DataType, DstAddrSp);

Value *Src = Builder.CreateBitCast(MI->getArgOperand(1), NewSrcPtrTy); Value *Src = Builder.CreateBitCast(MI->getArgOperand(1), NewSrcPtrTy);

Value *Dest = Builder.CreateBitCast(MI->getArgOperand(0), NewDstPtrTy); Value *Dest = Builder.CreateBitCast(MI->getArgOperand(0), NewDstPtrTy);

LoadInst *L = Builder.CreateLoad(IntType, Src); LoadInst *L = Builder.CreateLoad(DataType, Src);

// Alignment from the mem intrinsic will be better, so use it. // Alignment from the mem intrinsic will be better, so use it.

L->setAlignment(*CopySrcAlign); L->setAlignment(*CopySrcAlign);

if (CopyMD) if (CopyMD)

L->setMetadata(LLVMContext::MD_tbaa, CopyMD); L->setMetadata(LLVMContext::MD_tbaa, CopyMD);

MDNode *LoopMemParallelMD = MDNode *LoopMemParallelMD =

MI->getMetadata(LLVMContext::MD_mem_parallel_loop_access); MI->getMetadata(LLVMContext::MD_mem_parallel_loop_access);

if (LoopMemParallelMD) if (LoopMemParallelMD)

L->setMetadata(LLVMContext::MD_mem_parallel_loop_access, LoopMemParallelMD); L->setMetadata(LLVMContext::MD_mem_parallel_loop_access, LoopMemParallelMD);

▲ Show 20 Lines • Show All 2,612 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/memcpy-tbaa.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S -data-layout=p:64:64:64 \| FileCheck %s --check-prefix=CHECK			; RUN: opt < %s -instcombine -S -data-layout=p:64:64:64 \| FileCheck %s --check-prefix=CHECK
	; RUN: opt < %s -instcombine -S -data-layout=p:32:32:32 \| FileCheck %s --check-prefix=CHECK32			; RUN: opt < %s -instcombine -S -data-layout=p:32:32:32 \| FileCheck %s --check-prefix=CHECK32

	%struct.T = type { i32* }			%struct.T = type { i32* }

	define i32* @f(%struct.T* %s, %struct.T* %t) {			define i32* @f(%struct.T* %s, %struct.T* %t) {
	; CHECK-LABEL: @f(			; CHECK-LABEL: @f(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = bitcast %struct.T [[S:%.]] to i64			; CHECK-NEXT: [[TMP0:%.]] = bitcast %struct.T [[S:%.]] to i8*
	; CHECK-NEXT: [[TMP1:%.]] = bitcast %struct.T [[T:%.]] to i64			; CHECK-NEXT: [[TMP1:%.]] = bitcast %struct.T [[T:%.]] to i8*
	; CHECK-NEXT: [[TMP2:%.]] = load i64, i64 [[TMP0]], align 8, !tbaa [[TBAA0:![0-9]+]]			; CHECK-NEXT: [[TMP2:%.]] = load i8, i8** [[TMP0]], align 8, !tbaa [[TBAA0:![0-9]+]]
	; CHECK-NEXT: store i64 [[TMP2]], i64* [[TMP1]], align 8, !tbaa [[TBAA0]]			; CHECK-NEXT: store i8* [[TMP2]], i8** [[TMP1]], align 8, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[V_CAST:%.]] = inttoptr i64 [[TMP2]] to i32			; CHECK-NEXT: [[V_CAST:%.]] = bitcast i8 [[TMP2]] to i32*
	; CHECK-NEXT: ret i32* [[V_CAST]]			; CHECK-NEXT: ret i32* [[V_CAST]]
	;			;
	; CHECK32-LABEL: @f(			; CHECK32-LABEL: @f(
	; CHECK32-NEXT: entry:			; CHECK32-NEXT: entry:
	; CHECK32-NEXT: [[TMP0:%.]] = bitcast %struct.T [[S:%.]] to i64			; CHECK32-NEXT: [[TMP0:%.]] = bitcast %struct.T [[S:%.]] to i64
	; CHECK32-NEXT: [[TMP1:%.]] = bitcast %struct.T [[T:%.]] to i64			; CHECK32-NEXT: [[TMP1:%.]] = bitcast %struct.T [[T:%.]] to i64
	; CHECK32-NEXT: [[TMP2:%.]] = load i64, i64 [[TMP0]], align 8, !tbaa [[TBAA0:![0-9]+]]			; CHECK32-NEXT: [[TMP2:%.]] = load i64, i64 [[TMP0]], align 8, !tbaa [[TBAA0:![0-9]+]]
	; CHECK32-NEXT: store i64 [[TMP2]], i64* [[TMP1]], align 8, !tbaa [[TBAA0]]			; CHECK32-NEXT: store i64 [[TMP2]], i64* [[TMP1]], align 8, !tbaa [[TBAA0]]
	Show All 21 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Transform memcpy to ptr load/stores if TBAA says soAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 340008

llvm/include/llvm/IR/Metadata.h

llvm/lib/Analysis/TypeBasedAliasAnalysis.cpp

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

llvm/test/Transforms/InstCombine/memcpy-tbaa.ll

[InstCombine] Transform memcpy to ptr load/stores if TBAA says so
AbandonedPublic