As discussed in D100717, this patch states that if TBAA metadata node's type name is e.g.,
any pointer or vtable pointer, it can be used as a hint to drive further optimizations/passes.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/docs/LangRef.rst | ||
---|---|---|
5794 | This sentence ('LLVM does not assign ... an `MDString`') became a separate paragraph (line 5807) and the explanation is added. |
llvm/docs/LangRef.rst | ||
---|---|---|
5807–5810 | I think we should make it open ended right away, and stress that it is for heuristics, not correctness. |
I am somewhat confused on performance vs correctness - D100717 refers to a miscompile, would adding this behavior also clear miscompiles?
+1 from me.
Having this will allow instcombine to produce less patterns that the miscompiling fold is folding away,
thus moving towards potentially being able to remove the miscompiling fold one day.
So no, not really, this in itself won't do anything about those miscompiles.
On a high level, encoding more Clang specifics into the tbaa spec and using it as heuristics seems a bit unfortunate to me, as it may pessimize frontends that for various reasons cannot use tbaa, especially because the additional information does not seem tbaa specific to me.
I understand it is very convenient to use tbaa in this case, but I am worried about non-Clang frontends once this heuristic becomes important for performance. What will we suggest to frontends that want to opt-in to the optimizations (but tbaa in general is not suitable for them)?
llvm/docs/LangRef.rst | ||
---|---|---|
5811 | I think from that formulation it is still not clear *what* kind of meaning is assigned to those special strings. Unless I am missing something, it is still not clear *where* and *how* frontends should use/emit those special names. I think it would be good if the description would be clear on how new frontends should use the special names. |
We could introduce something like !tb.struct which point to the similar structured information as !tbaa.struct. When that is present, that information could also be used to expand llvm.memcpy, but without enforcing the type based aliasing implications.
This is also my concern. It would be very helpful is somebody familiar with TBAA could clarify whether there is any way to add the necessary metadata without having an effect on aliasing. I don't think it's possibly to just disable the TBAA analysis, because the optimization pipeline is generally not under your control (in cross-language linker-plugin LTO scenarios).
From my reading of LangRef, it might be possible to have a type hierarchy of "Root" <- "Dummy" <- "any pointer", where "any pointer" is used to annotate pointer types and "Dummy" for anything else. Then aliasing checks between "Dummy" and "any pointer" will always report aliasing, as it's reachable in one direction. More generally, as long as the type "hierarchy" is a linear chain, no useful aliasing information can be derived. Is that correct? Would this work in practice without causing other complications?
I do think what @jeroen.dobbelaere suggests would be the right way to approach this. This doesn't even have to be in addition to !tbaa.struct, but can be a replacement for it. In particular, I'm thinking that instead of having an {offset, size, tbaa} encoding, it could be {offset, size, type, tbaa}, where type is some well-defined type indicator to use for this optimization, while tbaa is an optional TBAA reference for the member. Frontends with type-based aliasing models would populate the last element, frontends without it wouldn't.
That will indeed work today. One thing that I find annoying with it, is that it feels like 'fighting to not use tbaa', but it is valid and it will have the intended effect.
I do think what @jeroen.dobbelaere suggests would be the right way to approach this. This doesn't even have to be in addition to !tbaa.struct, but can be a replacement for it. In particular, I'm thinking that instead of having an {offset, size, tbaa} encoding, it could be {offset, size, type, tbaa}, where type is some well-defined type indicator to use for this optimization, while tbaa is an optional TBAA reference for the member. Frontends with type-based aliasing models would populate the last element, frontends without it wouldn't.
Something like that would also work.
FWIW, in the AA call we concluded that we should go away from "tbaa" metadata towards "type" metadata that is resuable. "tbaa" could be a subset, maybe identified with a flag, which will allow TBAA to use it.
Thank you for the inputs as well!
Nuno and I had discussion as well and determined that uses of tbaa might not be good for this purpose as well.
I will revisit this and D100717 when I have enough bandwidth.
This sentence ('LLVM does not assign ... an `MDString`') became a separate paragraph (line 5807) and the explanation is added.
Similarly for the analogous sentence in the next paragraph as well.