For a select instruction where the operands are address calculations of two independent loads, the pass tries to speculate the loads and feed them into the select instruction, this allows early parallel execution of the loads and possibly memory folding into the CMOV instructions later on.
The pass currently only handles cases where the loads are elements of the same struct.
Details
Diff Detail
Event Timeline
I didn't look at the implementation, but why is it safe to speculate loads in these tests? I can create an example where one of the pointers in the select is unmapped, so speculating that load will crash in the general case.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
16266 | Do you need this dump()? It will break compilation of release mode. | |
lib/Target/X86/X86SpeculateSelectLoad.cpp | ||
26 | Maybe, I missed that, but why do you need this include for? | |
99 | Remove this empty line. | |
121 | NewSI can be of type Value*, and would not need the cast to SelectInst. | |
test/CodeGen/X86/speculate-select-load.ll | ||
29 | Can you add a comment explaining this case (and the one above). |
The implementation handles a specific case where both operands of the select are GEPs into elements of the same struct, correct me if i'm wrong but this should be safe
I don't see how being elements of one struct changes anything. Just because one pointer is dereferenceable does not make the other dereferenceable? You would need 'dereferenceable' metadata or some other means to know that either load is safe to hoist ahead of the select.
You're proposing this transform as an x86-specific pass, so maybe I'm missing something. Is there some feature of x86 that makes speculating the load safe? I'm guessing no because I tested the example I was thinking of on x86, and this transform crashes there.
This is the transformation I am interested in doing:
struct S {
int a; int b;
}
from:
foo (S* s, int x) {
int c; if (x) c = s->a; else c = s->b;
}
to:
foo (S* s, int x) {
int c1= s->a; int c2 = a->b; c = x? c1 : c2;
}
I am assuming this transformation is legal in C for the given struct with the given types since the entire struct is allocated, is my assumption wrong? the idea is to limit the pass to these cases ( I am uploading a patch to limit the current implementation more)
Yes, I think your assumption is wrong. It's contrived, but consider this possibility based on the current version of the patch:
#include <stdlib.h> typedef struct S { char padding[4088]; // not necessary, but might it make it easier for GuardMalloc or valgrind to see the bug struct S *p1; struct S *p2; } S; S *Sptr; void init() { Sptr = malloc(4096); // sorry, p2 - no space for you Sptr->p1 = "1239"; // crazy, but just to prove a point }
When input to a wrapper test similar to the test in this patch, this is safe without this transform, but crashes after. You need something to tell you the loads are dereferenceable (and whatever that information is will not be x86-specific, so there's no need to make an x86 pass for it).
That's correct (unfortunately). We have a utility function to help with this (isDereferenceablePointer and isDereferenceableAndAlignedPointer) and also isSafeToLoadUnconditionally (which is more expensive than the last two, but more powerful). There may be more that can be done in certain cases, but you also need to be careful about race conditions (it could be the case that some other thread is going to modify one of the values, and by speculating the load, you're moving it to before the synchronization point).
you're right, this is problematic. Thanks, I will try to limit this to cases where it is safe to do this
If the common pattern really is just a select of pointers, then there might be an existing IR transform pass where this can be added (with the right analysis or metadata to ensure safety)? If not, a more general hoisting solution like D37121 could solve this?
After some internal discussions, we suspect that Sanjay's counter-example may have undefined behavior according to the C standard.
typedef struct S { char padding[4088]; struct S *p1; struct S *p2; } S; S* f1(S *s, int x) { S *r; if (x) r = s->p1; else r = s->p2; return r; }
Sanjay's example was to pass the address of an incomplete object of type S as the first argument to f1. f1 will always access that object through an lvalue of type S, which seems like a violation of the type based aliasing rules in section 6.5 paragraph 7 of the standard:
An object shall have its stored value accessed only by an lvalue expression that has one of
the following types:
— a type compatible with the effective type of the object,
I am deliberately using fuzzy wording like "may have" and "seems like". I would be happy to have the C language experts in the community weigh in on whether this is valid optimization. But both gcc and icc speculate the loads in f1. This is the code generated by gcc 7.2 at -O2:
f1: movq 4096(%rdi), %rax testl %esi, %esi cmovne 4088(%rdi), %rax ret
icc is admittedly more aggressive with this optimization. gcc will only speculate the loads when they are accessing adjacent fields of the same struct. But both compilers are using the same logic that accessing one field of a structure makes it safe to speculatively access another field of the same structure.
Note that I am not trying to argue that this patch is correct. It isn't, because it doesn't guard against the case where the same base pointer is used to access structs of two different types, e.g.
typedef struct S { char padding[4088]; struct S *p1; struct S *p2; } S; // T is a partial overlay of S typedef struct T { char padding[4088]; struct S *p1; } T; S* f1(S *s, int x) { S *r; if (x) // p1 is accessed through an lvalue of type T. So it isn't safe to speculate the load of p2. r = ((T*)s)->p1; else r = s->p2; return r; }
I'd like to pose two questions.
- Is the gcc & icc logic correct? It is valid to speculatively access a structure field in the presence of an access to a different field of the same structure?
- If yes, how can we perform this optimization in LLVM? Should the TBAA information be used for this purpose, e.g. to enhance the isSafeToLoadUnconditionally utility that Hal mentioned?
I'd be really happy to be wrong on this one! :)
Would you reference this patch/example and ask the experts on cfe-dev?
Do you need this dump()? It will break compilation of release mode.
Either remove it or use it under DEBUG() macro.