Page MenuHomePhabricator

[X86] Speculatively load operands of select instruction
Needs RevisionPublic

Authored by lsaba on Aug 30 2017, 1:23 AM.

Details

Summary

For a select instruction where the operands are address calculations of two independent loads, the pass tries to speculate the loads and feed them into the select instruction, this allows early parallel execution of the loads and possibly memory folding into the CMOV instructions later on.
The pass currently only handles cases where the loads are elements of the same struct.

Diff Detail

Event Timeline

lsaba created this revision.Aug 30 2017, 1:23 AM
igorb added a subscriber: igorb.Aug 30 2017, 6:45 AM
spatel edited edge metadata.Aug 30 2017, 1:32 PM

I didn't look at the implementation, but why is it safe to speculate loads in these tests? I can create an example where one of the pointers in the select is unmapped, so speculating that load will crash in the general case.

aaboud added inline comments.Aug 30 2017, 3:30 PM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16266

Do you need this dump()? It will break compilation of release mode.
Either remove it or use it under DEBUG() macro.

lib/Target/X86/X86SpeculateSelectLoad.cpp
26

Maybe, I missed that, but why do you need this include for?

99

Remove this empty line.

121

NewSI can be of type Value*, and would not need the cast to SelectInst.
Notice that the only use for NewSI, is "replaceAllUsersWith()" which takes Value* as argument.

test/CodeGen/X86/speculate-select-load.ll
29

Can you add a comment explaining this case (and the one above).
Something like: Selecting between address of/pointer to two members of same structure, with offset bigger than cache line (64 bytes) between them. Thus, do not load speculatively.

I didn't look at the implementation, but why is it safe to speculate loads in these tests? I can create an example where one of the pointers in the select is unmapped, so speculating that load will crash in the general case.

The implementation handles a specific case where both operands of the select are GEPs into elements of the same struct, correct me if i'm wrong but this should be safe

I didn't look at the implementation, but why is it safe to speculate loads in these tests? I can create an example where one of the pointers in the select is unmapped, so speculating that load will crash in the general case.

The implementation handles a specific case where both operands of the select are GEPs into elements of the same struct, correct me if i'm wrong but this should be safe

I don't see how being elements of one struct changes anything. Just because one pointer is dereferenceable does not make the other dereferenceable? You would need 'dereferenceable' metadata or some other means to know that either load is safe to hoist ahead of the select.

You're proposing this transform as an x86-specific pass, so maybe I'm missing something. Is there some feature of x86 that makes speculating the load safe? I'm guessing no because I tested the example I was thinking of on x86, and this transform crashes there.

lsaba added a comment.Aug 31 2017, 7:00 AM

I didn't look at the implementation, but why is it safe to speculate loads in these tests? I can create an example where one of the pointers in the select is unmapped, so speculating that load will crash in the general case.

The implementation handles a specific case where both operands of the select are GEPs into elements of the same struct, correct me if i'm wrong but this should be safe

I don't see how being elements of one struct changes anything. Just because one pointer is dereferenceable does not make the other dereferenceable? You would need 'dereferenceable' metadata or some other means to know that either load is safe to hoist ahead of the select.

You're proposing this transform as an x86-specific pass, so maybe I'm missing something. Is there some feature of x86 that makes speculating the load safe? I'm guessing no because I tested the example I was thinking of on x86, and this transform crashes there.

This is the transformation I am interested in doing:

struct S {

int a;
int b;

}

from:

foo (S* s, int x) {

int c;
if (x) 
  c = s->a;
else
 c = s->b;

}

to:
foo (S* s, int x) {

int c1= s->a;
int c2 = a->b;
c = x? c1 : c2;

}

I am assuming this transformation is legal in C for the given struct with the given types since the entire struct is allocated, is my assumption wrong? the idea is to limit the pass to these cases ( I am uploading a patch to limit the current implementation more)

I didn't look at the implementation, but why is it safe to speculate loads in these tests? I can create an example where one of the pointers in the select is unmapped, so speculating that load will crash in the general case.

The implementation handles a specific case where both operands of the select are GEPs into elements of the same struct, correct me if i'm wrong but this should be safe

I don't see how being elements of one struct changes anything. Just because one pointer is dereferenceable does not make the other dereferenceable? You would need 'dereferenceable' metadata or some other means to know that either load is safe to hoist ahead of the select.

You're proposing this transform as an x86-specific pass, so maybe I'm missing something. Is there some feature of x86 that makes speculating the load safe? I'm guessing no because I tested the example I was thinking of on x86, and this transform crashes there.

This is the transformation I am interested in doing:

struct S {

int a;
int b;

}

from:

foo (S* s, int x) {

 int c;
 if (x) 
   c = s->a;
 else
  c = s->b;
}

to:
foo (S* s, int x) {

 int c1= s->a;
 int c2 = a->b;
 c = x? c1 : c2;
}

I am assuming this transformation is legal in C for the given struct with the given types since the entire struct is allocated, is my assumption wrong? the idea is to limit the pass to these cases ( I am uploading a patch to limit the current implementation more)

Yes, I think your assumption is wrong. It's contrived, but consider this possibility based on the current version of the patch:

#include <stdlib.h>
typedef struct S {
  char padding[4088]; // not necessary, but might it make it easier for GuardMalloc or valgrind to see the bug
  struct S *p1;
  struct S *p2;
} S;

S *Sptr;

void init() {
  Sptr = malloc(4096); // sorry, p2 - no space for you
  Sptr->p1 = "1239"; // crazy, but just to prove a point
}

When input to a wrapper test similar to the test in this patch, this is safe without this transform, but crashes after. You need something to tell you the loads are dereferenceable (and whatever that information is will not be x86-specific, so there's no need to make an x86 pass for it).

lsaba updated this revision to Diff 113398.Aug 31 2017, 7:11 AM

addressing aaboud's comments + limiting the opt. to non aggregate gep accesses

lsaba marked 5 inline comments as done.Aug 31 2017, 7:12 AM
hfinkel edited edge metadata.Aug 31 2017, 7:33 AM

I didn't look at the implementation, but why is it safe to speculate loads in these tests? I can create an example where one of the pointers in the select is unmapped, so speculating that load will crash in the general case.

The implementation handles a specific case where both operands of the select are GEPs into elements of the same struct, correct me if i'm wrong but this should be safe

I don't see how being elements of one struct changes anything. Just because one pointer is dereferenceable does not make the other dereferenceable? You would need 'dereferenceable' metadata or some other means to know that either load is safe to hoist ahead of the select.

You're proposing this transform as an x86-specific pass, so maybe I'm missing something. Is there some feature of x86 that makes speculating the load safe? I'm guessing no because I tested the example I was thinking of on x86, and this transform crashes there.

This is the transformation I am interested in doing:

struct S {

int a;
int b;

}

from:

foo (S* s, int x) {

 int c;
 if (x) 
   c = s->a;
 else
  c = s->b;
}

to:
foo (S* s, int x) {

 int c1= s->a;
 int c2 = a->b;
 c = x? c1 : c2;
}

I am assuming this transformation is legal in C for the given struct with the given types since the entire struct is allocated, is my assumption wrong? the idea is to limit the pass to these cases ( I am uploading a patch to limit the current implementation more)

Yes, I think your assumption is wrong. It's contrived, but consider this possibility based on the current version of the patch:

#include <stdlib.h>
typedef struct S {
  char padding[4088]; // not necessary, but might it make it easier for GuardMalloc or valgrind to see the bug
  struct S *p1;
  struct S *p2;
} S;

S *Sptr;

void init() {
  Sptr = malloc(4096); // sorry, p2 - no space for you
  Sptr->p1 = "1239"; // crazy, but just to prove a point
}

When input to a wrapper test similar to the test in this patch, this is safe without this transform, but crashes after. You need something to tell you the loads are dereferenceable (and whatever that information is will not be x86-specific, so there's no need to make an x86 pass for it).

That's correct (unfortunately). We have a utility function to help with this (isDereferenceablePointer and isDereferenceableAndAlignedPointer) and also isSafeToLoadUnconditionally (which is more expensive than the last two, but more powerful). There may be more that can be done in certain cases, but you also need to be careful about race conditions (it could be the case that some other thread is going to modify one of the values, and by speculating the load, you're moving it to before the synchronization point).

lsaba added a comment.Aug 31 2017, 7:40 AM

I didn't look at the implementation, but why is it safe to speculate loads in these tests? I can create an example where one of the pointers in the select is unmapped, so speculating that load will crash in the general case.

The implementation handles a specific case where both operands of the select are GEPs into elements of the same struct, correct me if i'm wrong but this should be safe

I don't see how being elements of one struct changes anything. Just because one pointer is dereferenceable does not make the other dereferenceable? You would need 'dereferenceable' metadata or some other means to know that either load is safe to hoist ahead of the select.

You're proposing this transform as an x86-specific pass, so maybe I'm missing something. Is there some feature of x86 that makes speculating the load safe? I'm guessing no because I tested the example I was thinking of on x86, and this transform crashes there.

This is the transformation I am interested in doing:

struct S {

int a;
int b;

}

from:

foo (S* s, int x) {

 int c;
 if (x) 
   c = s->a;
 else
  c = s->b;
}

to:
foo (S* s, int x) {

 int c1= s->a;
 int c2 = a->b;
 c = x? c1 : c2;
}

I am assuming this transformation is legal in C for the given struct with the given types since the entire struct is allocated, is my assumption wrong? the idea is to limit the pass to these cases ( I am uploading a patch to limit the current implementation more)

Yes, I think your assumption is wrong. It's contrived, but consider this possibility based on the current version of the patch:

#include <stdlib.h>
typedef struct S {
  char padding[4088]; // not necessary, but might it make it easier for GuardMalloc or valgrind to see the bug
  struct S *p1;
  struct S *p2;
} S;

S *Sptr;

void init() {
  Sptr = malloc(4096); // sorry, p2 - no space for you
  Sptr->p1 = "1239"; // crazy, but just to prove a point
}

When input to a wrapper test similar to the test in this patch, this is safe without this transform, but crashes after. You need something to tell you the loads are dereferenceable (and whatever that information is will not be x86-specific, so there's no need to make an x86 pass for it).

you're right, this is problematic. Thanks, I will try to limit this to cases where it is safe to do this

spatel requested changes to this revision.Aug 31 2017, 7:41 AM

If the common pattern really is just a select of pointers, then there might be an existing IR transform pass where this can be added (with the right analysis or metadata to ensure safety)? If not, a more general hoisting solution like D37121 could solve this?

This revision now requires changes to proceed.Aug 31 2017, 7:41 AM

After some internal discussions, we suspect that Sanjay's counter-example may have undefined behavior according to the C standard.

typedef struct S {
  char padding[4088];
  struct S *p1;
  struct S *p2;
} S;

S* f1(S *s, int x)
{
  S *r;
  if (x)
    r = s->p1;
  else
    r = s->p2;
  return r;
}

Sanjay's example was to pass the address of an incomplete object of type S as the first argument to f1. f1 will always access that object through an lvalue of type S, which seems like a violation of the type based aliasing rules in section 6.5 paragraph 7 of the standard:

An object shall have its stored value accessed only by an lvalue expression that has one of
the following types:
— a type compatible with the effective type of the object,

I am deliberately using fuzzy wording like "may have" and "seems like". I would be happy to have the C language experts in the community weigh in on whether this is valid optimization. But both gcc and icc speculate the loads in f1. This is the code generated by gcc 7.2 at -O2:

f1:
        movq    4096(%rdi), %rax
        testl   %esi, %esi
        cmovne  4088(%rdi), %rax
        ret

icc is admittedly more aggressive with this optimization. gcc will only speculate the loads when they are accessing adjacent fields of the same struct. But both compilers are using the same logic that accessing one field of a structure makes it safe to speculatively access another field of the same structure.

Note that I am not trying to argue that this patch is correct. It isn't, because it doesn't guard against the case where the same base pointer is used to access structs of two different types, e.g.

typedef struct S {
  char padding[4088];
  struct S *p1;
  struct S *p2;
} S;

// T is a partial overlay of S
typedef struct T {
  char padding[4088];
  struct S *p1;
} T;

S* f1(S *s, int x)
{
  S *r;
  if (x)
    // p1 is accessed through an lvalue of type T. So it isn't safe to speculate the load of p2.
    r = ((T*)s)->p1;
  else
    r = s->p2;
  return r;
}

I'd like to pose two questions.

  1. Is the gcc & icc logic correct? It is valid to speculatively access a structure field in the presence of an access to a different field of the same structure?
  2. If yes, how can we perform this optimization in LLVM? Should the TBAA information be used for this purpose, e.g. to enhance the isSafeToLoadUnconditionally utility that Hal mentioned?
spatel added a comment.Oct 8 2017, 6:54 AM

I am deliberately using fuzzy wording like "may have" and "seems like". I would be happy to have the C language experts in the community weigh in on whether this is valid optimization.

I'd be really happy to be wrong on this one! :)
Would you reference this patch/example and ask the experts on cfe-dev?