Page MenuHomePhabricator

[LV] Fix PR26600: avoid out of bounds loads for interleaved access vectorization
ClosedPublic

Authored by sbaranga on Feb 17 2016, 5:12 AM.

Details

Summary

If we don't have the first and last access of an interleaved load group,
the first and last wide load in the loop can do an out of bounds
access. Even though we discard results from speculative loads,
this can cause problems, since it can technically generate page faults
(or worse).

We now discard interleaved load groups that don't have the first and
load in the group.

Diff Detail

Repository
rL LLVM

Event Timeline

sbaranga updated this revision to Diff 48174.Feb 17 2016, 5:12 AM
sbaranga retitled this revision from to [LV] Fix PR26600: avoid out of bounds loads for interleaved access vectorization.
sbaranga updated this object.
sbaranga added a reviewer: hfinkel.
sbaranga added subscribers: anemet, mzolotukhin, llvm-commits.
hfinkel edited edge metadata.Feb 18 2016, 6:10 PM

How complicated would it be, instead of bailing out when we have a group without the last member, to peel off the last vector iteration instead (i.e. jump to the scalar tail loop one vector-loop iteration "early")?

It seems like that would be a better solution (although, if you agree, but it seems too complicated to implement for the release branch, I'm fine with taking this (and pulling it into the release branch), and then implementing the better solution in trunk only).

How complicated would it be, instead of bailing out when we have a group without the last member, to peel off the last vector iteration instead (i.e. jump to the scalar tail loop one vector-loop iteration "early")?

It seems like that would be a better solution (although, if you agree, but it seems too complicated to implement for the release branch, I'm fine with taking this (and pulling it into the release branch), and then implementing the better solution in trunk only).

That would work (and I agree it would be a better solution), but I don't have a good estimate of how much it that would be to implement (there are also probably a lot of edge cases there).

I can also think of another way to fix this in some cases which wouldn't involve the scalar remainder. For example, in the following code (similar to the example from the PR):

struct A {

int a;
int b;

}

for (int i = 0; i < 1000; ++i) {

a[i].b *= 2;

}

instead of doing the wide load from a[i].b we could do it from a[i].a and move the problem to knowing that the start is ok to load from (which we do know is safe), and it avoids the need for a scalar remainder. This might also work well if the loop was previously modified by LoopRotate (if it strips the first iteration, we can use the fact that we have dominating loads to the start of the memory location to deduce that the transformation would be safe).

Would it be ok to commit this as is for now?

Thanks,
Silviu

rengolin accepted this revision.Feb 19 2016, 7:30 AM
rengolin added a reviewer: rengolin.
rengolin added a subscriber: rengolin.

Hal,

We need this fix in for 3.8.0, and I think we can discuss how we deal with the problem later. For now, landing this patch on trunk and back-porting to 3.8 is a priority.

Silviu,

The patch looks good to me as it is. Let's work on the improvement later. Can you open a bug with the description, an example and the expected outcome, please?

Thanks!
--renato

This revision is now accepted and ready to land.Feb 19 2016, 7:30 AM
sbaranga closed this revision.Feb 19 2016, 7:50 AM
This revision was automatically updated to reflect the committed changes.

Thanks Renato, I've committed this in r261331.

-Silviu

How complicated would it be, instead of bailing out when we have a group without the last member, to peel off the last vector iteration instead (i.e. jump to the scalar tail loop one vector-loop iteration "early")?

It seems like that would be a better solution (although, if you agree, but it seems too complicated to implement for the release branch, I'm fine with taking this (and pulling it into the release branch), and then implementing the better solution in trunk only).

That would work (and I agree it would be a better solution), but I don't have a good estimate of how much it that would be to implement (there are also probably a lot of edge cases there).

I can also think of another way to fix this in some cases which wouldn't involve the scalar remainder. For example, in the following code (similar to the example from the PR):

struct A {
  int a;
  int b;
}
 
for (int i = 0; i < 1000; ++i) {
  a[i].b *= 2;
}

instead of doing the wide load from a[i].b we could do it from a[i].a and move the problem to knowing that the start is ok to load from (which we do know is safe), and it avoids the need for a scalar remainder. This might also work well if the loop was previously modified by LoopRotate (if it strips the first iteration, we can use the fact that we have dominating loads to the start of the memory location to deduce that the transformation would be safe).

I think that, when we can prove that's safe, that is also a good solution. I'm not sure we can always prove this, however. The underlying allocation in this case, for example, need not include a[0].a (although maybe with some higher-level TBAA-like information, we could know that it must).

Would it be ok to commit this as is for now?

As previously stated, sure :-)

Thanks,
Silviu