This is an archive of the discontinued LLVM Phabricator instance.

[SLSR] handle candidate form &B[i * S]
ClosedPublic

Authored by jingyue on Feb 5 2015, 10:57 PM.

Details

Summary

This patch enhances SLSR to handle another candidate form &B[i * S]. If
we found two candidates

S1: X = &B[i * S]
S2: Y = &B[i' * S]

and S1 dominates S2, we can replace S2 with

Y = &X[(i' - i) * S]

Diff Detail

Repository
rL LLVM

Event Timeline

jingyue updated this revision to Diff 19461.Feb 5 2015, 10:57 PM
jingyue retitled this revision from to [SLSR] handle candidate form &B[i * S].
jingyue updated this object.
jingyue edited the test plan for this revision. (Show Details)
jingyue added reviewers: hfinkel, atrick, meheff, eliben.
jingyue added a subscriber: Unknown Object (MLST).
hfinkel added inline comments.Feb 6 2015, 11:24 AM
lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
242 ↗(On Diff #19461)

Shouldn't you test whether or not B isa GlobalVariable? Seems like, if it is, you should set BaseGV to that, and HasBaseReg = false, and otherwise keep the current set of arguments.

317 ↗(On Diff #19461)

I don't think this is the right approach, we're inventing ScalarEvolution all over again. SE also already handles the shift case you mention in the TODO comment below.

If I take your test/Transforms/StraightLineStrengthReduce/X86/no-slsr.ll, for example, and run:

opt -analyze -scalar-evolution

We get:

Printing analysis 'Scalar Evolution Analysis' for function 'slsr_gep':
Classifying expressions for: @slsr_gep
  %p0 = getelementptr inbounds i32* %input, i64 0
  -->  %input
  %v0 = load i32* %p0
  -->  %v0
  %p1 = getelementptr inbounds i32* %input, i64 %s
  -->  ((4 * %s)<nsw> + %input)<nsw>
  %v1 = load i32* %p1
  -->  %v1
  %s2 = mul nsw i64 %s, 2
  -->  (2 * %s)
  %p2 = getelementptr inbounds i32* %input, i64 %s2
  -->  ((8 * %s) + %input)<nsw>
  %v2 = load i32* %p2
  -->  %v2
  %1 = add i32 %v0, %v1
  -->  (%v1 + %v0)
  %2 = add i32 %1, %v2
  -->  (%v2 + %v1 + %v0)
Determining loop execution counts for: @slsr_gep

And you can see, it nicely decomposes the GEPs into arithmetic expressions for you.

Andy, please shout if you disagree, and I apologize for not thinking about this before, but I'm inclined to recommend that we rework this pass so that all cases are handled by SE before we get too far. LSR, of course, uses SE, and I think this pass should too for many of the same reasons (even though there is no loop handling here, we need all of the same arithmetic normalization logic, which is fairly significant).

Also, I think you should consider handling the constant-offset case in GEPs because that will let us get cases where the last index in a structure index. like B[i*J].x, which is fairly common. You just need to make sure such an offset gets passed to isLegalAddressingMode (BaseOffset) above.

jingyue added inline comments.Feb 6 2015, 3:25 PM
lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
317 ↗(On Diff #19461)

Hi Hal,

Thanks for pointing this out! I original thought SCEV is for loop optimizations and didn't quite think along that line. I'll try to replace BaseExpr or even most of the Candidate class with SCEV. I need to run experiments to see how well this approach works, but in general, I am aligned with you that reusing SCEV is a better approach.

Jingyue

atrick edited edge metadata.Feb 6 2015, 3:51 PM

Good observation, Hal. SCEV is there for you to use, so you might as well. I agree that we don't want to reinvent the abstraction over arithmetic expressions in each pass. I will mention two issues:

(1) SCEV may end up recursively evaluating more of the expression than you need, which could be expensive. In the first patch, you had a very simple pattern match, so SCEV didn't seem worthwhile. As you try to cover more patterns, SCEV makes more sense.

(2) SCEVExpander is fraught with peril. If you can use SCEV to help discover candidates, but leave your rewriteCandidateWithBases() code as-is, then you have nothing to worry about. If you want to use SCEVExpander then you need to be very careful about the expressions you generate.

jingyue updated this revision to Diff 21524.Mar 9 2015, 3:48 PM
jingyue edited edge metadata.

Take 2 towards handling GEPs in SLSR

  • replace BaseExpr with SCEV
  • discover more patterns of GEP that can be SLSR'ed
jingyue added a comment.EditedMar 9 2015, 3:51 PM

Hi Andrew and Hal,

Sorry about the long delay in reworking this patch. I hope it is not too cache-cold :)

Two major changes against the previous patch are:

  • Replacing BaseExpr with SCEV. As we discussed before, we shouldn't reinvent another abstraction of arithmetic expressions. Reusing SCEV turns out to save lots of code!
  • The discovery phase of the current patch tries to match GEPs in the form of &B[..][i * S][..] and make it a candidate (char *)&B[..][0][..] + (i * element_size) * S. The old patch only works when i * S is the last index.

I wish I could use SCEV more extensively in this patch, e.g., to discover candidates and compute bumps. However, I didn't do that for two reasons:

  • Given an LLVM value, SCEV recursively evaluates all subexpressions of this value. The current SLSR algorithm does very shallow pattern matching and rewrites candidates based on these simple patterns. Recursively evaluating subexpressions makes SLSR harder to rewrite candidates into desired forms unless we can easily map SCEVs back to LLVM Values. For example, if S in candidate (B + i) * S can be further SCEV-evaluated, we would rewrite the candidate into some form like Y + (i' - i) * expr_of(S) instead of Y + (i' - i) * S.
  • ScalarEvolution is designed to be control-flow oblivious, and tends to strip nsw/nuw flags. This makes SLSR harder to factor the multiply of a sext'ed expression GEPs which happens very frequently for array indexing. For example, a GEP &B[i * S] is translated to B + sext(i * S) with the nsw flag on i * S stripped out. Without this flag, SLSR can hardly accept B + sext(i * S) as a valid GEP candidate.
lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
242 ↗(On Diff #19461)

Done. Please check out function isCompletelyFoldable.

jingyue updated this revision to Diff 21528.Mar 9 2015, 4:00 PM

add some missing comments

sanjoy added a subscriber: sanjoy.Mar 9 2015, 4:16 PM

Few drop-by comments inline.

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
93 ↗(On Diff #21524)

Very minor: I'd call this Kind to differentiate from llvm::Type.

327 ↗(On Diff #21524)

Unless I'm missing some larger context; for this equivalence to hold, you'll have to prove that Idx * S * ElementSize does not sign-overflow. For instance, consider:

Idx->getType() == i8, pointers are 16 bit, Idx == 4, S == 4 and ElementSize == 24. sext(Idx *s S) *s ElementSize is i16 384 while sext((Idx * ElementSize) *s S) is -128.

444 ↗(On Diff #21524)

Have you considered doing a bitcast to i8*, gep on that i8* and then back instead? That will preserve pointer-ness throughout.

451 ↗(On Diff #21524)

Maybe use llvm_unreachable here?

hfinkel added inline comments.Mar 9 2015, 4:48 PM
lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
214 ↗(On Diff #21528)

Don't need the { } here.

366 ↗(On Diff #21528)

Can't you use SCEV to do that here? If you do, then you get the shift case for free. If the problem here is the loss of the information on the NSW, please comment on that.

444 ↗(On Diff #21528)

No, please don't do that. Cast the pointer to an i8* (in the appropriate address space), and use a GEP.

Thanks Hal and Sanjoy for the review. I will fix all other comments.

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
366 ↗(On Diff #21528)

NSW is one problem. The other problem, as I replied earlier, is that this may complicate the rewriting phase because the rewriting would have to translate SCEVs back to IR instructions. For example, suppose S = a + b and the immediate basis of X = S * 4 is Y = S * 3. The current code simply rewrites X as Y + S without tracing into how S is computed. However, if we use SCEV, we would represent X as SCEV (a + b) * 4 and Y as SCEV (a + b) * 3, and rewrite Y as SCEV X + (a + b). After that, we would need to translate SCEV X + (a + b) back into IR instructions X + (a + b) and further into X + S. While I think this is doable, given the relatively simple pattern matching this pass currently has, I don't feel it's worthwhile leveraging SCEVExpander or such to rewrite SCEV into IR instructions.

327 ↗(On Diff #21524)

Thanks for pointing this out! You are right. I'll fix it.

hfinkel added inline comments.Mar 10 2015, 5:07 AM
lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
366 ↗(On Diff #21528)

Okay, please add a detailed comment about this in the code.

jingyue updated this revision to Diff 21625.Mar 10 2015, 1:18 PM

All comments addressed. PTAL

jingyue added inline comments.Mar 10 2015, 1:23 PM
lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
214 ↗(On Diff #21528)

Done.

366 ↗(On Diff #21528)

Done.

444 ↗(On Diff #21528)

Done. Also change BumpWithGEP to BumpWithUglyGEP because we use GEPs in both cases.

93 ↗(On Diff #21524)

Done.

327 ↗(On Diff #21524)

Done. We now transform

sext(Idx *s S) *s ElementSize

to

(sext(Idx) * ElementSize) * S
444 ↗(On Diff #21524)

Done.

451 ↗(On Diff #21524)

Done.

sanjoy added inline comments.Mar 10 2015, 3:09 PM
lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
325 ↗(On Diff #21625)

Nit: *s is confusing -- there is no difference between signed vs. unsigned integer multiplication. If you mean nsw then *nsw is clearer.

378 ↗(On Diff #21625)

You can use m_NSWMul here directly.

jingyue updated this revision to Diff 21649.Mar 10 2015, 3:19 PM

Addresses Sanjoy's follow-up comments

hfinkel added inline comments.Mar 10 2015, 4:29 PM
lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
358 ↗(On Diff #21649)

I don't understand this loop, could you please add a comment explaining it. What changes between the two iterations, and why two?

490 ↗(On Diff #21649)

I think we always at least have the base implementation available. Just make TTI a hard dependency.

jingyue updated this revision to Diff 21673.Mar 10 2015, 9:09 PM

Adds more comments and makes TTI mandatory

jingyue added inline comments.Mar 10 2015, 9:10 PM
lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
358 ↗(On Diff #21649)

Let me know whether this comment helps.

490 ↗(On Diff #21649)

Done.

hfinkel added inline comments.Mar 11 2015, 12:05 AM
lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
259 ↗(On Diff #21673)

We always have TTI, you don't need to check for it.

272 ↗(On Diff #21673)

If Basis.Basis != nullptr, do we want to set C.Basis to that?

461 ↗(On Diff #21673)

Either remove this comment, or say that we use i8* GEPs, instead of inttoptr/ptrtoint because the latter interferes with pointer-aliasing analysis.

465 ↗(On Diff #21673)

This can be just:

unsigned AS = Basis.Ins->getType()->getPointerAddressSpace();
503 ↗(On Diff #21673)

Use range-based for.

358 ↗(On Diff #21649)

Yes, that's better.

jingyue updated this revision to Diff 21743.Mar 11 2015, 10:41 AM

some more minor changes

lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
259 ↗(On Diff #21673)

Thanks. I missed this one.

272 ↗(On Diff #21673)

Yes. In that case, C.Ins can still be rewritten with respect to Basis.Ins. For example,

... &a[s]; // Basis. Assume this is the first memory reference to a, so Basis.Basis == nullptr. 
... &a[s * 2]; // C. We still want to rewrite C to Basis + s.
465 ↗(On Diff #21673)

Thanks!

503 ↗(On Diff #21673)

I may be missing something, but I don't see any API of BasicBlock that returns an iterator_range.
http://llvm.org/docs/doxygen/html/classllvm_1_1BasicBlock.html

hfinkel added inline comments.Mar 11 2015, 10:51 AM
lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
272 ↗(On Diff #21673)

Okay, please do. That form should enhance ILP.

503 ↗(On Diff #21673)

Don't need one. You can use a range-based for loop with any object that has a begin()/end().

for (auto &i : *B)
  allocateCandidateAndFindBasis(&I);

or something like that.

jingyue updated this revision to Diff 21745.Mar 11 2015, 11:06 AM

use range-based for loops

meheff added inline comments.Mar 11 2015, 11:26 AM
lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
363 ↗(On Diff #21743)

I see that this loop enables you to avoid code duplication for the two cases (sext-wrapped and no sext), but it requires a bit of mental effort to see what it is doing. I think this would be clearer if you just broke out the code into a separate function and avoided the loop altogether. Then the code looks like:

foobarfunction(Base, ArrayIdx, ElementSize, GEP)
Value *SextIdx;
if (!match(ArrayIdx, m_SExt(m_Value(SextIdx)))

foobarfunction(Base, SextIdx, ElementSize, GEP);

In this case it is much more obvious that you're considering the two cases (sext and no sext). There is one other location in the file which uses this similar style loop construct and might be clearer expressed as a separate function call.

jingyue added inline comments.Mar 11 2015, 2:38 PM
lib/Transforms/Scalar/StraightLineStrengthReduce.cpp
363 ↗(On Diff #21743)

Both places fixed. PTAL.

461 ↗(On Diff #21673)

Missed this comment. Fixed in this patch.

jingyue updated this revision to Diff 21769.Mar 11 2015, 2:38 PM

Extract symmetric code into a helper function.

meheff edited edge metadata.Mar 11 2015, 2:58 PM

Thanks, Jingyue. LGTM

No further comments from me. I don't see any issues and will defer to the other reviewers.

Thanks!

Hal and Sanjoy, do you have more comments?

Thanks!

Hal and Sanjoy, do you have more comments?

I don't have any more comments.

hfinkel accepted this revision.Mar 26 2015, 1:24 AM
hfinkel edited edge metadata.

LGTM too.

This revision is now accepted and ready to land.Mar 26 2015, 1:24 AM
jingyue closed this revision.Mar 26 2015, 9:52 AM
This revision was automatically updated to reflect the committed changes.
llvm/trunk/test/Transforms/StraightLineStrengthReduce/slsr-mul.ll