This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold icmp(constants[x]) when the range of x is given
Needs ReviewPublic

Authored by XChy on Jul 31 2023, 2:25 AM.

Details

Summary

This patch extends foldCmpLoadFromIndexedGlobal to fold IR below:

define i1 @cmp_load_constant_array0(i64 %x){
entry:
  %cond = icmp ult i64 %x, 2
  br i1 %cond, label %case1, label %case2

case2:
  ret i1 0

case1:
  %isOK_ptr = getelementptr inbounds i32, ptr @CG, i64 %x
  %isOK = load i32, ptr %isOK_ptr
  %cond_inferred = icmp ult i32 %isOK, 3
  ret i1 %cond_inferred
}

Problem still:
Not globally optimized best.
Here I think if D152838 is accepted, that would be better.

Proof:
alive2

Related issue:
issue

Diff Detail

Unit TestsFailed

Event Timeline

XChy created this revision.Jul 31 2023, 2:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2023, 2:25 AM
XChy requested review of this revision.Jul 31 2023, 2:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2023, 2:25 AM
XChy retitled this revision from [ValueTracking][NFC] Fold icmp(constants[x]) when the range of x is given to [ValueTracking][InstCombine] Fold icmp(constants[x]) when the range of x is given.Jul 31 2023, 2:30 AM
nikic requested changes to this revision.EditedJul 31 2023, 2:59 AM

This should be handled by the foldCmpLoadFromIndexedGlobal() fold, by first converting the icmp(load(gep(g, idx))) into icmp(idx) and then folding that icmp against the dominating condition.

That fold currently requires a very specific GEP source element type, but should be easy to generalize.

This revision now requires changes to proceed.Jul 31 2023, 2:59 AM
XChy added a comment.Jul 31 2023, 3:09 AM

This should be handled by the foldCmpLoadFromIndexedGlobal() fold, by first converting the icmp(load(gep(g, idx))) into icmp(idx) and then folding that icmp against the dominating condition.

That fold currently requires a very specific GEP source element type, but should be easy to generalize.

Sure, I'll have a check.

XChy updated this revision to Diff 545637.Jul 31 2023, 7:02 AM
XChy retitled this revision from [ValueTracking][InstCombine] Fold icmp(constants[x]) when the range of x is given to [InstCombine] Fold icmp(constants[x]) when the range of x is given.
XChy edited the summary of this revision. (Show Details)

Extends foldCmpLoadFromIndexedGlobal to fold.

XChy edited the summary of this revision. (Show Details)Jul 31 2023, 7:19 AM
XChy updated this revision to Diff 545670.Jul 31 2023, 7:59 AM

Update tests.

nikic added a comment.Jul 31 2023, 9:04 AM

It would be better to base this on collectOffset() and work in offset representation. Especially if you're interested in handling Rust, the type of the global and the GEP / loads will typically not line up (constant globals in Rust are usually i8 arrays, regardless of "real" type.)

XChy added a comment.Jul 31 2023, 9:46 AM

It would be better to base this on collectOffset() and work in offset representation. Especially if you're interested in handling Rust, the type of the global and the GEP / loads will typically not line up (constant globals in Rust are usually i8 arrays, regardless of "real" type.)

Yes, such generlization in units of byte is necessary. It seems to be related to ptradd RFC?
Currently, this fold is still in units of array element . I'll try to refactor and generlize it.

llvm/test/Transforms/InstCombine/load-cmp.ll
351

Optimized well locally, but not globally.

XChy updated this revision to Diff 546034.Aug 1 2023, 7:01 AM

Complete basically correct fold.

TODOs:

  • Multi-variable testcases, Bitwidth-changed testcases
  • Comments
  • Performance/optimization to improve. When index is ax + by + .... + C, the longest distance we can step by is gcd(a,b,c,d, ....), not just one. Based on this we fold more precisely.
nikic added a comment.Aug 1 2023, 7:21 AM

I think to start with, you should limit this to just the case where there is exactly one variable index (i.e. C*x or C*x+C2). Handling complex expressions is unlikely to be useful.

XChy updated this revision to Diff 546109.Aug 1 2023, 10:11 AM

Already limit it to one variable. But I don't know why it is unlikely to be useful. From my perspective For multi-dimensional arrays, it's possible to have more than one index variable.

Besides, there exists miscompilation, I found just now. Current state-machine doesn't emit the CmpInst/Poison to guarrantee the index/offset is always not negative. Lack of such range-check would lead to something triggering UB not to doing so. I need to check it later.

XChy marked an inline comment as not done.Aug 2 2023, 11:46 AM

Already limit it to one variable. But I don't know why it is unlikely to be useful. From my perspective For multi-dimensional arrays, it's possible to have more than one index variable.

Besides, there exists miscompilation, I found just now. Current state-machine doesn't emit the CmpInst/Poison to guarrantee the index/offset is always not negative. Lack of such range-check would lead to something triggering UB not to doing so. I need to check it later.

After trying, I don't think there is any way to do boundary-check (let things out of boundary lead to load poison) without modifying CFG. But InstCombine doesn't modify CFG. I haven't better idea here.

XChy updated this revision to Diff 547437.Aug 4 2023, 7:00 PM

Update tests and FIXME: Boundary-check

nikic added a comment.Aug 7 2023, 2:49 AM

Besides, there exists miscompilation, I found just now. Current state-machine doesn't emit the CmpInst/Poison to guarrantee the index/offset is always not negative. Lack of such range-check would lead to something triggering UB not to doing so. I need to check it later.

It's okay for transforms to remove UB (this is called "refinement" and done by many transforms). You just can't introduce new UB / poison that did not exist before.

XChy updated this revision to Diff 547726.Aug 7 2023, 5:12 AM

Eliminate boundary-check.

XChy updated this revision to Diff 555434.Sep 1 2023, 10:32 AM

Simplify LazyGetIndex function, and ready to be reviewed.

ping.

nikic is on vacation for the next week, since he requested changes plan to let him get to it when he comes back.
Might have time to review later in the week, however.

XChy added a comment.Sep 8 2023, 12:05 AM

ping.

nikic is on vacation for the next week, since he requested changes plan to let him get to it when he comes back.
Might have time to review later in the week, however.

Thanks. That's OK. I hope this patch could be finished before the Phabricator shutdowns.

XChy updated this revision to Diff 557164.Sep 21 2023, 2:31 AM

Polish comments.