This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Analysis/
-
Analysis/
13/17
ValueTracking.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
constant-fold-address-space-pointer.ll
5/6
constant-fold-gep.ll
-
unittests/Analysis/
-
Analysis/
-
ValueTrackingTest.cpp

Differential D86364

[ValueTracking] Interpret GEPs as a series of adds multiplied by the related scaling factor
ClosedPublic

Authored by qcolombet on Aug 21 2020, 12:36 PM.

Download Raw Diff

Details

Reviewers

fhahn
nikic
aqjune
spatel
lebedev.ri
RKSimon
efriedma

Commits

rGee6abef5323d: [ValueTracking] Interpret GEPs as a series of adds multiplied by the related…

Summary

Prior to this patch, computeKnownBits would only try to deduce trailing zeros bits for getelementptrs.
This patch adds the logic to treat geps as a series of add * scaling factor.

To do so, we refactored the logic in computeKnownBitsMul to expose an API that works on a pair of KnownBits. Then, we used it in conjunction of computeKnownBitsAddSub to compute the series of adds * mul when processing geps.

Thanks to this patch, using a gep or performing an address computation directly "by hand" (ptrtoint followed by adds and mul followed by inttoptr) is going to offer the same computeKnownBits information.

Previously, the "by hand" approach would have given more information.

This is related to https://llvm.org/PR47241.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	380 ms	linux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp
	90 ms	linux > Polly.ScopInfo/NonAffine::non-affine-loop-condition-dependent-access_3.ll
	390 ms	windows > lld.ELF/invalid::symtab-sh-info.s

Event Timeline

qcolombet created this revision.Aug 21 2020, 12:36 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2020, 12:36 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

qcolombet requested review of this revision.Aug 21 2020, 12:36 PM

Can your add also IR testcase from that PR? (To show that Instcombine can fold it)

Use GEPOperator instead of GEPInstr in cast. GEP can come form constant expression as well.

Can your add also IR testcase from that PR? (To show that Instcombine can fold it)

This commit is actually not enough to fix that PR, instcombine will still not be able to simplify the IR because it doesn't understand that icmp ugt float*, inttoptr (i64 523 to float*) is a icmp with a compile time constant.
Let me see if I can massage the IR to work around this issue.

Note: I also have one failure with that patch: Analysis/ScalarEvolution/scalable-vector.ll
warning: Compiler has made implicit assumption that TypeSize is not scalable. This may or may not lead to broken code.

I'll need to look closer (I guess it means you cannot deduce the size of the type at compile time.)

Let me see if I can massage the IR to work around this issue.

It doesn't work because instcombine simplify:

%gep = getelementptr float, float* %valptr, i64 512
%gepToInt = ptrtoint float* %gep to i64
%res = icmp ugt i64 %gepToInt, 523

Into:

%gep = getelementptr float, float* %valptr, i64 512
%res = icmp ugt float* %gep, inttoptr (i64 523 to float*)

So I am back to the issue that instcombine doesn't understand that icmp ugt float* %gep, inttoptr (i64 523 to float*) is icmp ugt float*, 523

Harbormaster completed remote builds in B69164: Diff 287082.Aug 21 2020, 1:16 PM

qcolombet updated this revision to Diff 287092.Aug 21 2020, 1:19 PM

Add scalable vector support (well really abort on scalable vector types!)

Harbormaster completed remote builds in B69165: Diff 287085.Aug 21 2020, 1:29 PM

Harbormaster completed remote builds in B69169: Diff 287092.Aug 21 2020, 1:58 PM

qcolombet added inline comments.Aug 21 2020, 6:34 PM

llvm/lib/Analysis/ValueTracking.cpp
1405	The bug in instrprof-value-prof.c is that this code is only correct for array. For structures the index is not based on the scaling factor. Fixing that.

Rework the logic so that TrailZ and AddrKnownBits share most of the logic
Fix the handling of structure accesses in the process.

qcolombet added inline comments.Aug 21 2020, 9:03 PM

llvm/test/Transforms/InstCombine/constant-fold-gep.ll
55	Note: The change in the input of the test is because the alignment was not consistent with the base pointer. On this store the alignment cannot be 8, since the base pointer is aligned on 16 (17 * 4 + 16 == 17 * 4 + 4 * 4 == 21 * 4 == 10 * 8 + 4 i.e., remainder with 8 cannot be zero). More easily seen looking at the previous load: @Y + 16 is aligned to 8 then we add 4 to make @Y + 17, and we claimed it was aligned to 8.
115	Test case for the structure access bug found during pretesting. With the offset calculation bug of the previous version of the patch we would have got `align 4`.

Harbormaster completed remote builds in B69204: Diff 287154.Aug 21 2020, 9:23 PM

At least in the current implementation, the compile-time impact of this change is too large: http://llvm-compile-time-tracker.com/compare.php?from=12edd4b36475170d445ac93da34e4883f23a8361&to=1b942d71a1d7e7e55c91a2240eaaba617cc26d5a&stat=instructions

Possibly adding the early exit on unknown bits will help.

llvm/lib/Analysis/ValueTracking.cpp
1365	I don't understand the purpose of this variable and why we track both the precise known bits and the trailing zeros. In the one place that sets TrackAddr to false, can't you set the known bits to unknown?
1374	This should early exit if the AddrKnownBits are unknown, which would be the equivalent of the TrailZ==0 check.

This revision now requires changes to proceed.Aug 21 2020, 11:46 PM

aqjune added inline comments.Aug 22 2020, 1:44 AM

llvm/test/Transforms/InstCombine/constant-fold-gep.ll
55	I agree that the changes were necessary, otherwise the stores should always raise UB due to the alignment mismatch. It seems that the last three stores below are always raising UB as well - the size of `@Y` is 72 bytes, so the accesses below are out-of-bounds. Should we treat these in this patch (by removing the three lines or expanding the size of the global variable), or is it okay to just leave them as they were?

Hi @nikic

Thanks for the review.

At least in the current implementation, the compile-time impact of this change is too large

This change makes more calls to computeKnownBits, so the compile-time impact is expected and I don't see how we can avoid any of the calls.

Do you have recommendations on how to limit the impact?

Cheers,
-Quentin

llvm/lib/Analysis/ValueTracking.cpp
1365	The trailing zeros may still be tracked while the AddrKnownBits are unknown. This happens when we hit scalable vectors. I kept this variable to make sure we don't regress computeKnownBits in these cases. Regarding the `TrackAddr` variable I can turn that into checking if AddrKnownBits are unknowns, I was not just what was the compile time impact of doing this.
1374	That's not true because AddrKnownBits won't be tracked as soon as we hit a scalable vector whereas the TrailZ information can still be tracked.
llvm/test/Transforms/InstCombine/constant-fold-gep.ll
55	I think the idea here is that the test was checking whether the inbounds keyword was added.

Possibly adding the early exit on unknown bits will help.

I'll add that to see if it helps.

Use AddrKnownBits.isUnknown instead of having a dedicated bool variable.

qcolombet added inline comments.Aug 24 2020, 10:29 AM

llvm/lib/Analysis/ValueTracking.cpp
1430	@nikic FYI here is where we stop to track the full address but we still compute the trailing zeros. If we don't do that, we will lose some information compared to the previous implementation. E.g., test/Analysis/ScalarEvolution/scalable-vector.ll will fail :5:10: error: CHECK: expected string not found in input ; CHECK: --> (3 * sizeof(<vscale x 4 x i32>)) U: [0,-15) S: [-9223372036854775808,9223372036854775793) ^ <stdin>:4:2: note: scanning from here --> (3 * sizeof(<vscale x 4 x i32>)) U: full-set S: full-set

Harbormaster completed remote builds in B69345: Diff 287438.Aug 24 2020, 11:06 AM

Hi @nikic,

I've made the suggested changes, how do I kick a new compile time measurement?

Cheers,
-Quentin

Here are the results for the new version: http://llvm-compile-time-tracker.com/compare.php?from=188f1ac301c5c6da6d2f5697952510fc39cbdd43&to=a43ce9e792896679e71314fc368a027a2c9ae544&stat=instructions

This is better than before, but still quite expensive for what it does.

llvm/lib/Analysis/ValueTracking.cpp
1430	Okay, I see. I think we should still be able to combine these with some special handling in this spot. Would something like this work? Combine IndexBits using computeKnownBitsMul as usual. For scalable vectors, the result is additionally multiplied by an unknown vscale. As an approximation, only keep the trailing zero bits of IndexBits in this case, and then continue as usual.
1442	We should add an sextOrTrunc() method to KnownBits. We already have zextOrTrunc() and anyextOrTrunc().

Hi @nikic,

Thanks for the updated numbers!

This is better than before, but still quite expensive for what it does.

What's an acceptable target for you?

The reason I am asking is because I fear we won't be able to shave much more from the implementation, while for us not having this information is pretty damaging. Put differently, the previous implementation is lacking critical information compared to hand written pointer arithmetic that could lead to missed optimization opportunities and dramatic performance lose.

I agree, that in itself this change doesn't seem worth the added compile time, but I think we should see it as an enabler for more optimizations. For instance, this change doesn't fix https://llvm.org/PR47241 but is a step in that direction. In other words, it doesn't fix it, because InstCombine didn't have this kind of information before and thus doesn't know how to act on it, but longer term, having this information would likely improve several optimizations.

Cheers,
-Quentin

llvm/lib/Analysis/ValueTracking.cpp
1430	That should be doable though I think it will degrade compile time even more given I would expect that calling computeKnownBitsMul would be more expensive than just doing the math on `TrailZ`. Also, are you suggesting that we get rid of `TrailZ` all together and just carry the trailing zeros bits in AddrKnownBits? If so, we would add back a bunch of computeKnownBits calls since AddrKnownBits won't be unknown anymore. What do you think?
1442	Sure.

Add KnownBits::sextOrTrunc

qcolombet marked an inline comment as done.Aug 26 2020, 10:30 AM

Does this (and follow-up patches) improve any public/private benchmarks? Any data?

Does this (and follow-up patches) improve any public/private benchmarks? Any data?

Yes it improves some private benchmarks but to be fair the optimizations that it helps are not public.

I didn't do any follow-up patches yet for the public issue.
I didn't dig, but for the in-tree targets, the one benefit that I saw so far with this commit alone is the improved alignment information, e.g., test/Transforms/InstCombine/constant-fold-gep.ll. This may already help some optimizations like vectorization or instruction selecting aligned loads.

Cheers,
-Quentin

One thing we could do to help compile time is maybe add more caching to the Query API.
Right now it only caches llvm.assume AFAICT, whereas we could cache known bits for values. That's what we did in the Machine IR variant of computeKnownBits.

This would be a functional change though as in theory not having a cache could give you more information. E.g., imagine we hit a value at Depth 6 with a max depth of 6 and then later on, we hit it again at Depth 4. With a cache that value would have the information cached on the first hit, thus we would only have information for it with 0 depth worth of analysis, whereas with the current implementation the second call would have 2 depth worth of analysis.

Harbormaster completed remote builds in B69636: Diff 288034.Aug 26 2020, 12:28 PM

Ping @nikic.

Let me know if the current compile time impact is acceptable or if I need to come up with a different strategy.

In D86364#2239835, @qcolombet wrote:

One thing we could do to help compile time is maybe add more caching to the Query API.
Right now it only caches llvm.assume AFAICT, whereas we could cache known bits for values. That's what we did in the Machine IR variant of computeKnownBits.

To clarify, are you referring to caching within a single known bits query here (for the case where one instruction is recursively referenced multiple times), or across multiple queries? Not sure how useful the former would be, I don't have a good idea how often we do redundant known bits computations in one query. The latter has been tried at some point, but I think the conclusion was that properly invalidating that cache in InstCombine is too fragile.

Let me know if the current compile time impact is acceptable or if I need to come up with a different strategy.

I don't think the current impact here is acceptable: This patch has very little benefit for standard targets (we really don't care about alignment all that much), but is a pretty significant compile-time regression, about 1.5% for tramp4-3d. That's not a good tradeoff.

I do have an old experiment lying around that would make this more feasible: Moving alignment inferral out of InstCombine into a separate pass that only runs once shortly before codegen (https://github.com/llvm/llvm-project/commit/d95001f63a7fb57a3c08002e05a751106ec8e10c for the basic idea -- it was a geomean 1% improvement when I tested it). This reduces the number of known bits queries on GEPs a lot and would make us less susceptible to how expensive they are.

llvm/lib/Analysis/ValueTracking.cpp
1430	I don't think doing this would have a (negative) impact on compile-time, as it only affects the case where GEPs on scalable vectors are used (i.e. very rarely). And yes, with this change it should be possible to get rid of TrailZ entirely and only use AddrKnownBits.

Hi @nikic,

Thanks for the comments.

To clarify, are you referring to caching within a single known bits query here (for the case where one instruction is recursively referenced multiple times), or across multiple queries?

I was referring to caching within a single known bits.
For GISel it actually helped quite a lot, but anyhow this is orthogonal.

This patch has very little benefit for standard targets (we really don't care about alignment all that much),

This actually helps more than alignment. The idea is that you get ranges for some addresses and on some targets this actually can get a dramatical improvement because some addressing modes are not supported for all ranges.

This reduces the number of known bits queries on GEPs a lot and would make us less susceptible to how expensive they are.

Although interesting, again I am not really after alignment here and really about ranges. I.e., the full bit analysis is required.

Anyway, I'll rework the patch to have the ability to inject some target specific code here (probably by plumbing TargetTransfromInfo in here) and we'll see how this looks.

Cheers,
-Quentin

qcolombet mentioned this in D87342: Allow targets to augment computeKnownBits with their analysis using TargetTransformInfo.Sep 8 2020, 8:32 PM

Hi @nikic,

I've made a proof of concept in D87342 on how we could have some pieces of computeKnownBits be target dependent.

Would that other approach be suitable?

Cheers,
-Quentin

Abandon in favor of D87342.

Resurrecting this diff.
I need to rebase on top of the refactoring: D88937 and D88935.

Rebase patch
Add the reviewers from D87342

Herald added a subscriber: arichardson. · View Herald TranscriptOct 16 2020, 11:49 AM

Harbormaster completed remote builds in B75349: Diff 298701.Oct 16 2020, 1:14 PM

Get rid of TrailZ and track everything with AddrKnownBits.

Hi @nikic, @spatel, @lebedev.ri, @RKSimon, @efriedma,

Following the conversation in D87342 , this patch has been rebased and all the previous feedbacks have been addressed.

We're left with the compile time issue that @nikic brought up.

Any idea on how to tackle that?

Cheers,
-Quentin

Harbormaster completed remote builds in B75624: Diff 299224.Oct 19 2020, 6:02 PM

In D86364#2340420, @qcolombet wrote:

Hi @nikic, @spatel, @lebedev.ri, @RKSimon, @efriedma,

Following the conversation in D87342 , this patch has been rebased and all the previous feedbacks have been addressed.

We're left with the compile time issue that @nikic brought up.

Any idea on how to tackle that?

I have not stepped through this in detail, so not sure if it helps:

It's been a couple of months, so we should confirm that the time impact is still the same.
It was suggested earlier to trigger the analysis from a different/dedicated pass rather than instcombine. Is that not feasible?
We're using the more expensive analysis on all GEPs. Could we use the existing analysis as the common case and limit the more expensive some way (for example, only GEPs with N or more operands, inbounds, etc)?

In D86364#2342451, @spatel wrote:

I have not stepped through this in detail, so not sure if it helps:

It's been a couple of months, so we should confirm that the time impact is still the same.

Here's some new numbers: https://llvm-compile-time-tracker.com/compare.php?from=0c0fcea557e4a7cfd51216ad20aa67c82733ab52&to=2bf154f40abacf90d0adffaaec20b01f6bd06481&stat=instructions So this is now at 0.35% geomean, with 1.3% on tramp3d-v4. This is definitely much better than where this started.

It was suggested earlier to trigger the analysis from a different/dedicated pass rather than instcombine. Is that not feasible?

Right. Most of the cost here comes from the fact that we compute alignment for all loads and stores something like 10 times (due to many instcombine runs, which are likely to also visit instructions multiple times), and those alignment calculations will commonly compute GEP known bits. I understand that the actual use case for this patch is not actually the alignment calculation, that's just where this ends up being expensive. If we remove that from the equation, I expect the compile-time impact of this patch would be pretty minimal. I started some work on this, but I'm not sure when I'll be able to finish it.

A couple special cases might improve performance:

Check for and skip zero array indexes, similar to the special case for zero struct indexes, since they show up frequently.
Accumulate constant indexes separately from variable ones: you can compute the sum of the constant indexes, and computeForAddSub with the result, instead of using computeForAddSub on each index.
Add a special case for computeForAddSub on a constant; the arithmetic is a bit simpler.

Hi @nikic, @spatel

If we remove that from the equation, I expect the compile-time impact of this patch would be pretty minimal. I started some work on this, but I'm not sure when I'll be able to finish it.

That's an interesting observation. It would fit my use case to have a different API for compute known bits for all vs. compute known bits for all but only alignment for geps.

That said, from the users stand point how would we decide which version we want to use. E.g., what would be the criteria for instcombine?

If we look at https://llvm.org/PR47241 for instance, instcombine would need the full range to be able to optimize that code, but maybe we don't care about this use case.
Similarly, if we look at the change in llvm/test/Transforms/InstCombine/constant-fold-address-space-pointer.ll, even if we are only interested in the alignment, the new computation can yield better results.

It's been a couple of months, so we should confirm that the time impact is still the same.

Thanks @nikic for the new measurements!
Looks like it's smaller, but the benefits may still not be worth it in general.

Let me know what do you think.

It was suggested earlier to trigger the analysis from a different/dedicated pass rather than instcombine. Is that not feasible?

Repeating what I said earlier in that message, for my use case, that would certainly work. The one difficulty is that it needs to be exposed as a full fledged version of compute knowns bits, because I don't know before hand if the expression dag I am looking at contains any gep (and in particular the root may not be a gep). So we have to somehow convey within compute known bits that we want to compute only TrailZ instead of the full information. I would rather that we avoid this kind of plumbing, especially because it would create a precedent for new improvements in compute known bits, which if we go down that road is not very far from D87342.

We're using the more expensive analysis on all GEPs. Could we use the existing analysis as the common case and limit the more expensive some way (for example, only GEPs with N or more operands, inbounds, etc)?

I can give that a try and see how it impacts compile time. The cut-off would be somewhat arbitrary though.

Stay tuned.

Cheers,
-Quentin

Thanks @efriedma for the suggestions!

Let me try that before deciding if we want the precise vs. less precise approach.

Skip zero indices
Accumulate constant indices with constant scaling factor in a separate variable to avoid calls to computeForAddSub

Did 1 & 2 of @eli.friedman's suggestions.

After a quick look, I didn't see how to do #3 and left it out. We may want to give it a try separately as it may be a general improvement to the known bits infrastructure.

@nikic, could you do a measurement of the latest patch, please?

Then we can decide if we want to pursue either the cut-off idea or somehow expose the precise vs. less precise gep analysis.

Oups, tagged Eli's old account.
Tagging the new one @efriedma

Harbormaster completed remote builds in B75783: Diff 299495.Oct 20 2020, 4:34 PM

New numbers: https://llvm-compile-time-tracker.com/compare.php?from=0c0fcea557e4a7cfd51216ad20aa67c82733ab52&to=723ef6ad7cc817f5b00605f58d559330abae29e2&stat=instructions Down to 0.1% geomean, without any large outliers. That's good enough for me :) Thank you @efriedma for those suggestions.

I'll review the actual implementation later today.

LGTM. Thank you for your patience and continued work on this issue.

llvm/lib/Analysis/ValueTracking.cpp
1363	As far as I can see, you could just rename LocalKnown to AddrKnownBits, rather than first computing into LocalKnown and then copying it. Or, going one step further, you could directly populate `Known`, rather than populating `AddrKnownBits` and then copying it at the end.
1366	Nit: Cst -> Const. This seems like an unusual abbreviation.
1385	Move these down to the computeKnownBits call? Doesn't appear to be used in between.
llvm/test/Transforms/InstCombine/constant-fold-gep.ll
111	Typo: `CHECK;` instead of `CHECK:`. I would suggest to simply rerun update_test_checks.py instead.

This revision is now accepted and ready to land.Oct 21 2020, 11:31 AM

Use Known directly instead of using an intermediate AddrKnownBits variable
Change AccCstIndices into AccConstIndices
Move initialization code of IndexBits next to first use
Regenerate test case using update_test_check

Thanks all for your help and @nikic in particular for doing the perf measurements and pushing for a better solution.
I think this shows that reviews work and produce better code quality!

Will push shortly.

llvm/lib/Analysis/ValueTracking.cpp
1363	Good point, we don't need the intermediate variables anymore.
1366	Done!
1385	Good catch, we don't need that here anymore. Moved it down.
llvm/test/Transforms/InstCombine/constant-fold-gep.ll
111	Oh wow! Sorry about that and nice catch! Ran the update script (when I originally wrote the test, the checks were not autogenerated.)

Harbormaster completed remote builds in B75924: Diff 299775.Oct 21 2020, 12:47 PM

This revision was landed with ongoing or failed builds.Oct 21 2020, 3:07 PM

Closed by commit rGee6abef5323d: [ValueTracking] Interpret GEPs as a series of adds multiplied by the related… (authored by qcolombet). · Explain Why

This revision was automatically updated to reflect the committed changes.

qcolombet marked 4 inline comments as done.

qcolombet added a commit: rGee6abef5323d: [ValueTracking] Interpret GEPs as a series of adds multiplied by the related….

Interestingly, the final version ended up being completely compile-time neutral (or in the noise): https://llvm-compile-time-tracker.com/compare.php?from=e97e9851b227e98e39c27c4c8f5558e331cde8b4&to=ee6abef5323d59b983129bf3514ef6775d1d6cd5&stat=instructions

nikic mentioned this in D90637: [ValueTracking] Inbounds does not imply nsw.Nov 2 2020, 12:05 PM

Revision Contents

Path

Size

llvm/

lib/

Analysis/

ValueTracking.cpp

89 lines

test/

Transforms/

InstCombine/

constant-fold-address-space-pointer.ll

2 lines

constant-fold-gep.ll

32 lines

unittests/

Analysis/

ValueTrackingTest.cpp

60 lines

Diff 299495

llvm/lib/Analysis/ValueTracking.cpp

Show First 20 Lines • Show All 1,354 Lines • ▼ Show 20 Lines	static void computeKnownBitsFromOperator(const Operator *I,
case Instruction::Alloca:		case Instruction::Alloca:
Known.Zero.setLowBits(Log2(cast<AllocaInst>(I)->getAlign()));		Known.Zero.setLowBits(Log2(cast<AllocaInst>(I)->getAlign()));
break;		break;
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
// Analyze all of the subscripts of this getelementptr instruction		// Analyze all of the subscripts of this getelementptr instruction
// to determine if we can prove known low zero bits.		// to determine if we can prove known low zero bits.
KnownBits LocalKnown(BitWidth);		KnownBits LocalKnown(BitWidth);
computeKnownBits(I->getOperand(0), LocalKnown, Depth + 1, Q);		computeKnownBits(I->getOperand(0), LocalKnown, Depth + 1, Q);
unsigned TrailZ = LocalKnown.countMinTrailingZeros();		KnownBits AddrKnownBits(LocalKnown);
		nikicUnsubmitted Done Reply Inline Actions As far as I can see, you could just rename LocalKnown to AddrKnownBits, rather than first computing into LocalKnown and then copying it. Or, going one step further, you could directly populate `Known`, rather than populating `AddrKnownBits` and then copying it at the end. nikic: As far as I can see, you could just rename LocalKnown to AddrKnownBits, rather than first…
		qcolombetAuthorUnsubmitted Done Reply Inline Actions Good point, we don't need the intermediate variables anymore. qcolombet: Good point, we don't need the intermediate variables anymore.
		// Accumulate the constant indices in a separate variable
		// to minimize the number of calls to computeForAddSub.
		nikicUnsubmitted Not Done Reply Inline Actions I don't understand the purpose of this variable and why we track both the precise known bits and the trailing zeros. In the one place that sets TrackAddr to false, can't you set the known bits to unknown? nikic: I don't understand the purpose of this variable and why we track both the precise known bits…
		qcolombetAuthorUnsubmitted Done Reply Inline Actions The trailing zeros may still be tracked while the AddrKnownBits are unknown. This happens when we hit scalable vectors. I kept this variable to make sure we don't regress computeKnownBits in these cases. Regarding the `TrackAddr` variable I can turn that into checking if AddrKnownBits are unknowns, I was not just what was the compile time impact of doing this. qcolombet: The trailing zeros may still be tracked while the AddrKnownBits are unknown. This happens when…
		APInt AccCstIndices(BitWidth, 0, /IsSigned/ true);
		nikicUnsubmitted Done Reply Inline Actions Nit: Cst -> Const. This seems like an unusual abbreviation. nikic: Nit: Cst -> Const. This seems like an unusual abbreviation.
		qcolombetAuthorUnsubmitted Done Reply Inline Actions Done! qcolombet: Done!

gep_type_iterator GTI = gep_type_begin(I);		gep_type_iterator GTI = gep_type_begin(I);
		// If the inbounds keyword is not present, the offsets are added to the
		// base address with silently-wrapping two’s complement arithmetic.
		bool IsInBounds = cast<GEPOperator>(I)->isInBounds();
for (unsigned i = 1, e = I->getNumOperands(); i != e; ++i, ++GTI) {		for (unsigned i = 1, e = I->getNumOperands(); i != e; ++i, ++GTI) {
// TrailZ can only become smaller, short-circuit if we hit zero.		// TrailZ can only become smaller, short-circuit if we hit zero.
if (TrailZ == 0)		if (AddrKnownBits.isUnknown())
		nikicUnsubmitted Not Done Reply Inline Actions This should early exit if the AddrKnownBits are unknown, which would be the equivalent of the TrailZ==0 check. nikic: This should early exit if the AddrKnownBits are unknown, which would be the equivalent of the…
		qcolombetAuthorUnsubmitted Done Reply Inline Actions That's not true because AddrKnownBits won't be tracked as soon as we hit a scalable vector whereas the TrailZ information can still be tracked. qcolombet: That's not true because AddrKnownBits won't be tracked as soon as we hit a scalable vector…
break;		break;

Value *Index = I->getOperand(i);		Value *Index = I->getOperand(i);

		// Handle case when index is zero.
		Constant *CIndex = dyn_cast<Constant>(Index);
		if (CIndex && CIndex->isZeroValue())
		continue;

		unsigned IndexBitWidth = Index->getType()->getScalarSizeInBits();
		KnownBits IndexBits(IndexBitWidth);
		nikicUnsubmitted Done Reply Inline Actions Move these down to the computeKnownBits call? Doesn't appear to be used in between. nikic: Move these down to the computeKnownBits call? Doesn't appear to be used in between.
		qcolombetAuthorUnsubmitted Done Reply Inline Actions Good catch, we don't need that here anymore. Moved it down. qcolombet: Good catch, we don't need that here anymore. Moved it down.
if (StructType *STy = GTI.getStructTypeOrNull()) {		if (StructType *STy = GTI.getStructTypeOrNull()) {
// Handle struct member offset arithmetic.		// Handle struct member offset arithmetic.

// Handle case when index is vector zeroinitializer		assert(CIndex &&
Constant *CIndex = cast<Constant>(Index);		"Access to structure field must be known at compile time");
if (CIndex->isZeroValue())
continue;

if (CIndex->getType()->isVectorTy())		if (CIndex->getType()->isVectorTy())
Index = CIndex->getSplatValue();		Index = CIndex->getSplatValue();

unsigned Idx = cast<ConstantInt>(Index)->getZExtValue();		unsigned Idx = cast<ConstantInt>(Index)->getZExtValue();
const StructLayout *SL = Q.DL.getStructLayout(STy);		const StructLayout *SL = Q.DL.getStructLayout(STy);
uint64_t Offset = SL->getElementOffset(Idx);		uint64_t Offset = SL->getElementOffset(Idx);
TrailZ = std::min<unsigned>(TrailZ,		AccCstIndices += Offset;
countTrailingZeros(Offset));		continue;
} else {		}

// Handle array index arithmetic.		// Handle array index arithmetic.
Type *IndexedTy = GTI.getIndexedType();		Type *IndexedTy = GTI.getIndexedType();
if (!IndexedTy->isSized()) {		if (!IndexedTy->isSized()) {
TrailZ = 0;		AddrKnownBits.resetAll();
		qcolombetAuthorUnsubmitted Done Reply Inline Actions The bug in instrprof-value-prof.c is that this code is only correct for array. For structures the index is not based on the scaling factor. Fixing that. qcolombet: The bug in instrprof-value-prof.c is that this code is only correct for array. For structures…
break;		break;
}		}
unsigned GEPOpiBits = Index->getType()->getScalarSizeInBits();
uint64_t TypeSize = Q.DL.getTypeAllocSize(IndexedTy).getKnownMinSize();		computeKnownBits(Index, IndexBits, Depth + 1, Q);
LocalKnown.Zero = LocalKnown.One = APInt(GEPOpiBits, 0);		TypeSize IndexTypeSize = Q.DL.getTypeAllocSize(IndexedTy);
computeKnownBits(Index, LocalKnown, Depth + 1, Q);		uint64_t TypeSizeInBytes = IndexTypeSize.getKnownMinSize();
TrailZ = std::min(TrailZ,		KnownBits ScalingFactor(IndexBitWidth);
unsigned(countTrailingZeros(TypeSize) +		// Multiply by current sizeof type.
LocalKnown.countMinTrailingZeros()));		// &A[i] == A + i * sizeof(*A[i]).
}		if (IndexTypeSize.isScalable()) {
		// For scalable types the only thing we know about sizeof is
		// that this is a multiple of the minimum size.
		ScalingFactor.Zero.setLowBits(countTrailingZeros(TypeSizeInBytes));
		} else if (IndexBits.isConstant()) {
		APInt IndexCst = IndexBits.getConstant();
		APInt ScalingFactor(IndexBitWidth, TypeSizeInBytes);
		IndexCst *= ScalingFactor;
		AccCstIndices += IndexCst.sextOrTrunc(BitWidth);
		continue;
		} else {
		ScalingFactor.Zero = ~TypeSizeInBytes;
		ScalingFactor.One = TypeSizeInBytes;
}		}
		IndexBits = KnownBits::computeForMul(IndexBits, ScalingFactor);

		qcolombetAuthorUnsubmitted Done Reply Inline Actions @nikic FYI here is where we stop to track the full address but we still compute the trailing zeros. If we don't do that, we will lose some information compared to the previous implementation. E.g., test/Analysis/ScalarEvolution/scalable-vector.ll will fail :5:10: error: CHECK: expected string not found in input ; CHECK: --> (3 * sizeof(<vscale x 4 x i32>)) U: [0,-15) S: [-9223372036854775808,9223372036854775793) ^ <stdin>:4:2: note: scanning from here --> (3 * sizeof(<vscale x 4 x i32>)) U: full-set S: full-set qcolombet: @nikic FYI here is where we stop to track the full address but we still compute the trailing…
		nikicUnsubmitted Not Done Reply Inline Actions Okay, I see. I think we should still be able to combine these with some special handling in this spot. Would something like this work? Combine IndexBits using computeKnownBitsMul as usual. For scalable vectors, the result is additionally multiplied by an unknown vscale. As an approximation, only keep the trailing zero bits of IndexBits in this case, and then continue as usual. nikic: Okay, I see. I think we should still be able to combine these with some special handling in…
		qcolombetAuthorUnsubmitted Done Reply Inline Actions That should be doable though I think it will degrade compile time even more given I would expect that calling computeKnownBitsMul would be more expensive than just doing the math on `TrailZ`. Also, are you suggesting that we get rid of `TrailZ` all together and just carry the trailing zeros bits in AddrKnownBits? If so, we would add back a bunch of computeKnownBits calls since AddrKnownBits won't be unknown anymore. What do you think? qcolombet: That should be doable though I think it will degrade compile time even more given I would…
		nikicUnsubmitted Not Done Reply Inline Actions I don't think doing this would have a (negative) impact on compile-time, as it only affects the case where GEPs on scalable vectors are used (i.e. very rarely). And yes, with this change it should be possible to get rid of TrailZ entirely and only use AddrKnownBits. nikic: I don't think doing this would have a (negative) impact on compile-time, as it only affects the…
Known.Zero.setLowBits(TrailZ);		// If the offsets have a different width from the pointer, according
		// to the language reference we need to sign-extend or truncate them
		// to the width of the pointer.
		IndexBits = IndexBits.sextOrTrunc(BitWidth);

		AddrKnownBits = KnownBits::computeForAddSub(
		/Add=/true,
		/NSW=/IsInBounds, AddrKnownBits, IndexBits);
		}
		if (!AddrKnownBits.isUnknown() && !AccCstIndices.isNullValue()) {
		KnownBits Index(BitWidth);
		Index.Zero = ~AccCstIndices;
		nikicUnsubmitted Done Reply Inline Actions We should add an sextOrTrunc() method to KnownBits. We already have zextOrTrunc() and anyextOrTrunc(). nikic: We should add an sextOrTrunc() method to KnownBits. We already have zextOrTrunc() and…
		qcolombetAuthorUnsubmitted Done Reply Inline Actions Sure. qcolombet: Sure.
		Index.One = AccCstIndices;
		AddrKnownBits = KnownBits::computeForAddSub(
		/Add=/true,
		/NSW=/IsInBounds, AddrKnownBits, Index);
		}
		Known = AddrKnownBits;
break;		break;
}		}
case Instruction::PHI: {		case Instruction::PHI: {
const PHINode *P = cast<PHINode>(I);		const PHINode *P = cast<PHINode>(I);
// Handle the case of a simple two-predecessor recurrence PHI.		// Handle the case of a simple two-predecessor recurrence PHI.
// There's a lot more that could theoretically be done here, but		// There's a lot more that could theoretically be done here, but
// this is sufficient to catch some interesting cases.		// this is sufficient to catch some interesting cases.
if (P->getNumIncomingValues() == 2) {		if (P->getNumIncomingValues() == 2) {
▲ Show 20 Lines • Show All 5,316 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/constant-fold-address-space-pointer.ll

	Show First 20 Lines • Show All 219 Lines • ▼ Show 20 Lines
	;			;
	%p = getelementptr [10 x i32], [10 x i32] addrspace(3)* @i32_array_as3, i64 0, i64 0			%p = getelementptr [10 x i32], [10 x i32] addrspace(3)* @i32_array_as3, i64 0, i64 0
	%x = load i32, i32 addrspace(3)* %p, align 4			%x = load i32, i32 addrspace(3)* %p, align 4
	ret i32 %x			ret i32 %x
	}			}

	define i32 @test_constant_cast_gep_struct_indices_as() {			define i32 @test_constant_cast_gep_struct_indices_as() {
	; CHECK-LABEL: @test_constant_cast_gep_struct_indices_as(			; CHECK-LABEL: @test_constant_cast_gep_struct_indices_as(
	; CHECK-NEXT: [[Y:%.]] = load i32, i32 addrspace(3) getelementptr inbounds (%struct.foo, [[STRUCT_FOO:%.]] addrspace(3) @constant_fold_global_ptr, i16 0, i32 2, i16 2), align 8			; CHECK-NEXT: [[Y:%.]] = load i32, i32 addrspace(3) getelementptr inbounds (%struct.foo, [[STRUCT_FOO:%.]] addrspace(3) @constant_fold_global_ptr, i16 0, i32 2, i16 2), align 16
	; CHECK-NEXT: ret i32 [[Y]]			; CHECK-NEXT: ret i32 [[Y]]
	;			;
	%x = getelementptr %struct.foo, %struct.foo addrspace(3)* @constant_fold_global_ptr, i18 0, i32 2, i12 2			%x = getelementptr %struct.foo, %struct.foo addrspace(3)* @constant_fold_global_ptr, i18 0, i32 2, i12 2
	%y = load i32, i32 addrspace(3)* %x, align 4			%y = load i32, i32 addrspace(3)* %x, align 4
	ret i32 %y			ret i32 %y
	}			}

	@constant_data_as3 = addrspace(3) constant [5 x i32] [i32 1, i32 2, i32 3, i32 4, i32 5]			@constant_data_as3 = addrspace(3) constant [5 x i32] [i32 1, i32 2, i32 3, i32 4, i32 5]
	Show All 40 Lines

llvm/test/Transforms/InstCombine/constant-fold-gep.ll

Show All 9 Lines
@Y = internal global [3 x %struct.X] zeroinitializer		@Y = internal global [3 x %struct.X] zeroinitializer

define void @frob() {		define void @frob() {
; CHECK-LABEL: @frob(		; CHECK-LABEL: @frob(
; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 0), align 16		; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 0), align 16
; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 1), align 4		; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 1), align 4
; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 2), align 8		; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 2), align 8
; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 1, i64 0), align 4		; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 1, i64 0), align 4
; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 1, i64 1), align 4		; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 1, i64 1), align 16
; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 1, i64 2), align 4		; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 1, i64 2), align 4
; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 1, i32 0, i64 0), align 8		; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 1, i32 0, i64 0), align 8
; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 1, i32 0, i64 1), align 4		; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 1, i32 0, i64 1), align 4
; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 1, i32 0, i64 2), align 8		; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 1, i32 0, i64 2), align 16
; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 1, i32 1, i64 0), align 4		; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 1, i32 1, i64 0), align 4
; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 1, i32 1, i64 1), align 4		; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 1, i32 1, i64 1), align 8
; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 1, i32 1, i64 2), align 4		; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 1, i32 1, i64 2), align 4
; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 2, i32 0, i64 0), align 16		; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 2, i32 0, i64 0), align 16
; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 2, i32 0, i64 1), align 4		; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 2, i32 0, i64 1), align 4
; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 2, i32 0, i64 2), align 8		; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 2, i32 0, i64 2), align 8
; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 2, i32 1, i64 0), align 8		; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 2, i32 1, i64 0), align 4
; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 2, i32 1, i64 1), align 8		; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 2, i32 1, i64 1), align 16
; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 2, i32 1, i64 2), align 8		; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 2, i32 1, i64 2), align 4
; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 1, i64 0, i32 0, i64 0), align 8		; CHECK-NEXT: store i32 1, i32* getelementptr inbounds ([3 x %struct.X], [3 x %struct.X]* @Y, i64 1, i64 0, i32 0, i64 0), align 8
; CHECK-NEXT: store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 2, i64 0, i32 0, i64 0), align 16		; CHECK-NEXT: store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 2, i64 0, i32 0, i64 0), align 16
; CHECK-NEXT: store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 1, i64 0, i32 0, i64 1), align 8		; CHECK-NEXT: store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 1, i64 0, i32 0, i64 1), align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 0), align 4		store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 0), align 4
store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 1), align 4		store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 1), align 4
store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 2), align 4		store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 2), align 4
store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 3), align 4		store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 3), align 4
store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 4), align 4		store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 4), align 4
store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 5), align 4		store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 5), align 4
store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 6), align 4		store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 6), align 4
store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 7), align 4		store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 7), align 4
store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 8), align 4		store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 8), align 4
store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 9), align 4		store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 9), align 4
store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 10), align 4		store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 10), align 4
store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 11), align 4		store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 11), align 4
store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 12), align 4		store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 12), align 4
store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 13), align 4		store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 13), align 4
store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 14), align 8		store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 14), align 8
store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 15), align 8		store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 15), align 4
store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 16), align 8		store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 16), align 8
store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 17), align 8		store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 17), align 4
store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 18), align 8		store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 18), align 8
		qcolombetAuthorUnsubmitted Done Reply Inline Actions Note: The change in the input of the test is because the alignment was not consistent with the base pointer. On this store the alignment cannot be 8, since the base pointer is aligned on 16 (17 * 4 + 16 == 17 * 4 + 4 * 4 == 21 * 4 == 10 * 8 + 4 i.e., remainder with 8 cannot be zero). More easily seen looking at the previous load: @Y + 16 is aligned to 8 then we add 4 to make @Y + 17, and we claimed it was aligned to 8. qcolombet: Note: The change in the input of the test is because the alignment was not consistent with the…
		aqjuneUnsubmitted Not Done Reply Inline Actions I agree that the changes were necessary, otherwise the stores should always raise UB due to the alignment mismatch. It seems that the last three stores below are always raising UB as well - the size of `@Y` is 72 bytes, so the accesses below are out-of-bounds. Should we treat these in this patch (by removing the three lines or expanding the size of the global variable), or is it okay to just leave them as they were? aqjune: I agree that the changes were necessary, otherwise the stores should always raise UB due to the…
		qcolombetAuthorUnsubmitted Done Reply Inline Actions I think the idea here is that the test was checking whether the inbounds keyword was added. qcolombet: I think the idea here is that the test was checking whether the inbounds keyword was added.
store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 36), align 8		store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 36), align 8
store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 19), align 8		store i32 1, i32* getelementptr ([3 x %struct.X], [3 x %struct.X]* @Y, i64 0, i64 0, i32 0, i64 19), align 8
ret void		ret void
}		}


; PR8883 - Constant fold exotic gep subtract		; PR8883 - Constant fold exotic gep subtract
@X = global [1000 x i8] zeroinitializer, align 16		@X = global [1000 x i8] zeroinitializer, align 16
Show All 29 Lines	entry:

%B2 = ptrtoint i8 addrspace(1)* %B to i16		%B2 = ptrtoint i8 addrspace(1)* %B to i16
%C = sub i16 0, %B2		%C = sub i16 0, %B2
%D = getelementptr i8, i8 addrspace(1)* %A, i16 %C		%D = getelementptr i8, i8 addrspace(1)* %A, i16 %C
%E = ptrtoint i8 addrspace(1)* %D to i16		%E = ptrtoint i8 addrspace(1)* %D to i16

ret i16 %E		ret i16 %E
}		}

		; Check that we improve the alignment information.
		; The base pointer is 16-byte aligned and we access the field at
		; an offset of 8-byte.
		; Every element in the @CallerInfos array is 16-byte aligned so
		; any access from the following gep is 8-byte aligned.
		%struct.CallerInfo = type { i8*, i32 }
		@CallerInfos = global [128 x %struct.CallerInfo] zeroinitializer, align 16

		; CHECK-LABEL: @test_gep_in_struct(
		; CHECK; load i32, i32* %NS7, align 8
		nikicUnsubmitted Done Reply Inline Actions Typo: `CHECK;` instead of `CHECK:`. I would suggest to simply rerun update_test_checks.py instead. nikic: Typo: `CHECK;` instead of `CHECK:`. I would suggest to simply rerun update_test_checks.py…
		qcolombetAuthorUnsubmitted Done Reply Inline Actions Oh wow! Sorry about that and nice catch! Ran the update script (when I originally wrote the test, the checks were not autogenerated.) qcolombet: Oh wow! Sorry about that and nice catch! Ran the update script (when I originally wrote the…
		define i32 @test_gep_in_struct(i64 %idx) {
		%NS7 = getelementptr inbounds [128 x %struct.CallerInfo], [128 x %struct.CallerInfo]* @CallerInfos, i64 0, i64 %idx, i32 1
		%res = load i32, i32* %NS7, align 1
		ret i32 %res
		qcolombetAuthorUnsubmitted Done Reply Inline Actions Test case for the structure access bug found during pretesting. With the offset calculation bug of the previous version of the patch we would have got `align 4`. qcolombet: Test case for the structure access bug found during pretesting. With the offset calculation bug…
		}

llvm/unittests/Analysis/ValueTrackingTest.cpp

Show First 20 Lines • Show All 1,202 Lines • ▼ Show 20 Lines	TEST_F(ComputeKnownBitsTest, ComputeKnownBitsAddWithRangeNoOverlap) {
EXPECT_EQ(Known.Zero.getZExtValue(), ~512llu & ~(64llu - 1));		EXPECT_EQ(Known.Zero.getZExtValue(), ~512llu & ~(64llu - 1));
EXPECT_EQ(Known.One.getZExtValue(), 512u \| 32u);		EXPECT_EQ(Known.One.getZExtValue(), 512u \| 32u);
// The known range is not precise given computeKnownBits works		// The known range is not precise given computeKnownBits works
// with the masks of zeros and ones, not the ranges.		// with the masks of zeros and ones, not the ranges.
EXPECT_EQ(Known.getMinValue(), 544);		EXPECT_EQ(Known.getMinValue(), 544);
EXPECT_EQ(Known.getMaxValue(), 575);		EXPECT_EQ(Known.getMaxValue(), 575);
}		}

		TEST_F(ComputeKnownBitsTest, ComputeKnownBitsGEPWithRange) {
		parseAssembly(
		"define void @test(i64* %p) {\n"
		" %A = load i64, i64* %p, !range !{i64 64, i64 65536}\n"
		" %APtr = inttoptr i64 %A to float*"
		" %APtrPlus512 = getelementptr float, float* %APtr, i32 128\n"
		" %c = icmp ugt float* %APtrPlus512, inttoptr (i32 523 to float*)\n"
		" call void @llvm.assume(i1 %c)\n"
		" ret void\n"
		"}\n"
		"declare void @llvm.assume(i1)\n");
		AssumptionCache AC(*F);
		KnownBits Known = computeKnownBits(A, M->getDataLayout(), /* Depth */ 0, &AC,
		F->front().getTerminator());
		EXPECT_EQ(Known.Zero.getZExtValue(), ~(65536llu - 1));
		EXPECT_EQ(Known.One.getZExtValue(), 0u);
		Instruction &APtrPlus512 = findInstructionByName(F, "APtrPlus512");
		Known = computeKnownBits(&APtrPlus512, M->getDataLayout(), /* Depth */ 0, &AC,
		F->front().getTerminator());
		// We know of one less zero because 512 may have produced a 1 that
		// got carried all the way to the first trailing zero.
		EXPECT_EQ(Known.Zero.getZExtValue(), ~(65536llu - 1) << 1);
		EXPECT_EQ(Known.One.getZExtValue(), 0u);
		// The known range is not precise given computeKnownBits works
		// with the masks of zeros and ones, not the ranges.
		EXPECT_EQ(Known.getMinValue(), 0u);
		EXPECT_EQ(Known.getMaxValue(), 131071);
		}

		// 4*128 + [32, 64) doesn't produce overlapping bits.
		// Make sure we get all the individual bits properly.
		// This test is useful to check that we account for the scaling factor
		// in the gep. Indeed, gep float, [32,64), 128 is not 128 + [32,64).
		TEST_F(ComputeKnownBitsTest, ComputeKnownBitsGEPWithRangeNoOverlap) {
		parseAssembly(
		"define void @test(i64* %p) {\n"
		" %A = load i64, i64* %p, !range !{i64 32, i64 64}\n"
		" %APtr = inttoptr i64 %A to float*"
		" %APtrPlus512 = getelementptr float, float* %APtr, i32 128\n"
		" %c = icmp ugt float* %APtrPlus512, inttoptr (i32 523 to float*)\n"
		" call void @llvm.assume(i1 %c)\n"
		" ret void\n"
		"}\n"
		"declare void @llvm.assume(i1)\n");
		AssumptionCache AC(*F);
		KnownBits Known = computeKnownBits(A, M->getDataLayout(), /* Depth */ 0, &AC,
		F->front().getTerminator());
		EXPECT_EQ(Known.Zero.getZExtValue(), ~(64llu - 1));
		EXPECT_EQ(Known.One.getZExtValue(), 32u);
		Instruction &APtrPlus512 = findInstructionByName(F, "APtrPlus512");
		Known = computeKnownBits(&APtrPlus512, M->getDataLayout(), /* Depth */ 0, &AC,
		F->front().getTerminator());
		EXPECT_EQ(Known.Zero.getZExtValue(), ~512llu & ~(64llu - 1));
		EXPECT_EQ(Known.One.getZExtValue(), 512u \| 32u);
		// The known range is not precise given computeKnownBits works
		// with the masks of zeros and ones, not the ranges.
		EXPECT_EQ(Known.getMinValue(), 544);
		EXPECT_EQ(Known.getMaxValue(), 575);
		}

class IsBytewiseValueTest : public ValueTrackingTest,		class IsBytewiseValueTest : public ValueTrackingTest,
public ::testing::WithParamInterface<		public ::testing::WithParamInterface<
std::pair<const char , const char >> {		std::pair<const char , const char >> {
protected:		protected:
};		};

const std::pair<const char , const char > IsBytewiseValueTests[] = {		const std::pair<const char , const char > IsBytewiseValueTests[] = {
{		{
▲ Show 20 Lines • Show All 556 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[ValueTracking] Interpret GEPs as a series of adds multiplied by the related scaling factorClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 299495

llvm/lib/Analysis/ValueTracking.cpp

llvm/test/Transforms/InstCombine/constant-fold-address-space-pointer.ll

llvm/test/Transforms/InstCombine/constant-fold-gep.ll

llvm/unittests/Analysis/ValueTrackingTest.cpp

[ValueTracking] Interpret GEPs as a series of adds multiplied by the related scaling factor
ClosedPublic