Page MenuHomePhabricator

[DAGCombiner] Adjust some checks in DAGCombiner::reduceLoadWidth

Authored by bjope on Jan 15 2022, 2:20 PM.



In code review for D117104 two slightly weird checks were found
in DAGCombiner::reduceLoadWidth. They were typically checking
if BitsA was a mulitple of BitsB by looking at (BitsA & (BitsB - 1)),
but such a comparison actually only make sense if BitsB is a power
of two.

The checks were related to the code that attempted to shrink a load
based on the fact that the loaded value would be right shifted.

Afaict the legality of the value types is checked later (typically in
isLegalNarrowLdSt), so the existing checks were both overly
conservative as well as being wrong whenever ExtVTBits wasn't a
power of two. The latter was a situation triggered by a number of
lit tests so we could not just assert on ExtVTBIts being a power of

When attempting to simply remove the checks I found some problems,
that seems to have been guarded by the checks (maybe just out of
luck). A typical example would be a pattern like this:

t1 = load i96* ptr
t2 = srl t1, 64
t3 = truncate t2 to i64

When DAGCombine is visiting the truncate reduceLoadWidth is called
attempting to narrow the load to 64 bits (ExtVT := MVT::i64). Then
the SRL is detected and we set ShAmt to 64.

In the past we've bailed out due to i96 not being a multiple of 64.
If we simply remove that check then we would end up replacing the
load with a new load that would read 64 bits but with a base pointer
adjusted by 64 bits. So we would read 32 bits the wasn't accessed by
the original load.
This patch will instead utilize the fact that the logical left shift
can be folded away by using a zextload. Thus, the pattern above will
now be combined into

t3 = load i32* ptr+offset, zext to i64

Another case is shown in the new X86/combine-srl-load.ll test case:

t1 = load i32* ptr
t2 = srl i32 t1, 8
t3 = truncate t2 to i16

In the past we bailed out due to the shift count (8) not being a
multiple of 16. Now the narrowing kicks in and we get

t3 = load i16* ptr+offset

Diff Detail

Event Timeline

bjope created this revision.Jan 15 2022, 2:20 PM
bjope requested review of this revision.Jan 15 2022, 2:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 15 2022, 2:20 PM
spatel added inline comments.Jan 17 2022, 12:05 PM
6 ↗(On Diff #400319)

This is a generic missed fold. I added more tests and a proposed fix:

bjope updated this revision to Diff 400773.Jan 18 2022, 2:03 AM

Rebased (now including the BSWAP fold from D117508 to avoid regression in a PPC test case).

Also added a new X86 test as a regression test for new fold when the shift
count isn't a multiple of the narrowed load width.

bjope edited the summary of this revision. (Show Details)Jan 18 2022, 2:06 AM
bjope added inline comments.Jan 18 2022, 2:10 AM
7 ↗(On Diff #400773)

I've assumed that DAGCombiner::isLegalNarrowLdSt would have bailed out if this would result in a unaligned memory access for the target (it does use TLI.allowsMemoryAccess).

Had perhaps been nice with a test case for some target where this would matter. Any suggestions?

bjope added inline comments.Jan 18 2022, 2:27 AM
7 ↗(On Diff #400773)

Well, I've verified it by using riscv64 as a target. Not sure if I need to add a test case for that.

spatel added inline comments.Jan 18 2022, 9:45 AM

happend -> happen


Why set these variables if we are exiting the function? Can we specify in the comment which function does the narrowing of the load for this pattern?

4 ↗(On Diff #400773)

It would better to pre-commit this test with the baseline CHECKs. I don't think there's a specific test file for x86 used for verifying this type of fold, but "shift-folding.ll" has a test that looks similar, so you could add it to that file.

bjope added inline comments.Jan 18 2022, 10:32 AM

Oh, that return statement looks like a mistake. I started out by simply returning here, but then I realized that it should be OK to use ZEXTLOAD here, as long as that assert never triggers about replacing sextload by zextload. The idea was to remove the return statement again, not including it in the patch.

The example with (i64 (truncate (i96 (srl (load x), 64)))) would however be optimzed also without this special case. But then it would start off by first rewriting srl+load into a larger zextload, which later is combined with the truncate to form a smaller zextload. By removing the return here we would trigger the full fold already based on the truncate.

bjope updated this revision to Diff 400915.Jan 18 2022, 10:53 AM
bjope edited the summary of this revision. (Show Details)

Rebased given parent patch with test cases for pre-commit (D117588).

spatel accepted this revision.Jan 18 2022, 12:06 PM


This revision is now accepted and ready to land.Jan 18 2022, 12:06 PM
This revision was landed with ongoing or failed builds.Jan 24 2022, 3:24 AM
This revision was automatically updated to reflect the committed changes.