This is an archive of the discontinued LLVM Phabricator instance.

[IRBuilder] Introduce folder using inst-simplify, use for Or fold.
ClosedPublic

Authored by fhahn on Jan 10 2022, 5:53 AM.

Details

Summary

Alternative to D116817.

This introduces a new value-based folding interface for Or (FoldOr),
which takes 2 values and returns an existing Value or a constant if the
Or can be simplified. Otherwise nullptr is returned. This replaces the
more restrictive CreateOr which takes 2 constants.

This is the used to implement a folder that uses InstructionSimplify.
The logic to simplify Or instructions is moved there. Subsequent
patches are going to transition other CreateXXX to the more general
FoldXXX interface.

Diff Detail

Event Timeline

fhahn created this revision.Jan 10 2022, 5:53 AM
fhahn requested review of this revision.Jan 10 2022, 5:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 10 2022, 5:53 AM

Thanks for looking into this.
This seems conceptually fine to me.

llvm/include/llvm/Analysis/InstSimplifyFolder.h
53–55

I don't think you need to do this explicitly?

nikic added inline comments.Jan 10 2022, 6:11 AM
llvm/include/llvm/Analysis/InstSimplifyFolder.h
55

InstSimplify already delegates to target folding for constant operands. I think you should not explicitly handle this here, and InstSimplifyFolder shouldn't accept a ConstantFolder either. For the transitional phase you can hardcode use of TargetFolder.

fhahn updated this revision to Diff 398619.Jan 10 2022, 6:53 AM

Remove folder argument, always use TargetFolder. This caused a change in llvm/test/Transforms/SROA/basictest-opaque-ptrs.ll because it now uses the TargetFolder for SROA.

fhahn added inline comments.Jan 10 2022, 6:56 AM
llvm/include/llvm/Analysis/InstSimplifyFolder.h
55

Removed. The only reason I made the constant folder an argument was that SROA was only using the regular ConstantFolder before. With TargetFolder being now used unconditionally, it also applies target-specific constant folding, but I guess that should be fine.

lebedev.ri accepted this revision.Jan 10 2022, 7:00 AM

LGTM, thanks, but would be good to have a second review.

This revision is now accepted and ready to land.Jan 10 2022, 7:00 AM
nikic added a comment.Jan 10 2022, 7:08 AM

Why do we need / want to use the InstSimplifyFolder in SROA?

fhahn updated this revision to Diff 398640.Jan 10 2022, 7:42 AM

Updated a bunch of failing clang tests that now have redundant ORs (after removing the fold from IRBuilder::CreateOr).

Why do we need / want to use the InstSimplifyFolder in SROA?

I don't have any strong opinions either way. The reason I updated SROA to use it was that there are a few tests that otherwise create some redundant ORs that otherwise are simplified.

Herald added a project: Restricted Project. · View Herald Transcript
nikic added a comment.Jan 10 2022, 7:55 AM

Why do we need / want to use the InstSimplifyFolder in SROA?

I don't have any strong opinions either way. The reason I updated SROA to use it was that there are a few tests that otherwise create some redundant ORs that otherwise are simplified.

I'd go for "do not simplify" as the default assumption, if we don't have any particular motivation to the contrary.

fhahn updated this revision to Diff 398647.Jan 10 2022, 8:00 AM

Do not use InstSimplifyFolder for SROA, update llvm/test/Transforms/SROA/basictest.ll & llvm/test/Transforms/SROA/basictest-opaque-ptrs.ll

lebedev.ri accepted this revision.Jan 10 2022, 8:11 AM
nikic accepted this revision.Jan 10 2022, 8:17 AM

LGTM

If I remember correctly, the Or and And folds for 0 and all 1s are there to optimize bitfield codegen for the frontend. Are we losing that optimization if this patch goes in before switching clang to InstSimplifyFolder?

If I remember correctly, the Or and And folds for 0 and all 1s are there to optimize bitfield codegen for the frontend. Are we losing that optimization if this patch goes in before switching clang to InstSimplifyFolder?

True, but why is that optimization there in the first place?
Presumably it shouldn't matter nowadays.

craig.topper added a comment.EditedJan 10 2022, 10:50 AM

If I remember correctly, the Or and And folds for 0 and all 1s are there to optimize bitfield codegen for the frontend. Are we losing that optimization if this patch goes in before switching clang to InstSimplifyFolder?

True, but why is that optimization there in the first place?
Presumably it shouldn't matter nowadays.

Probably for -O0 where there is no optimizer,

commit 0fc4311f0ea4034b4bad1c82c652dc04bb763cda
Author: Chris Lattner <sabre@nondot.org>
Date:   Wed Nov 4 05:00:12 2009 +0000

    make IRBuilder zap "X|0" and "X&-1" when building IR, this happens
    during bitfield codegen and slows down -O0 compile times by making
    useless IR.  rdar://7362516
    
    llvm-svn: 86006
fhahn updated this revision to Diff 398722.Jan 10 2022, 12:52 PM

Rebased and made sure comments are consistent between different folders.

If I remember correctly, the Or and And folds for 0 and all 1s are there to optimize bitfield codegen for the frontend. Are we losing that optimization if this patch goes in before switching clang to InstSimplifyFolder?

True, but why is that optimization there in the first place?
Presumably it shouldn't matter nowadays.

Probably for -O0 where there is no optimizer,

commit 0fc4311f0ea4034b4bad1c82c652dc04bb763cda
Author: Chris Lattner <sabre@nondot.org>
Date:   Wed Nov 4 05:00:12 2009 +0000

    make IRBuilder zap "X|0" and "X&-1" when building IR, this happens
    during bitfield codegen and slows down -O0 compile times by making
    useless IR.  rdar://7362516

    llvm-svn: 86006

I collected numbers using the compile-time tracker and the data seems to indicate this is not an issue any longer, there are no notable changes for any config: http://llvm-compile-time-tracker.com/compare.php?from=baa5b54d43419b5d4a1ec36f3788c36d52187e11&to=e22fc3bb9b024699fdfd12b89c68205565215b27&stat=instructions

Rebased and made sure comments are consistent between different folders.

If I remember correctly, the Or and And folds for 0 and all 1s are there to optimize bitfield codegen for the frontend. Are we losing that optimization if this patch goes in before switching clang to InstSimplifyFolder?

True, but why is that optimization there in the first place?
Presumably it shouldn't matter nowadays.

Probably for -O0 where there is no optimizer,

commit 0fc4311f0ea4034b4bad1c82c652dc04bb763cda
Author: Chris Lattner <sabre@nondot.org>
Date:   Wed Nov 4 05:00:12 2009 +0000

    make IRBuilder zap "X|0" and "X&-1" when building IR, this happens
    during bitfield codegen and slows down -O0 compile times by making
    useless IR.  rdar://7362516

    llvm-svn: 86006

I collected numbers using the compile-time tracker and the data seems to indicate this is not an issue any longer, there are no notable changes for any config: http://llvm-compile-time-tracker.com/compare.php?from=baa5b54d43419b5d4a1ec36f3788c36d52187e11&to=e22fc3bb9b024699fdfd12b89c68205565215b27&stat=instructions

Do the programs in compile-time tracker make much use of bitfields? Is there any indication in rdar://7362516 what program needed this? It looks like assigning a bitfield to 0 is enough to generate an or with 0 that survives through fast isel at -O0 on X86. But a few assignments to 0 wouldn't make for a huge problem, so I imagine there must have been something pathological about some program.

Do the programs in compile-time tracker make much use of bitfields?

I'd expect some bitfields, but nothing excessive/pathological.

Is there any indication in rdar://7362516 what program needed this? It looks like assigning a bitfield to 0 is enough to generate an or with 0 that survives through fast isel at -O0 on X86. But a few assignments to 0 wouldn't make for a huge problem, so I imagine there must have been something pathological about some program.

I had a look at rdar://7362516 and there is nothing in the issue & history that seems to indicate that the motivation was a pathological case, but just a very simple struct with 4 bitfields. Clang generates a lot of other redundant IR that doesn't get folded by IRBuilder (e.g. add or shifts of 0), which survive in a similar way with fast isel. AFAICT there's nothing that would indicate that or X, 0 is a special case that would warrant special treatment. rdar://7362516 also mentions that IRBuilder should fold shifts with 0 as well, but it looks like that's not happening at the moment either.

I am inclined to remove the special case from IRBuilder, which applies the 'IRBuilder should not optimize directly, only via a specified folder' policy also to CreateOr. I don't think we should use InstSimplifyFolder in Clang, because Clang usually doesn't try very hard to optimize IR and InstructionSimplify provides a lot of folds. But if we get additional feedback after the commit that the or X, 0 fold is very valuable for Clang, this can be provided to Clang via a custom ClangFolder.

fhahn updated this revision to Diff 398881.Jan 11 2022, 2:41 AM

Another rebase. I am planning to land this later today, unless there are additional concerns with respect to the fold in Clang after my latest response.

Another rebase. I am planning to land this later today, unless there are additional concerns with respect to the fold in Clang after my latest response.

Thanks, Florian. I have no additional concerns.

Allen added a subscriber: Allen.Jul 13 2022, 7:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2022, 7:15 AM