Page MenuHomePhabricator

Allow targets to augment computeKnownBits with their analysis using TargetTransformInfo
AbandonedPublic

Authored by qcolombet on Sep 8 2020, 8:32 PM.

Details

Summary

This is a proof of concept that shows how we could have the targets provide more information for computeKnownBits analysis.

This was motivated by the discussion in D86364, where we could make the computeKnownBits analysis smarter but it seems that the compile time would not be worth it for all the targets. In other words, this patch shows how we could allow the targets to put some extract effort on the computeKnownBits analysis.

Diff Detail

Unit TestsFailed

TimeTest
90 mslinux > LLVM.CodeGen/AMDGPU::opt-pipeline.ll
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/bin/opt -O0 -mtriple=amdgcn--amdhsa -disable-output -disable-verify -debug-pass=Structure /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll 2>&1 | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck -check-prefix=GCN-O0 /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/AMDGPU/opt-pipeline.ll
60 mswindows > LLVM.CodeGen/AArch64/GlobalISel::combine-trunc.mir
Script: -- : 'RUN: at line 2'; c:\ws\w16n2-1\llvm-project\premerge-checks\build\bin\llc.exe -o - -mtriple=aarch64-unknown-unknown -run-pass=aarch64-prelegalizer-combiner -verify-machineinstrs C:\ws\w16n2-1\llvm-project\premerge-checks\llvm\test\CodeGen\AArch64\GlobalISel\combine-trunc.mir | c:\ws\w16n2-1\llvm-project\premerge-checks\build\bin\filecheck.exe C:\ws\w16n2-1\llvm-project\premerge-checks\llvm\test\CodeGen\AArch64\GlobalISel\combine-trunc.mir
120 mswindows > LLVM.CodeGen/AMDGPU::opt-pipeline.ll
Script: -- : 'RUN: at line 1'; c:\ws\w16n2-1\llvm-project\premerge-checks\build\bin\opt.exe -O0 -mtriple=amdgcn--amdhsa -disable-output -disable-verify -debug-pass=Structure C:\ws\w16n2-1\llvm-project\premerge-checks\llvm\test\CodeGen\AMDGPU\opt-pipeline.ll 2>&1 | c:\ws\w16n2-1\llvm-project\premerge-checks\build\bin\filecheck.exe -check-prefix=GCN-O0 C:\ws\w16n2-1\llvm-project\premerge-checks\llvm\test\CodeGen\AMDGPU\opt-pipeline.ll

Event Timeline

qcolombet created this revision.Sep 8 2020, 8:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 8 2020, 8:32 PM
qcolombet requested review of this revision.Sep 8 2020, 8:32 PM

The bulk of the changes are plumbing TTI into the computeKnownBits APIs. On top of that, clang-format did its magic and there are a lot of changes.

llvm/unittests/Analysis/ValueTrackingTest.cpp
1124

This illustrates how the target can put some extract effort on some instructions.

1248

Note: This test illustrates how it would work with a custom TTI but is not commit-able as is because it relies on BasicTTIImplBase, which is part of the CodeGen library.

qcolombet updated this revision to Diff 290634.Sep 8 2020, 9:05 PM
  • Use TargetTransformInfoImplCRTPBase instead of BasicTTIImpl to avoid requiring the CodeGen library for the unit tests
llvm/unittests/Analysis/ValueTrackingTest.cpp
1248

Fixed that part.

qcolombet retitled this revision from WIP: Plumb TargetTransformInfo into computeKnownBits to Allow targets to augment computeKnownBits with their analysis using TargetTransformInfo.

Clean up the proof of concept:

  • Add the proper includes
  • Fixup clang-format acting up
qcolombet updated this revision to Diff 292026.Sep 15 2020, 2:23 PM
  • Remove const_cast that was a left over from the proof of concept
qcolombet updated this revision to Diff 292053.Sep 15 2020, 4:31 PM
  • Fix a few call sites where I was passing TTI instead of the boolean for UseInstrInfo (it doesn't help that the compiler didn't warm on these)

Now, it should really be NFC, unless you set the TTI, like in the unit test.

Gentle ping @nikic, @aqjune, @spatel, @lebedev.ri, @RKSimon.

Being able to have the target provide more information is very useful to us.
This commit is NFC (or at least if should be if I didn't screw something up) when the target doesn't override the computeKnownBits method or when the TTI is not passed around.

Ping @fhahn, @nikic, @aqjune, @spatel, @lebedev.ri, @RKSimon

It's been almost a month and nobody commented on the approach.

Not again that the patch is NFC when the TTI is not set. Is it okay to proceed?

RKSimon accepted this revision.Mon, Oct 5, 11:54 AM

Sorry for the delay @qcolombet - I think the sheer size of the patch kept putting me off! But you're right, most of its trivial plumbing. There's a few bits that it'd be great to (pre-)commit separately if possible.

llvm/include/llvm/Analysis/TargetTransformInfo.h
52

sorting

llvm/lib/Analysis/ValueTracking.cpp
468

Pull this out into its own commit

llvm/unittests/Analysis/ValueTrackingTest.cpp
1127

These 2 tests above look like they can pre-committed?

This revision is now accepted and ready to land.Mon, Oct 5, 11:54 AM
nikic added a comment.Mon, Oct 5, 12:01 PM

I do have a concern about the general direction here: When we allowed targets to hook into InstCombine, one of the hard design constraints was that the target is only allowed to affect combines for target intrinsics, and nothing else. This patch seems to go against that restriction by allowing to replace general analysis behavior and affect otherwise target-independent folds. Maybe @spatel and @lebedev.ri have something to say regarding that.

I do have a concern about the general direction here: When we allowed targets to hook into InstCombine, one of the hard design constraints was that the target is only allowed to affect combines for target intrinsics, and nothing else. This patch seems to go against that restriction by allowing to replace general analysis behavior and affect otherwise target-independent folds. Maybe @spatel and @lebedev.ri have something to say regarding that.

Yep, i don't really like that.

I do have a concern about the general direction here: When we allowed targets to hook into InstCombine, one of the hard design constraints was that the target is only allowed to affect combines for target intrinsics, and nothing else. This patch seems to go against that restriction by allowing to replace general analysis behavior and affect otherwise target-independent folds. Maybe @spatel and @lebedev.ri have something to say regarding that.

Sorry for the delayed reply. I skimmed the earlier patch and this one now, and I agree, this could open the door to unintended behavior for seemingly target-independent passes.
It's not clear to me if there's anything other than the GEP example planning to use an override. Can we create a dedicated GEP analysis function that would limit the compile-time impact? Or could we use/create some option that would enable the more expensive analysis selectively?

RKSimon requested changes to this revision.Mon, Oct 5, 1:51 PM

reopening

This revision now requires changes to proceed.Mon, Oct 5, 1:51 PM

@nikic, @spatel, @lebedev.ri, @RKSimon, thank you for your feedbacks!

I do have a concern about the general direction here: When we allowed targets to hook into InstCombine, one of the hard design constraints was that the target is only allowed to affect combines for target intrinsics, and nothing else. This patch seems to go against that restriction by allowing to replace general analysis behavior and affect otherwise target-independent folds.

To be honest, I don't really like it either, but given it looks like only us care about precise GEPs, I don't see how we can reconcile both constrains.

That said, this patch doesn't affect the combines per se, since this patch doesn't modify instcombine. It only affects the computeKnownBits analysis (i.e., it is more a complicated version of what in spirit @spatel is suggesting with a dedicated GEP analysis for the precise case.) and I was not planning to have instcombine taking advantage of this. But yes, this opens that door.

It's not clear to me if there's anything other than the GEP example planning to use an override.

I personally don't plan to have any other override here, but that doesn't mean that other needs may arise in the future.

Can we create a dedicated GEP analysis function that would limit the compile-time impact?

What do you have in mind?

Or could we use/create some option that would enable the more expensive analysis selectively?

I have to admit I am not a fan of that because it opens the door of having to do the same for pretty much everything that people would consider too expensive for their target. Put differently why GEPs are different?

Cheers,
-Quentin

qcolombet added inline comments.Mon, Oct 5, 1:54 PM
llvm/unittests/Analysis/ValueTrackingTest.cpp
1127

Correct.

TargetTransformInfo exists because some IR transforms really don't work without the extra information. For example, vectorization doesn't make sense without some target-specific parameters. But the further we go down the path of target-specific tweaks, the harder it becomes to make any changes at all: you end up with bugs that can only be reproduced on on specific targets, and the impacts on compile time and code quality become harder to predict. This is why, for example, it's really hard to make changes to DAGCombine.

I think changing computeKnownBits to allow arbitrary target-controlled computation is going too far on the side of unpredictability; it's used by a bunch of passes in contexts you wouldn't really expect to behave in target-specific ways.

spatel added a comment.Tue, Oct 6, 9:57 AM

Can we create a dedicated GEP analysis function that would limit the compile-time impact?

What do you have in mind?

I didn't check the details, so this might be DOA...but I see that we have things like llvm::computeOverflowForSignedAdd() that are specializations/wrappers around the generic computeKnownBits(). If the motivating cases all start with a gep, then could we add the extra logic to a wrapper around the regular computeKnownBits()? That wrapper is only called from places like InstCombinerImpl::visitGetElementPtrInst(), so it would limit the compile-time increase?

If the motivating cases all start with a gep, then could we add the extra logic to a wrapper around the regular computeKnownBits()?

Unfortunately the motivating example doesn't start with a gep, or at least, there is no guarantee it would start with a gep.

Anyway, I can already push the NFC changes like the added tests cases and refactoring of KnownBits.

In the meantime, any other idea on how we could have GEPs being more precise?
Should I resurrect https://reviews.llvm.org/D86364?

If the motivating cases all start with a gep, then could we add the extra logic to a wrapper around the regular computeKnownBits()?

Unfortunately the motivating example doesn't start with a gep, or at least, there is no guarantee it would start with a gep.

Anyway, I can already push the NFC changes like the added tests cases and refactoring of KnownBits.

Sure, having those in place will help to focus on the real diffs.

In the meantime, any other idea on how we could have GEPs being more precise?
Should I resurrect https://reviews.llvm.org/D86364?

Ok - I think that's the theoretically/philosophically right way to proceed (target-independent). I don't have any ideas yet on how to limit the cost, but maybe we can collectively find some more savings in an update of that patch.

I don't have any ideas yet on how to limit the cost, but maybe we can collectively find some more savings in an update of that patch.

Sounds like a plan :).

Thanks!

(I'll add the gep test in a separate PR to demonstrate the lack of precision for these.)

qcolombet abandoned this revision.Tue, Oct 6, 4:35 PM

Abandoning this diff in favor of D86364 based on our discussion.