This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
InitializePasses.h
-
Transforms/
-
Scalar.h
-
Scalar/
-
LoopIntWrapPredication.h
-
lib/
-
Passes/
-
PassBuilder.cpp
-
PassBuilderPipelines.cpp
-
PassRegistry.def
-
Transforms/Scalar/
-
Scalar/
-
CMakeLists.txt
4/10
LoopIntWrapPredication.cpp
-
Scalar.cpp
-
test/Transforms/LoopIntWrapPredication/
-
Transforms/
-
LoopIntWrapPredication/
-
2d-array-linear.ll
-
2d-array-transposed.ll
-
add-to-sub.ll
-
basic.ll
-
non-invariant-trip-count.ll

Differential D132208

[LoopIntWrapPredication] Loop Integer Wrapping Predication Pass
Needs ReviewPublic

Authored by kachkov98 on Aug 19 2022, 12:57 AM.

Download Raw Diff

Details

Reviewers

anton-afanasyev
kito-cheng
asb
craig.topper
reames

Summary

Motivation for the LoopIntWrapPredication pass
Consider the following example:

for (unsigned i = 0; i < N; ++i)
  for (unsigned j = 0; j < N; ++j)
    C[i*N+j] = foo();

With C pointer size = 64 bit and i, j variables with 32-bit size. According to the C standard, unsigned overflow is defined behavior, so i*N+j calculation will be done with 32-bit types, zero-extented to 64 bit and will be used as offset in GEP instruction. However, if we replace induction variables types to signed integer, this calculation will have nsw flag, and IndVarSimplify pass will be able to promote both induction variables to 64-bit types and get rid of sext inside hot loop. Since we can't do the same thing with unsigned variables, but it's quite common pattern in code, we try to do versioning of this loop: generate some runtime check that ensures that overflow will never occur, and set NUW flags on this chain of address calculation. We use scalar evolution to get possible range of this expression and insert loop-invariant condition that this range is not overflowing. To simplify pass, we don't do loop versioning directly, but inserting branching code for this calculation chain inside the loop, relying that this branch will be unswitched by the subsequent pass (because this branch has loop-invariant condition). This how it will look like in pseudo-code:

for (unsigned i = 0; i < N; ++i)
  for (unsigned j = 0; j < N; ++j)
    if (overflow(N*N))
      *(C + zext(i*N+j)) = foo();
    else
      *(C + zext(i*N+j /*nuw*/)) = foo();

After unswitching:

if (overflow(N*N))
  for (unsigned i = 0; i < N; ++i)
    for (unsigned j = 0; j < N; ++j)
      *(C + zext(i*N+j)) = foo();
else
  for (unsigned i = 0; i < N; ++i)
    for (unsigned j = 0; j < N; ++j)
      *(C + zext(i*N+j /*nuw*/)) = foo();

After IndVarSimplify:

if (overflow(N*N))
  for (unsigned i = 0; i < N; ++i)
    for (unsigned j = 0; j < N; ++j)
      *(C + zext(i*N+j)) = foo();
else
  for (uint64_t i = 0; i < N; ++i)
    for (uint64_t j = 0; j < N; ++j)
      *(C + (i*N+j)) = foo();

Results
This pass shows some good results in Coremark benchmark on our RISC-V hardware, increasing score on 18 %. It has exaclty the same pattern as described before: uses unsigned types for induction variables, that are never overflowed in runtime (this issue was also mentioned here: https://github.com/eembc/coremark/issues/22). Analyzing code of matrix functions (especially matrix_mul_matrix) shows that significant ammount of instructions inside innermost loop are doing this zero extension, but unsigned overflow of array index is never occured.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

kachkov98 created this revision.Aug 19 2022, 12:57 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2022, 12:57 AM

Herald added subscribers: ormris, hiraditya, mgorny. · View Herald Transcript

kachkov98 requested review of this revision.Aug 19 2022, 12:57 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2022, 12:57 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

kachkov98 edited the summary of this revision. (Show Details)Aug 19 2022, 1:20 AM

kachkov98 added a subscriber: anton-afanasyev.

Herald added subscribers: luke957, luismarques, s.egerton and 3 others. · View Herald TranscriptAug 19 2022, 1:20 AM

kachkov98 added reviewers: anton-afanasyev, kito-cheng, asb, craig.topper.Aug 19 2022, 1:28 AM

Herald added a subscriber: StephenFan. · View Herald TranscriptAug 19 2022, 1:28 AM

craig.topper added a reviewer: reames.Aug 19 2022, 1:35 AM

Harbormaster completed remote builds in B182163: Diff 453905.Aug 19 2022, 1:40 AM

kachkov98 edited the summary of this revision. (Show Details)Aug 19 2022, 2:02 AM

test-suite statistics:

Tests: 2428
Metric: loop-int-wrap-predication.NumPredicatedChains

Program                                       loop-int-wrap-predication.NumPredicatedChains                              
MultiSourc.../Applications/JM/ldecod/ldecod    30.00                                       
MultiSourc...chmarks/Prolangs-C/agrep/agrep    18.00                                       
MultiSourc.../Applications/JM/lencod/lencod    11.00                                       
MultiSourc...ch/consumer-lame/consumer-lame     7.00                                       
MultiSourc...e/Applications/ClamAV/clamscan     6.00                                       
MultiSourc...reeBench/fourinarow/fourinarow     5.00                                       
MultiSourc...Benchmarks/7zip/7zip-benchmark     4.00                                       
MultiSourc.../DOE-ProxyApps-C++/CLAMR/CLAMR     4.00                                       
MultiSourc...nch/mpeg2/mpeg2dec/mpeg2decode     2.00                                       
MultiSourc...arks/Rodinia/backprop/backprop     2.00                                       
MultiSourc...ch/consumer-jpeg/consumer-jpeg     1.00                                       
MultiSourc.../mediabench/jpeg/jpeg-6a/cjpeg     1.00                                       
MultiSourc...enchmarks/mafft/pairlocalalign     1.00

In D132208#3734597, @kachkov98 wrote:

test-suite statistics:

Tests: 2428
Metric: loop-int-wrap-predication.NumPredicatedChains

Program                                       loop-int-wrap-predication.NumPredicatedChains                              
MultiSourc.../Applications/JM/ldecod/ldecod    30.00                                       
MultiSourc...chmarks/Prolangs-C/agrep/agrep    18.00                                       
MultiSourc.../Applications/JM/lencod/lencod    11.00                                       
MultiSourc...ch/consumer-lame/consumer-lame     7.00                                       
MultiSourc...e/Applications/ClamAV/clamscan     6.00                                       
MultiSourc...reeBench/fourinarow/fourinarow     5.00                                       
MultiSourc...Benchmarks/7zip/7zip-benchmark     4.00                                       
MultiSourc.../DOE-ProxyApps-C++/CLAMR/CLAMR     4.00                                       
MultiSourc...nch/mpeg2/mpeg2dec/mpeg2decode     2.00                                       
MultiSourc...arks/Rodinia/backprop/backprop     2.00                                       
MultiSourc...ch/consumer-jpeg/consumer-jpeg     1.00                                       
MultiSourc.../mediabench/jpeg/jpeg-6a/cjpeg     1.00                                       
MultiSourc...enchmarks/mafft/pairlocalalign     1.00

Do you have also have numbers on impact of runtime performance on a wider range of benchmarks like SPEC & co?

craig.topper added inline comments.Aug 19 2022, 6:57 PM

llvm/lib/Transforms/Scalar/LoopIntWrapPredication.cpp
71	The description on line 32 mentions `shl`, but `Shl` isn't here
247	What guarantees the loop has a preheader? Placement in the pipeline would probably guarantee it but I'm not sure anything does when running the pass by itself. Unless I missed something.
249	This assert is probably unnecessary. A terminator is required for the Preheader to even be recognized as a predecessor of the loop.
276	proceed -> process?

Review changes

kachkov98 marked 2 inline comments as done.Aug 22 2022, 1:03 AM

kachkov98 added inline comments.

llvm/lib/Transforms/Scalar/LoopIntWrapPredication.cpp
71	It looks like processing of shl is not profitable, since it's not handled by SimplifyIndVar (https://llvm.org/doxygen/SimplifyIndVar_8cpp_source.html#l01551)
247	Loop Pass Manager should run LoopSimplify pass, and this cannonical form ensures that loop has a preheader: https://llvm.org/docs/LoopTerminology.html#loop-simplify-form

Harbormaster completed remote builds in B182517: Diff 454400.Aug 22 2022, 1:56 AM

neat! it'll be great if for loop bounds that don't overflow and can be inferred at compile time, the first version just get DCE'd.

In D132208#3741322, @hiraditya wrote:

neat! it'll be great if for loop bounds that don't overflow and can be inferred at compile time, the first version just get DCE'd.

From the experiment, if SCEV can be proven to has NUW, IndVarSimplifyPass already handles this case and successfully widens induction variable, so it makes sence to insert only non-tirivial runtime checks.

hiraditya added inline comments.Aug 23 2022, 7:33 AM

llvm/lib/Transforms/Scalar/LoopIntWrapPredication.cpp
86	What happens when `Inst->use_empty() == true`?

Is there another induction variable optimization pass that this can be merged with?

kachkov98 added inline comments.Aug 23 2022, 7:45 AM

llvm/lib/Transforms/Scalar/LoopIntWrapPredication.cpp
86	We are checking that Inst has exactly one use (line 82)

craig.topper added inline comments.Aug 23 2022, 8:20 AM

llvm/lib/Transforms/Scalar/LoopIntWrapPredication.cpp
247	I agree that it works if run as part of the Loop Pass Manager. But if you run the pass standalone from the opt command line, it may not be in loop simplify form. The pass either needs to protect itself or require the LoopSimplify analysis.

kachkov98 added inline comments.Aug 23 2022, 8:37 AM

llvm/lib/Transforms/Scalar/LoopIntWrapPredication.cpp

247

I've tried to run only this pass from opt with -debug-pass-manager option, and this is the output:

Running pass: VerifierPass on [module]
Running analysis: VerifierAnalysis on [module]
Running analysis: InnerAnalysisManagerProxy<llvm::FunctionAnalysisManager, llvm::Module> on [module]
Running analysis: PreservedCFGCheckerAnalysis on foo
Running pass: LoopSimplifyPass on foo (14 instructions)
Running analysis: LoopAnalysis on foo
Running analysis: DominatorTreeAnalysis on foo
Running analysis: AssumptionAnalysis on foo
Running analysis: TargetIRAnalysis on foo
Running pass: LCSSAPass on foo (14 instructions)
Running analysis: AAManager on foo
Running analysis: TargetLibraryAnalysis on foo
Running analysis: BasicAA on foo
Running analysis: ScopedNoAliasAA on foo
Running analysis: TypeBasedAA on foo
Running analysis: OuterAnalysisManagerProxy<llvm::ModuleAnalysisManager, llvm::Function> on foo
Running analysis: ScalarEvolutionAnalysis on foo
Running analysis: InnerAnalysisManagerProxy<llvm::LoopAnalysisManager, llvm::Function> on foo
Running pass: LoopIntWrapPredicationPass on Loop at depth 1 containing: %for.body<header><latch><exiting>
Invalidating analysis: PreservedCFGCheckerAnalysis on foo
Invalidating analysis: VerifierAnalysis on [module]
Running pass: VerifierPass on [module]
Running analysis: VerifierAnalysis on [module]
Running pass: PrintModulePass on [module]

So LoopSimplify pass is launched even in standalone mode as a part of loop pass manager pipeline.

More fixes

Statistics with SPEC2017:

Metric: loop-int-wrap-predication.NumPredicatedChains,loop-int-wrap-predication.NumProcessedLoops

Program                                                                         loop-int-wrap-predication.NumPredicatedChains loop-int-wrap-predication.NumProcessedLoops
                   test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test  30.00                                         13.00                                     
              test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test  26.00                                         20.00                                     
             test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test  26.00                                         20.00                                     
               test-suite :: MultiSource/Benchmarks/Prolangs-C/agrep/agrep.test  18.00                                         10.00                                     
       test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test  18.00                                         16.00                                     
                   test-suite :: MultiSource/Applications/JM/lencod/lencod.test  11.00                                         11.00                                     
  test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test   7.00                                          7.00                                     
                    test-suite :: MultiSource/Applications/ClamAV/clamscan.test   6.00                                          6.00                                     
      test-suite :: MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow.test   5.00                                          5.00                                     
           test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   4.00                                          4.00                                     
            test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   4.00                                          4.00                                     
        test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test   4.00                                          4.00                                     
                  test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test   4.00                                          3.00                                     
  test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test   4.00                                          4.00                                     
 test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test   4.00                                          4.00                                     
test-suite :: MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode.test   2.00                                          2.00                                     
            test-suite :: MultiSource/Benchmarks/Rodinia/backprop/backprop.test   2.00                                          2.00                                     
               test-suite :: External/SPEC/CINT2017speed/657.xz_s/657.xz_s.test   1.00                                          1.00                                     
                test-suite :: External/SPEC/CINT2017rate/557.xz_r/557.xz_r.test   1.00                                          1.00                                     
                 test-suite :: MultiSource/Benchmarks/mafft/pairlocalalign.test   1.00                                          1.00                                     
        test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test   1.00                                          1.00                                     
  test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test   1.00                                          1.00

I've checked that all tests (including SPEC) are now passing on x86 machine. Getting reliable performance results on RISC-V will take some time - will try to update this information soon.

Harbormaster completed remote builds in B182859: Diff 454854.Aug 23 2022, 10:13 AM

Isn;t this pattern handled by LoopFlatten? cc @SjoerdMeijer

In D132208#3748455, @xbolva00 wrote:

Isn;t this pattern handled by LoopFlatten? cc @SjoerdMeijer

Yes, it's a very similar pattern, but the goal of this pass is different: the insertion of runtime overflow check to simplify code under condition when this doesn't happen. Even after LoopFlatten the resulted SCEV can't be proven to not overflow at compile-time.

In D132208#3742749, @hiraditya wrote:

Is there another induction variable optimization pass that this can be merged with?

I haven't found any other passes which can benefit from this transformation. In general, the idea looks similar to LICMLoopVersioning, but in that pass runtime checks are inserted to relax pointer aliasing restrictions, and it was hard to reuse some code from it.

yakush added a subscriber: yakush.Sep 23 2022, 11:55 AM

Rebased on ToT
Add predication on separate conditions
Add simple cost model (do not predicate big loops)

Compile-time results on llvm-test-suite + SPEC2006 + SPEC2017:

Program                                                                         loop-int-wrap-predication.NumPredicatedChains

         test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 2029.00                                      
                 test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test  250.00                                      
              test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test   24.00                                      
             test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test   24.00                                      
                    test-suite :: SingleSource/UnitTests/matrix-types-spec.test   17.00                                      
                      test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test   12.00                                      
                   test-suite :: MultiSource/Applications/JM/lencod/lencod.test    8.00                                      
  test-suite :: External/SPEC/CINT2017rate/523.xalancbmk_r/523.xalancbmk_r.test    7.00                                      
 test-suite :: External/SPEC/CINT2017speed/623.xalancbmk_s/623.xalancbmk_s.test    7.00                                      
       test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test    7.00                                      
          test-suite :: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk.test    7.00                                      
                  test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test    7.00                                      
                    test-suite :: MultiSource/Applications/ClamAV/clamscan.test    6.00                                      
            test-suite :: SingleSource/UnitTests/Vectorizer/runtime-checks.test    6.00                                      
               test-suite :: MultiSource/Benchmarks/Prolangs-C/agrep/agrep.test    6.00                                      
      test-suite :: MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow.test    5.00                                      
  test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test    5.00                                      
              test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test    4.00                                      
        test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test    4.00                                      
                          test-suite :: MultiSource/Benchmarks/PAQ8p/paq8p.test    4.00                                      
            test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test    3.00                                      
           test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test    3.00                                      
            test-suite :: MultiSource/Benchmarks/Rodinia/backprop/backprop.test    2.00                                      
 test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test    2.00                                      
                test-suite :: SingleSource/Benchmarks/Misc-C++/oopack_v1p8.test    2.00                                      
  test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test    2.00                                      
test-suite :: MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode.test    2.00                                      
                 test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test    1.00                                      
               test-suite :: External/SPEC/CINT2017speed/657.xz_s/657.xz_s.test    1.00                                      
                test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test    1.00                                      
                test-suite :: External/SPEC/CINT2017rate/557.xz_r/557.xz_r.test    1.00                                      
  test-suite :: MultiSource/Benchmarks/MiBench/consumer-jpeg/consumer-jpeg.test    1.00                                      
        test-suite :: External/SPEC/CINT2006/462.libquantum/462.libquantum.test    1.00                                      
        test-suite :: MultiSource/Benchmarks/mediabench/jpeg/jpeg-6a/cjpeg.test    1.00                                      
         test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test    1.00

Big number of cases in 510.parest_r is due to template instantiations of almost the same code (sparse_matrix class methods). In this benchmark (on ref data), I've observed 1.7 % improvement on T-head RVB-ICE RISC-V board with XuanTie C910 core (before: 3 090 706 ms, after: 3 038 376 ms). Results on SPEC2006:

Benchmark	baseline, ms	optimized, ms	diff
400.perlbench	1 493 603,64	1 497 198,00	100,24%
401.bzip2	1 905 760,43	1 903 672,13	99,89%
403.gcc	        1 495 701,76	1 492 812,78	99,81%
429.mcf	        2 009 662,24	2 018 930,79	100,46%
445.gobmk	1 502 378,31	1 495 364,06	99,53%
456.hmmer	1 900 147,45	1 860 074,98	97,89%
458.sjeng	1 837 567,39	1 839 056,42	100,08%
462.libquantum	1 171 120,66	1 156 115,71	98,72%
464.h264ref	2 386 369,44	2 359 021,70	98,85%
471.omnetpp	1 688 319,54	1 685 899,41	99,86%
473.astar	1 670 467,01	1 661 233,42	99,45%
483.xalancbmk	1 450 736,57	1 463 321,58	100,87%

Harbormaster completed remote builds in B201900: Diff 481188.Dec 8 2022, 9:56 AM

Ping (this pass is mostly profitable for RISC-V target, because zext on this target has non-zero cost, and the intention is to report Coremark results for RISC-V with this optimization enabled, like with DFA jump threading)

Revision Contents

Path

Size

llvm/

include/

llvm/

InitializePasses.h

1 line

Transforms/

Scalar.h

7 lines

Scalar/

LoopIntWrapPredication.h

22 lines

lib/

Passes/

PassBuilder.cpp

1 line

PassBuilderPipelines.cpp

7 lines

PassRegistry.def

1 line

Transforms/

Scalar/

CMakeLists.txt

1 line

LoopIntWrapPredication.cpp

436 lines

Scalar.cpp

1 line

test/

Transforms/

LoopIntWrapPredication/

2d-array-linear.ll

92 lines

2d-array-transposed.ll

93 lines

add-to-sub.ll

53 lines

basic.ll

49 lines

non-invariant-trip-count.ll

51 lines

Diff 481188

llvm/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 217 Lines • ▼ Show 20 Lines
	void initializeLoopDistributeLegacyPass(PassRegistry&);			void initializeLoopDistributeLegacyPass(PassRegistry&);
	void initializeLoopExtractorLegacyPassPass(PassRegistry &);			void initializeLoopExtractorLegacyPassPass(PassRegistry &);
	void initializeLoopGuardWideningLegacyPassPass(PassRegistry&);			void initializeLoopGuardWideningLegacyPassPass(PassRegistry&);
	void initializeLoopFuseLegacyPass(PassRegistry&);			void initializeLoopFuseLegacyPass(PassRegistry&);
	void initializeLoopIdiomRecognizeLegacyPassPass(PassRegistry&);			void initializeLoopIdiomRecognizeLegacyPassPass(PassRegistry&);
	void initializeLoopInfoWrapperPassPass(PassRegistry&);			void initializeLoopInfoWrapperPassPass(PassRegistry&);
	void initializeLoopInstSimplifyLegacyPassPass(PassRegistry&);			void initializeLoopInstSimplifyLegacyPassPass(PassRegistry&);
	void initializeLoopInterchangeLegacyPassPass(PassRegistry &);			void initializeLoopInterchangeLegacyPassPass(PassRegistry &);
				void initializeLoopIntWrapPredicationLegacyPassPass(PassRegistry &);
	void initializeLoopFlattenLegacyPassPass(PassRegistry&);			void initializeLoopFlattenLegacyPassPass(PassRegistry&);
	void initializeLoopLoadEliminationPass(PassRegistry&);			void initializeLoopLoadEliminationPass(PassRegistry&);
	void initializeLoopPassPass(PassRegistry&);			void initializeLoopPassPass(PassRegistry&);
	void initializeLoopPredicationLegacyPassPass(PassRegistry&);			void initializeLoopPredicationLegacyPassPass(PassRegistry&);
	void initializeLoopRerollLegacyPassPass(PassRegistry &);			void initializeLoopRerollLegacyPassPass(PassRegistry &);
	void initializeLoopRotateLegacyPassPass(PassRegistry&);			void initializeLoopRotateLegacyPassPass(PassRegistry&);
	void initializeLoopSimplifyCFGLegacyPassPass(PassRegistry&);			void initializeLoopSimplifyCFGLegacyPassPass(PassRegistry&);
	void initializeLoopSimplifyPass(PassRegistry&);			void initializeLoopSimplifyPass(PassRegistry&);
	▲ Show 20 Lines • Show All 191 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 139 Lines • ▼ Show 20 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LoopPredication - This pass does loop predication on guards.			// LoopPredication - This pass does loop predication on guards.
	//			//
	Pass *createLoopPredicationPass();			Pass *createLoopPredicationPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
				// LoopIntWrapPredication - This pass does predication on overflowing binary
				// operators inside loops.
				//
				Pass *createLoopIntWrapPredicationPass();

				//===----------------------------------------------------------------------===//
				//
	// LoopInterchange - This pass interchanges loops to provide a more			// LoopInterchange - This pass interchanges loops to provide a more
	// cache-friendly memory access patterns.			// cache-friendly memory access patterns.
	//			//
	Pass *createLoopInterchangePass();			Pass *createLoopInterchangePass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LoopFlatten - This pass flattens nested loops into a single loop.			// LoopFlatten - This pass flattens nested loops into a single loop.
	▲ Show 20 Lines • Show All 400 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Scalar/LoopIntWrapPredication.h

This file was added.

				#ifndef LLVM_TRANSFORMS_SCALAR_LOOPINTWRAPPREDICATION_H
				#define LLVM_TRANSFORMS_SCALAR_LOOPINTWRAPPREDICATION_H

				#include "llvm/Analysis/LoopAnalysisManager.h"
				#include "llvm/IR/PassManager.h"

				namespace llvm {

				class Loop;
				class LPMUpdater;

				/// Performs Loop Integer Wrapping Predication Pass.
				class LoopIntWrapPredicationPass
				: public PassInfoMixin<LoopIntWrapPredicationPass> {
				public:
				PreservedAnalyses run(Loop &L, LoopAnalysisManager &AM,
				LoopStandardAnalysisResults &AR, LPMUpdater &U);
				};

				} // end namespace llvm

				#endif // LLVM_TRANSFORMS_SCALAR_LOOPINTWRAPPREDICATION_H

llvm/lib/Passes/PassBuilder.cpp

	Show First 20 Lines • Show All 173 Lines • ▼ Show 20 Lines
	#include "llvm/Transforms/Scalar/LoopBoundSplit.h"			#include "llvm/Transforms/Scalar/LoopBoundSplit.h"
	#include "llvm/Transforms/Scalar/LoopDataPrefetch.h"			#include "llvm/Transforms/Scalar/LoopDataPrefetch.h"
	#include "llvm/Transforms/Scalar/LoopDeletion.h"			#include "llvm/Transforms/Scalar/LoopDeletion.h"
	#include "llvm/Transforms/Scalar/LoopDistribute.h"			#include "llvm/Transforms/Scalar/LoopDistribute.h"
	#include "llvm/Transforms/Scalar/LoopFlatten.h"			#include "llvm/Transforms/Scalar/LoopFlatten.h"
	#include "llvm/Transforms/Scalar/LoopFuse.h"			#include "llvm/Transforms/Scalar/LoopFuse.h"
	#include "llvm/Transforms/Scalar/LoopIdiomRecognize.h"			#include "llvm/Transforms/Scalar/LoopIdiomRecognize.h"
	#include "llvm/Transforms/Scalar/LoopInstSimplify.h"			#include "llvm/Transforms/Scalar/LoopInstSimplify.h"
				#include "llvm/Transforms/Scalar/LoopIntWrapPredication.h"
	#include "llvm/Transforms/Scalar/LoopInterchange.h"			#include "llvm/Transforms/Scalar/LoopInterchange.h"
	#include "llvm/Transforms/Scalar/LoopLoadElimination.h"			#include "llvm/Transforms/Scalar/LoopLoadElimination.h"
	#include "llvm/Transforms/Scalar/LoopPassManager.h"			#include "llvm/Transforms/Scalar/LoopPassManager.h"
	#include "llvm/Transforms/Scalar/LoopPredication.h"			#include "llvm/Transforms/Scalar/LoopPredication.h"
	#include "llvm/Transforms/Scalar/LoopReroll.h"			#include "llvm/Transforms/Scalar/LoopReroll.h"
	#include "llvm/Transforms/Scalar/LoopRotation.h"			#include "llvm/Transforms/Scalar/LoopRotation.h"
	#include "llvm/Transforms/Scalar/LoopSimplifyCFG.h"			#include "llvm/Transforms/Scalar/LoopSimplifyCFG.h"
	#include "llvm/Transforms/Scalar/LoopSink.h"			#include "llvm/Transforms/Scalar/LoopSink.h"
	▲ Show 20 Lines • Show All 1,669 Lines • Show Last 20 Lines

llvm/lib/Passes/PassBuilderPipelines.cpp

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
#include "llvm/Transforms/Scalar/InstSimplifyPass.h"		#include "llvm/Transforms/Scalar/InstSimplifyPass.h"
#include "llvm/Transforms/Scalar/JumpThreading.h"		#include "llvm/Transforms/Scalar/JumpThreading.h"
#include "llvm/Transforms/Scalar/LICM.h"		#include "llvm/Transforms/Scalar/LICM.h"
#include "llvm/Transforms/Scalar/LoopDeletion.h"		#include "llvm/Transforms/Scalar/LoopDeletion.h"
#include "llvm/Transforms/Scalar/LoopDistribute.h"		#include "llvm/Transforms/Scalar/LoopDistribute.h"
#include "llvm/Transforms/Scalar/LoopFlatten.h"		#include "llvm/Transforms/Scalar/LoopFlatten.h"
#include "llvm/Transforms/Scalar/LoopIdiomRecognize.h"		#include "llvm/Transforms/Scalar/LoopIdiomRecognize.h"
#include "llvm/Transforms/Scalar/LoopInstSimplify.h"		#include "llvm/Transforms/Scalar/LoopInstSimplify.h"
		#include "llvm/Transforms/Scalar/LoopIntWrapPredication.h"
#include "llvm/Transforms/Scalar/LoopInterchange.h"		#include "llvm/Transforms/Scalar/LoopInterchange.h"
#include "llvm/Transforms/Scalar/LoopLoadElimination.h"		#include "llvm/Transforms/Scalar/LoopLoadElimination.h"
#include "llvm/Transforms/Scalar/LoopPassManager.h"		#include "llvm/Transforms/Scalar/LoopPassManager.h"
#include "llvm/Transforms/Scalar/LoopRotation.h"		#include "llvm/Transforms/Scalar/LoopRotation.h"
#include "llvm/Transforms/Scalar/LoopSimplifyCFG.h"		#include "llvm/Transforms/Scalar/LoopSimplifyCFG.h"
#include "llvm/Transforms/Scalar/LoopSink.h"		#include "llvm/Transforms/Scalar/LoopSink.h"
#include "llvm/Transforms/Scalar/LoopUnrollAndJamPass.h"		#include "llvm/Transforms/Scalar/LoopUnrollAndJamPass.h"
#include "llvm/Transforms/Scalar/LoopUnrollPass.h"		#include "llvm/Transforms/Scalar/LoopUnrollPass.h"
▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
static cl::opt<bool> EnableUnrollAndJam("enable-unroll-and-jam",		static cl::opt<bool> EnableUnrollAndJam("enable-unroll-and-jam",
cl::init(false), cl::Hidden,		cl::init(false), cl::Hidden,
cl::desc("Enable Unroll And Jam Pass"));		cl::desc("Enable Unroll And Jam Pass"));

static cl::opt<bool> EnableLoopFlatten("enable-loop-flatten", cl::init(false),		static cl::opt<bool> EnableLoopFlatten("enable-loop-flatten", cl::init(false),
cl::Hidden,		cl::Hidden,
cl::desc("Enable the LoopFlatten Pass"));		cl::desc("Enable the LoopFlatten Pass"));

		static cl::opt<bool> EnableLoopIntWrapPredication(
		"enable-loop-int-wrap-predication", cl::init(false), cl::Hidden,
		cl::desc("Enable Loop Integer Wrapping Predication Pass"));

static cl::opt<bool>		static cl::opt<bool>
EnableDFAJumpThreading("enable-dfa-jump-thread",		EnableDFAJumpThreading("enable-dfa-jump-thread",
cl::desc("Enable DFA jump threading"),		cl::desc("Enable DFA jump threading"),
cl::init(false), cl::Hidden);		cl::init(false), cl::Hidden);

static cl::opt<bool>		static cl::opt<bool>
EnableHotColdSplit("hot-cold-split",		EnableHotColdSplit("hot-cold-split",
cl::desc("Enable hot-cold splitting pass"));		cl::desc("Enable hot-cold splitting pass"));
▲ Show 20 Lines • Show All 335 Lines • ▼ Show 20 Lines	LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
/AllowSpeculation=/false));		/AllowSpeculation=/false));

// Disable header duplication in loop rotation at -Oz.		// Disable header duplication in loop rotation at -Oz.
LPM1.addPass(		LPM1.addPass(
LoopRotatePass(Level != OptimizationLevel::Oz, isLTOPreLink(Phase)));		LoopRotatePass(Level != OptimizationLevel::Oz, isLTOPreLink(Phase)));
// TODO: Investigate promotion cap for O1.		// TODO: Investigate promotion cap for O1.
LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,		LPM1.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
/AllowSpeculation=/true));		/AllowSpeculation=/true));
		if (EnableLoopIntWrapPredication)
		LPM1.addPass(LoopIntWrapPredicationPass());
LPM1.addPass(		LPM1.addPass(
SimpleLoopUnswitchPass(/* NonTrivial */ Level == OptimizationLevel::O3 &&		SimpleLoopUnswitchPass(/* NonTrivial */ Level == OptimizationLevel::O3 &&
EnableO3NonTrivialUnswitching));		EnableO3NonTrivialUnswitching));
if (EnableLoopFlatten)		if (EnableLoopFlatten)
LPM1.addPass(LoopFlattenPass());		LPM1.addPass(LoopFlattenPass());

LPM2.addPass(LoopIdiomRecognizePass());		LPM2.addPass(LoopIdiomRecognizePass());
LPM2.addPass(IndVarSimplifyPass());		LPM2.addPass(IndVarSimplifyPass());
▲ Show 20 Lines • Show All 1,420 Lines • Show Last 20 Lines

llvm/lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 526 Lines • ▼ Show 20 Lines
	LOOP_PASS("print<iv-users>", IVUsersPrinterPass(dbgs()))			LOOP_PASS("print<iv-users>", IVUsersPrinterPass(dbgs()))
	LOOP_PASS("print<loopnest>", LoopNestPrinterPass(dbgs()))			LOOP_PASS("print<loopnest>", LoopNestPrinterPass(dbgs()))
	LOOP_PASS("print<loop-cache-cost>", LoopCachePrinterPass(dbgs()))			LOOP_PASS("print<loop-cache-cost>", LoopCachePrinterPass(dbgs()))
	LOOP_PASS("loop-predication", LoopPredicationPass())			LOOP_PASS("loop-predication", LoopPredicationPass())
	LOOP_PASS("guard-widening", GuardWideningPass())			LOOP_PASS("guard-widening", GuardWideningPass())
	LOOP_PASS("loop-bound-split", LoopBoundSplitPass())			LOOP_PASS("loop-bound-split", LoopBoundSplitPass())
	LOOP_PASS("loop-reroll", LoopRerollPass())			LOOP_PASS("loop-reroll", LoopRerollPass())
	LOOP_PASS("loop-versioning-licm", LoopVersioningLICMPass())			LOOP_PASS("loop-versioning-licm", LoopVersioningLICMPass())
				LOOP_PASS("loop-int-wrap-predication", LoopIntWrapPredicationPass())
	#undef LOOP_PASS			#undef LOOP_PASS

	#ifndef LOOP_PASS_WITH_PARAMS			#ifndef LOOP_PASS_WITH_PARAMS
	#define LOOP_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS)			#define LOOP_PASS_WITH_PARAMS(NAME, CLASS, CREATE_PASS, PARSER, PARAMS)
	#endif			#endif
	LOOP_PASS_WITH_PARAMS("simple-loop-unswitch",			LOOP_PASS_WITH_PARAMS("simple-loop-unswitch",
	"SimpleLoopUnswitchPass",			"SimpleLoopUnswitchPass",
	[](std::pair<bool, bool> Params) {			[](std::pair<bool, bool> Params) {
	Show All 19 Lines

llvm/lib/Transforms/Scalar/CMakeLists.txt

Show All 28 Lines	add_llvm_component_library(LLVMScalarOpts
LoopBoundSplit.cpp		LoopBoundSplit.cpp
LoopSink.cpp		LoopSink.cpp
LoopDeletion.cpp		LoopDeletion.cpp
LoopDataPrefetch.cpp		LoopDataPrefetch.cpp
LoopDistribute.cpp		LoopDistribute.cpp
LoopFuse.cpp		LoopFuse.cpp
LoopIdiomRecognize.cpp		LoopIdiomRecognize.cpp
LoopInstSimplify.cpp		LoopInstSimplify.cpp
		LoopIntWrapPredication.cpp
LoopInterchange.cpp		LoopInterchange.cpp
LoopFlatten.cpp		LoopFlatten.cpp
LoopLoadElimination.cpp		LoopLoadElimination.cpp
LoopPassManager.cpp		LoopPassManager.cpp
LoopPredication.cpp		LoopPredication.cpp
LoopRerollPass.cpp		LoopRerollPass.cpp
LoopRotation.cpp		LoopRotation.cpp
LoopSimplifyCFG.cpp		LoopSimplifyCFG.cpp
▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LoopIntWrapPredication.cpp

This file was added.

				#include "llvm/Transforms/Scalar/LoopIntWrapPredication.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/Analysis/AssumptionCache.h"
				#include "llvm/Analysis/CodeMetrics.h"
				#include "llvm/Analysis/LoopInfo.h"
				#include "llvm/Analysis/LoopPass.h"
				#include "llvm/Analysis/MemorySSA.h"
				#include "llvm/Analysis/MemorySSAUpdater.h"
				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/MDBuilder.h"
				#include "llvm/IR/PassManager.h"
				#include "llvm/InitializePasses.h"
				#include "llvm/Pass.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Transforms/Scalar.h"
				#include "llvm/Transforms/Utils/BasicBlockUtils.h"
				#include "llvm/Transforms/Utils/LoopUtils.h"
				#include "llvm/Transforms/Utils/ScalarEvolutionExpander.h"
				#include <optional>

				using namespace llvm;

				#define DEBUG_TYPE "loop-int-wrap-predication"

				static const char *LoopIntWrapPredicationMetadata =
				"llvm.loop.int_wrap_predication.disable";

				static cl::opt<uint32_t> LoopPredicationThreshold(
				"loop-predication-threshold", cl::Hidden, cl::init(50),
				cl::desc("The cost threshold for predication a loop"));
				static cl::opt<double>
				OverflowProbability("overflow-probability", cl::Hidden, cl::init(0.01),
				cl::desc("Weight of branch with overflow condition"));

				STATISTIC(NumPredicatedChains, "Number of predicated chains");
				STATISTIC(NumProcessedLoops, "Number of processed loops");

				// AffineExpr - holds range of values that ArithmeticChain can have inside loop
				// nest: [Start, Start + TripCount * Step), where Step = 1 (this value is fixed
				// on AffineExpr construction).
				struct AffineExpr {
				const SCEV *Start;
				const SCEV *TripCount;
				};

				namespace llvm {
				// DenseMapInfo specialization that allows AffineExpr as a key in a DenseMap.
				template <> struct DenseMapInfo<AffineExpr> {
				static inline AffineExpr getEmptyKey() {
				const SCEV EmptyKey = DenseMapInfo<const SCEV >::getEmptyKey();
				return AffineExpr{EmptyKey, EmptyKey};
				}

				static inline AffineExpr getTombstoneKey() {
				const SCEV TombstoneKey = DenseMapInfo<const SCEV >::getTombstoneKey();
				return AffineExpr{TombstoneKey, TombstoneKey};
				}

				static unsigned getHashValue(const AffineExpr &Expr) {
				return detail::combineHashValue(
				DenseMapInfo<const SCEV *>::getHashValue(Expr.Start),
				DenseMapInfo<const SCEV *>::getHashValue(Expr.TripCount));
				}

				static bool isEqual(const AffineExpr &LHS, const AffineExpr &RHS) {
				return LHS.Start == RHS.Start && LHS.TripCount == RHS.TripCount;
				}
				};
				} // namespace llvm
				craig.topperUnsubmitted Not Done Reply Inline Actions The description on line 32 mentions `shl`, but `Shl` isn't here craig.topper: The description on line 32 mentions `shl`, but `Shl` isn't here
				kachkov98AuthorUnsubmitted Done Reply Inline Actions It looks like processing of shl is not profitable, since it's not handled by SimplifyIndVar (https://llvm.org/doxygen/SimplifyIndVar_8cpp_source.html#l01551) kachkov98: It looks like processing of shl is not profitable, since it's not handled by SimplifyIndVar…

				// Arithmetic chain - chain of arithmetic instructions (add/sub/mul) that
				// starts from induction variable and finishes on zext; this zext can be
				// eliminated by widening of induction variable in case when instructions in
				// this chain will not overflow.
				class ArithmeticChain {
				public:
				ArithmeticChain(const Loop &L, const Use &U);
				ArithmeticChain(const ArithmeticChain &) = delete;
				ArithmeticChain(ArithmeticChain &&) = default;
				ArithmeticChain &operator=(const ArithmeticChain &) = delete;
				ArithmeticChain &operator=(ArithmeticChain &&) = default;

				std::optional<AffineExpr> getAffineExpr(ScalarEvolution &SE) const;
				void doPredication(Loop &L, Value *Cond, DominatorTree &DT, LoopInfo &LI,
				hiradityaUnsubmitted Not Done Reply Inline Actions What happens when `Inst->use_empty() == true`? hiraditya: What happens when `Inst->use_empty() == true`?
				kachkov98AuthorUnsubmitted Not Done Reply Inline Actions We are checking that Inst has exactly one use (line 82) kachkov98: We are checking that Inst has exactly one use (line 82)
				ScalarEvolution &SE, MemorySSAUpdater *MSSAU) const;

				bool empty() const { return Chain.empty() \|\| !ZExt; }

				void print(raw_ostream &OS) const {
				OS << "Chain:\n";
				for (auto *Inst : Chain)
				OS << *Inst << '\n';
				OS << "ZExt:\n" << *ZExt << '\n';
				}

				void dump() const { print(dbgs()); }

				private:
				SmallVector<BinaryOperator *, 4> Chain;
				ZExtInst *ZExt = nullptr;
				};

				static BinaryOperator *getChainInst(const Loop &L, const Use &U) {
				auto *Inst = dyn_cast<BinaryOperator>(U.getUser());
				if (!Inst)
				return nullptr;
				unsigned Opc = Inst->getOpcode();
				if (Opc != Instruction::Add && Opc != Instruction::Sub &&
				Opc != Instruction::Mul)
				return nullptr;
				const Value *OtherOperand = Inst->getOperand(1 - U.getOperandNo());
				if (!L.isLoopInvariant(OtherOperand))
				return nullptr;
				return Inst;
				}

				ArithmeticChain::ArithmeticChain(const Loop &L, const Use &U) {
				const Use *CurUse = &U;
				while (BinaryOperator Inst = getChainInst(L, CurUse)) {
				if (!Inst->hasOneUse())
				break;
				Chain.push_back(Inst);
				CurUse = &*Inst->use_begin();
				}
				ZExt = dyn_cast<ZExtInst>(CurUse->getUser());
				}

				// For the reccurence chain AR, that represents an expression inside some number
				// of nested loops, determine if it can be flattened to the form:
				// Start + i * RequiredStep, where i = 0 .. TripCount-1
				// and return Start and TripCount parameters if it is possible.
				static std::optional<AffineExpr>
				getFlattenedAffineExpr(ScalarEvolution &SE, const SCEVAddRecExpr *AR,
				const SCEV *RequiredStep) {
				if (!AR->isAffine())
				return None;
				auto *Start = AR->getStart();
				auto *Step = AR->getStepRecurrence(SE);
				auto *L = AR->getLoop();
				auto *TripCount = SE.getBackedgeTakenCount(L);
				if (isa<SCEVCouldNotCompute>(TripCount))
				return None;
				// In rotated loop form number of header executions is one more than number of
				// taken back edges.
				if (L->isRotatedForm())
				TripCount = SE.getTripCountFromExitCount(TripCount, false);
				auto *StartAR = dyn_cast<SCEVAddRecExpr>(Start);
				if (!StartAR) {
				if (Step == RequiredStep)
				return AffineExpr{Start, TripCount};
				return None;
				}
				// Linear case - parent loop should have Stride = TripCount * Step.
				if (Step == RequiredStep) {
				if (auto Res =
				getFlattenedAffineExpr(SE, StartAR, SE.getMulExpr(TripCount, Step)))
				return AffineExpr{Res->Start, SE.getMulExpr(TripCount, Res->TripCount)};
				return None;
				}
				// Possibly transposed case - parent loop should have required step and its
				// TripCount * RequiredStep = Step of current loop.
				if (auto Res = getFlattenedAffineExpr(SE, StartAR, RequiredStep))
				if (SE.getMulExpr(Res->TripCount, RequiredStep) == Step)
				return AffineExpr{Res->Start, SE.getMulExpr(TripCount, Res->TripCount)};
				return None;
				}

				std::optional<AffineExpr>
				ArithmeticChain::getAffineExpr(ScalarEvolution &SE) const {
				assert(!Chain.empty());
				auto *AR = dyn_cast<SCEVAddRecExpr>(SE.getSCEV(Chain.back()));
				if (!AR \|\| AR->hasNoUnsignedWrap())
				return std::nullopt;
				LLVM_DEBUG(dbgs() << "SCEV: " << *AR << '\n');
				// For the simplicity of overflow check, process the most common case with
				// Step = 1.
				return getFlattenedAffineExpr(SE, AR, SE.getOne(AR->getType()));
				}

				static Value *createNUWChainInst(IRBuilder<> &Builder,
				Instruction::BinaryOps Opc, Value *LHS,
				Value *RHS, const Twine &Name) {
				switch (Opc) {
				case Instruction::Add:
				return Builder.CreateNUWAdd(LHS, RHS, Name);
				case Instruction::Sub:
				return Builder.CreateNUWSub(LHS, RHS, Name);
				case Instruction::Mul:
				return Builder.CreateNUWMul(LHS, RHS, Name);
				default:
				llvm_unreachable("Unexpected opcode");
				}
				}

				void ArithmeticChain::doPredication(Loop &L, Value *Cond, DominatorTree &DT,
				LoopInfo &LI, ScalarEvolution &SE,
				MemorySSAUpdater *MSSAU) const {
				// Create if-then-else CFG.
				BasicBlock *HeadBB = ZExt->getParent();
				BasicBlock *TailBB = SplitBlock(HeadBB, ZExt, &DT, &LI, MSSAU);
				LLVMContext &C = HeadBB->getContext();
				BasicBlock *ThenBB = BasicBlock::Create(C, ZExt->getName() + ".then",
				HeadBB->getParent(), TailBB);
				BasicBlock *ElseBB = BasicBlock::Create(C, ZExt->getName() + ".else",
				HeadBB->getParent(), TailBB);
				BranchInst *NewBr = BranchInst::Create(ThenBB, ElseBB, Cond);
				uint32_t OverflowWeight = UINT32_MAX * OverflowProbability;
				uint32_t NoOverflowWeight = UINT32_MAX * (1.0 - OverflowProbability);
				// Set branch weight metadata with overflow probability, so the following
				// passes will consider it as unlikely event.
				NewBr->setMetadata(
				LLVMContext::MD_prof,
				MDBuilder(C).createBranchWeights(OverflowWeight, NoOverflowWeight));
				ReplaceInstWithInst(HeadBB->getTerminator(), NewBr);
				if (MSSAU)
				MSSAU->applyUpdates(DominatorTree::UpdateType{DT.Delete, HeadBB, TailBB},
				DT);
				for (BasicBlock *BranchBB : {ThenBB, ElseBB}) {
				BranchInst *Br = BranchInst::Create(TailBB, BranchBB);
				Br->setDebugLoc(ZExt->getDebugLoc());
				L.addBasicBlockToLoop(BranchBB, LI);
				DT.addNewBlock(BranchBB, HeadBB);
				if (MSSAU)
				MSSAU->applyUpdates(
				{DominatorTree::UpdateType{DT.Insert, HeadBB, BranchBB},
				DominatorTree::UpdateType{DT.Insert, BranchBB, TailBB}},
				DT);
				}
				// Create versions with no integer wrapping and with possible wrapping.
				Value *ThenRes = nullptr;
				Value *ElseRes = nullptr;
				IRBuilder<> Builder(ElseBB->getTerminator());
				for (BinaryOperator *Inst : Chain) {
				Value *PrevThenRes = ThenRes;
				Value *PrevElseRes = ElseRes;
				// Then branch (possible wrap version).
				Inst->moveBefore(ThenBB->getTerminator());
				ThenRes = Inst;
				// Else branch (nowrap version).
				Instruction::BinaryOps Opc = Inst->getOpcode();
				Value *LHS = Inst->getOperand(0);
				Value *RHS = Inst->getOperand(1);
				// InstCombine can turn 'sub nuw' to 'add' losing the no-wrap flag, detect
				// such cases and replace them back.
				if (Opc == Instruction::Add && SE.isKnownNegative(SE.getSCEV(RHS))) {
				craig.topperUnsubmitted Not Done Reply Inline Actions What guarantees the loop has a preheader? Placement in the pipeline would probably guarantee it but I'm not sure anything does when running the pass by itself. Unless I missed something. craig.topper: What guarantees the loop has a preheader? Placement in the pipeline would probably guarantee it…
				kachkov98AuthorUnsubmitted Not Done Reply Inline Actions Loop Pass Manager should run LoopSimplify pass, and this cannonical form ensures that loop has a preheader: https://llvm.org/docs/LoopTerminology.html#loop-simplify-form kachkov98: Loop Pass Manager should run LoopSimplify pass, and this cannonical form ensures that loop has…
				craig.topperUnsubmitted Not Done Reply Inline Actions I agree that it works if run as part of the Loop Pass Manager. But if you run the pass standalone from the opt command line, it may not be in loop simplify form. The pass either needs to protect itself or require the LoopSimplify analysis. craig.topper: I agree that it works if run as part of the Loop Pass Manager. But if you run the pass…
				kachkov98AuthorUnsubmitted Done Reply Inline Actions I've tried to run only this pass from opt with -debug-pass-manager option, and this is the output: Running pass: VerifierPass on [module] Running analysis: VerifierAnalysis on [module] Running analysis: InnerAnalysisManagerProxy<llvm::FunctionAnalysisManager, llvm::Module> on [module] Running analysis: PreservedCFGCheckerAnalysis on foo Running pass: LoopSimplifyPass on foo (14 instructions) Running analysis: LoopAnalysis on foo Running analysis: DominatorTreeAnalysis on foo Running analysis: AssumptionAnalysis on foo Running analysis: TargetIRAnalysis on foo Running pass: LCSSAPass on foo (14 instructions) Running analysis: AAManager on foo Running analysis: TargetLibraryAnalysis on foo Running analysis: BasicAA on foo Running analysis: ScopedNoAliasAA on foo Running analysis: TypeBasedAA on foo Running analysis: OuterAnalysisManagerProxy<llvm::ModuleAnalysisManager, llvm::Function> on foo Running analysis: ScalarEvolutionAnalysis on foo Running analysis: InnerAnalysisManagerProxy<llvm::LoopAnalysisManager, llvm::Function> on foo Running pass: LoopIntWrapPredicationPass on Loop at depth 1 containing: %for.body<header><latch><exiting> Invalidating analysis: PreservedCFGCheckerAnalysis on foo Invalidating analysis: VerifierAnalysis on [module] Running pass: VerifierPass on [module] Running analysis: VerifierAnalysis on [module] Running pass: PrintModulePass on [module] So LoopSimplify pass is launched even in standalone mode as a part of loop pass manager pipeline. kachkov98: I've tried to run only this pass from opt with -debug-pass-manager option, and this is the…
				Opc = Instruction::Sub;
				RHS = Builder.CreateNeg(RHS);
				craig.topperUnsubmitted Done Reply Inline Actions This assert is probably unnecessary. A terminator is required for the Preheader to even be recognized as a predecessor of the loop. craig.topper: This assert is probably unnecessary. A terminator is required for the Preheader to even be…
				};
				if (LHS == PrevThenRes)
				LHS = PrevElseRes;
				if (RHS == PrevThenRes)
				RHS = PrevElseRes;
				ElseRes =
				createNUWChainInst(Builder, Opc, LHS, RHS, Inst->getName() + ".nowrap");
				}
				// Insert Phi node.
				auto *Phi =
				PHINode::Create(ZExt->getSrcTy(), 2, ZExt->getName() + ".phi", ZExt);
				Phi->addIncoming(ThenRes, ThenBB);
				Phi->addIncoming(ElseRes, ElseBB);
				ZExt->setOperand(0, Phi);
				}

				// Generate overflow checking code.
				static Value *generateOverflowCheck(ScalarEvolution &SE, const AffineExpr &Expr,
				Instruction *Loc) {
				SmallVector<Value *, 4> OverflowChecks;
				SCEVExpander Expander(SE, SE.getDataLayout(), "overflowcheck");
				Expander.setInsertPoint(Loc);
				IRBuilder<> Builder(Loc);
				// Check overall TripCount overflow.
				Value *MulRes = nullptr;
				if (auto *Mul = dyn_cast<SCEVMulExpr>(Expr.TripCount)) {
				MulRes = Expander.expandCodeFor(Mul->getOperand(0));
				craig.topperUnsubmitted Done Reply Inline Actions proceed -> process? craig.topper: proceed -> process?
				for (auto *MulOp : drop_begin(Mul->operands())) {
				CallInst *MulOverflow = Builder.CreateCall(
				Intrinsic::getDeclaration(
				Loc->getModule(), Intrinsic::umul_with_overflow, Mul->getType()),
				{MulRes, Expander.expandCodeFor(MulOp)});
				MulRes = Builder.CreateExtractValue(MulOverflow, 0);
				OverflowChecks.push_back(Builder.CreateExtractValue(MulOverflow, 1));
				}
				} else
				MulRes = Expander.expandCodeFor(Expr.TripCount);
				// Check overflow of Start + TripCount.
				CallInst *AddOverflow = Builder.CreateCall(
				Intrinsic::getDeclaration(Loc->getModule(), Intrinsic::uadd_with_overflow,
				MulRes->getType()),
				{Expander.expandCodeFor(Expr.Start), MulRes});
				OverflowChecks.push_back(Builder.CreateExtractValue(AddOverflow, 1));
				return Builder.CreateOr(OverflowChecks);
				}

				static uint32_t getLoopCost(const Loop &L, const TargetTransformInfo &TTI,
				AssumptionCache &AC) {
				TargetTransformInfo::TargetCostKind CostKind =
				L.getHeader()->getParent()->hasMinSize()
				? TargetTransformInfo::TCK_CodeSize
				: TargetTransformInfo::TCK_SizeAndLatency;
				SmallPtrSet<const Value *, 4> EphValues;
				CodeMetrics::collectEphemeralValues(&L, &AC, EphValues);
				InstructionCost Cost = 0;
				for (auto *BB : L.blocks())
				for (auto &I : *BB)
				if (!EphValues.count(&I))
				Cost += TTI.getInstructionCost(&I, CostKind);
				return Cost.isValid() ? *Cost.getValue() : UINT32_MAX;
				}

				static bool processLoop(Loop &L, DominatorTree &DT, LoopInfo &LI,
				ScalarEvolution &SE, TargetTransformInfo &TTI,
				AssumptionCache &AC, MemorySSA *MSSA) {
				// Skip loop if it was already proceeded.
				if (findStringMetadataForLoop(&L, LoopIntWrapPredicationMetadata))
				return false;
				auto *IndVar = L.getInductionVariable(SE);
				if (!IndVar)
				return false;
				uint32_t LoopCost = getLoopCost(L, TTI, AC);
				// If LoopCost is greater than threshold, it's not profitable to version loop.
				if (LoopCost > LoopPredicationThreshold)
				return false;
				LLVM_DEBUG(dbgs() << "Processing loop: " << L.getName() << '\n'
				<< "Induction variable: " << IndVar->getName() << '\n');
				BasicBlock *Preheader = L.getLoopPreheader();
				assert(Preheader);
				Instruction *Loc = Preheader->getTerminator();
				SmallVector<std::pair<ArithmeticChain, AffineExpr>, 4> Chains;
				SmallDenseMap<AffineExpr, Value *, 4> Conds;
				// Collect arithmetic chains from IndVar uses.
				for (auto &U : IndVar->uses()) {
				ArithmeticChain Chain(L, U);
				if (Chain.empty())
				continue;
				LLVM_DEBUG(Chain.dump());
				auto AffineExpr = Chain.getAffineExpr(SE);
				if (!AffineExpr)
				continue;
				LLVM_DEBUG(dbgs() << "Start: " << *AffineExpr->Start
				<< " TripCount: " << *AffineExpr->TripCount << '\n');
				Chains.emplace_back(std::move(Chain), *AffineExpr);
				Conds.insert({*AffineExpr, nullptr});
				}
				if (Chains.empty())
				return false;
				// If there are N conditions and we need to version loop N times, cost of
				// duplication will be (2 ^ N - 1) * LoopCost; this value should be below
				// LoopPredicationThreshold, so N <= log2(Threshold / LoopCost + 1).
				if (Conds.size() > Log2_32(LoopPredicationThreshold / LoopCost + 1))
				return false;
				// Generate overflow checking code for each AffineExpr.
				for (auto &[Expr, Val] : Conds)
				Val = generateOverflowCheck(SE, Expr, Loc);
				// Predicate all chains with their overflow condition.
				std::optional<MemorySSAUpdater> MSSAU;
				if (MSSA)
				MSSAU = MemorySSAUpdater(MSSA);
				for (auto &[Chain, Expr] : Chains)
				Chain.doPredication(L, Conds[Expr], DT, LI, SE, MSSAU ? &*MSSAU : nullptr);
				NumPredicatedChains += Chains.size();
				// Mark loop to not process it again.
				addStringMetadataToLoop(&L, LoopIntWrapPredicationMetadata);
				++NumProcessedLoops;
				if (MSSA && VerifyMemorySSA)
				MSSA->verifyMemorySSA();
				return true;
				}

				PreservedAnalyses
				LoopIntWrapPredicationPass::run(Loop &L, LoopAnalysisManager &AM,
				LoopStandardAnalysisResults &AR,
				LPMUpdater &U) {
				if (!processLoop(L, AR.DT, AR.LI, AR.SE, AR.TTI, AR.AC, AR.MSSA))
				return PreservedAnalyses::all();
				auto PA = getLoopPassPreservedAnalyses();
				if (AR.MSSA)
				PA.preserve<MemorySSAAnalysis>();
				return PA;
				}

				namespace {

				class LoopIntWrapPredicationLegacyPass : public LoopPass {
				public:
				static char ID;

				LoopIntWrapPredicationLegacyPass() : LoopPass(ID) {
				initializeLoopIntWrapPredicationLegacyPassPass(
				*PassRegistry::getPassRegistry());
				}

				bool runOnLoop(Loop *L, LPPassManager &LPM) override {
				if (skipLoop(L))
				return false;

				Function &F = *L->getHeader()->getParent();

				auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
				auto &LI = getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
				auto &SE = getAnalysis<ScalarEvolutionWrapperPass>().getSE();
				auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
				auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
				auto *MSSAWP = getAnalysisIfAvailable<MemorySSAWrapperPass>();
				return processLoop(*L, DT, LI, SE, TTI, AC,
				MSSAWP ? &MSSAWP->getMSSA() : nullptr);
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<LoopInfoWrapperPass>();
				AU.addRequired<ScalarEvolutionWrapperPass>();
				AU.addRequired<TargetTransformInfoWrapperPass>();
				AU.addRequired<AssumptionCacheTracker>();
				getLoopAnalysisUsage(AU);
				}
				};

				} // end anonymous namespace

				char LoopIntWrapPredicationLegacyPass::ID = 0;

				INITIALIZE_PASS_BEGIN(LoopIntWrapPredicationLegacyPass,
				"loop-int-wrap-predication",
				"Predicate overflowing binary operators", false, false)
				INITIALIZE_PASS_DEPENDENCY(LoopPass)
				INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(ScalarEvolutionWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
				INITIALIZE_PASS_END(LoopIntWrapPredicationLegacyPass,
				"loop-int-wrap-predication",
				"Predicate overflowing binary operators", false, false)

				Pass *llvm::createLoopIntWrapPredicationPass() {
				return new LoopIntWrapPredicationLegacyPass();
				}

llvm/lib/Transforms/Scalar/Scalar.cpp

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeLegacyLICMPassPass(Registry);		initializeLegacyLICMPassPass(Registry);
initializeLegacyLoopSinkPassPass(Registry);		initializeLegacyLoopSinkPassPass(Registry);
initializeLoopFuseLegacyPass(Registry);		initializeLoopFuseLegacyPass(Registry);
initializeLoopDataPrefetchLegacyPassPass(Registry);		initializeLoopDataPrefetchLegacyPassPass(Registry);
initializeLoopDeletionLegacyPassPass(Registry);		initializeLoopDeletionLegacyPassPass(Registry);
initializeLoopAccessLegacyAnalysisPass(Registry);		initializeLoopAccessLegacyAnalysisPass(Registry);
initializeLoopInstSimplifyLegacyPassPass(Registry);		initializeLoopInstSimplifyLegacyPassPass(Registry);
initializeLoopInterchangeLegacyPassPass(Registry);		initializeLoopInterchangeLegacyPassPass(Registry);
		initializeLoopIntWrapPredicationLegacyPassPass(Registry);
initializeLoopFlattenLegacyPassPass(Registry);		initializeLoopFlattenLegacyPassPass(Registry);
initializeLoopPredicationLegacyPassPass(Registry);		initializeLoopPredicationLegacyPassPass(Registry);
initializeLoopRotateLegacyPassPass(Registry);		initializeLoopRotateLegacyPassPass(Registry);
initializeLoopStrengthReducePass(Registry);		initializeLoopStrengthReducePass(Registry);
initializeLoopRerollLegacyPassPass(Registry);		initializeLoopRerollLegacyPassPass(Registry);
initializeLoopUnrollPass(Registry);		initializeLoopUnrollPass(Registry);
initializeLoopUnrollAndJamPass(Registry);		initializeLoopUnrollAndJamPass(Registry);
initializeWarnMissedTransformationsLegacyPass(Registry);		initializeWarnMissedTransformationsLegacyPass(Registry);
▲ Show 20 Lines • Show All 223 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopIntWrapPredication/2d-array-linear.ll

This file was added.

				; RUN: opt -passes=loop-int-wrap-predication -S < %s \| FileCheck %s

				declare i32 @f()

				define void @foo(i32 %N1, i32 %N2, ptr %C) {
				; CHECK-LABEL: @foo(
				; CHECK: for.body3.lr.ph.us: ; preds = %for.cond1.preheader.us
				; CHECK-NEXT: [[MUL:%.*]] = mul i32 %i.015.us, %N1
				; CHECK-NEXT: [[UMUL:%.*]] = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %N1, i32 %N2)
				; CHECK-NEXT: [[UMULRES:%.*]] = extractvalue { i32, i1 } [[UMUL]], 0
				; CHECK-NEXT: [[OVERFLOW1:%.*]] = extractvalue { i32, i1 } [[UMUL]], 1
				; CHECK-NEXT: [[UADD:%.*]] = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 0, i32 [[UMULRES]])
				; CHECK-NEXT: [[OVERFLOW2:%.*]] = extractvalue { i32, i1 } [[UADD]], 1
				; CHECK-NEXT: [[OVERFLOW:%.*]] = or i1 [[OVERFLOW1]], [[OVERFLOW2]]
				; CHECK-NEXT: br label %for.body3.us
				; CHECK: for.body3.us:
				; CHECK-NEXT: [[INDVAR:%.*]] = phi i32 [ 0, %for.body3.lr.ph.us ], [ %inc.us, %for.body3.us.split ]
				; CHECK-NEXT: [[CALL:%.*]] = tail call signext i32 @f()
				; CHECK-NEXT: br i1 [[OVERFLOW]], label %idxprom.us.then, label %idxprom.us.else
				; CHECK-DAG: idxprom.us.then:
				; CHECK-NEXT: [[ADD:%.*]] = add i32 [[INDVAR]], [[MUL]]
				; CHECK-NEXT: br label %for.body3.us.split
				; CHECK-DAG: idxprom.us.else:
				; CHECK-NEXT: [[ADD_NOWRAP:%.*]] = add nuw i32 [[INDVAR]], [[MUL]]
				; CHECK-NEXT: br label %for.body3.us.split
				; CHECK: for.body3.us.split:
				; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ [[ADD]], %idxprom.us.then ], [ [[ADD_NOWRAP]], %idxprom.us.else ]
				; CHECK-NEXT: [[IDX:%.*]] = zext i32 [[PHI]] to i64
				; CHECK-NEXT: [[PTR:%.*]] = getelementptr inbounds i32, ptr %C, i64 [[IDX]]
				; CHECK-NEXT: store i32 [[CALL]], ptr [[PTR]], align 4
				entry:
				%cmp14 = icmp ult i32 0, %N2
				br i1 %cmp14, label %for.cond1.preheader.lr.ph, label %for.end6

				for.cond1.preheader.lr.ph: ; preds = %entry
				%cmp212 = icmp ult i32 0, %N1
				br i1 %cmp212, label %for.cond1.preheader.lr.ph.split.us, label %for.cond1.preheader.lr.ph.split

				for.cond1.preheader.lr.ph.split.us: ; preds = %for.cond1.preheader.lr.ph
				br label %for.cond1.preheader.us

				for.cond1.preheader.us: ; preds = %for.inc4.us, %for.cond1.preheader.lr.ph.split.us
				%i.015.us = phi i32 [ 0, %for.cond1.preheader.lr.ph.split.us ], [ %inc5.us, %for.inc4.us ]
				br label %for.body3.lr.ph.us

				for.body3.lr.ph.us: ; preds = %for.cond1.preheader.us
				%mul.us = mul i32 %i.015.us, %N1
				br label %for.body3.us

				for.body3.us: ; preds = %for.body3.lr.ph.us, %for.body3.us
				%j.013.us = phi i32 [ 0, %for.body3.lr.ph.us ], [ %inc.us, %for.body3.us ]
				%call.us = tail call signext i32 @f()
				%add.us = add i32 %j.013.us, %mul.us
				%idxprom.us = zext i32 %add.us to i64
				%arrayidx.us = getelementptr inbounds i32, ptr %C, i64 %idxprom.us
				store i32 %call.us, ptr %arrayidx.us, align 4
				%inc.us = add nuw i32 %j.013.us, 1
				%cmp2.us = icmp ult i32 %inc.us, %N1
				br i1 %cmp2.us, label %for.body3.us, label %for.cond1.for.inc4_crit_edge.us

				for.cond1.for.inc4_crit_edge.us: ; preds = %for.body3.us
				br label %for.inc4.us

				for.inc4.us: ; preds = %for.cond1.for.inc4_crit_edge.us
				%inc5.us = add i32 %i.015.us, 1
				%cmp.us = icmp ult i32 %inc5.us, %N2
				br i1 %cmp.us, label %for.cond1.preheader.us, label %for.cond.for.end6_crit_edge.split.us

				for.cond.for.end6_crit_edge.split.us: ; preds = %for.inc4.us
				br label %for.cond.for.end6_crit_edge

				for.cond1.preheader.lr.ph.split: ; preds = %for.cond1.preheader.lr.ph
				br label %for.cond1.preheader

				for.cond1.preheader: ; preds = %for.cond1.preheader.lr.ph.split, %for.inc4
				%i.015 = phi i32 [ 0, %for.cond1.preheader.lr.ph.split ], [ %inc5, %for.inc4 ]
				br label %for.inc4

				for.inc4: ; preds = %for.cond1.preheader
				%inc5 = add i32 %i.015, 1
				%cmp = icmp ult i32 %inc5, %N2
				br i1 %cmp, label %for.cond1.preheader, label %for.cond.for.end6_crit_edge.split

				for.cond.for.end6_crit_edge.split: ; preds = %for.inc4
				br label %for.cond.for.end6_crit_edge

				for.cond.for.end6_crit_edge: ; preds = %for.cond.for.end6_crit_edge.split.us, %for.cond.for.end6_crit_edge.split
				br label %for.end6

				for.end6: ; preds = %for.cond.for.end6_crit_edge, %entry
				ret void
				}

llvm/test/Transforms/LoopIntWrapPredication/2d-array-transposed.ll

This file was added.

				; RUN: opt -passes=loop-int-wrap-predication -S < %s \| FileCheck %s

				declare i32 @f()

				define void @foo(i32 %N1, i32 %N2, ptr %C) {
				; CHECK-LABEL: @foo(
				; CHECK: for.body3.lr.ph.us: ; preds = %for.cond1.preheader.us
				; CHECK-NEXT: [[UMUL:%.*]] = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %N1, i32 %N2)
				; CHECK-NEXT: [[UMULRES:%.*]] = extractvalue { i32, i1 } [[UMUL]], 0
				; CHECK-NEXT: [[OVERFLOW1:%.*]] = extractvalue { i32, i1 } [[UMUL]], 1
				; CHECK-NEXT: [[UADD:%.*]] = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 0, i32 [[UMULRES]])
				; CHECK-NEXT: [[OVERFLOW2:%.*]] = extractvalue { i32, i1 } [[UADD]], 1
				; CHECK-NEXT: [[OVERFLOW:%.*]] = or i1 [[OVERFLOW1]], [[OVERFLOW2]]
				; CHECK-NEXT: br label %for.body3.us
				; CHECK: for.body3.us:
				; CHECK-NEXT: [[INDVAR:%.*]] = phi i32 [ 0, %for.body3.lr.ph.us ], [ %inc.us, %for.body3.us.split ]
				; CHECK-NEXT: [[CALL:%.*]] = tail call signext i32 @f()
				; CHECK-NEXT: br i1 [[OVERFLOW]], label %idxprom.us.then, label %idxprom.us.else
				; CHECK-DAG: idxprom.us.then:
				; CHECK-NEXT: [[MUL:%.*]] = mul i32 [[INDVAR]], %N2
				; CHECK-NEXT: [[ADD:%.*]] = add i32 %i.015.us, [[MUL]]
				; CHECK-NEXT: br label %for.body3.us.split
				; CHECK-DAG: idxprom.us.else:
				; CHECK-NEXT: [[MUL_NOWRAP:%.*]] = mul nuw i32 [[INDVAR]], %N2
				; CHECK-NEXT: [[ADD_NOWRAP:%.*]] = add nuw i32 %i.015.us, [[MUL_NOWRAP]]
				; CHECK-NEXT: br label %for.body3.us.split
				; CHECK: for.body3.us.split:
				; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ [[ADD]], %idxprom.us.then ], [ [[ADD_NOWRAP]], %idxprom.us.else ]
				; CHECK-NEXT: [[IDX:%.*]] = zext i32 [[PHI]] to i64
				; CHECK-NEXT: [[PTR:%.*]] = getelementptr inbounds i32, ptr %C, i64 [[IDX]]
				; CHECK-NEXT: store i32 [[CALL]], ptr [[PTR]], align 4
				entry:
				%cmp14 = icmp ult i32 0, %N2
				br i1 %cmp14, label %for.cond1.preheader.lr.ph, label %for.end6

				for.cond1.preheader.lr.ph: ; preds = %entry
				%cmp212 = icmp ult i32 0, %N1
				br i1 %cmp212, label %for.cond1.preheader.lr.ph.split.us, label %for.cond1.preheader.lr.ph.split

				for.cond1.preheader.lr.ph.split.us: ; preds = %for.cond1.preheader.lr.ph
				br label %for.cond1.preheader.us

				for.cond1.preheader.us: ; preds = %for.inc4.us, %for.cond1.preheader.lr.ph.split.us
				%i.015.us = phi i32 [ 0, %for.cond1.preheader.lr.ph.split.us ], [ %inc5.us, %for.inc4.us ]
				br label %for.body3.lr.ph.us

				for.body3.lr.ph.us: ; preds = %for.cond1.preheader.us
				br label %for.body3.us

				for.body3.us: ; preds = %for.body3.lr.ph.us, %for.body3.us
				%j.013.us = phi i32 [ 0, %for.body3.lr.ph.us ], [ %inc.us, %for.body3.us ]
				%call.us = tail call signext i32 @f()
				%mul.us = mul i32 %j.013.us, %N2
				%add.us = add i32 %i.015.us, %mul.us
				%idxprom.us = zext i32 %add.us to i64
				%arrayidx.us = getelementptr inbounds i32, ptr %C, i64 %idxprom.us
				store i32 %call.us, ptr %arrayidx.us, align 4
				%inc.us = add nuw i32 %j.013.us, 1
				%cmp2.us = icmp ult i32 %inc.us, %N1
				br i1 %cmp2.us, label %for.body3.us, label %for.cond1.for.inc4_crit_edge.us

				for.cond1.for.inc4_crit_edge.us: ; preds = %for.body3.us
				br label %for.inc4.us

				for.inc4.us: ; preds = %for.cond1.for.inc4_crit_edge.us
				%inc5.us = add i32 %i.015.us, 1
				%cmp.us = icmp ult i32 %inc5.us, %N2
				br i1 %cmp.us, label %for.cond1.preheader.us, label %for.cond.for.end6_crit_edge.split.us

				for.cond.for.end6_crit_edge.split.us: ; preds = %for.inc4.us
				br label %for.cond.for.end6_crit_edge

				for.cond1.preheader.lr.ph.split: ; preds = %for.cond1.preheader.lr.ph
				br label %for.cond1.preheader

				for.cond1.preheader: ; preds = %for.cond1.preheader.lr.ph.split, %for.inc4
				%i.015 = phi i32 [ 0, %for.cond1.preheader.lr.ph.split ], [ %inc5, %for.inc4 ]
				br label %for.inc4

				for.inc4: ; preds = %for.cond1.preheader
				%inc5 = add i32 %i.015, 1
				%cmp = icmp ult i32 %inc5, %N2
				br i1 %cmp, label %for.cond1.preheader, label %for.cond.for.end6_crit_edge.split

				for.cond.for.end6_crit_edge.split: ; preds = %for.inc4
				br label %for.cond.for.end6_crit_edge

				for.cond.for.end6_crit_edge: ; preds = %for.cond.for.end6_crit_edge.split.us, %for.cond.for.end6_crit_edge.split
				br label %for.end6

				for.end6: ; preds = %for.cond.for.end6_crit_edge, %entry
				ret void
				}

llvm/test/Transforms/LoopIntWrapPredication/add-to-sub.ll

This file was added.

				; RUN: opt -passes=loop-int-wrap-predication -S < %s \| FileCheck %s

				declare i32 @f()

				define void @foo(i32 %offset, i32 %N, ptr %C) {
				; CHECK-LABEL: @foo(
				; CHECK: for.body.lr.ph:
				; CHECK-NEXT: [[OFFSET_INC:%.*]] = add i32 %offset, 1
				; CHECK-NEXT: [[UMAX:%.*]] = call i32 @llvm.umax.i32(i32 %N, i32 [[OFFSET_INC]])
				; CHECK-NEXT: [[TRIP_COUNT:%.*]] = sub i32 [[UMAX]], %offset
				; CHECK-NEXT: [[START:%.*]] = add i32 %offset, -1
				; CHECK-NEXT: [[UADD:%.*]] = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 [[START]], i32 [[TRIP_COUNT]])
				; CHECK-NEXT: [[OVERFLOW:%.*]] = extractvalue { i32, i1 } [[UADD]], 1
				; CHECK-NEXT: br label %for.body
				; CHECK: for.body:
				; CHECK-NEXT: [[INDVAR:%.*]] = phi i32 [ %offset, %for.body.lr.ph ], [ %inc, %for.body.split ]
				; CHECK-NEXT: [[CALL:%.*]] = tail call signext i32 @f()
				; CHECK-NEXT: br i1 [[OVERFLOW]], label %idxprom.then, label %idxprom.else
				; CHECK-DAG: idxprom.then:
				; CHECK-NEXT: [[SUB:%.*]] = add i32 [[INDVAR]], -1
				; CHECK-NEXT: br label %for.body.split
				; CHECK-DAG: idxprom.else:
				; CHECK-NEXT: [[SUB_NOWRAP:%.*]] = sub nuw i32 [[INDVAR]], 1
				; CHECK-NEXT: br label %for.body.split
				; CHECK: for.body.split:
				; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ [[SUB]], %idxprom.then ], [ [[SUB_NOWRAP]], %idxprom.else ]
				; CHECK-NEXT: [[IDX:%.*]] = zext i32 [[PHI]] to i64
				; CHECK-NEXT: [[PTR:%.*]] = getelementptr inbounds i32, ptr %C, i64 [[IDX]]
				; CHECK-NEXT: store i32 [[CALL]], ptr [[PTR]], align 4
				entry:
				%cmp3 = icmp ult i32 0, %N
				br i1 %cmp3, label %for.body.lr.ph, label %for.cond.cleanup

				for.body.lr.ph: ; preds = %entry
				br label %for.body

				for.body: ; preds = %for.body.lr.ph, %for.body
				%i.04 = phi i32 [ %offset, %for.body.lr.ph ], [ %inc, %for.body ]
				%call = tail call signext i32 @f()
				%sub = add i32 %i.04, -1
				%idxprom = zext i32 %sub to i64
				%arrayidx = getelementptr inbounds i32, ptr %C, i64 %idxprom
				store i32 %call, ptr %arrayidx, align 4
				%inc = add nuw i32 %i.04, 1
				%cmp = icmp ult i32 %inc, %N
				br i1 %cmp, label %for.body, label %for.cond.for.cond.cleanup_crit_edge

				for.cond.for.cond.cleanup_crit_edge: ; preds = %for.body
				br label %for.cond.cleanup

				for.cond.cleanup: ; preds = %entry, %for.cond.for.cond.cleanup_crit_edge
				ret void
				}

llvm/test/Transforms/LoopIntWrapPredication/basic.ll

This file was added.

				; RUN: opt -passes=loop-int-wrap-predication -S < %s \| FileCheck %s

				declare i32 @f()

				define void @foo(i32 %offset, i32 %N, ptr %C) {
				; CHECK-LABEL: @foo(
				; CHECK: for.body.lr.ph:
				; CHECK-NEXT: [[UADD:%.*]] = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %offset, i32 %N)
				; CHECK-NEXT: [[OVERFLOW:%.*]] = extractvalue { i32, i1 } [[UADD]], 1
				; CHECK-NEXT: br label %for.body
				; CHECK: for.body:
				; CHECK-NEXT: [[INDVAR:%.*]] = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.body.split ]
				; CHECK-NEXT: [[CALL:%.*]] = tail call signext i32 @f()
				; CHECK-NEXT: br i1 [[OVERFLOW]], label %idxprom.then, label %idxprom.else
				; CHECK-DAG: idxprom.then:
				; CHECK-NEXT: [[ADD:%.*]] = add i32 [[INDVAR]], %offset
				; CHECK-NEXT: br label %for.body.split
				; CHECK-DAG: idxprom.else:
				; CHECK-NEXT: [[ADD_NOWRAP:%.*]] = add nuw i32 [[INDVAR]], %offset
				; CHECK-NEXT: br label %for.body.split
				; CHECK: for.body.split:
				; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ [[ADD]], %idxprom.then ], [ [[ADD_NOWRAP]], %idxprom.else ]
				; CHECK-NEXT: [[IDX:%.*]] = zext i32 [[PHI]] to i64
				; CHECK-NEXT: [[PTR:%.*]] = getelementptr inbounds i32, ptr %C, i64 [[IDX]]
				; CHECK-NEXT: store i32 [[CALL]], ptr [[PTR]], align 4
				entry:
				%cmp3 = icmp ult i32 0, %N
				br i1 %cmp3, label %for.body.lr.ph, label %for.cond.cleanup

				for.body.lr.ph: ; preds = %entry
				br label %for.body

				for.body: ; preds = %for.body.lr.ph, %for.body
				%i.04 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.body ]
				%call = tail call signext i32 @f()
				%add = add i32 %i.04, %offset
				%idxprom = zext i32 %add to i64
				%arrayidx = getelementptr inbounds i32, ptr %C, i64 %idxprom
				store i32 %call, ptr %arrayidx, align 4
				%inc = add nuw i32 %i.04, 1
				%cmp = icmp ult i32 %inc, %N
				br i1 %cmp, label %for.body, label %for.cond.for.cond.cleanup_crit_edge

				for.cond.for.cond.cleanup_crit_edge: ; preds = %for.body
				br label %for.cond.cleanup

				for.cond.cleanup: ; preds = %entry, %for.cond.for.cond.cleanup_crit_edge
				ret void
				}

llvm/test/Transforms/LoopIntWrapPredication/non-invariant-trip-count.ll

This file was added.

				; RUN: opt -passes=loop-int-wrap-predication -S < %s \| FileCheck %s

				; COM: Linear memory access with variable number of steps in inner loop:
				; COM: unsigned offset = 0;
				; COM: for (unsigned i = 0; i < N; ++i) {
				; COM: offset += i;
				; COM: for (unsigned j = 0; j < i + 1; ++j)
				; COM: C[offset + j] = f();
				; COM: }

				declare i32 @f()

				define void @foo(i32 %N, ptr %C) {
				; COM: not applied, because trip count of inner loop is SCEVCouldNotCompute
				; CHECK-LABEL: @foo(
				; CHECK: for.body4:
				; CHECK-NEXT: [[INDVAR:%.*]] = phi i32 [ 0, %for.body ], [ %inc, %for.body4 ]
				; CHECK-NEXT: [[CALL:%.*]] = tail call signext i32 @f()
				; CHECK-NEXT: [[ADD:%.*]] = add i32 [[INDVAR]], %add
				; CHECK-NEXT: [[IDX:%.*]] = zext i32 [[ADD]] to i64
				; CHECK-NEXT: [[PTR:%.*]] = getelementptr inbounds i32, ptr %C, i64 [[IDX]]
				; CHECK-NEXT: store i32 [[CALL]], ptr [[PTR]], align 4
				entry:
				%cmp16 = icmp ult i32 0, %N
				br i1 %cmp16, label %for.body, label %for.end8

				for.cond.loopexit: ; preds = %for.body4
				%add2.le = add nuw i32 %i.017, 1
				%cmp = icmp ult i32 %add2.le, %N
				br i1 %cmp, label %for.body, label %for.end8

				for.body: ; preds = %entry, %for.cond.loopexit
				%offset.018 = phi i32 [ %add, %for.cond.loopexit ], [ 0, %entry ]
				%i.017 = phi i32 [ %add2.le, %for.cond.loopexit ], [ 0, %entry ]
				%add = add i32 %offset.018, %i.017
				br label %for.body4

				for.body4: ; preds = %for.body, %for.body4
				%j.015 = phi i32 [ 0, %for.body ], [ %inc, %for.body4 ]
				%call = tail call signext i32 @f()
				%add5 = add i32 %j.015, %add
				%idxprom = zext i32 %add5 to i64
				%arrayidx = getelementptr inbounds i32, ptr %C, i64 %idxprom
				store i32 %call, ptr %arrayidx, align 4
				%inc = add nuw i32 %j.015, 1
				%cmp3.not = icmp ugt i32 %inc, %i.017
				br i1 %cmp3.not, label %for.cond.loopexit, label %for.body4

				for.end8: ; preds = %for.cond.loopexit, %entry
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[LoopIntWrapPredication] Loop Integer Wrapping Predication PassNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 481188

llvm/include/llvm/InitializePasses.h

llvm/include/llvm/Transforms/Scalar.h

llvm/include/llvm/Transforms/Scalar/LoopIntWrapPredication.h

llvm/lib/Passes/PassBuilder.cpp

llvm/lib/Passes/PassBuilderPipelines.cpp

llvm/lib/Passes/PassRegistry.def

llvm/lib/Transforms/Scalar/CMakeLists.txt

llvm/lib/Transforms/Scalar/LoopIntWrapPredication.cpp

llvm/lib/Transforms/Scalar/Scalar.cpp

llvm/test/Transforms/LoopIntWrapPredication/2d-array-linear.ll

llvm/test/Transforms/LoopIntWrapPredication/2d-array-transposed.ll

llvm/test/Transforms/LoopIntWrapPredication/add-to-sub.ll

llvm/test/Transforms/LoopIntWrapPredication/basic.ll

llvm/test/Transforms/LoopIntWrapPredication/non-invariant-trip-count.ll

[LoopIntWrapPredication] Loop Integer Wrapping Predication Pass
Needs ReviewPublic