This is an archive of the discontinued LLVM Phabricator instance.

[NewGVN][LoadCoercion][3/3] Partial redundant load elimination for load instructions that are dependent on a MemoryPhi or another load instruction in a predecessor basic block.
Needs ReviewPublic

Authored by kmitropoulou on Jun 13 2022, 3:04 AM.

Details

Summary

In the following example, the load can either read the value 100 or the value 500. Therefore, it is straightforward that we can replace the load with a phi node as it is shown below:

Example 1:
Before Load Coercion:

       BB1:                        BB2:
        store i32 100, i32* %P      store i32 500, i32* %P
	                      \    /
			    BB3:
			       %V = load i32, i32* %P

After Load Coercion:

       BB1:                        BB2:
        store i32 100, i32* %P      store i32 500, i32* %P
	                      \    /
	                   BB3:
                             %phi = phi i32 [ 100, %BB1], [ 500, %BB2 ]

In the following example, the load has only one dependency in BB1. To eliminate the load we add an artificial dependency in BB2 (%V1) and we replace the load of BB3 with a phi node. In the worst case scenario, the load of BB2 (%V1) will be executed as many times as the load in BB3.

Example 2:
Before Load Coercion:

       BB1:                         BB2:
        store i32 100, i32* %P       |
	                       \    /
			      BB3:
			         %V = load i32, i32* %P

After Load Coercion:

       BB1:                        BB2:
        store i32 100, i32* %P      %V1 = load i32, i32* %P
	                       \    /
	                       BB3:
                                 %phi = phi i32 [ 100, %BB1], [ %V1, %BB2 ]

The algorithm has two main steps:

Collection of loads: In Example 3, the load of BB3 is dependent on the the memory phi of BB3. In this case, the operands of the memory phi are the also the operands of the phi that will replace the load of BB3.

Example 3:

     BB1:                         BB2: 
      1 = MemoryDef(liveOnEntry)   2 = MemoryDef(liveOnEntry)
      store i32 100, i32* %P       store i32 500, i32* %P
                             \    /
                              BB3:
		               3 = MemoryPhi({T,1},{F,2})
			       %V = load i32, i32* %P

Next, we check if there are any live-in loads that can be replaced by a phi node. In the following example, we can add a fake dependency in BB2. In this way, we can replace the load of BB3 with a phi node.

Example 4:

   BB1:                                 BB2:
    0 = MemoryDef(liveOnEntry)           |
    %V1 = load <2xi32>, <2xi32>* %P      |
                                 \      /
        	                BB3:
			           %V = load i32, i32* %P

Code generation: As it is shown in Examples 1 and 2, there are two code generation scenarios: i. If the load has as many dependencies as its predecessors, then we just replace the load with a phi as it is shown in Example 1. ii. If one of the predecessors of the basic block of the load does not have a dependency, then we emit a fake dependency in this predecessor. Next, we can replace the load with a phi node as it is shown in Example 2.

Event Timeline

kmitropoulou created this revision.Jun 13 2022, 3:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2022, 3:04 AM
kmitropoulou requested review of this revision.Jun 13 2022, 3:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2022, 3:04 AM
kmitropoulou edited the summary of this revision. (Show Details)Jun 13 2022, 3:07 AM
kmitropoulou added reviewers: asbirlea, nikic, fhahn, davide.
kmitropoulou added inline comments.Jun 13 2022, 3:10 AM
llvm/test/Transforms/NewGVN/basic-cyclic-opt.ll
258

I need to check why NewGVN does not eliminate the add.

llvm/test/Transforms/NewGVN/rle-nonlocal.ll
20

I need to investigate this more.

kmitropoulou edited the summary of this revision. (Show Details)Jun 13 2022, 3:33 AM
kmitropoulou edited the summary of this revision. (Show Details)Jun 13 2022, 3:56 AM
kmitropoulou edited the summary of this revision. (Show Details)Jun 13 2022, 4:01 AM
kmitropoulou edited the summary of this revision. (Show Details)Jun 13 2022, 11:19 AM
kmitropoulou edited the summary of this revision. (Show Details)

Updating D127628: [NewGVN][LoadCoercion][3/3] Replace load with a phi node

Updating D127628: [NewGVN][LoadCoercion][3/3] Replace load with a phi node

Updating D127628: [NewGVN][LoadCoercion][3/3] Replace load with a phi node

foad added a subscriber: foad.Aug 22 2022, 1:40 AM

In example 2 you insert a new instruction %V1 = load i32, i32* %P at the end of BB2. I'm not sure this is safe in general. For example, if BB2 has other successors:

BB1     BB2
   \   /   \
    BB3     BB4

then the new load instruction will be executed even if BB2 is going to branch to BB4, and in that case the load might trap.

As another example, what if BB3 looked like this:

BB3:
  call @foo // this function never returns
  %V = load i32, i32* %P // this load always traps

If you insert a load at the end of BB2 then it will trap before calling @foo, which changes the behaviour of the program.

Updating D127628: [NewGVN][LoadCoercion][3/3] Replace load with a phi node

In example 2 you insert a new instruction %V1 = load i32, i32* %P at the end of BB2. I'm not sure this is safe in general. For example, if BB2 has other successors:

BB1     BB2
   \   /   \
    BB3     BB4

then the new load instruction will be executed even if BB2 is going to branch to BB4, and in that case the load might trap.

In this case, I bail out.

As another example, what if BB3 looked like this:

BB3:
  call @foo // this function never returns
  %V = load i32, i32* %P // this load always traps

If you insert a load at the end of BB2 then it will trap before calling @foo, which changes the behaviour of the program.

I added implicit control flow tracking to detect such cases.

kmitropoulou added inline comments.Sep 15 2022, 10:20 PM
llvm/test/Transforms/NewGVN/basic-cyclic-opt.ll
258

If load coercion is successful, we remove the load and we update its uses with the coerced value. In this case, the updated instructions can be eliminated further. For this reason, we run value numbering for the uses of the load that was optimized.

kmitropoulou retitled this revision from [NewGVN][LoadCoercion][3/3] Replace load with a phi node to [NewGVN][LoadCoercion][3/3] Partial redundant load elimination for load instructions that are dependent on a MemoryPhi or another load instruction in a predecessor basic block..Sep 15 2022, 10:21 PM

Updating D127628: [NewGVN][LoadCoercion][3/3] Partial redundant load elimination for load instructions that are dependent on a MemoryPhi or another load instruction in a predecessor basic block.

Updating D127628: [NewGVN][LoadCoercion][3/3] Partial redundant load elimination for load instructions that are dependent on a MemoryPhi or another load instruction in a predecessor basic block.

Updating D127628: [NewGVN][LoadCoercion][3/3] Partial redundant load elimination for load instructions that are dependent on a MemoryPhi or another load instruction in a predecessor basic block.

Updating D127628: [NewGVN][LoadCoercion][3/3] Partial redundant load elimination for load instructions that are dependent on a MemoryPhi or another load instruction in a predecessor basic block.

Updating D127628: [NewGVN][LoadCoercion][3/3] Partial redundant load elimination for load instructions that are dependent on a MemoryPhi or another load instruction in a predecessor basic block.