Any idea why this is hardcoded?
Is it okay to switch the value from 6 to 8 (or else) to resolve the following problem?
Right now, in -O2, -early-cse replaces a load with a store, too early for -instcombine to recognize
subsequent duplicate loads as actual duplicates to be removed.
For example, consider the following code snippet:
%i = alloca %struct.A, align 4 %j = alloca %struct.A, align 4 %0 = bitcast %struct.A* %i to i8* call void @llvm.lifetime.start(i64 4, i8* %0) #3 %call = call i32 @_Z3onev() %a2.i = getelementptr inbounds %struct.A, %struct.A* %i, i64 0, i32 0 store i32 %call, i32* %a2.i, align 4, !tbaa !1 %1 = bitcast %struct.A* %i to i8* %2 = call {}* @llvm.invariant.start(i64 4, i8* %1) #3 %3 = bitcast %struct.A* %j to i8* call void @llvm.lifetime.start(i64 4, i8* %3) #3 %4 = getelementptr inbounds %struct.A, %struct.A* %i, i64 0, i32 0 %5 = getelementptr inbounds %struct.A, %struct.A* %j, i64 0, i32 0 %6 = load i32, i32* %4, align 4 store i32 %6, i32* %5, align 4, !tbaa !6 %7 = bitcast %struct.A* %j to i8* %8 = call {}* @llvm.invariant.start(i64 4, i8* %7) #3 call void @_Z3bar1A(i32 %6) call void @_Z3bar1A(i32 %6) call void @_Z4foo2PK1AS1_(%struct.A* nonnull %j, %struct.A* nonnull %i) %9 = getelementptr inbounds %struct.A, %struct.A* %i, i64 0, i32 0 %10 = load i32, i32* %9, align 4 ; <--- duplicate. Should be removed. call void @_Z3bar1A(i32 %10)
After -early-cse, the above becomes
%i = alloca %struct.A, align 4 %j = alloca %struct.A, align 4 %0 = bitcast %struct.A* %i to i8* call void @llvm.lifetime.start(i64 4, i8* %0) #3 %call = call i32 @_Z3onev() %a2.i = getelementptr inbounds %struct.A, %struct.A* %i, i64 0, i32 0 store i32 %call, i32* %a2.i, align 4, !tbaa !1 %1 = call {}* @llvm.invariant.start(i64 4, i8* %0) #3 %2 = bitcast %struct.A* %j to i8* call void @llvm.lifetime.start(i64 4, i8* %2) #3 %3 = getelementptr inbounds %struct.A, %struct.A* %j, i64 0, i32 0 store i32 %call, i32* %3, align 4, !tbaa !6 %4 = call {}* @llvm.invariant.start(i64 4, i8* %2) #3 call void @_Z3bar1A(i32 %call) call void @_Z3bar1A(i32 %call) call void @_Z4foo2PK1AS1_(%struct.A* nonnull %j, %struct.A* nonnull %i) %5 = load i32, i32* %a2.i, align 4 ; <--- duplicate. Should be removed. call void @_Z3bar1A(i32 %5)
where the first load from %i has been replaced with the store into %a2.i which points to %i.
For -instcombine to remove the duplicate load above, either
- -early-cse should not merge the first load into the store -- thereby treating the load as not a "trivially redundant instruction", or
- -instcombine should allow FindAvailableLoadedValue() to scan more than 6 instructions.
Note that @llvm.invariant.start() calls are ignored in the count, just like @llvm.lifetime.start().
A quick run has found the value of 8 to be the minimal needed, for this example case scenario.
The usual way to do this in LLVM is "cl::opt". Take a look at UnrollCount for an example.