Any idea why this is hardcoded?
Is it okay to switch the value from 6 to 8 (or else) to resolve the following problem?
Right now, in -O2, -early-cse replaces a load with a store, too early for -instcombine to recognize
subsequent duplicate loads as actual duplicates to be removed.
For example, consider the following code snippet:
%i = alloca %struct.A, align 4
%j = alloca %struct.A, align 4
%0 = bitcast %struct.A* %i to i8*
call void @llvm.lifetime.start(i64 4, i8* %0) #3
%call = call i32 @_Z3onev()
%a2.i = getelementptr inbounds %struct.A, %struct.A* %i, i64 0, i32 0
store i32 %call, i32* %a2.i, align 4, !tbaa !1
%1 = bitcast %struct.A* %i to i8*
%2 = call {}* @llvm.invariant.start(i64 4, i8* %1) #3
%3 = bitcast %struct.A* %j to i8*
call void @llvm.lifetime.start(i64 4, i8* %3) #3
%4 = getelementptr inbounds %struct.A, %struct.A* %i, i64 0, i32 0
%5 = getelementptr inbounds %struct.A, %struct.A* %j, i64 0, i32 0
%6 = load i32, i32* %4, align 4
store i32 %6, i32* %5, align 4, !tbaa !6
%7 = bitcast %struct.A* %j to i8*
%8 = call {}* @llvm.invariant.start(i64 4, i8* %7) #3
call void @_Z3bar1A(i32 %6)
call void @_Z3bar1A(i32 %6)
call void @_Z4foo2PK1AS1_(%struct.A* nonnull %j, %struct.A* nonnull %i)
%9 = getelementptr inbounds %struct.A, %struct.A* %i, i64 0, i32 0
%10 = load i32, i32* %9, align 4 ; <--- duplicate. Should be removed.
call void @_Z3bar1A(i32 %10)After -early-cse, the above becomes
%i = alloca %struct.A, align 4
%j = alloca %struct.A, align 4
%0 = bitcast %struct.A* %i to i8*
call void @llvm.lifetime.start(i64 4, i8* %0) #3
%call = call i32 @_Z3onev()
%a2.i = getelementptr inbounds %struct.A, %struct.A* %i, i64 0, i32 0
store i32 %call, i32* %a2.i, align 4, !tbaa !1
%1 = call {}* @llvm.invariant.start(i64 4, i8* %0) #3
%2 = bitcast %struct.A* %j to i8*
call void @llvm.lifetime.start(i64 4, i8* %2) #3
%3 = getelementptr inbounds %struct.A, %struct.A* %j, i64 0, i32 0
store i32 %call, i32* %3, align 4, !tbaa !6
%4 = call {}* @llvm.invariant.start(i64 4, i8* %2) #3
call void @_Z3bar1A(i32 %call)
call void @_Z3bar1A(i32 %call)
call void @_Z4foo2PK1AS1_(%struct.A* nonnull %j, %struct.A* nonnull %i)
%5 = load i32, i32* %a2.i, align 4 ; <--- duplicate. Should be removed.
call void @_Z3bar1A(i32 %5)where the first load from %i has been replaced with the store into %a2.i which points to %i.
For -instcombine to remove the duplicate load above, either
- -early-cse should not merge the first load into the store -- thereby treating the load as not a "trivially redundant instruction", or
- -instcombine should allow FindAvailableLoadedValue() to scan more than 6 instructions.
Note that @llvm.invariant.start() calls are ignored in the count, just like @llvm.lifetime.start().
A quick run has found the value of 8 to be the minimal needed, for this example case scenario.
The usual way to do this in LLVM is "cl::opt". Take a look at UnrollCount for an example.