This is an archive of the discontinued LLVM Phabricator instance.

[Mem2Reg] Enable promotion for bitcastable load/store values
AbandonedPublic

Authored by sdmitriev on Jan 16 2019, 2:13 PM.

Details

Summary

This patch enables Mem2Reg to handle loads/stores from/to bitcasted alloca
values as long as the loaded/stored value is bitcastable to the allocated
type (see example below). Such instruction sequences can be introduced by
the InstCombine pass as part of the load canonicalization.

  
%f = alloca float, align 4
...
%0 = getelementptr inbounds float, float* %A, i64 %idx
%1 = bitcast float* %0 to i32*
%2 = load i32, i32* %1, align 4
%3 = bitcast float* %f to i32*
store i32 %2, i32* %3, align 4

Diff Detail

Event Timeline

sdmitriev created this revision.Jan 16 2019, 2:13 PM

Associated patch for clang tests https://reviews.llvm.org/D56811

What are you trying to accomplish here? The standard optimization doesn't ever invoke "-mem2reg", and "-sroa" already handles your testcase.

I am changing PromoteMemToReg function which can be called from other places besides mem2reg pass, so this change is not limited to mem2reg pass only. Regarding SROA, it does not handle alloca instructions outside of function entry block. We need to create allocas outside of entry in some cases and the use of PromoteMemToReg() seems to be the only option for registerizing them.

I am changing PromoteMemToReg function which can be called from other places besides mem2reg pass, so this change is not limited to mem2reg pass only.

So this is primarily for the benefit of some out-of-tree pass?

PromoteMemToReg is critical for performance, so changing the data structures is going to require a really deep review.

We need to create allocas outside of entry in some cases and the use of PromoteMemToReg() seems to be the only option for registerizing them.

For common use-cases (e.g. clang), LLVM IR contains very few allocas outside of the entry block. If you're working with a language which needs to produce a lot of dynamic allocas for some reason, it's probably worth implementing an optimization to hoist allocas into the entry block, where it's legal. Not sure what sort of language construct would require that, though.

We use this in the internal product based on LLVM. I am not changing existing PromoteMemToReg functionality, just extending it to allow alloca use-def chain to have GEPs with zero indices and bitcasts between alloca and dependent loads/stores as long as the load/store type is bitcastable to the allocated type. This could be beneficial for the common case as well.

Does anyone have any comments on this match?

Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2019, 9:31 PM
Herald added a subscriber: jdoerfert. · View Herald Transcript
jfb added a comment.Mar 8 2019, 9:42 AM

I am changing PromoteMemToReg function which can be called from other places besides mem2reg pass, so this change is not limited to mem2reg pass only.

So this is primarily for the benefit of some out-of-tree pass?

PromoteMemToReg is critical for performance, so changing the data structures is going to require a really deep review.

We need to create allocas outside of entry in some cases and the use of PromoteMemToReg() seems to be the only option for registerizing them.

For common use-cases (e.g. clang), LLVM IR contains very few allocas outside of the entry block. If you're working with a language which needs to produce a lot of dynamic allocas for some reason, it's probably worth implementing an optimization to hoist allocas into the entry block, where it's legal. Not sure what sort of language construct would require that, though.

It seems Eli had open questions here. It would be good to answer them before proceeding.

Ok. As I said earlier we need to created allocas outside of the entry block, and hoisting allocas to the entry block as Eli suggested above would definitely solve the registerization problem because of SROA, but in our case these allocas cannot be hoisted therefore SROA just does not look at them. So calling PromoteMemToReg for such allocas seems to be the only way to force registerization for them. That is the reason why I am trying to enhance it to handle bitcasts even though this functionality already exists in SROA.

jfb added a comment.Mar 8 2019, 10:19 AM

Ok. As I said earlier we need to created allocas outside of the entry block, and hoisting allocas to the entry block as Eli suggested above would definitely solve the registerization problem because of SROA, but in our case these allocas cannot be hoisted therefore SROA just does not look at them. So calling PromoteMemToReg for such allocas seems to be the only way to force registerization for them. That is the reason why I am trying to enhance it to handle bitcasts even though this functionality already exists in SROA.

Why can't the allocas be hoisted? It goes against the grain of what IR expects, and adds some complexity as well as duplicates optimizations. If we start supporting this use case better, I'd rather understand why. Maybe it has other implications.

We need this for the internal product which is based on LLVM and there are reasons why hoisting cannot be done in our case (this is actually a different topic).
But regarding the IR expectations, I do not think there are any restrictions for alloca placement. Consider the following example

void foo(int x) {
  do {
    alloca(1);
  } while (x--);
}

It will have the following IR after front end with alloca outside of the entry block

define dso_local void @foo(i32 %x) #0 {
entry:
  %x.addr = alloca i32, align 4
  store i32 %x, i32* %x.addr, align 4
  br label %do.body

do.body:                                          ; preds = %do.cond, %entry
  %0 = alloca i8, i64 1, align 16
  br label %do.cond

do.cond:                                          ; preds = %do.body
  %1 = load i32, i32* %x.addr, align 4
  %dec = add nsw i32 %1, -1
  store i32 %dec, i32* %x.addr, align 4
  %tobool = icmp ne i32 %1, 0
  br i1 %tobool, label %do.body, label %do.end

do.end:                                           ; preds = %do.cond
  ret void
}

BTW, hoisting alloca to the entry block for this example would not be legal, though I think it would still be legal to try registerizing it (for this particular example it does not make sense, but I think it can be modified where it would make sense).

sdmitriev abandoned this revision.Sep 26 2019, 9:53 AM