This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Transforms/Scalar/LICM.cpp
1855	Removed.
2145–2146	I am not sure about writing to a constant. Can you elaborate with an example where it would fail?
llvm/test/Transforms/LICM/reg-promote.ll
17 ↗	(On Diff #447229)	I used `mem2reg` to remove allocas from test but it changes the state of program which will prohibit the optimization which I am proposing.
187 ↗	(On Diff #447229)	Removed.

Harbormaster completed remote builds in B178719: Diff 449225.Aug 2 2022, 2:50 AM

Can we make this work off the existing clang option -mthread-model=single?

Use ThreadModel::Single instead of custom flag.

In D130466#3694507, @efriedma wrote:

Can we make this work off the existing clang option -mthread-model=single?

Done.

I have removed the flag. Now, if ThreadModel::Single is set, LICM will hoist load.

Harbormaster completed remote builds in B179015: Diff 449641.Aug 3 2022, 6:55 AM

fhahn added inline comments.Aug 3 2022, 9:44 AM

llvm/lib/Transforms/Scalar/LICM.cpp
2145–2146	This doesn't like quite right. You are just comparing the enum value, not the actual configured ThreadModel I think. You likely have to check the value returned by `getThreadModel`.

xbolva00 added inline comments.Aug 3 2022, 9:50 AM

llvm/lib/Transforms/Scalar/LICM.cpp
2145–2146	and no tests?

gsocshubham added inline comments.Aug 4 2022, 7:41 AM

llvm/lib/Transforms/Scalar/LICM.cpp
2145–2146	`getThreadModel()` is declared here - https://github.com/llvm/llvm-project/blob/d0541b47000739c68c540170c6b9790ec1ea3b77/llvm/include/llvm/CodeGen/CommandFlags.h#L42 Defination is present in clang - https://github.com/llvm/llvm-project/blob/448adfee05b737a26dda34e7ae2cd4948760fff0/clang/include/clang/Driver/ToolChain.h#L573 `virtual std::string getThreadModel() const { return "posix"; }` I am not sure on how can I use `getThreadModel()` in LICM.cpp. Is there any other way to know ThreadModel?
2145–2146	I have added a test here - `hoist-load-without-store.ll` where load is hoisted out of loop. I did not get what do you mean by no tests?

xbolva00 added inline comments.Aug 4 2022, 8:04 AM

llvm/lib/Transforms/Scalar/LICM.cpp
2145–2146	So maybe clang can run llvm with -mllvm -your-new-flag-to-allow-data-races if threadmodel is single?

xbolva00 added inline comments.Aug 4 2022, 8:06 AM

llvm/lib/Transforms/Scalar/LICM.cpp
2145–2146	Tests that we perform this optimization only when threadmodel is single

hiraditya added inline comments.Aug 5 2022, 7:57 AM

llvm/lib/Transforms/Scalar/LICM.cpp
2145–2146	Yeah, we need test checks the optimization was performed when threadmodel is single, and does not do the optimization otherwise.

In D130466#3694507, @efriedma wrote:

Can we make this work off the existing clang option -mthread-model=single?

-mthread-model single does not work for aarch64, sparc and I get below error without any change in the code -

clang-15: error: invalid thread model 'single' in '-mthread-model single' for this target

Target: aarch64-unknown-linux-gnueabihf
Thread model: posix

Since, mthread-model single is not supported for most of the architecture, can we keep AllowDataRaces as a separate flag independant from mthread-model?

I have compiled below testcase with options -

../install-aarch64/bin/clang -mthread-model single thread.cpp -S -Ofast
clang-15: error: invalid thread model 'single' in '-mthread-model single' for this target

-------thread.cpp----------

int u, v, restrict, i;

void f(int a[restrict], int b[restrict], int n) {
    for (i = 0; i < n; ++i) {
        if (a[i]) {
            ++u;
            break;
        }
        ++u;
        if (b[i])
            ++v;
    }
}

Also, I tried with SPARC -

../install-sparc/bin/clang -mthread-model single thread.cpp -S -Ofast
clang-15: error: invalid thread model 'single' in '-mthread-model single' for this target

while for thread model posix, above commands works fine.

gsocshubham added inline comments.Aug 9 2022, 2:31 AM

llvm/lib/Transforms/Scalar/LICM.cpp
2145–2146	Passing `-mllvm -your-new-flag-to-allow-data-races` to clang will work with my original submission - in which there is a new flag AllowDataRaces in LICM. I have tried an approach to let LICM know whether thread model is single or not as below - a. In `clang/lib/CodeGen/BackendUtil.cpp`, we have thread model info using `LangOpts.getThreadModel()` b. I pass a bool variable to PassBuilder which in turns pass to LICM. c. LICM based on this bool variable knows whether to hoist the load from the loop if thread model is single or not. I did a trail approach by referring to `AllowSpeculation` variable which is being passes to LICM. I don't think it is optimal/suggested solution. From my other comment, first I want to get clear with `-mthread-model single` option. Can you you comments on it?

gsocshubham added inline comments.Aug 9 2022, 3:42 AM

llvm/lib/Transforms/Scalar/LICM.cpp
2145–2146	If we pass thread model information as above, then we don't need flag AllowDataRaces at all. But I think there should be more efficient way apart from above - on how clang driver pass information to llvm passes.

@greened - Can you give suggestions on above comments?

In D130466#3709163, @gsocshubham wrote:

I have compiled below testcase with options -

../install-aarch64/bin/clang -mthread-model single thread.cpp -S -Ofast
clang-15: error: invalid thread model 'single' in '-mthread-model single' for this target

-------thread.cpp----------

int u, v, restrict, i;

void f(int a[restrict], int b[restrict], int n) {
    for (i = 0; i < n; ++i) {
        if (a[i]) {
            ++u;
            break;
        }
        ++u;
        if (b[i])
            ++v;
    }
}

Also, I tried with SPARC -

../install-sparc/bin/clang -mthread-model single thread.cpp -S -Ofast
clang-15: error: invalid thread model 'single' in '-mthread-model single' for this target

while for thread model posix, above commands works fine.

Check for examples here: test/Driver/thread-model.c

And list of architectures supporting single thread model in https://clang.llvm.org/doxygen/ToolChain_8cpp_source.html#l00729

Fix review comments

Updated patch with updated test LICM/promote-sink-store.ll - shows load hoisting and store sinking if flag is set.

Added a test LICM/without-allow-data-race.ll which prohibits load hoisting and store sinking if the flag is not set.

AllowDataRaces flag can be passed directly to clang -

../install/bin/clang -Ofast -S -promote-sink-store.ll emit-llvm -target aarch64-linux -mllvm -allow-data-races

where IR is obtained from below C testcase -

int u, v;

void f(int a[restrict], int b[restrict], int n) {
    for (int i = 0; i < n; ++i) {
        if (a[i]) {
            ++u;
            break;
        }
        ++u;
        if (b[i])
            ++v;
    }
}

TODO -

Remove flag -allow-data-races but how feasible it would be to pass thread model information from clang to LICM pass via PassBuilder?

gsocshubham added inline comments.Aug 12 2022, 2:43 AM

llvm/lib/Transforms/Scalar/LICM.cpp
2145–2146	@hiraditya - I have added test `LICM/without-allow-data-race.ll` which shows that optimization is not done when flag is not set - just for confirmation purpose. I think that it will be redundant test compared to `LICM/promote-sink-store.ll` and I should probably remove it once patch is ready to commit.

Harbormaster completed remote builds in B180871: Diff 452118.Aug 12 2022, 3:15 AM

Sink store if architecture supports thread model single.

If thread model single is set, then store is sinked.
For those architecture which do not support -mthread-model single, they could make use of command line argument -thread-model-single

Let me know suggestions on whether to keep (2) approach or not.

Herald added a subscriber: ormris. · View Herald TranscriptAug 12 2022, 5:55 AM

Now for arm-linux when compiled with -Ofast -

int u, v;

void f(int a[restrict], int b[restrict], int n) {
    for (int i = 0; i < n; ++i) {
        if (a[i]) {
            ++u;
            break;
        }
        ++u;
        if (b[i])
            ++v;
    }
}

we get -

target triple = "armv4t-unknown-linux"

@u = dso_local local_unnamed_addr global i32 0, align 4
@v = dso_local local_unnamed_addr global i32 0, align 4

; Function Attrs: nofree norecurse nosync nounwind
define dso_local arm_aapcscc void @f(ptr noalias nocapture noundef readonly %0, ptr noalias nocapture noundef readonly %1, i32 noundef %2) local_unnamed_addr #0 {
  %4 = load i32, ptr @u, align 4, !tbaa !5
  %5 = load i32, ptr @v, align 4, !tbaa !5
  %6 = icmp sgt i32 %2, 0
  br i1 %6, label %7, label %27

7:                                                ; preds = %3
  %8 = add i32 %4, %2
  br label %9

9:                                                ; preds = %7, %18
  %10 = phi i32 [ %25, %18 ], [ 0, %7 ]
  %11 = phi i32 [ %19, %18 ], [ %4, %7 ]
  %12 = phi i32 [ %24, %18 ], [ %5, %7 ]
  %13 = getelementptr inbounds i32, ptr %0, i32 %10
  %14 = load i32, ptr %13, align 4, !tbaa !5
  %15 = icmp eq i32 %14, 0
  br i1 %15, label %18, label %16

16:                                               ; preds = %9
  store i32 %12, ptr @v, align 4, !tbaa !5
  %17 = add nsw i32 %11, 1
  store i32 %17, ptr @u, align 4, !tbaa !5
  br label %30

18:                                               ; preds = %9
  %19 = add nsw i32 %11, 1
  %20 = getelementptr inbounds i32, ptr %1, i32 %10
  %21 = load i32, ptr %20, align 4, !tbaa !5
  %22 = icmp ne i32 %21, 0
  %23 = zext i1 %22 to i32
  %24 = add nsw i32 %12, %23
  %25 = add nuw nsw i32 %10, 1
  %26 = icmp eq i32 %25, %2
  br i1 %26, label %27, label %9, !llvm.loop !9

27:                                               ; preds = %18, %3
  %28 = phi i32 [ %5, %3 ], [ %24, %18 ]
  %29 = phi i32 [ %4, %3 ], [ %8, %18 ]
  store i32 %29, ptr @u, align 4, !tbaa !5
  store i32 %28, ptr @v, align 4, !tbaa !5
  br label %30

30:                                               ; preds = %27, %16
  ret void
}

Note that - store to u, v both got sinked out of loop.

store i32 %29, ptr @u, align 4, !tbaa !5
store i32 %28, ptr @v, align 4, !tbaa !5

What do you think about passing thread model information from clang to LICM?

a. I had to change lot many files just to pass a single bool variable - which seems not feasible.
b. I could not think of any other approach apart from above. Let me know if someone has a better approach.
c. Is there no way to know thread model from the optimizations? Is it just present in the clang frontend? Are there any other phases which uses thread model info?

Allen added a subscriber: Allen.Aug 12 2022, 6:08 AM

Harbormaster completed remote builds in B180900: Diff 452161.Aug 12 2022, 7:04 AM

The clang changes are sort of right, in the sense that it's getting value of the right flag to the right place, but it's not how we pass around that sort of information.

Usually, we do one of two things:

Treat it as a property of the target, and add a method to TargetTransformInfo to retrieve it from the target.
Encode the information directly in the IR, as a function attribute or something like that.

llvm/lib/Transforms/Scalar/LICM.cpp
113	Since this is specifically for LICM, maybe call it licm-force-thread-model-single

Change thread model single licm flag name to licm-force-thread-model-single from thread-model-single.

In D130466#3719718, @efriedma wrote:

The clang changes are sort of right, in the sense that it's getting value of the right flag to the right place, but it's not how we pass around that sort of information.

Usually, we do one of two things:

Treat it as a property of the target, and add a method to TargetTransformInfo to retrieve it from the target.

@efriedma - ThreadModel is available in TargetOptions, how can i make it visible in TargetTransformInfo?

I do not see any usage of ThreadModel in CodeGen/Transforms/Target. I see it in only in the front end.
The only way I can think of is to pass information from clang to CodeGen and then store it in TargetTransformInfo and then use it in LICM. Can you point me to an example which is already present which does it? It would be helpful.

Clang (exist here) -> CodeGen (how to store here) -> Transform(want to use here)

In above patch, I passed ThreadModel from Clang to Transform but passing from Clang to CodeGen seems correct. Can you point me to such example?

Encode the information directly in the IR, as a function attribute or something like that.

The second approach does not seem feasible in this case as the information might be lost till the time it reaches LICM. I will go with (1).

lkail added a subscriber: lkail.Aug 16 2022, 6:32 AM

Harbormaster completed remote builds in B181502: Diff 452975.Aug 16 2022, 6:37 AM

The ThreadModel is passed from clang->LLVM; see llvm/include/llvm/Target/TargetOptions.h

Most interesting parts of TargetTransformInfo are implemented in llvm/include/llvm/CodeGen/BasicTTIImpl.h . In there, you should be able to go from TargetLowering->TargetMachine->TargetOptions, or something like that.

In D130466#3726690, @efriedma wrote:

The ThreadModel is passed from clang->LLVM; see llvm/include/llvm/Target/TargetOptions.h

Most interesting parts of TargetTransformInfo are implemented in llvm/include/llvm/CodeGen/BasicTTIImpl.h . In there, you should be able to go from TargetLowering->TargetMachine->TargetOptions, or something like that.

Makes sense.

One can get TargetOptions using - TargetLowering->getTargetMachine()->DefaultOptions->ThreadModel where DefaultOptions is TargetOptions only if TargetLowering is available in LICM.

Getting TargetLowering or TargetMachine in LICM does not seem pretty straight forward/directly available in LICM which implies there seems no way to know Thread Model in LICM.

-> In LICM, we have TargetTransformInfo and TargetLibraryInfo but again it does not fetch TargetLowering/TargetOptions in any way.

The current implementation seems complete as per https://github.com/llvm/llvm-project/issues/50537 in which there is option -licm-force-thread-model-single if the target supports thread model single.

We pass thread model from clang to licm which enables this transformation when thread model is single.

@efriedma - WDYT?

If we can get TargetLowering/TargetMachine in LICM then we do not need to pass thread model from clang to LICM. But I do not see any optimization making use of TargetLowering/TargetMachine as it is unavailable. Do you have any suggestions on it?

TargetTransformInfo can access TargetLowering. Set getTLI() in BasicTTIImpl.h.

Fix review comments -

Rename LICM flag to -licm-force-thread-model-single

Use thread model info from TargetOptions instead of passing from clang to llvm

In D130466#3736041, @efriedma wrote:

TargetTransformInfo can access TargetLowering. Set getTLI() in BasicTTIImpl.h.

Understood. I have updated it.

I have fixed all the suggestions. WDYT? @eli.friedman

gsocshubham added inline comments.Aug 22 2022, 1:51 AM

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
111	What should be the return value here? Should it be `ThreadModel::INVALID` or should I keep it same? Should I add a new value to existing enum like below? namespace ThreadModel { enum Model { INVALID // Thread model not known? POSIX, // POSIX Threads Single // Single Threaded Environment }; } Above enum is defined in `llvm/Target/TargetOptions.h`

tschuett removed a subscriber: tschuett.Aug 22 2022, 1:53 AM

Harbormaster completed remote builds in B182527: Diff 454415.Aug 22 2022, 3:42 AM

Use Options instead of DefaultOptions while fetching thread model.

Harbormaster completed remote builds in B183521: Diff 455799.Aug 26 2022, 12:05 AM

Any review comments - @momchil.velikov @eli.friedman

I think that instead of "ThreadModel::Model getThreadModel()" as the TTI API, we should just make the API something like "bool isSingleThreaded()". Which is basically, can any other thread run at the "same" time. I don't really want to include TargetOptions.h everywhere, and it makes the reason we're exposing it on TargetTransformInfo a bit more clear.

I don't think you ever addressed my comment about constants. The case where that would come up as a practical issue is if you have an argument marked "deferenceable"; we know it's legal to load from an arbitrary dereferenceable pointer, but it's not legal to store to one. We can only store to locations we know are modifiable (allocas, non-constant global variables, etc.)

Change ThreadModel::Model getThreadModel() to bool isSingleThreaded()

Remove inclusion of TargetOptions at various places.

Prohibit this transformation if store points to constant memory using bool PointToConstantMemory.

In D130466#3752338, @efriedma wrote:

I think that instead of "ThreadModel::Model getThreadModel()" as the TTI API, we should just make the API something like "bool isSingleThreaded()". Which is basically, can any other thread run at the "same" time. I don't really want to include TargetOptions.h everywhere, and it makes the reason we're exposing it on TargetTransformInfo a bit more clear.

Understood. Updated as above.
Thanks!

I don't think you ever addressed my comment about constants. The case where that would come up as a practical issue is if you have an argument marked "deferenceable"; we know it's legal to load from an arbitrary dereferenceable pointer, but it's not legal to store to one. We can only store to locations we know are modifiable (allocas, non-constant global variables, etc.)

I have added a check to avoid this transformation if store points to constant memory. Let me know WDYT on it. Have I addressed all your comments now? @eli.friedman

Harbormaster completed remote builds in B184336: Diff 456922.Aug 31 2022, 5:26 AM

Prohibit this transformation if store points to constant memory using bool PointToConstantMemory.

You can't use pointsToConstantMemory to prove the property you need. The problem is that it doesn't do what you want if it can't prove anything. You need a function that says the memory isn't writable unless it can prove the memory is writable (maybe call it something like "pointsToWritableMemory").

Getting the proof requirements wrong is a common trap for newcomers to writing compiler optimizations; please watch out for it in the future.

In D130466#3761744, @efriedma wrote:

Prohibit this transformation if store points to constant memory using bool PointToConstantMemory.

You can't use pointsToConstantMemory to prove the property you need. The problem is that it doesn't do what you want if it can't prove anything. You need a function that says the memory isn't writable unless it can prove the memory is writable (maybe call it something like "pointsToWritableMemory").

Getting the proof requirements wrong is a common trap for newcomers to writing compiler optimizations; please watch out for it in the future.

In practice, would that be alloca and non-constant globals, or can we handle any other cases as well? I don't think doing this for noalias calls would be legal, because the noalias semantics don't say anything about the memory being writable, right?

llvm/test/Transforms/LICM/promote-sink-store.ll
2	Please run the tests through `-instnamer` so the instructions have names. Also, it should be able to significantly reduce this test, it currently contains code not relevant to the transform. You'll also want to add a negative test for the constant memory case.

In D130466#3763325, @nikic wrote:

In practice, would that be alloca and non-constant globals, or can we handle any other cases as well? I don't think doing this for noalias calls would be legal, because the noalias semantics don't say anything about the memory being writable, right?

Hm ... we already assume that this is fine for noalias calls (in the non-capturing case), so I guess that's a pre-existing issue that is orthogonal to this patch. I believe the right treatment here would be to only omit the isNotCapturedBeforeOrInLoop() check in the single-thread case, but leave everything else alone.

In D130466#3761744, @efriedma wrote:

Prohibit this transformation if store points to constant memory using bool PointToConstantMemory.

You can't use pointsToConstantMemory to prove the property you need. The problem is that it doesn't do what you want if it can't prove anything. You need a function that says the memory isn't writable unless it can prove the memory is writable (maybe call it something like "pointsToWritableMemory").

Understood. Should I add isNotCapturedBeforeOrInLoop() as an extra check as mentioned by @nikic ?

Getting the proof requirements wrong is a common trap for newcomers to writing compiler optimizations; please watch out for it in the future.

Noted. Thanks!

gsocshubham added inline comments.Sep 2 2022, 3:15 AM

llvm/test/Transforms/LICM/promote-sink-store.ll
2	Sure. I will update it in the next push once we finalize the check on whether to use isNotCapturedBeforeOrInLoop() or should we write a custom function pointsToWritableMemory() to check alloca and non-constant globals?

gsocshubham marked an inline comment as done and an inline comment as not done.Sep 2 2022, 10:00 AM

In D130466#3766569, @gsocshubham wrote:

In D130466#3761744, @efriedma wrote:

Prohibit this transformation if store points to constant memory using bool PointToConstantMemory.

You can't use pointsToConstantMemory to prove the property you need. The problem is that it doesn't do what you want if it can't prove anything. You need a function that says the memory isn't writable unless it can prove the memory is writable (maybe call it something like "pointsToWritableMemory").

Understood. Should I add isNotCapturedBeforeOrInLoop() as an extra check as mentioned by @nikic ?

First off, please rebase your patch on main; @nikic made changes in rG315aef66.

Anyway, with that patch, the relevant bit of code currently looks like this:

Value *Object = getUnderlyingObject(SomePtr);
SafeToInsertStore =
    (isNoAliasCall(Object) || isa<AllocaInst>(Object) ||
     (isa<Argument>(Object) && cast<Argument>(Object)->hasByValAttr())) &&
    isNotCapturedBeforeOrInLoop(Object, CurLoop, DT);

There are basically two checks here: one, that the location is readable and writable (it's a malloc/alloca/byval argument/etc,), and two, that the location isn't accessed by some other thread (isNotCapturedBeforeOrInLoop).

If you know there aren't any other threads, you can just skip the isNotCapturedBeforeOrInLoop() check, and you're basically done.

On top of that, you probably also want to expand the list of locations that are known to be readable and writable to include global variables. (The current code doesn't check for them because the isNotCapturedBeforeOrInLoop() check can't succeed for global variables.)

I pushed another change that should make this code clearer: https://github.com/llvm/llvm-project/commit/388b684354cc71bd4043ddccbcfd91fb338d8b1e In single-thread mode, the isThreadLocalObject() check can be skipped. isWritableObject() can be extended to handle constant globals.

Address review comments.

Run -instnamer on lit tests.

TODO - Add a negative test for the constant memory case.

I am thinking on what changes needs to be done on below testcase to become it as negative test -

int u, v;

void f(int a[restrict], int b[restrict], int n) {

for (int i = 0; i < n; ++i) {
    if (a[i]) {
        ++u;
        break;
    }
    ++u;
    if (b[i])
        ++v;
}

}

Harbormaster completed remote builds in B185384: Diff 458416.Sep 7 2022, 4:48 AM

I had rebased with master while pushing changes. But while pushing, some other patch got merged which causes conflict.

Let me rebase it again and push it again.

gsocshubham added inline comments.Sep 8 2022, 1:32 AM

llvm/test/Transforms/LICM/promote-sink-store.ll
2	@nikic - Can you please point me to a testcase for constant memory case?

Add global variable check for sinking store.

In D130466#3776571, @gsocshubham wrote:

Add global variable check for sinking store.

@eli.friedman @nikic - Can you please give review on my latest changes?

Harbormaster completed remote builds in B185581: Diff 458684.Sep 8 2022, 2:38 AM

Update SafeToInsertStore check.

Harbormaster completed remote builds in B185788: Diff 458981.Sep 9 2022, 2:02 AM

Fix build bot failure caused by change in clang-format version.

Combine thread model single check with global variable check.

Harbormaster completed remote builds in B185807: Diff 459009.Sep 9 2022, 4:25 AM

@eli.friedman and @nikic - Any thoughts on above changes?

nikic added inline comments.Sep 10 2022, 2:33 AM

llvm/lib/Transforms/Scalar/LICM.cpp
2152	The check for GlobalVariable should go inside isWritableObject() and needs to check !isConstant().

xbolva00 added inline comments.Sep 10 2022, 3:11 AM

llvm/lib/Transforms/Scalar/LICM.cpp
114	You should rename this variable.

gsocshubham added inline comments.Sep 10 2022, 4:00 AM

llvm/lib/Transforms/Scalar/LICM.cpp
114	Do you have any suggestion? Probably, `ForceThreadModelSingle`?

Fix all review comments -

Move global variable constant check inside IsWritableObject()

Rename flag ThreadModelSingle to SingleThread

llvm/lib/Transforms/Scalar/LICM.cpp
114	I have renamed flag to `SingleThread`. @xbolva00 - Let me know WDYT

Harbormaster completed remote builds in B186007: Diff 459281.Sep 10 2022, 5:56 AM

nikic added inline comments.Sep 10 2022, 8:52 AM

llvm/lib/Transforms/Scalar/LICM.cpp
2151–2152	These checks should be in isThreadLocalObject(). If there is only a single thread, then the object is always thread-local. Your current code incorrectly skips the writability check.

Move checks (TTI->isSingleThreaded() || SingleThread) inside bool isThreadLocalObject()

Harbormaster completed remote builds in B186016: Diff 459293.Sep 10 2022, 10:57 AM

gsocshubham marked an inline comment as done.Sep 11 2022, 12:36 AM

gsocshubham added inline comments.

llvm/lib/Transforms/Scalar/LICM.cpp
2151–2152	Done. Thanks! Please check it.

Is this patch ready to be merged?

@nikic @dmgreen @eli.friedman

I believe the logic is now correct, but testing could be improved. Some suggestions:

Use a single file with two RUN lines, only one of which has -licm-force-thread-model-single. Combined with --check-prefixes, this will make it clear in which cases behavior differs based on the flag and in which cases it is the same.
The current test cases look too complicated. For globals, I believe a very simple loop that just loads / stores the global inside the loop should be enough. Without the flag this will lead to only load promotion, and with it to store promotion.
We should have a variant of that test with a constant global (should have no store promotion, independent of the flag).
We should have a variant of the test with a function argument (unknown if writable, so should have no store promotion).
We should have a variant of the test with an alloca that is captured (store promotion legal with the flag, because capture does not impact thread safety).

llvm/lib/Transforms/Scalar/LICM.cpp
1934	Unused AAResults argument?

Add unit tests.

TODO - Remove attributes from test cases.

Harbormaster completed remote builds in B187104: Diff 460701.Sep 16 2022, 5:36 AM

Remove attributes from LIT tests.

Run --instnamer on LIT tests.

Rename testcases as below -

promote-sink-store-arg.ll
-> No store promotion

promote-sink-store-capture.ll
-> Store promotion

promote-sink-store-constant-global.ll
-> No store promotion

promote-sink-store-global.ll
-> Store promotion

Harbormaster completed remote builds in B187713: Diff 461517.Sep 20 2022, 3:04 AM

@nikic @efriedma - Can you please give review on latest LIT tests? Is there any other way to remove alloca apart from mem2reg? Using mem2reg changes state of the program where I want to show current optimization.

I will rebase patch and update it since failure is due to ongoing patches in master.

PING - @nikic @eli.friedman

Rebase with master branch.

Harbormaster completed remote builds in B188337: Diff 462391.Sep 22 2022, 9:39 PM

gsocshubham marked an inline comment as done.Sep 28 2022, 7:12 AM

gsocshubham added inline comments.

llvm/lib/Transforms/Scalar/LICM.cpp
1934	Removed.

gsocshubham marked an inline comment as done.Sep 28 2022, 7:56 AM

gsocshubham added inline comments.

llvm/test/Transforms/LICM/promote-sink-store.ll
4	FTMS stands for `Force Thread Model Single` NFTMS stands for `No Force Thread Model Single`

Address all the review comments.

Add 2 RUN lines in LIT tests with prefixes - LICM-TMS and LICM-NO-TMS where TMS stands for Thread Model Single

Add a minimal test for globals where store promotion will happen only if flag is set and load promotion happens irrespective of the flag.

-> promote-sink-store-global.ll

Add a minimal test for constant globals where store promotion will not happen irrespective of the flag.

-> promote-sink-store-constant-global.ll

Add a minimal test for function argument where store promotion will not happen.

-> promote-sink-store-arg.ll

Add a minimal test for alloca that is captured where store promotion will happen.

-> promote-sink-store-capture.ll

At last there is a testcase named - "promote-sink-store.ll" which is the testcase attached in original bug description from github - https://github.com/llvm/llvm-project/issues/50537

gsocshubham added inline comments.Sep 28 2022, 12:56 PM

llvm/test/Transforms/LICM/promote-sink-store-arg.ll
73 ↗	(On Diff #463654)	This IR is obtained from below C code - void f(int n, int u) { for (int i = 0; i < n; ++i) { u = i; } }
llvm/test/Transforms/LICM/promote-sink-store-constant-global.ll
77 ↗	(On Diff #463654)	This IR is obtained from C testcase - const int u = 7; void f(int n) { int x; for (int i = 0; i < n; ++i) { x = u; } }
llvm/test/Transforms/LICM/promote-sink-store-global.ll
85 ↗	(On Diff #463654)	This IR is obtained from C code - int u, v; void f(int n) { int a; for (int i = 0; i < n; ++i) { u = i; a = v; } }
llvm/test/Transforms/LICM/promote-sink-store.ll
92	This IR is obtained from below testcase attached from bug description - int u, v; void f(int a[restrict], int b[restrict], int n) { for (int i = 0; i < n; ++i) { if (a[i]) { ++u; break; } ++u; if (b[i]) ++v; } }

In D130466#3783514, @nikic wrote:

I believe the logic is now correct, but testing could be improved. Some suggestions:

Use a single file with two RUN lines, only one of which has -licm-force-thread-model-single. Combined with --check-prefixes, this will make it clear in which cases behavior differs based on the flag and in which cases it is the same.

Added 2 RUN lines in all the LIT tests.

The current test cases look too complicated. For globals, I believe a very simple loop that just loads / stores the global inside the loop should be enough. Without the flag this will lead to only load promotion, and with it to store promotion.

I have used a simple loop to achieve above.

We should have a variant of that test with a constant global (should have no store promotion, independent of the flag).

Done.

We should have a variant of the test with a function argument (unknown if writable, so should have no store promotion).

Done.

We should have a variant of the test with an alloca that is captured (store promotion legal with the flag, because capture does not impact thread safety).

Done.

Harbormaster completed remote builds in B189241: Diff 463654.Sep 28 2022, 2:47 PM

Review request - @momchil.velikov @dmgreen @eli.friedman @nikic @fhahn

Can someone give final review? If this patch is fine, is it ready to merge?

All review comments have been addressed with no regression.

Use below testcase for capture case - test/Transforms/LICM/promote-sink-store-capture.ll

void f(int n, int u) {
    for (int i = 0, x = u; i < n; ++i) {
    x = i;
    }
}

instead of -

void f(int a[restrict], int n, int u) {
    for (int i = 0; i < n; ++i) {
    int x = u;
        if (a[i]) {
            ++x;
            break;
        }
        ++x;
    }
}

gsocshubham added inline comments.Sep 29 2022, 12:32 AM

llvm/test/Transforms/LICM/promote-sink-store-capture.ll
83 ↗	(On Diff #463767)	C testcase which is used to generate below IR - void f(int n, int u) { for (int i = 0, x = u; i < n; ++i) { x = i; } } Note - For capture case, store promotion happens irrespective of flag `-licm-force-thread-model-single` i.e. with just `-licm` with current state of LICM. In my last to last revision, without that flag captured store promotion was not happening, maybe some other patches in between have caused it.

Harbormaster completed remote builds in B189317: Diff 463767.Sep 29 2022, 1:18 AM

Can you please run the test cases through -sroa (as in, commit the test case with SROA applied, not add it to the RUN line)? The current tests contain unnecessary alloca+load/store+lifetime.start/end.

llvm/lib/Transforms/Scalar/LICM.cpp
115	Update description to match new option name?

arsenm added a subscriber: arsenm.Sep 29 2022, 8:44 AM

arsenm added inline comments.

llvm/test/Transforms/LICM/promote-sink-store-arg.ll
98 ↗	(On Diff #463767)	How is the threadedness relevant in a case that doesn't use atomic loads and stores?

Address review comments -

Run -sroa on lit tests where-ever applicable.

Update LICM flag description.

Add a new LIT test "promote-sink-atomic-store-arg.ll" to show atomic store behaviour.

I am able to run -sroa successfully on promote-sink-store-global.ll

I could not run sroa on below LIT tests as everything is getting optimized and there will be no load/store left to showcase this optmization effect - (Please check below godbolt links)

a. promote-sink-store-constant-global.ll - Constant global case - https://clang.godbolt.org/z/n5E69ebbz

IR obtained from -

const int u = 7;

int f(int n) {
     int x, i;
    for (i = 0; i < n; ++i) {
        x = u;
    }
  return x + u;
}

b. promote-sink-store-capture.ll - Capture case - https://clang.godbolt.org/z/fjxn7xc46

IR obtained from -

void f(int n, int u) {
    for (int i = 0, x = u; i < n; ++i) {
    x = u;
    }
}

c. promote-sink-store-arg.ll - Function argument case - https://clang.godbolt.org/z/d85qhxbsr

IR obtained from -

void f(int n, int u) {
    for (int i = 0; i < n; ++i) {
            u = i;
    }
}

gsocshubham added inline comments.Sep 30 2022, 3:50 AM

llvm/lib/Transforms/Scalar/LICM.cpp
115	Done.
llvm/test/Transforms/LICM/promote-sink-atomic-store-arg.ll
73 ↗	(On Diff #464209)	This IR is obtained from below C testcase - void f(int n, int u) { for (int i = 0; i < n; ++i) { __atomic_store_n( &u, i, __ATOMIC_SEQ_CST ); } }
llvm/test/Transforms/LICM/promote-sink-store-arg.ll
98 ↗	(On Diff #463767)	I have added a new test named - promote-sink-atomic-store-arg.ll below which contains atomic store. Let me know WDYT of the new test.

gsocshubham added inline comments.Sep 30 2022, 4:00 AM

llvm/test/Transforms/LICM/promote-sink-store-global.ll
85 ↗	(On Diff #463654)	Now the latest IR is being generated using below C code followed by -sroa int u, v; int f(int n) { int a; for (int i = 0; i < n; ++i) { u = i; a = v; } return u + a; }

Harbormaster completed remote builds in B189642: Diff 464209.Sep 30 2022, 4:48 AM

@nikic - Does the patch look fine now? Is it ready to merge?

Let me know suggestions if any. Thanks for your earlier -sroa comment.

In D130466#3827345, @gsocshubham wrote:

@nikic - Does the patch look fine now? Is it ready to merge?

Let me know suggestions if any. Thanks for your earlier -sroa comment.

PING.

In D130466#3824014, @nikic wrote:

Can you please run the test cases through -sroa (as in, commit the test case with SROA applied, not add it to the RUN line)? The current tests contain unnecessary alloca+load/store+lifetime.start/end.

I have run -sroa and shared respective dumps.

Please let me know if you need any other changes in this current patch.

llvm/lib/Transforms/Scalar/LICM.cpp
115	@nikic - Updated the description.
llvm/test/Transforms/LICM/promote-sink-store-arg.ll
98 ↗	(On Diff #463767)	@arsenm - Any thoughts on above?

In D130466#3783514, @nikic wrote:

I believe the logic is now correct, but testing could be improved. Some suggestions:

Use a single file with two RUN lines, only one of which has -licm-force-thread-model-single. Combined with --check-prefixes, this will make it clear in which cases behavior differs based on the flag and in which cases it is the same.

The current test cases look too complicated. For globals, I believe a very simple loop that just loads / stores the global inside the loop should be enough. Without the flag this will lead to only load promotion, and with it to store promotion.

We should have a variant of that test with a constant global (should have no store promotion, independent of the flag).

We should have a variant of the test with a function argument (unknown if writable, so should have no store promotion).

We should have a variant of the test with an alloca that is captured (store promotion legal with the flag, because capture does not impact thread safety).

Should I keep both prefix checks with and without flag? That is increasing number of lines of code in the testcase.

After reducing the testcase with bare minumum load/store licm opt - Below are number of lines

promote-sink-atomic-store-arg.ll - 30 lines
promote-sink-store-arg.ll - 30 lines
promote-sink-store-capture.ll - 40 lines
promote-sink-store-constant-global.ll - 20 lines
promote-sink-store-global.ll - 20 lines
promote-sink-store.ll - 35 lines

+ checks generated by llvm/utils
+ these includes spaces in between

gsocshubham added reviewers: nikic, arsenm, xbolva00, hiraditya.Oct 6 2022, 4:01 AM

Herald added a subscriber: wdng. · View Herald TranscriptOct 6 2022, 4:01 AM

Add simple LIT tests -

Remove complex testcases and add simple ones without allocas as requested by @nikic
Run -instnamer and -sroa.
Two checks with and without flag -licm-force-thread-model-single is shown only when behaviour differs when flag is set. (For reduced line in LIT tests). TMS stands for Thread Model Single

Note - There is alloca present in capture case as requested from above comment by @nikic We should have a variant of the test with an alloca that is captured (store promotion legal with the flag, because capture does not impact thread safety).

Behaviour of test cases -

constant-global-sroa-instnamer.ll
-> We can not store to a constant global and hence same behaviour irrespective of flag

capture-instnamer.ll
-> Store promotion happens.

arg-sroa-instnamer.ll
-> No store promotion irrespective of flag.

promote-sink-store-global.ll
-> Store promotion happens if flag is set.

gsocshubham added inline comments.Oct 7 2022, 2:29 AM

llvm/test/Transforms/LICM/promote-sink-store-arg.ll
73 ↗	(On Diff #463654)	Updated test IR is obtained after running -sroad and -instnamer on - int i, n; void f(int u[n]) { for (i = 0; i < n; ++i) { if(u[i]) u[i] = i; } } Here store to `u` will be not promoted.
llvm/test/Transforms/LICM/promote-sink-store-capture.ll
83 ↗	(On Diff #463767)	Updated test IR is obtained after running -sroad and -instnamer on - int i; void f(int u) { i = 0; for (int x = u; i < 20; ++i) { x = i; } } Here store to `x` will be promoted.
llvm/test/Transforms/LICM/promote-sink-store-constant-global.ll
77 ↗	(On Diff #463654)	Updated test IR is obtained after running -sroad and -instnamer on - const int u; int f(int n) { int x, i; for (i = 0; i < n; ++i) { x = u; } return x + u; }

Harbormaster completed remote builds in B190908: Diff 466013.Oct 7 2022, 3:20 AM

nikic mentioned this in rGb6676f3c1258: [LICM] Add test for single thread model promotion (NFC).Oct 7 2022, 8:13 AM

I'm still not really happy with the minimality of these tests. They still contain many unnecessary parts. Generally, unless you are writing debuginfo tests, it is better to write test IR by hand instead of trying to generate it using clang.

I've pushed a set of minimal tests at https://github.com/llvm/llvm-project/commit/b6676f3c12588cd1333de9bb3cee3a53bc71771e. Can you please rebase over those tests, and then rerun update_test_checks.py with these adjusted RUN lines?

; RUN: opt -S -licm < %s | FileCheck %s --check-prefixes=CHECK,MT
; RUN: opt -S -licm -licm-force-thread-model-single < %s | FileCheck %s --check-prefixes=CHECK,ST

Update RUN lines with flag -licm-force-thread-model-single and rerun update_test_checks.py on promote-single-thread.ll

Remove previously submitted LIT tests.

In D130466#3842927, @nikic wrote:

I'm still not really happy with the minimality of these tests. They still contain many unnecessary parts. Generally, unless you are writing debuginfo tests, it is better to write test IR by hand instead of trying to generate it using clang.

Understood!

I've pushed a set of minimal tests at https://github.com/llvm/llvm-project/commit/b6676f3c12588cd1333de9bb3cee3a53bc71771e. Can you please rebase over those tests, and then rerun update_test_checks.py with these adjusted RUN lines?
; RUN: opt -S -licm < %s | FileCheck %s --check-prefixes=CHECK,MT
; RUN: opt -S -licm -licm-force-thread-model-single < %s | FileCheck %s --check-prefixes=CHECK,ST

@nikic - Thanks a lot for above tests. I have rebased and updated it accordingly.

Now, store in promote_global() and promote_captured_alloca() gets promoted as per your comments in the testcase. Is this patch good to merge now?

LGTM

This revision is now accepted and ready to land.Oct 7 2022, 10:00 AM

In D130466#3843156, @nikic wrote:

LGTM

Thanks. Can you please commit it for me with below details? I do not have commit access.

Name - Shubham Narlawar
Email - shubham.narlawar@rrlogic.co.in

"Shubham Narlawar <shubham.narlawar@rrlogic.co.in>"

Harbormaster completed remote builds in B190964: Diff 466105.Oct 7 2022, 11:00 AM

Sure, I'll land this on Monday if nobody beats me to it.

How is this different from using syncscope("singlethread")?

In D130466#3844483, @arsenm wrote:

How is this different from using syncscope("singlethread")?

They don't have any relation, syncscope is a property of atomic operations.

To remove all the LICM-specific aspects, the question here is basically whether it is safe to introduce a spurious (non-atomic) store of the form store (load p), p without introducing a data race.

In D130466#3843921, @nikic wrote:

Sure, I'll land this on Monday if nobody beats me to it.

Sounds good.

It seems like everyone is fine with it!

Closed by commit rGb920407cf595: [LICM] Disable thread-safety checks in single-thread model (authored by gsocshubham, committed by nikic). · Explain WhyOct 10 2022, 7:51 AM

This revision was automatically updated to reflect the committed changes.

nikic added a commit: rGb920407cf595: [LICM] Disable thread-safety checks in single-thread model.

@nikic - Thanks for comitting it.

Thanks for your help!

We see that for a single-threaded system or with option '-mllvm -licm-force-thread-model-single', the generated code quality worsens for a very simple example: https://godbolt.org/z/5xhY1EcGb.

Would it be possible to fix this?

In D130466#4174726, @Bhramar.vatsa wrote:

We see that for a single-threaded system or with option '-mllvm -licm-force-thread-model-single', the generated code quality worsens for a very simple example: https://godbolt.org/z/5xhY1EcGb.

Would it be possible to fix this?

This is not really a regression at the IR level. Ideally the loop would be removed entirely after indvars loop exit replacement, but it currently doesn't support this pattern where the value from the next-to-last iteration is taken. There's a couple open issues for that, but somehow I can't find a single one of them...

Ah, I found one of the issues I remembered: https://github.com/llvm/llvm-project/issues/51396

Yes, it looks like the similar conditions are triggered by the code shown in that issue, but perhaps the reason isn't that indvars simplification pass couldn't really deduce the the value from last but one iteration (as seen in the example I shared: https://godbolt.org/z/5xhY1EcGb), but maybe the code structure introduced by this change?

The indvarsimplify did work fine in the absence of the option '-licm-force-thread-model-single'. With this option, the store is sunk into exit block in the first pass of LICM itself (just before loop-rotation). This then presents a different loop body to loop-rotation, which doesn't result in a guard condition. In the end, indvarsimplification see the exit block depending upon not just the induction variable value, but also on the original value of 'theGlobalVar', which perhaps make it difficult to reduce this to just an exit val computation.

As opposed to this, the case where it works (without the LICM option), the indvar simplification sees the loop as invariant i.e. the exit values just depends upon only the values from within loop, which then perhaps helps in reducing the loop to just an exit value computation.

In other words, with this LICM change, the store, which otherwise was getting skipped is now always done, making the code in the exit block to choose between the two values to store, the original value of global or the one less than exit value from the loop. The original value of the global location is propagated through loop making it more complicated than required? So, perhaps this is not required, but instead the code can be transformed such that the choice of values can be directly done in the exit block (e.g. a select between original value or computed exit val based on loop condition, similar to loop-rotation's guard condition)? This will leave the loop only with induction variable computations and thus can still be just reduced to calculation of exit values.
However, I do understand that this might out of scope of this change as it didn't introduce the transformation, but just exploited it.

nikic mentioned this in D146233: [LICM] Don't promote store to global even in single-thread mode.Mar 16 2023, 1:55 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

5 lines

TargetTransformInfoImpl.h

2 lines

CodeGen/

BasicTTIImpl.h

6 lines

Transforms/

Utils/

LoopUtils.h

11 lines

lib/

Analysis/

TargetTransformInfo.cpp

4 lines

Transforms/

Scalar/

LICM.cpp

32 lines

test/

Transforms/

LICM/

promote-sink-store.ll

94 lines

without-force-thread-model-single.ll

90 lines

Diff 459293

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 366 Lines • ▼ Show 20 Lines	public:
bool isNoopAddrSpaceCast(unsigned FromAS, unsigned ToAS) const;		bool isNoopAddrSpaceCast(unsigned FromAS, unsigned ToAS) const;

/// Return true if globals in this address space can have initializers other		/// Return true if globals in this address space can have initializers other
/// than `undef`.		/// than `undef`.
bool canHaveNonUndefGlobalInitializerInAddressSpace(unsigned AS) const;		bool canHaveNonUndefGlobalInitializerInAddressSpace(unsigned AS) const;

unsigned getAssumedAddrSpace(const Value *V) const;		unsigned getAssumedAddrSpace(const Value *V) const;

		bool isSingleThreaded() const;

std::pair<const Value *, unsigned>		std::pair<const Value *, unsigned>
getPredicatedAddrSpace(const Value *V) const;		getPredicatedAddrSpace(const Value *V) const;

/// Rewrite intrinsic call \p II such that \p OldV will be replaced with \p		/// Rewrite intrinsic call \p II such that \p OldV will be replaced with \p
/// NewV, which has a different address space. This should happen for every		/// NewV, which has a different address space. This should happen for every
/// operand index that collectFlatAddressOperands returned for the intrinsic.		/// operand index that collectFlatAddressOperands returned for the intrinsic.
/// \returns nullptr if the intrinsic was not handled. Otherwise, returns the		/// \returns nullptr if the intrinsic was not handled. Otherwise, returns the
/// new value (which may be the original \p II with modified operands).		/// new value (which may be the original \p II with modified operands).
▲ Show 20 Lines • Show All 1,189 Lines • ▼ Show 20 Lines	public:
virtual bool isAlwaysUniform(const Value *V) = 0;		virtual bool isAlwaysUniform(const Value *V) = 0;
virtual unsigned getFlatAddressSpace() = 0;		virtual unsigned getFlatAddressSpace() = 0;
virtual bool collectFlatAddressOperands(SmallVectorImpl<int> &OpIndexes,		virtual bool collectFlatAddressOperands(SmallVectorImpl<int> &OpIndexes,
Intrinsic::ID IID) const = 0;		Intrinsic::ID IID) const = 0;
virtual bool isNoopAddrSpaceCast(unsigned FromAS, unsigned ToAS) const = 0;		virtual bool isNoopAddrSpaceCast(unsigned FromAS, unsigned ToAS) const = 0;
virtual bool		virtual bool
canHaveNonUndefGlobalInitializerInAddressSpace(unsigned AS) const = 0;		canHaveNonUndefGlobalInitializerInAddressSpace(unsigned AS) const = 0;
virtual unsigned getAssumedAddrSpace(const Value *V) const = 0;		virtual unsigned getAssumedAddrSpace(const Value *V) const = 0;
		virtual bool isSingleThreaded() const = 0;
virtual std::pair<const Value *, unsigned>		virtual std::pair<const Value *, unsigned>
getPredicatedAddrSpace(const Value *V) const = 0;		getPredicatedAddrSpace(const Value *V) const = 0;
virtual Value rewriteIntrinsicWithAddressSpace(IntrinsicInst II,		virtual Value rewriteIntrinsicWithAddressSpace(IntrinsicInst II,
Value *OldV,		Value *OldV,
Value *NewV) const = 0;		Value *NewV) const = 0;
virtual bool isLoweredToCall(const Function *F) = 0;		virtual bool isLoweredToCall(const Function *F) = 0;
virtual void getUnrollingPreferences(Loop *L, ScalarEvolution &,		virtual void getUnrollingPreferences(Loop *L, ScalarEvolution &,
UnrollingPreferences &UP,		UnrollingPreferences &UP,
▲ Show 20 Lines • Show All 363 Lines • ▼ Show 20 Lines	public:
canHaveNonUndefGlobalInitializerInAddressSpace(unsigned AS) const override {		canHaveNonUndefGlobalInitializerInAddressSpace(unsigned AS) const override {
return Impl.canHaveNonUndefGlobalInitializerInAddressSpace(AS);		return Impl.canHaveNonUndefGlobalInitializerInAddressSpace(AS);
}		}

unsigned getAssumedAddrSpace(const Value *V) const override {		unsigned getAssumedAddrSpace(const Value *V) const override {
return Impl.getAssumedAddrSpace(V);		return Impl.getAssumedAddrSpace(V);
}		}

		bool isSingleThreaded() const override { return Impl.isSingleThreaded(); }

std::pair<const Value *, unsigned>		std::pair<const Value *, unsigned>
getPredicatedAddrSpace(const Value *V) const override {		getPredicatedAddrSpace(const Value *V) const override {
return Impl.getPredicatedAddrSpace(V);		return Impl.getPredicatedAddrSpace(V);
}		}

Value rewriteIntrinsicWithAddressSpace(IntrinsicInst II, Value *OldV,		Value rewriteIntrinsicWithAddressSpace(IntrinsicInst II, Value *OldV,
Value *NewV) const override {		Value *NewV) const override {
return Impl.rewriteIntrinsicWithAddressSpace(II, OldV, NewV);		return Impl.rewriteIntrinsicWithAddressSpace(II, OldV, NewV);
▲ Show 20 Lines • Show All 688 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 102 Lines • ▼ Show 20 Lines	public:

bool isNoopAddrSpaceCast(unsigned, unsigned) const { return false; }		bool isNoopAddrSpaceCast(unsigned, unsigned) const { return false; }
bool canHaveNonUndefGlobalInitializerInAddressSpace(unsigned AS) const {		bool canHaveNonUndefGlobalInitializerInAddressSpace(unsigned AS) const {
return AS == 0;		return AS == 0;
};		};

unsigned getAssumedAddrSpace(const Value *V) const { return -1; }		unsigned getAssumedAddrSpace(const Value *V) const { return -1; }

		bool isSingleThreaded() const { return false; }
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions What should be the return value here? Should it be `ThreadModel::INVALID` or should I keep it same? Should I add a new value to existing enum like below? namespace ThreadModel { enum Model { INVALID // Thread model not known? POSIX, // POSIX Threads Single // Single Threaded Environment }; } Above enum is defined in `llvm/Target/TargetOptions.h` gsocshubham: What should be the return value here? Should it be `ThreadModel::INVALID` or should I keep it…

std::pair<const Value *, unsigned>		std::pair<const Value *, unsigned>
getPredicatedAddrSpace(const Value *V) const {		getPredicatedAddrSpace(const Value *V) const {
return std::make_pair(nullptr, -1);		return std::make_pair(nullptr, -1);
}		}

Value rewriteIntrinsicWithAddressSpace(IntrinsicInst II, Value *OldV,		Value rewriteIntrinsicWithAddressSpace(IntrinsicInst II, Value *OldV,
Value *NewV) const {		Value *NewV) const {
return nullptr;		return nullptr;
▲ Show 20 Lines • Show All 1,172 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/MachineValueType.h"		#include "llvm/Support/MachineValueType.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
		#include "llvm/Target/TargetOptions.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <limits>		#include <limits>
#include <utility>		#include <utility>

namespace llvm {		namespace llvm {

▲ Show 20 Lines • Show All 224 Lines • ▼ Show 20 Lines	public:
bool isNoopAddrSpaceCast(unsigned FromAS, unsigned ToAS) const {		bool isNoopAddrSpaceCast(unsigned FromAS, unsigned ToAS) const {
return getTLI()->getTargetMachine().isNoopAddrSpaceCast(FromAS, ToAS);		return getTLI()->getTargetMachine().isNoopAddrSpaceCast(FromAS, ToAS);
}		}

unsigned getAssumedAddrSpace(const Value *V) const {		unsigned getAssumedAddrSpace(const Value *V) const {
return getTLI()->getTargetMachine().getAssumedAddrSpace(V);		return getTLI()->getTargetMachine().getAssumedAddrSpace(V);
}		}

		bool isSingleThreaded() const {
		return getTLI()->getTargetMachine().Options.ThreadModel ==
		ThreadModel::Single;
		}

std::pair<const Value *, unsigned>		std::pair<const Value *, unsigned>
getPredicatedAddrSpace(const Value *V) const {		getPredicatedAddrSpace(const Value *V) const {
return getTLI()->getTargetMachine().getPredicatedAddrSpace(V);		return getTLI()->getTargetMachine().getPredicatedAddrSpace(V);
}		}

Value rewriteIntrinsicWithAddressSpace(IntrinsicInst II, Value *OldV,		Value rewriteIntrinsicWithAddressSpace(IntrinsicInst II, Value *OldV,
Value *NewV) const {		Value *NewV) const {
return nullptr;		return nullptr;
▲ Show 20 Lines • Show All 2,113 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Utils/LoopUtils.h

	Show First 20 Lines • Show All 200 Lines • ▼ Show 20 Lines
	/// loop invariant. It takes a set of must-alias values, Loop exit blocks			/// loop invariant. It takes a set of must-alias values, Loop exit blocks
	/// vector, loop exit blocks insertion point vector, PredIteratorCache,			/// vector, loop exit blocks insertion point vector, PredIteratorCache,
	/// LoopInfo, DominatorTree, Loop, AliasSet information for all instructions			/// LoopInfo, DominatorTree, Loop, AliasSet information for all instructions
	/// of the loop and loop safety information as arguments.			/// of the loop and loop safety information as arguments.
	/// Diagnostics is emitted via \p ORE. It returns changed status.			/// Diagnostics is emitted via \p ORE. It returns changed status.
	/// \p AllowSpeculation is whether values should be hoisted even if they are not			/// \p AllowSpeculation is whether values should be hoisted even if they are not
	/// guaranteed to execute in the loop, but are safe to speculatively execute.			/// guaranteed to execute in the loop, but are safe to speculatively execute.
	bool promoteLoopAccessesToScalars(			bool promoteLoopAccessesToScalars(
	const SmallSetVector<Value , 8> &, SmallVectorImpl<BasicBlock > &,			AAResults , const SmallSetVector<Value , 8> &,
	SmallVectorImpl<Instruction > &, SmallVectorImpl<MemoryAccess > &,			SmallVectorImpl<BasicBlock > &, SmallVectorImpl<Instruction > &,
	PredIteratorCache &, LoopInfo , DominatorTree , const TargetLibraryInfo *,			SmallVectorImpl<MemoryAccess > &, PredIteratorCache &, LoopInfo ,
	Loop , MemorySSAUpdater &, ICFLoopSafetyInfo ,			DominatorTree , const TargetLibraryInfo , TargetTransformInfo , Loop ,
	OptimizationRemarkEmitter *, bool AllowSpeculation);			MemorySSAUpdater &, ICFLoopSafetyInfo , OptimizationRemarkEmitter ,
				bool AllowSpeculation);

	/// Does a BFS from a given node to all of its children inside a given loop.			/// Does a BFS from a given node to all of its children inside a given loop.
	/// The returned vector of nodes includes the starting point.			/// The returned vector of nodes includes the starting point.
	SmallVector<DomTreeNode , 16> collectChildrenInLoop(DomTreeNode N,			SmallVector<DomTreeNode , 16> collectChildrenInLoop(DomTreeNode N,
	const Loop *CurLoop);			const Loop *CurLoop);

	/// Returns the instructions that use values defined in the loop.			/// Returns the instructions that use values defined in the loop.
	SmallVector<Instruction , 8> findDefsUsedOutsideOfLoop(Loop L);			SmallVector<Instruction , 8> findDefsUsedOutsideOfLoop(Loop L);
	▲ Show 20 Lines • Show All 326 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 267 Lines • ▼ Show 20 Lines	bool TargetTransformInfo::canHaveNonUndefGlobalInitializerInAddressSpace(
unsigned AS) const {		unsigned AS) const {
return TTIImpl->canHaveNonUndefGlobalInitializerInAddressSpace(AS);		return TTIImpl->canHaveNonUndefGlobalInitializerInAddressSpace(AS);
}		}

unsigned TargetTransformInfo::getAssumedAddrSpace(const Value *V) const {		unsigned TargetTransformInfo::getAssumedAddrSpace(const Value *V) const {
return TTIImpl->getAssumedAddrSpace(V);		return TTIImpl->getAssumedAddrSpace(V);
}		}

		bool TargetTransformInfo::isSingleThreaded() const {
		return TTIImpl->isSingleThreaded();
		}

std::pair<const Value *, unsigned>		std::pair<const Value *, unsigned>
TargetTransformInfo::getPredicatedAddrSpace(const Value *V) const {		TargetTransformInfo::getPredicatedAddrSpace(const Value *V) const {
return TTIImpl->getPredicatedAddrSpace(V);		return TTIImpl->getPredicatedAddrSpace(V);
}		}

Value *TargetTransformInfo::rewriteIntrinsicWithAddressSpace(		Value *TargetTransformInfo::rewriteIntrinsicWithAddressSpace(
IntrinsicInst II, Value OldV, Value *NewV) const {		IntrinsicInst II, Value OldV, Value *NewV) const {
return TTIImpl->rewriteIntrinsicWithAddressSpace(II, OldV, NewV);		return TTIImpl->rewriteIntrinsicWithAddressSpace(II, OldV, NewV);
▲ Show 20 Lines • Show All 931 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LICM.cpp

Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/PredIteratorCache.h"		#include "llvm/IR/PredIteratorCache.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
		#include "llvm/Target/TargetOptions.h"
#include "llvm/Transforms/Scalar.h"		#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/AssumeBundleBuilder.h"		#include "llvm/Transforms/Utils/AssumeBundleBuilder.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Transforms/Utils/LoopUtils.h"		#include "llvm/Transforms/Utils/LoopUtils.h"
#include "llvm/Transforms/Utils/SSAUpdater.h"		#include "llvm/Transforms/Utils/SSAUpdater.h"
#include <algorithm>		#include <algorithm>
#include <utility>		#include <utility>
Show All 18 Lines
static cl::opt<bool>		static cl::opt<bool>
DisablePromotion("disable-licm-promotion", cl::Hidden, cl::init(false),		DisablePromotion("disable-licm-promotion", cl::Hidden, cl::init(false),
cl::desc("Disable memory promotion in LICM pass"));		cl::desc("Disable memory promotion in LICM pass"));

static cl::opt<bool> ControlFlowHoisting(		static cl::opt<bool> ControlFlowHoisting(
"licm-control-flow-hoisting", cl::Hidden, cl::init(false),		"licm-control-flow-hoisting", cl::Hidden, cl::init(false),
cl::desc("Enable control flow (and PHI) hoisting in LICM"));		cl::desc("Enable control flow (and PHI) hoisting in LICM"));

		static cl::opt<bool> SingleThread("licm-force-thread-model-single", cl::Hidden,
		efriedmaUnsubmitted Done Reply Inline Actions Since this is specifically for LICM, maybe call it licm-force-thread-model-single efriedma: Since this is specifically for LICM, maybe call it licm-force-thread-model-single
		cl::init(false),
		xbolva00Unsubmitted Not Done Reply Inline Actions You should rename this variable. xbolva00: You should rename this variable.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Do you have any suggestion? Probably, `ForceThreadModelSingle`? gsocshubham: Do you have any suggestion? Probably, `ForceThreadModelSingle`?
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions I have renamed flag to `SingleThread`. @xbolva00 - Let me know WDYT gsocshubham: I have renamed flag to `SingleThread`. @xbolva00 - Let me know WDYT
		cl::desc("Allow data races in LICM pass"));
		nikicUnsubmitted Not Done Reply Inline Actions Update description to match new option name? nikic: Update description to match new option name?
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. gsocshubham: Done.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions @nikic - Updated the description. gsocshubham: @nikic - Updated the description.

static cl::opt<uint32_t> MaxNumUsesTraversed(		static cl::opt<uint32_t> MaxNumUsesTraversed(
"licm-max-num-uses-traversed", cl::Hidden, cl::init(8),		"licm-max-num-uses-traversed", cl::Hidden, cl::init(8),
cl::desc("Max num uses visited for identifying load "		cl::desc("Max num uses visited for identifying load "
"invariance in loop using invariant start (default = 8)"));		"invariance in loop using invariant start (default = 8)"));

// Experimental option to allow imprecision in LICM in pathological cases, in		// Experimental option to allow imprecision in LICM in pathological cases, in
// exchange for faster compile. This is to be removed if MemorySSA starts to		// exchange for faster compile. This is to be removed if MemorySSA starts to
// address the same issue. LICM calls MemorySSAWalker's		// address the same issue. LICM calls MemorySSAWalker's
▲ Show 20 Lines • Show All 355 Lines • ▼ Show 20 Lines	if (!HasCatchSwitch) {
// loop invariant, so run this in a loop.		// loop invariant, so run this in a loop.
bool Promoted = false;		bool Promoted = false;
bool LocalPromoted;		bool LocalPromoted;
do {		do {
LocalPromoted = false;		LocalPromoted = false;
for (const SmallSetVector<Value *, 8> &PointerMustAliases :		for (const SmallSetVector<Value *, 8> &PointerMustAliases :
collectPromotionCandidates(MSSA, AA, L)) {		collectPromotionCandidates(MSSA, AA, L)) {
LocalPromoted \|= promoteLoopAccessesToScalars(		LocalPromoted \|= promoteLoopAccessesToScalars(
PointerMustAliases, ExitBlocks, InsertPts, MSSAInsertPts, PIC, LI,		AA, PointerMustAliases, ExitBlocks, InsertPts, MSSAInsertPts, PIC,
DT, TLI, L, MSSAU, &SafetyInfo, ORE, LicmAllowSpeculation);		LI, DT, TLI, TTI, L, MSSAU, &SafetyInfo, ORE,
		LicmAllowSpeculation);
}		}
Promoted \|= LocalPromoted;		Promoted \|= LocalPromoted;
} while (LocalPromoted);		} while (LocalPromoted);

// Once we have promoted values across the loop body we have to		// Once we have promoted values across the loop body we have to
// recursively reform LCSSA as any nested loop may now have values defined		// recursively reform LCSSA as any nested loop may now have values defined
// within the loop used in the outer loop.		// within the loop used in the outer loop.
// FIXME: This is really heavy handed. It would be a bit better to use an		// FIXME: This is really heavy handed. It would be a bit better to use an
▲ Show 20 Lines • Show All 1,348 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = LoopExitBlocks.size(); i != e; ++i) {
NewMemAcc = MSSAU.createMemoryAccessInBB(		NewMemAcc = MSSAU.createMemoryAccessInBB(
NewSI, nullptr, NewSI->getParent(), MemorySSA::Beginning);		NewSI, nullptr, NewSI->getParent(), MemorySSA::Beginning);
} else {		} else {
NewMemAcc =		NewMemAcc =
MSSAU.createMemoryAccessAfter(NewSI, nullptr, MSSAInsertPoint);		MSSAU.createMemoryAccessAfter(NewSI, nullptr, MSSAInsertPoint);
}		}
MSSAInsertPts[i] = NewMemAcc;		MSSAInsertPts[i] = NewMemAcc;
MSSAU.insertDef(cast<MemoryDef>(NewMemAcc), true);		MSSAU.insertDef(cast<MemoryDef>(NewMemAcc), true);
// FIXME: true for safety, false may still be correct.		// FIXME: true for safety, false may still be correct.
		efriedmaUnsubmitted Not Done Reply Inline Actions Not sure how this is related. efriedma: Not sure how this is related.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Removed. gsocshubham: Removed.
}		}
}		}

void doExtraRewritesBeforeFinalDeletion() override {		void doExtraRewritesBeforeFinalDeletion() override {
if (CanInsertStoresInExitBlocks)		if (CanInsertStoresInExitBlocks)
insertStoresInLoopExitBlocks();		insertStoresInLoopExitBlocks();
}		}

Show All 37 Lines	bool isWritableObject(const Value *Object) {
// See https://github.com/llvm/llvm-project/issues/51838.		// See https://github.com/llvm/llvm-project/issues/51838.
if (isa<AllocaInst>(Object))		if (isa<AllocaInst>(Object))
return true;		return true;

// TODO: Also handle sret.		// TODO: Also handle sret.
if (auto *A = dyn_cast<Argument>(Object))		if (auto *A = dyn_cast<Argument>(Object))
return A->hasByValAttr();		return A->hasByValAttr();

		if (auto *G = dyn_cast<GlobalVariable>(Object))
		return !G->isConstant();

// TODO: Noalias has nothing to do with writability, this should check for		// TODO: Noalias has nothing to do with writability, this should check for
// an allocator function.		// an allocator function.
return isNoAliasCall(Object);		return isNoAliasCall(Object);
}		}

bool isThreadLocalObject(const Value Object, const Loop L,		bool isThreadLocalObject(const Value Object, const Loop L, DominatorTree *DT,
DominatorTree *DT) {		TargetTransformInfo *TTI) {
// The object must be function-local to start with, and then not captured		// The object must be function-local to start with, and then not captured
// before/in the loop.		// before/in the loop.
return isIdentifiedFunctionLocal(Object) &&		return (isIdentifiedFunctionLocal(Object) &&
isNotCapturedBeforeOrInLoop(Object, L, DT);		isNotCapturedBeforeOrInLoop(Object, L, DT)) \|\|
		(TTI->isSingleThreaded() \|\| SingleThread);
}		}

} // namespace		} // namespace

/// Try to promote memory values to scalars by sinking stores out of the		/// Try to promote memory values to scalars by sinking stores out of the
/// loop and moving loads to before the loop. We do this by looping over		/// loop and moving loads to before the loop. We do this by looping over
/// the stores in the loop, looking for stores to Must pointers which are		/// the stores in the loop, looking for stores to Must pointers which are
/// loop invariant.		/// loop invariant.
///		///
bool llvm::promoteLoopAccessesToScalars(		bool llvm::promoteLoopAccessesToScalars(
const SmallSetVector<Value *, 8> &PointerMustAliases,		AAResults AA, const SmallSetVector<Value , 8> &PointerMustAliases,
		nikicUnsubmitted Done Reply Inline Actions Unused AAResults argument? nikic: Unused AAResults argument?
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Removed. gsocshubham: Removed.
SmallVectorImpl<BasicBlock *> &ExitBlocks,		SmallVectorImpl<BasicBlock *> &ExitBlocks,
SmallVectorImpl<Instruction *> &InsertPts,		SmallVectorImpl<Instruction *> &InsertPts,
SmallVectorImpl<MemoryAccess *> &MSSAInsertPts, PredIteratorCache &PIC,		SmallVectorImpl<MemoryAccess *> &MSSAInsertPts, PredIteratorCache &PIC,
LoopInfo LI, DominatorTree DT, const TargetLibraryInfo *TLI,		LoopInfo LI, DominatorTree DT, const TargetLibraryInfo *TLI,
Loop CurLoop, MemorySSAUpdater &MSSAU, ICFLoopSafetyInfo SafetyInfo,		TargetTransformInfo TTI, Loop CurLoop, MemorySSAUpdater &MSSAU,
OptimizationRemarkEmitter *ORE, bool AllowSpeculation) {		ICFLoopSafetyInfo SafetyInfo, OptimizationRemarkEmitter ORE,
		bool AllowSpeculation) {
// Verify inputs.		// Verify inputs.
assert(LI != nullptr && DT != nullptr && CurLoop != nullptr &&		assert(LI != nullptr && DT != nullptr && CurLoop != nullptr &&
SafetyInfo != nullptr &&		SafetyInfo != nullptr &&
"Unexpected Input to promoteLoopAccessesToScalars");		"Unexpected Input to promoteLoopAccessesToScalars");

LLVM_DEBUG({		LLVM_DEBUG({
dbgs() << "Trying to promote set of must-aliased pointers:\n";		dbgs() << "Trying to promote set of must-aliased pointers:\n";
for (Value *Ptr : PointerMustAliases)		for (Value *Ptr : PointerMustAliases)
▲ Show 20 Lines • Show All 187 Lines • ▼ Show 20 Lines	if (SawUnorderedAtomic && Alignment < MDL.getTypeStoreSize(AccessTy))
return false;		return false;

// If we couldn't prove we can hoist the load, bail.		// If we couldn't prove we can hoist the load, bail.
if (!DereferenceableInPH) {		if (!DereferenceableInPH) {
LLVM_DEBUG(dbgs() << "Not promoting: Not dereferenceable in preheader\n");		LLVM_DEBUG(dbgs() << "Not promoting: Not dereferenceable in preheader\n");
return false;		return false;
}		}

// We know we can hoist the load, but don't have a guaranteed store.		// We know we can hoist the load, but don't have a guaranteed store.
// Check whether the location is writable and thread-local. If it is, then we		// Check whether the location is writable and thread-local. If it is, then we
		efriedmaUnsubmitted Not Done Reply Inline Actions I think you need a few more safety checks, even if you're assuming the program is single-threaded. Just because it's legal to load from a memory location, doesn't mean it's legal to store to it. For example, you can't write to a constant. efriedma: I think you need a few more safety checks, even if you're assuming the program is single…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions I am not sure about writing to a constant. Can you elaborate with an example where it would fail? gsocshubham: I am not sure about writing to a constant. Can you elaborate with an example where it would…
		fhahnUnsubmitted Not Done Reply Inline Actions This doesn't like quite right. You are just comparing the enum value, not the actual configured ThreadModel I think. You likely have to check the value returned by `getThreadModel`. fhahn: This doesn't like quite right. You are just comparing the enum value, not the actual configured…
		xbolva00Unsubmitted Not Done Reply Inline Actions and no tests? xbolva00: and no tests?
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions I have added a test here - `hoist-load-without-store.ll` where load is hoisted out of loop. I did not get what do you mean by no tests? gsocshubham: I have added a test here - `hoist-load-without-store.ll` where load is hoisted out of loop. I…
		xbolva00Unsubmitted Not Done Reply Inline Actions Tests that we perform this optimization only when threadmodel is single xbolva00: Tests that we perform this optimization only when threadmodel is single
		hiradityaUnsubmitted Not Done Reply Inline Actions Yeah, we need test checks the optimization was performed when threadmodel is single, and does not do the optimization otherwise. hiraditya: Yeah, we need test checks the optimization was performed when threadmodel is single, and does…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions @hiraditya - I have added test `LICM/without-allow-data-race.ll` which shows that optimization is not done when flag is not set - just for confirmation purpose. I think that it will be redundant test compared to `LICM/promote-sink-store.ll` and I should probably remove it once patch is ready to commit. gsocshubham: @hiraditya - I have added test `LICM/without-allow-data-race.ll` which shows that optimization…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions `getThreadModel()` is declared here - https://github.com/llvm/llvm-project/blob/d0541b47000739c68c540170c6b9790ec1ea3b77/llvm/include/llvm/CodeGen/CommandFlags.h#L42 Defination is present in clang - https://github.com/llvm/llvm-project/blob/448adfee05b737a26dda34e7ae2cd4948760fff0/clang/include/clang/Driver/ToolChain.h#L573 `virtual std::string getThreadModel() const { return "posix"; }` I am not sure on how can I use `getThreadModel()` in LICM.cpp. Is there any other way to know ThreadModel? gsocshubham: `getThreadModel()` is declared here - https://github.com/llvm/llvm…
		xbolva00Unsubmitted Not Done Reply Inline Actions So maybe clang can run llvm with -mllvm -your-new-flag-to-allow-data-races if threadmodel is single? xbolva00: So maybe clang can run llvm with -mllvm -your-new-flag-to-allow-data-races if threadmodel is…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Passing `-mllvm -your-new-flag-to-allow-data-races` to clang will work with my original submission - in which there is a new flag AllowDataRaces in LICM. I have tried an approach to let LICM know whether thread model is single or not as below - a. In `clang/lib/CodeGen/BackendUtil.cpp`, we have thread model info using `LangOpts.getThreadModel()` b. I pass a bool variable to PassBuilder which in turns pass to LICM. c. LICM based on this bool variable knows whether to hoist the load from the loop if thread model is single or not. I did a trail approach by referring to `AllowSpeculation` variable which is being passes to LICM. I don't think it is optimal/suggested solution. From my other comment, first I want to get clear with `-mthread-model single` option. Can you you comments on it? gsocshubham: 1. Passing `-mllvm -your-new-flag-to-allow-data-races` to clang will work with my original…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions If we pass thread model information as above, then we don't need flag AllowDataRaces at all. But I think there should be more efficient way apart from above - on how clang driver pass information to llvm passes. gsocshubham: If we pass thread model information as above, then we don't need flag AllowDataRaces at all.
// can insert stores along paths which originally didn't have them without		// can insert stores along paths which originally didn't have them without
// violating the memory model.		// violating the memory model.
if (StoreSafety == StoreSafetyUnknown) {		if (StoreSafety == StoreSafetyUnknown) {
Value *Object = getUnderlyingObject(SomePtr);		Value *Object = getUnderlyingObject(SomePtr);
if (isWritableObject(Object) && isThreadLocalObject(Object, CurLoop, DT))		if (isWritableObject(Object) &&
		isThreadLocalObject(Object, CurLoop, DT, TTI))
		nikicUnsubmitted Not Done Reply Inline Actions The check for GlobalVariable should go inside isWritableObject() and needs to check !isConstant(). nikic: The check for GlobalVariable should go inside isWritableObject() and needs to check !isConstant…
		nikicUnsubmitted Done Reply Inline Actions These checks should be in isThreadLocalObject(). If there is only a single thread, then the object is always thread-local. Your current code incorrectly skips the writability check. nikic: These checks should be in isThreadLocalObject(). If there is only a single thread, then the…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. Thanks! Please check it. gsocshubham: Done. Thanks! Please check it.
StoreSafety = StoreSafe;		StoreSafety = StoreSafe;
}		}

// If we've still failed to prove we can sink the store, hoist the load		// If we've still failed to prove we can sink the store, hoist the load
// only, if possible.		// only, if possible.
if (StoreSafety != StoreSafe && !FoundLoadToPromote)		if (StoreSafety != StoreSafe && !FoundLoadToPromote)
// If we cannot hoist the load either, give up.		// If we cannot hoist the load either, give up.
return false;		return false;
▲ Show 20 Lines • Show All 190 Lines • Show Last 20 Lines

llvm/test/Transforms/LICM/promote-sink-store.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -licm -licm-force-thread-model-single -S %s \| FileCheck %s
				nikicUnsubmitted Not Done Reply Inline Actions Please run the tests through `-instnamer` so the instructions have names. Also, it should be able to significantly reduce this test, it currently contains code not relevant to the transform. You'll also want to add a negative test for the constant memory case. nikic: Please run the tests through `-instnamer` so the instructions have names. Also, it should be…
				gsocshubhamAuthorUnsubmitted Not Done Reply Inline Actions Sure. I will update it in the next push once we finalize the check on whether to use isNotCapturedBeforeOrInLoop() or should we write a custom function pointsToWritableMemory() to check alloca and non-constant globals? gsocshubham: Sure. I will update it in the next push once we finalize the check on whether to use…
				gsocshubhamAuthorUnsubmitted Done Reply Inline Actions @nikic - Can you please point me to a testcase for constant memory case? gsocshubham: @nikic - Can you please point me to a testcase for constant memory case?

				@u = dso_local local_unnamed_addr global i32 0, align 4
				gsocshubhamAuthorUnsubmitted Done Reply Inline Actions FTMS stands for `Force Thread Model Single` NFTMS stands for `No Force Thread Model Single` gsocshubham: FTMS stands for `Force Thread Model Single` NFTMS stands for `No Force Thread Model Single`
				@v = dso_local local_unnamed_addr global i32 0, align 4

				define dso_local void @f(ptr noalias nocapture noundef readonly %arg, ptr noalias nocapture noundef readonly %arg1, i32 noundef %arg2) local_unnamed_addr {
				; CHECK-LABEL: @f(
				; CHECK-NEXT: bb:
				; CHECK-NEXT: [[I:%.]] = icmp sgt i32 [[ARG2:%.]], 0
				; CHECK-NEXT: br i1 [[I]], label [[BB3:%.]], label [[BB26:%.]]
				; CHECK: bb3:
				; CHECK-NEXT: [[I4:%.*]] = load i32, ptr @v, align 4
				; CHECK-NEXT: [[I5:%.*]] = load i32, ptr @u, align 4
				; CHECK-NEXT: [[I6:%.*]] = zext i32 [[ARG2]] to i64
				; CHECK-NEXT: [[V_PROMOTED:%.*]] = load i32, ptr @v, align 1
				; CHECK-NEXT: br label [[BB7:%.*]]
				; CHECK: bb7:
				; CHECK-NEXT: [[I203:%.]] = phi i32 [ [[V_PROMOTED]], [[BB3]] ], [ [[I202:%.]], [[BB21:%.*]] ]
				; CHECK-NEXT: [[I8:%.]] = phi i64 [ 0, [[BB3]] ], [ [[I23:%.]], [[BB21]] ]
				; CHECK-NEXT: [[I9:%.]] = phi i32 [ [[I5]], [[BB3]] ], [ [[I14:%.]], [[BB21]] ]
				; CHECK-NEXT: [[I10:%.]] = phi i32 [ [[I4]], [[BB3]] ], [ [[I22:%.]], [[BB21]] ]
				; CHECK-NEXT: [[I11:%.]] = getelementptr inbounds i32, ptr [[ARG:%.]], i64 [[I8]]
				; CHECK-NEXT: [[I12:%.*]] = load i32, ptr [[I11]], align 4
				; CHECK-NEXT: [[I13:%.*]] = icmp eq i32 [[I12]], 0
				; CHECK-NEXT: [[I14]] = add nsw i32 [[I9]], 1
				; CHECK-NEXT: br i1 [[I13]], label [[BB15:%.]], label [[BB25:%.]]
				; CHECK: bb15:
				; CHECK-NEXT: [[I16:%.]] = getelementptr inbounds i32, ptr [[ARG1:%.]], i64 [[I8]]
				; CHECK-NEXT: [[I17:%.*]] = load i32, ptr [[I16]], align 4
				; CHECK-NEXT: [[I18:%.*]] = icmp eq i32 [[I17]], 0
				; CHECK-NEXT: br i1 [[I18]], label [[BB21]], label [[BB19:%.*]]
				; CHECK: bb19:
				; CHECK-NEXT: [[I20:%.*]] = add nsw i32 [[I10]], 1
				; CHECK-NEXT: br label [[BB21]]
				; CHECK: bb21:
				; CHECK-NEXT: [[I202]] = phi i32 [ [[I203]], [[BB15]] ], [ [[I20]], [[BB19]] ]
				; CHECK-NEXT: [[I22]] = phi i32 [ [[I10]], [[BB15]] ], [ [[I20]], [[BB19]] ]
				; CHECK-NEXT: [[I23]] = add nuw nsw i64 [[I8]], 1
				; CHECK-NEXT: [[I24:%.*]] = icmp eq i64 [[I23]], [[I6]]
				; CHECK-NEXT: br i1 [[I24]], label [[BB25]], label [[BB7]]
				; CHECK: bb25:
				; CHECK-NEXT: [[I201:%.*]] = phi i32 [ [[I202]], [[BB21]] ], [ [[I203]], [[BB7]] ]
				; CHECK-NEXT: [[I14_LCSSA:%.*]] = phi i32 [ [[I14]], [[BB21]] ], [ [[I14]], [[BB7]] ]
				; CHECK-NEXT: store i32 [[I201]], ptr @v, align 1
				; CHECK-NEXT: store i32 [[I14_LCSSA]], ptr @u, align 4
				; CHECK-NEXT: br label [[BB26]]
				; CHECK: bb26:
				; CHECK-NEXT: ret void
				;
				bb:
				%i = icmp sgt i32 %arg2, 0
				br i1 %i, label %bb3, label %bb26

				bb3: ; preds = %bb
				%i4 = load i32, ptr @v, align 4
				%i5 = load i32, ptr @u, align 4
				%i6 = zext i32 %arg2 to i64
				br label %bb7

				bb7: ; preds = %bb21, %bb3
				%i8 = phi i64 [ 0, %bb3 ], [ %i23, %bb21 ]
				%i9 = phi i32 [ %i5, %bb3 ], [ %i14, %bb21 ]
				%i10 = phi i32 [ %i4, %bb3 ], [ %i22, %bb21 ]
				%i11 = getelementptr inbounds i32, ptr %arg, i64 %i8
				%i12 = load i32, ptr %i11, align 4
				%i13 = icmp eq i32 %i12, 0
				%i14 = add nsw i32 %i9, 1
				br i1 %i13, label %bb15, label %bb25

				bb15: ; preds = %bb7
				%i16 = getelementptr inbounds i32, ptr %arg1, i64 %i8
				%i17 = load i32, ptr %i16, align 4
				%i18 = icmp eq i32 %i17, 0
				br i1 %i18, label %bb21, label %bb19

				bb19: ; preds = %bb15
				%i20 = add nsw i32 %i10, 1
				store i32 %i20, ptr @v, align 4
				br label %bb21

				bb21: ; preds = %bb19, %bb15
				%i22 = phi i32 [ %i10, %bb15 ], [ %i20, %bb19 ]
				%i23 = add nuw nsw i64 %i8, 1
				%i24 = icmp eq i64 %i23, %i6
				br i1 %i24, label %bb25, label %bb7

				bb25: ; preds = %bb21, %bb7
				store i32 %i14, ptr @u, align 4
				br label %bb26

				bb26: ; preds = %bb25, %bb
				gsocshubhamAuthorUnsubmitted Done Reply Inline Actions This IR is obtained from below testcase attached from bug description - int u, v; void f(int a[restrict], int b[restrict], int n) { for (int i = 0; i < n; ++i) { if (a[i]) { ++u; break; } ++u; if (b[i]) ++v; } } gsocshubham: This IR is obtained from below testcase attached from bug description - ``` int u, v; void f…
				ret void
				}

llvm/test/Transforms/LICM/without-force-thread-model-single.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -licm -S %s \| FileCheck %s

				@u = dso_local local_unnamed_addr global i32 0, align 4
				@v = dso_local local_unnamed_addr global i32 0, align 4

				define dso_local void @f(ptr noalias nocapture noundef readonly %arg, ptr noalias nocapture noundef readonly %arg1, i32 noundef %arg2) local_unnamed_addr {
				; CHECK-LABEL: @f(
				; CHECK-NEXT: bb:
				; CHECK-NEXT: [[I:%.]] = icmp sgt i32 [[ARG2:%.]], 0
				; CHECK-NEXT: br i1 [[I]], label [[BB3:%.]], label [[BB26:%.]]
				; CHECK: bb3:
				; CHECK-NEXT: [[I4:%.*]] = load i32, ptr @v, align 4
				; CHECK-NEXT: [[I5:%.*]] = load i32, ptr @u, align 4
				; CHECK-NEXT: [[I6:%.*]] = zext i32 [[ARG2]] to i64
				; CHECK-NEXT: br label [[BB7:%.*]]
				; CHECK: bb7:
				; CHECK-NEXT: [[I8:%.]] = phi i64 [ 0, [[BB3]] ], [ [[I23:%.]], [[BB21:%.*]] ]
				; CHECK-NEXT: [[I9:%.]] = phi i32 [ [[I5]], [[BB3]] ], [ [[I14:%.]], [[BB21]] ]
				; CHECK-NEXT: [[I10:%.]] = phi i32 [ [[I4]], [[BB3]] ], [ [[I22:%.]], [[BB21]] ]
				; CHECK-NEXT: [[I11:%.]] = getelementptr inbounds i32, ptr [[ARG:%.]], i64 [[I8]]
				; CHECK-NEXT: [[I12:%.*]] = load i32, ptr [[I11]], align 4
				; CHECK-NEXT: [[I13:%.*]] = icmp eq i32 [[I12]], 0
				; CHECK-NEXT: [[I14]] = add nsw i32 [[I9]], 1
				; CHECK-NEXT: br i1 [[I13]], label [[BB15:%.]], label [[BB25:%.]]
				; CHECK: bb15:
				; CHECK-NEXT: [[I16:%.]] = getelementptr inbounds i32, ptr [[ARG1:%.]], i64 [[I8]]
				; CHECK-NEXT: [[I17:%.*]] = load i32, ptr [[I16]], align 4
				; CHECK-NEXT: [[I18:%.*]] = icmp eq i32 [[I17]], 0
				; CHECK-NEXT: br i1 [[I18]], label [[BB21]], label [[BB19:%.*]]
				; CHECK: bb19:
				; CHECK-NEXT: [[I20:%.*]] = add nsw i32 [[I10]], 1
				; CHECK-NEXT: store i32 [[I20]], ptr @v, align 4
				; CHECK-NEXT: br label [[BB21]]
				; CHECK: bb21:
				; CHECK-NEXT: [[I22]] = phi i32 [ [[I10]], [[BB15]] ], [ [[I20]], [[BB19]] ]
				; CHECK-NEXT: [[I23]] = add nuw nsw i64 [[I8]], 1
				; CHECK-NEXT: [[I24:%.*]] = icmp eq i64 [[I23]], [[I6]]
				; CHECK-NEXT: br i1 [[I24]], label [[BB25]], label [[BB7]]
				; CHECK: bb25:
				; CHECK-NEXT: [[I14_LCSSA:%.*]] = phi i32 [ [[I14]], [[BB21]] ], [ [[I14]], [[BB7]] ]
				; CHECK-NEXT: store i32 [[I14_LCSSA]], ptr @u, align 4
				; CHECK-NEXT: br label [[BB26]]
				; CHECK: bb26:
				; CHECK-NEXT: ret void
				;
				bb:
				%i = icmp sgt i32 %arg2, 0
				br i1 %i, label %bb3, label %bb26

				bb3: ; preds = %bb
				%i4 = load i32, ptr @v, align 4
				%i5 = load i32, ptr @u, align 4
				%i6 = zext i32 %arg2 to i64
				br label %bb7

				bb7: ; preds = %bb21, %bb3
				%i8 = phi i64 [ 0, %bb3 ], [ %i23, %bb21 ]
				%i9 = phi i32 [ %i5, %bb3 ], [ %i14, %bb21 ]
				%i10 = phi i32 [ %i4, %bb3 ], [ %i22, %bb21 ]
				%i11 = getelementptr inbounds i32, ptr %arg, i64 %i8
				%i12 = load i32, ptr %i11, align 4
				%i13 = icmp eq i32 %i12, 0
				%i14 = add nsw i32 %i9, 1
				br i1 %i13, label %bb15, label %bb25

				bb15: ; preds = %bb7
				%i16 = getelementptr inbounds i32, ptr %arg1, i64 %i8
				%i17 = load i32, ptr %i16, align 4
				%i18 = icmp eq i32 %i17, 0
				br i1 %i18, label %bb21, label %bb19

				bb19: ; preds = %bb15
				%i20 = add nsw i32 %i10, 1
				store i32 %i20, ptr @v, align 4
				br label %bb21

				bb21: ; preds = %bb19, %bb15
				%i22 = phi i32 [ %i10, %bb15 ], [ %i20, %bb19 ]
				%i23 = add nuw nsw i64 %i8, 1
				%i24 = icmp eq i64 %i23, %i6
				br i1 %i24, label %bb25, label %bb7

				bb25: ; preds = %bb21, %bb7
				store i32 %i14, ptr @u, align 4
				br label %bb26

				bb26: ; preds = %bb25, %bb
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[LICM] - Add option to force thread model singleClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 459293

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/include/llvm/Transforms/Utils/LoopUtils.h

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/Transforms/Scalar/LICM.cpp

llvm/test/Transforms/LICM/promote-sink-store.ll

llvm/test/Transforms/LICM/without-force-thread-model-single.ll

[LICM] - Add option to force thread model single
ClosedPublic