This is an archive of the discontinued LLVM Phabricator instance.

[DSE] Switch to MemorySSA-backed DSE by default.
ClosedPublic

Authored by fhahn on Sep 4 2020, 2:11 PM.

Details

Summary

The tests have been updated and I plan to move them from the MSSA
directory up.

Some end-to-end tests needed small adjustments. One difference to the
legacy DSE is that legacy DSE also deletes trivially dead instructions
that are unrelated to memory operations. Because MemorySSA-backed DSE
just walks the MemorySSA, we only visit/check memory instructions. But
removing unrelated dead instructions is not really DSE's job and other
passes will clean up.

One noteworthy change is in llvm/test/Transforms/Coroutines/ArgAddr.ll,
but I think this comes down to legacy DSE not handling instructions that
may throw correctly in that case. To cover this with MemorySSA-backed
DSE, we need an update to llvm.coro.begin to treat it's return value to
belong to the same underlying object as the passed pointer.

There are some minor cases MemorySSA-backed DSE currently misses, e.g. related
to atomic operations, but I think those can be implemented after the switch.

This has been discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html

For the MultiSource/SPEC2000/SPEC2006 the number of eliminated stores
goes from ~17500 (legayc DSE) to ~26300 (MemorySSA-backed). More numbers
and details in the thread on llvm-dev.

Impact on CTMark:

                                     Legacy Pass Manager
                        exec instrs    size-text
O3                       + 0.60%        - 0.27%
ReleaseThinLTO           + 1.00%        - 0.42%
ReleaseLTO-g.            + 0.77%        - 0.33%
RelThinLTO (link only)   + 0.87%        - 0.42%
RelLO-g (link only)      + 0.78%        - 0.33%

http://llvm-compile-time-tracker.com/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions

                                     New Pass Manager
                       exec instrs.   size-text
O3                       + 0.95%       - 0.25%
ReleaseThinLTO           + 1.34%       - 0.41%
ReleaseLTO-g.            + 1.71%       - 0.35%
RelThinLTO (link only)   + 0.96%       - 0.41%
RelLO-g (link only)      + 2.21%       - 0.35%

http://195.201.131.214:8000/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions

Diff Detail

Event Timeline

fhahn created this revision.Sep 4 2020, 2:11 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptSep 4 2020, 2:11 PM
fhahn requested review of this revision.Sep 4 2020, 2:11 PM
xbolva00 added inline comments.
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
115

Default 150?

lebedev.ri edited the summary of this revision. (Show Details)Sep 4 2020, 11:10 PM
fhahn updated this revision to Diff 290140.Sep 6 2020, 8:35 AM

Adjust limit in description, thanks!

llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
115

Yes indeed, fixed!

fhahn added a comment.Sep 8 2020, 12:45 PM

Note that we now also have to compute the PostDominatorTree, which adds an extra bit of compile-time overhead. By adjusting the pipeline a bit more, we can re-use ADCE's PDTs in most cases, which gives a -0.18% geomean improvement for -O3 D87322 http://llvm-compile-time-tracker.com/compare.php?from=15fdd6cd7c24c745df1bb419e72ff66fd138aa7e&to=481f494515fc89cb7caea8d862e40f2c910dc994&stat=instructions

Ideally we would preserve the PDT, but my first try with using DomTreeUpdater introduced a significant compile-time regression, so I put that on hold for now.

I don't think it is worth holding off flipping the default for those changes. They can be done as follow-ups I think and there probably will also be a few other small issues to iron out. The compile-time numbers shared are without the recent improvements to MemDepAnalysis (which sped up legacy DSE a bit), so the temporary change will be slightly bigger.

fhahn updated this revision to Diff 290658.Sep 9 2020, 1:03 AM

rebase on top of latest changes.

fhahn added a comment.Sep 9 2020, 6:29 AM

As another data point, I have collected compile-time numbers for ARM64 with -Os -g LTO & -Oz -g LTO. Geoman changes below. Compile time here is actual execution time on a stabilized system.

                 Compile time     code size
-Oz RLTO           0.67%             -0.12%
-Os -g RLTO        0.06%             -0.14%

I think flipping the switch now should be reasonable compile-time wise and we also have some ideas on how to reduce compile-time even further once it is enabled by default.

xbolva00 accepted this revision.Sep 9 2020, 6:53 AM
xbolva00 added a subscriber: spatel.

Thanks, amazing work!

I agree that this should be enabled now and do not wait anymore.

@nikic @spatel ?

This revision is now accepted and ready to land.Sep 9 2020, 6:53 AM
nikic accepted this revision.Sep 9 2020, 9:22 AM

LGTM as well, per llvm-dev discussion. I think you've done all the due diligence we can expect.

Do you plan to also work on MSSA-based MemCpyOpt? I expect that one will have a large positive impact on Rust code, where lack of cross-BB memcpy elision is a big issue right now.

asbirlea accepted this revision.Sep 9 2020, 11:23 AM

Green light from me as well. I think there's still room for improvement for the NPM, but this is the right step to move forward infrastructure-wise.

Regarding the MemCpyOpt question from nikic@, please let me know if you want to take that over.

fhahn added a comment.Sep 9 2020, 3:01 PM

It looks like a recently introduced GVN crash is exposed by this change (probably D87061). I will commit once this is fixed.

LGTM as well, per llvm-dev discussion. I think you've done all the due diligence we can expect.

Do you plan to also work on MSSA-based MemCpyOpt? I expect that one will have a large positive impact on Rust code, where lack of cross-BB memcpy elision is a big issue right now.

I probably won't have time to work on MSSA-based MemCpyOpt for the next couple of weeks unfortunately. But I think there is a lot that can be re-used/shared from the DSE implementation. Happy to discuss approaches.

asbirlea requested changes to this revision.Sep 9 2020, 7:02 PM

I'm running into a crash with this, could you please hold off until tomorrow? I'm working on getting a reproducer.

This revision now requires changes to proceed.Sep 9 2020, 7:02 PM
asbirlea added inline comments.Sep 9 2020, 8:24 PM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
316

The crash is hitting this unreachable for the intrinsic "llvm.memcpy.inline".
Simple repro: opt -dse test1.ll

define void @test(i8* noalias nocapture %0, i8* noalias nocapture readonly %1) {
  tail call void @llvm.memcpy.inline(i8* align 1 %0, i8* align 1 %1, i64 3, i1 false)
  ret void
}

declare void @llvm.memcpy.inline(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64 immarg, i1 immarg)
fhahn added inline comments.Sep 10 2020, 5:34 AM
llvm/lib/Transforms/Scalar/DeadStoreElimination.cpp
316

Thanks! The problem was that MemorySSA-based DSE only relies on whether we can find read/write memory locations. For memcpy.inline, MemoryLocation knows how to extract the arguments, but the DSE functions where not updated to account for that.

In the legacy DSE, we silently failed to optimized/eliminate memcpy.inline calls, but with the MemorySSA version we got a crash highlighting that the code needs updating.

That should be fixed by a5ec99da6ea7

fhahn added a comment.Sep 10 2020, 9:58 AM

I'm running into a crash with this, could you please hold off until tomorrow? I'm working on getting a reproducer.

@asbirlea Thanks for the reproducer, the issue should be sorted out. The GVN issue has also been fixed and I am planning to submit this today, unless you think we should wait a bit longer.

fhahn updated this revision to Diff 291078.Sep 10 2020, 1:59 PM

Final rebase after new memcpy test.

asbirlea accepted this revision.Sep 10 2020, 2:08 PM

I'm running additional testing in the background. I don't have anything holding this back at this point.
Either check in and if something shows up, I'll let you know to resolve or revert, or wait another 1-2 days for the testing I'm running to complete.
I'm ok with either option, and may even be inclined for the former, because there may be additional parties coming forward if issues arise, and I'd like these to come up earlier rather than later.

This revision is now accepted and ready to land.Sep 10 2020, 2:08 PM
fhahn added a comment.Sep 10 2020, 2:10 PM

I'm running additional testing in the background. I don't have anything holding this back at this point.
Either check in and if something shows up, I'll let you know to resolve or revert, or wait another 1-2 days for the testing I'm running to complete.
I'm ok with either option, and may even be inclined for the former, because there may be additional parties coming forward if issues arise, and I'd like these to come up earlier rather than later.

SGTM, I expect a few hiccups and it's probably best to get started early. I'll commit shortly.

This revision was landed with ongoing or failed builds.Sep 10 2020, 2:26 PM
This revision was automatically updated to reflect the committed changes.

Thanks, should already be fixed in d85ac6d577ac

Heads-up: We're seeing fairly widespread test failures with this in Chromium that go away when we force this flag off. It's possible that it's due to only a small number of issues and they might be previously-benign UB (no idea), but I figured I'd give an early note. We'll send an update once we've reduced a failing test.

Hi,

I'm seeing what I think is a miscompile with this:

opt -S -dse -o - dse.ll

on

@x = global [10 x [10 x i16]] zeroinitializer, align 1

define i16 @f() {
entry:
  br label %do.body

do.body:                                          ; preds = %if.end, %entry
  %i.0 = phi i16 [ 0, %entry ], [ %inc, %if.end ]
  %j.0 = phi i16 [ 0, %entry ], [ %add, %if.end ]
  %arrayidx2 = getelementptr inbounds [10 x [10 x i16]], [10 x [10 x i16]]* @x, i16 0, i16 %i.0, i16 %j.0
  store i16 2, i16* %arrayidx2, align 1
  %exitcond = icmp eq i16 %i.0, 4
  br i1 %exitcond, label %if.end10, label %if.end

if.end:                                           ; preds = %do.body
  %inc = add nuw nsw i16 %i.0, 1
  %add = add nuw nsw i16 %j.0, 2
  br label %do.body

if.end10:                                         ; preds = %do.body
  store i16 1, i16* %arrayidx2, align 1
  ret i16 0
}

gives

x = global [10 x [10 x i16]] zeroinitializer, align 1

define i16 @f() {
entry:
  br label %do.body

do.body:                                          ; preds = %if.end, %entry
  %i.0 = phi i16 [ 0, %entry ], [ %inc, %if.end ]
  %j.0 = phi i16 [ 0, %entry ], [ %add, %if.end ]
  %arrayidx2 = getelementptr inbounds [10 x [10 x i16]], [10 x [10 x i16]]* @x, i16 0, i16 %i.0, i16 %j.0
  %exitcond = icmp eq i16 %i.0, 4
  br i1 %exitcond, label %if.end10, label %if.end

if.end:                                           ; preds = %do.body
  %inc = add nuw nsw i16 %i.0, 1
  %add = add nuw nsw i16 %j.0, 2
  br label %do.body

if.end10:                                         ; preds = %do.body
  store i16 1, i16* %arrayidx2, align 1
  ret i16 0
}

So the store in the loop has been removed, which I think is incorrect.
The

store i16 2

in the loop should be carried out at (i,j) == (0,0), (1,2), (2,4), (3,6) och (4,8) and then the last of them should be overwritten by the

store i16 1

after the loop, but now the store in the loop is removed so we're only left with a written 1 at (4,8).

fhahn added a comment.Sep 14 2020, 4:09 AM

Hi,

I'm seeing what I think is a miscompile with this:

Thanks for the reproducer. We have to be more careful around pointers that may reference multiple locations in memory. I pushed f715d81c9df3 which limits elimination to stores that only reference a single location (or are in the same BB). The check is conservative and can be improved in the future, but the impact on the number of stores eliminated seems very low in practice. It would be great if you could check if that fixes the original reproducer.

fhahn added a comment.Sep 14 2020, 4:12 AM

Heads-up: We're seeing fairly widespread test failures with this in Chromium that go away when we force this flag off. It's possible that it's due to only a small number of issues and they might be previously-benign UB (no idea), but I figured I'd give an early note. We'll send an update once we've reduced a failing test.

Thanks for the heads-up. I just pushed another fix for the issue @uabelho. I'd guess at least some of the failures you are seeing should be related. If not, I think it makes sense to revert this once we have the reproducers and can work in ironing out the remaining issues.

Hi,

I'm seeing what I think is a miscompile with this:

Thanks for the reproducer. We have to be more careful around pointers that may reference multiple locations in memory. I pushed f715d81c9df3 which limits elimination to stores that only reference a single location (or are in the same BB). The check is conservative and can be improved in the future, but the impact on the number of stores eliminated seems very low in practice. It would be great if you could check if that fixes the original reproducer.

Great! I'll check and get back during the day.

Hi,

I'm seeing what I think is a miscompile with this:

Thanks for the reproducer. We have to be more careful around pointers that may reference multiple locations in memory. I pushed f715d81c9df3 which limits elimination to stores that only reference a single location (or are in the same BB). The check is conservative and can be improved in the future, but the impact on the number of stores eliminated seems very low in practice. It would be great if you could check if that fixes the original reproducer.

Great! I'll check and get back during the day.

Yep, the original reproducer passes now. Thanks!

dmajor added a subscriber: dmajor.Sep 14 2020, 1:27 PM

Heads-up: We're seeing fairly widespread test failures with this in Chromium that go away when we force this flag off. It's possible that it's due to only a small number of issues and they might be previously-benign UB (no idea), but I figured I'd give an early note. We'll send an update once we've reduced a failing test.

Thanks for the heads-up. I just pushed another fix for the issue @uabelho. I'd guess at least some of the failures you are seeing should be related. If not, I think it makes sense to revert this once we have the reproducers and can work in ironing out the remaining issues.

We're also seeing test failures in Firefox that are not fixed by f715d81c9df3. I'll attempt to reduce but it may take some time.

fhahn added a comment.Sep 14 2020, 2:02 PM

Heads-up: We're seeing fairly widespread test failures with this in Chromium that go away when we force this flag off. It's possible that it's due to only a small number of issues and they might be previously-benign UB (no idea), but I figured I'd give an early note. We'll send an update once we've reduced a failing test.

Thanks for the heads-up. I just pushed another fix for the issue @uabelho. I'd guess at least some of the failures you are seeing should be related. If not, I think it makes sense to revert this once we have the reproducers and can work in ironing out the remaining issues.

We're also seeing test failures in Firefox that are not fixed by f715d81c9df3. I'll attempt to reduce but it may take some time.

I also pushed a fix for a MemorySSA issue that was exposed through DSE + MemorySSA in c4f1b3144184. This might also fix (some) of the failures you are seeing.

I also pushed a fix for a MemorySSA issue that was exposed through DSE + MemorySSA in c4f1b3144184. This might also fix (some) of the failures you are seeing.

Same failures with c4f1b3144184.

I'd suggest reverting if the failures are blocking, but we do need a reproducer so it can be recommitted after a fix is in place.

fhahn added a comment.Sep 15 2020, 5:53 AM

I'd suggest reverting if the failures are blocking, but we do need a reproducer so it can be recommitted after a fix is in place.

+1, but I think ideally we would have reproducers for the remaining issues first. @dmajor @thakis please let me know if this is blocking you and it is not feasible to provide reproducers before reverting.

It's not blocking us because we added an explicit flag to force this off to our build config for "normal" builds. (We still get the default-on behavior on our bots that build with trunk clang.) But if others are seeing problems with this too, I think it makes sense to revert so others don't spend time tracking down miscompiles to this commit.

Sadly our ToT bots were broken all day yesterday because someone broke -Warray-bounds in clang. That got reverted 5h ago, but our bots need 6-7h to cycle, so we don't know how our tests look at clang trunk.

(The last build before the -Warray-bounds issue also timed out during compile, so it's possible that that's a real issue and if so then the current build will also time out during compile and we'll have to analyze and fix that first before we get data about the state of our tests.)

fhahn added a comment.Sep 15 2020, 6:44 AM

It's not blocking us because we added an explicit flag to force this off to our build config for "normal" builds. (We still get the default-on behavior on our bots that build with trunk clang.) But if others are seeing problems with this too, I think it makes sense to revert so others don't spend time tracking down miscompiles to this commit.

Sadly our ToT bots were broken all day yesterday because someone broke -Warray-bounds in clang. That got reverted 5h ago, but our bots need 6-7h to cycle, so we don't know how our tests look at clang trunk.

Ok, if you could let me know once you have some results, that would be great. If there still are problems, I'll go ahead an revert the patch later today.

Here is a reduced-ish repro, I can keep poking at it but maybe this is sufficient for you to notice something: https://godbolt.org/z/endf6d.

Center pane is old DSE, right pane is new DSE. I highlighted the problem area with a nop at line 56 of the source, and line 59 of each disassembly. The array writes following the nop have been eliminated, even though they will be needed on a future iteration of the for loop.

Reduced a bit more: https://godbolt.org/z/j59evK (C++) and https://godbolt.org/z/8xG688 (IR) -- the store at line 43 of while.end has been removed.

Reduced a bit more: https://godbolt.org/z/j59evK (C++) and https://godbolt.org/z/8xG688 (IR) -- the store at line 43 of while.end has been removed.

Thanks! I reverted the change for now in fb109c42d91c.

I think the problem here is that the load in the loop has liveOnEntry as defining access (which is outside the loop), so according to MemorySSA, the load does not read the memory written by the store. A slightly further IR example with the MemorySSA printed: https://godbolt.org/z/G34oM4. I think this is similar to the issue https://bugs.llvm.org/show_bug.cgi?id=47498.

@asbirlea do you have any thoughts on the issue? Do I understand correctly that in the example, %use = load i32, i32* %ptr.1, align 4 should have ; 8 = MemoryPhi({entry,1},{loop.latch,7}) as defining access?

define i32 @test() {
entry:
  %nodeStack = alloca [12 x i32], align 4
  %nodeStack.cast = bitcast [12 x i32]* %nodeStack to i8*
; 1 = MemoryDef(liveOnEntry)
  %c.1 = call i1 @cond(i32 1)
  br i1 %c.1, label %cleanup, label %loop.header

loop.header:                                      ; preds = %loop.latch, %entry
; 8 = MemoryPhi({entry,1},{loop.latch,7})
  %depth.1 = phi i32 [ %depth.1.be, %loop.latch ], [ 0, %entry ]
  %cmp = icmp sgt i32 %depth.1, 0
  br i1 %cmp, label %cond.read, label %cond.store

cond.read:                                        ; preds = %loop.header
  %sub = add nsw i32 %depth.1, -1
  %ptr.1 = getelementptr inbounds [12 x i32], [12 x i32]* %nodeStack, i32 0, i32 %sub
; MemoryUse(liveOnEntry)
  %use = load i32, i32* %ptr.1, align 4
; 2 = MemoryDef(8)
  %c.2 = call i1 @cond(i32 %use)
  br i1 %c.2, label %loop.latch, label %cond.store

cond.store:                                       ; preds = %cond.read, %loop.header
; 9 = MemoryPhi({loop.header,8},{cond.read,2})
  %ptr.2 = getelementptr inbounds [12 x i32], [12 x i32]* %nodeStack, i32 0, i32 %depth.1
; 3 = MemoryDef(9)
  store i32 10, i32* %ptr.2, align 4
  %inc = add nsw i32 %depth.1, 1
; 4 = MemoryDef(3)
  %c.3 = call i1 @cond(i32 20)
  br i1 %c.3, label %cleanup, label %loop.latch

loop.latch:                                       ; preds = %cond.store, %cond.read
; 7 = MemoryPhi({cond.read,2},{cond.store,4})
  %depth.1.be = phi i32 [ %sub, %cond.read ], [ %inc, %cond.store ]
  br label %loop.header

cleanup:                                          ; preds = %cond.store, %entry
; 6 = MemoryPhi({entry,1},{cond.store,4})
; 5 = MemoryDef(6)
  call void @llvm.lifetime.end.p0i8(i64 48, i8* nonnull %nodeStack.cast)
  ret i32 20
}

Yes, the load should have the Phi as the defining access. I'm still looking into where this information should come from, but it's not hitting the phi translation.
Thank you for the revert, I'll update with the fix once I have it available.

Yes, the load should have the Phi as the defining access. I'm still looking into where this information should come from, but it's not hitting the phi translation.
Thank you for the revert, I'll update with the fix once I have it available.

Thanks! I also started looking into where things go wrong. It appears that this happens during MemorySSA use optimization. (https://github.com/llvm/llvm-project/blob/master/llvm/lib/Analysis/MemorySSA.cpp#L1447). It seems like getClobberingMemoryAccess does not account for the fact that the query location could map to multiple memory locations if it is loop dependent. Just querying AA for both pointers will return noalias, because they only alias for different concrete phi values. I am not sure if we have a safe way to detect that a location actually refers to a range of locations. I am currently experimenting with being more conservative in getClobberingMemoryAccess for locations for which we cannot prove that they are loop-invariant for possible cycle in the function.

Right!
I believe the solution is to set the location size to unknown after a phi translation with the same ptr, but I need to properly verify that.

// include/llvm/Analysis/MemorySSA.h:1233
CurrentPair.second = Location.getWithNewSize(LocationSize::unknown());
return

I checked in a fix in https://reviews.llvm.org/rGfc8200633122, but I haven't yet verified it addresses all the failures reported.

I checked in a fix in https://reviews.llvm.org/rGfc8200633122, but I haven't yet verified it addresses all the failures reported.

Thanks, I've confirmed that this fixes our tests in their original form (not reduced).

Our day-to-day testing of LLVM trunk is limited though, maybe one-tenth of our full suite. Since this code has some risks, I could start a full run to throw more testing at it. But that takes more machine capacity and human effort to remove flaky failures, so I'd prefer to wait until the odds are good that there won't be further changes. Let me know when you think it's ready.

I checked in a fix in https://reviews.llvm.org/rGfc8200633122, but I haven't yet verified it addresses all the failures reported.

Thanks, I've confirmed that this fixes our tests in their original form (not reduced).

Our day-to-day testing of LLVM trunk is limited though, maybe one-tenth of our full suite. Since this code has some risks, I could start a full run to throw more testing at it. But that takes more machine capacity and human effort to remove flaky failures, so I'd prefer to wait until the odds are good that there won't be further changes. Let me know when you think it's ready.

I think I managed to come up with another test case that unfortunately is not covered by the recent fix. I think we might need to be even more conservative when walking across MemoryPhis and put up D87778, which includes the test case, which is crafted so that the result of PHI translation dominate the relevant blocks during translation.

fhahn added a comment.Sep 18 2020, 3:15 AM

All reported problems should be fixed now and we also made the fix for the issue @dmajor reported more conservative to catch more problematic cases. I just enabled DSE + MemorySSA again in 9d172c8e9c84. Please let me know if there are any further issues.

Thanks for the heads up. I'll schedule a more intensive test run over the weekend while there's less demand for machines.

I threw as many tests as I could find at 9d172c8e9c84, and I don't see any regressions compared to its parent revision.

fhahn added a comment.Sep 21 2020, 1:19 PM

I threw as many tests as I could find at 9d172c8e9c84, and I don't see any regressions compared to its parent revision.

That's great to know, thank you very much! And thanks everybody for providing very helpful test cases.

fhahn added a comment.Oct 6 2020, 9:10 AM

FYI this has been temporarily reverted again roughly a week ago, because there seems to be a mis-compile, that is caused by LLVM introducing invalid lifetime.end calls and exposed by the more powerful DSE, which @asbirlea is currently trying to narrow down.

fhahn added a comment.Oct 6 2020, 2:19 PM

I put up an impromptu round-table to discuss MemorySSA related topics, including potential new transformations like MemCpyOpt using MemorySSA: https://whova.com/portal/webapp/llvm_202010/CommunityBoard/topic/342030/

If there's any interest, I think it would be great to sync up.

Hi!

I found a new problem after this was turned on again in 51ff04567b2f.
With

opt -S -o - bbi-49235.ll -memoryssa -gvn -dse -verify-memoryssa

I hit

opt: ../lib/Analysis/MemorySSA.cpp:2063: void llvm::MemorySSA::verifyOrderingDominationAndDefUses(llvm::Function &) const: Assertion `&*ALI == *AAI && "Not the same accesses in the same order"' failed.

with bbi-49235.ll being

define void @k() {
entry:
  %tobool = icmp ne i16 1, 0
  %0 = xor i1 %tobool, true
  call void @llvm.assume(i1 %0)
  call void @f()
  ret void
}

declare void @f() local_unnamed_addr

; Function Attrs: nofree nosync nounwind willreturn
declare void @llvm.assume(i1 noundef) #0

attributes #0 = { nofree nosync nounwind willreturn }

Hi!

I found a new problem after this was turned on again in 51ff04567b2f.

I wrote https://bugs.llvm.org/show_bug.cgi?id=48072 about it

thakis added a comment.Nov 6 2020, 1:34 PM

Very belated update: We've been running with this on our clang/tot bots for a few weeks now and turned this on (or, more accurately stopped turning it off) for "normal" builds today. All tests pass and so far everything's happy.

fhahn added a comment.Nov 6 2020, 1:49 PM

Very belated update: We've been running with this on our clang/tot bots for a few weeks now and turned this on (or, more accurately stopped turning it off) for "normal" builds today. All tests pass and so far everything's happy.

That's great, thanks for letting us know! There might be a few more features down the road, so it's good to know that you were able to make the initial switch.

I wrote
https://bugs.llvm.org/show_bug.cgi?id=49859
about a miscompile that started occuring when turning on memoryssa again in DSE.
I suspect that licm doesn't keep memoryssa updated properly.