This is an archive of the discontinued LLVM Phabricator instance.

[DSE]Split memset when the memset is small enough to be lowered to stores
Needs ReviewPublic

Authored by junbuml on May 13 2016, 4:02 PM.

Details

Summary

This change split a memset into two parts if it is partially overlapped with later stores.
We split a memset only when it is small enough to be lowered to stores in backend
and the total number of expected stores after splitting should be smaller than the original one.

Diff Detail

Event Timeline

junbuml updated this revision to Diff 57266.May 13 2016, 4:02 PM
junbuml retitled this revision from to [DSE][WIP]Split memset when the memset is small enough to be lowered to stores.
junbuml updated this object.
junbuml updated this revision to Diff 57483.May 17 2016, 8:25 AM
junbuml retitled this revision from [DSE][WIP]Split memset when the memset is small enough to be lowered to stores to [DSE]Split memset when the memset is small enough to be lowered to stores.
junbuml edited subscribers, added: llvm-commits; removed: mcrosier.
junbuml updated this revision to Diff 57934.May 20 2016, 8:43 AM

Minor changes in test cases.

junbuml updated this revision to Diff 58241.May 24 2016, 8:39 AM

Rebased and fixed typo.

dberlin edited edge metadata.May 24 2016, 8:47 AM

These tests don't seem to give me a lot of information about what you expect to happen and when.
Can you modify them to actually test that stores get eliminated (because if they are not, this whole thing seems speculative enough that it probably should be value profile guided in some fashion)

test/Transforms/DeadStoreElimination/SplitMemintrinsic.ll
11

Errr, so what gets eliminated here?

35

Again, if your goal is to test that perform dead store elimination, you should test that, not just test that we turn the memset into stores.

These tests don't seem to give me a lot of information about what you expect to happen and when. Can you modify them to actually test that stores get eliminated (because if they are not, this whole thing seems speculative enough that it probably should be value profile guided in some fashion)

Thanks Daniel for your quick feedback. I will change the tests to check final stores impacted by this patch in DSE.

junbuml updated this revision to Diff 58269.May 24 2016, 10:39 AM
junbuml edited edge metadata.

Updated test cases which will now check the final stores impacted by this change.

eeckstein removed a reviewer: qcolombet.

So, in terms of high level optimization, this looks correct at a glance.
I'd wait for quentin to say whether he thinks this makes sense for LLVM's backends.
Assuming he thinks it looks good on that front, looks good to me.

lib/Transforms/Scalar/DeadStoreElimination.cpp
238

How and when can this happen?

junbuml added inline comments.May 24 2016, 1:23 PM
lib/Transforms/Scalar/DeadStoreElimination.cpp
238

If there is no information about LegalIntWidth extracted from datalayout, this could be 0. If we remove the target datalayout in input IR, this will be 0.

The first seems plausible, but datalayout is not optional anymore, IIRC, so
the second should be impossible.

qcolombet added inline comments.May 26 2016, 5:45 PM
include/llvm/Analysis/TargetTransformInfo.h
465

What about the MinSize attribute? (Oz)
Shouldn’t we just pass the Function (or its directly its list of attribute) and let the target checks whatever attribute it feels appropriate?

lib/Transforms/Scalar/DeadStoreElimination.cpp
238

Like Daniel said, DL are mandatory now. If that may still happen, (int would be an illegal type??) add a comment on when this is the case and add a test case for that!

241

Add a comment why this is not profitable. Something along the line of “this is going to be only one store”.

492

Add a comment with ASCII art like the other cases to explain what we handle here.

1069

Make a method out of this to break the nested indentation.

test/Transforms/DeadStoreElimination/SplitMemintrinsic.ll
2

I would prefer two different tests:

  1. One that checks that dse does what is expected.
  2. One that checks that given such input (statically available, not produced by opt), the backend does what is expected.
79

Use opt -instnamer

Thanks Quentin for the review. I will address your comments soon.

lib/Transforms/Scalar/DeadStoreElimination.cpp
238

As far as I check, I didn't encounter any complain when injecting an IR without "target datalayout" into opt. Also there are many test cases without datalayout. It seems also possible to assign empty string like target datalayout="". Did the mandatory mean in IRs generated from the frontend?

If DL.getLargestLegalIntTypeSizeInBits() returns 0, I think we should bail out with the assumption that we don't have proper information about backend.

qcolombet edited edge metadata.May 31 2016, 4:30 PM
qcolombet added a subscriber: qcolombet.

+Mehdi, who set the mandatory data layout.

mehdi_amini added inline comments.
lib/Transforms/Scalar/DeadStoreElimination.cpp
238

Having DL.getLargestLegalIntTypeSizeInBits() returning 0 would mean that someone specifically asked for not having int types supported at all.
Handling this case means that LLVM acknowledges that it aims at supporting this case. Is it possible/desirable to support this?
Otherwise an assertion would be more appropriate IMO.

+Mehdi, who set the mandatory data layout.

Can you please clarify about the mandatory? The reason I'm asking is because I can still see many test cases without datalayout.

Having DL.getLargestLegalIntTypeSizeInBits() returning 0 would mean that someone specifically asked for not having int types supported at all. Handling this case means that LLVM acknowledges that it aims at supporting this case. Is it possible/desirable to support this? Otherwise an assertion would be more appropriate IMO.

If having DL.getLargestLegalIntTypeSizeInBits() returning 0 is undesirable, I think we should assert in getLargestLegalIntTypeSizeInBits().

http://llvm.org/docs/doxygen/html/Module_8cpp_source.html#l00375
"const DataLayout &Module::getDataLayout() const { return DL; }"
It returns a reference, so it is guaranteed to not be null.
"No data layout in the Textual IR means "use the default data layout" for the triple."

Thanks for clarifying this. So, we are guaranteed to DL even without datalayout in the textual IR. However, this doesn't mean that we can always get the non-zero from DL.getLargestLegalIntTypeSizeInBits(). As I test, we can use an empty datalayout in IR like : target datalayout="".

Wouldn't DL.getLargestLegalIntTypeSizeInBits() == 0 imply that you can have icmp or select? Hence almost nothing in llvm would work?

Although we can make DL.getLargestLegalIntTypeSizeInBits() return 0, I also doubt when this is meaningful. If this doesn't make sense, I will add assert in getLargestLegalIntTypeSizeInBits(). Can anyone give us any feedback if there is any situation where we need to consider getLargestLegalIntTypeSizeInBits() == 0 ? As far as I test, I didn't see any unit test failure with the assert in getLargestLegalIntTypeSizeInBits() == 0.

Hi Mehdi,

Do you expect DL.getLargestLegalIntTypeSizeInBits() always return non-zero?
However, using the test case added in this patch, we can easily make
DL.getLargestLegalIntTypeSizeInBits() return 0 whenever we remove
datalayout in the IR below. Please correct me if I miss something here.

; RUN: opt < %s -basicaa -dse -mtriple=arm64-linux-gnu -S

; target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"

define void @test_32_8to15(i64* nocapture %P, i64 %n64) #0 {
entry:
%0 = bitcast i64* %P to i8*
call void @llvm.memset.p0i8.i64(i8* %0, i8 0, i64 32, i32 8, i1 false)
%arrayidx1 = getelementptr inbounds i64, i64* %P, i64 1 store i64 %n64,
i64* %arrayidx1 ret void }

Thanks,
Jun

junbuml updated this revision to Diff 59390.Jun 2 2016, 9:07 AM
junbuml edited edge metadata.
  • Bail out if DL.getLargestLegalIntTypeSizeInBits() returns 0, based on the assumption that we don't have proper information about backend.
  • For getMaxStoresPerMemset(), pass Function as a parameter to allow targets to check whatever desirable attribute. The current default behavior is adoppted from shouldLowerMemFuncForSize() that is used when lowering memset in backend.
  • Split the test case into two : one that check output of DSE and one that check stores lowered in backend.
  • Add extra comments as Quentin pointed.

Please take a look.

Thanks,
Jun

Couple additional comments.

include/llvm/Analysis/TargetTransformInfo.h
465

Is nullptr permitted for F?
If not, then use a reference.

include/llvm/Target/TargetLowering.h
966 ↗(On Diff #59390)

Same question, nullptr OK or not?

lib/CodeGen/TargetLoweringBase.cpp
1648 ↗(On Diff #59390)

What the comment says is correct. However, I wonder if we want to have this difference here.
I would have expected something simpler like return F->optForSize() || F->optForMinSize() for all OS.

test/Transforms/DeadStoreElimination/SplitMemintrinsic.ll
8

Don’t hard code %0.
Use a regexp.

junbuml updated this revision to Diff 59767.Jun 6 2016, 1:01 PM

Addressed Quentin's comments. Thanks Quentin for your review!

mcrosier resigned from this revision.Mar 6 2017, 1:05 PM