This change split a memset into two parts if it is partially overlapped with later stores.
We split a memset only when it is small enough to be lowered to stores in backend
and the total number of expected stores after splitting should be smaller than the original one.
Details
Diff Detail
Event Timeline
These tests don't seem to give me a lot of information about what you expect to happen and when.
Can you modify them to actually test that stores get eliminated (because if they are not, this whole thing seems speculative enough that it probably should be value profile guided in some fashion)
test/Transforms/DeadStoreElimination/SplitMemintrinsic.ll | ||
---|---|---|
11 | Errr, so what gets eliminated here? | |
35 | Again, if your goal is to test that perform dead store elimination, you should test that, not just test that we turn the memset into stores. |
These tests don't seem to give me a lot of information about what you expect to happen and when. Can you modify them to actually test that stores get eliminated (because if they are not, this whole thing seems speculative enough that it probably should be value profile guided in some fashion)
Thanks Daniel for your quick feedback. I will change the tests to check final stores impacted by this patch in DSE.
So, in terms of high level optimization, this looks correct at a glance.
I'd wait for quentin to say whether he thinks this makes sense for LLVM's backends.
Assuming he thinks it looks good on that front, looks good to me.
lib/Transforms/Scalar/DeadStoreElimination.cpp | ||
---|---|---|
238 | How and when can this happen? |
lib/Transforms/Scalar/DeadStoreElimination.cpp | ||
---|---|---|
238 | If there is no information about LegalIntWidth extracted from datalayout, this could be 0. If we remove the target datalayout in input IR, this will be 0. |
The first seems plausible, but datalayout is not optional anymore, IIRC, so
the second should be impossible.
include/llvm/Analysis/TargetTransformInfo.h | ||
---|---|---|
465 | What about the MinSize attribute? (Oz) | |
lib/Transforms/Scalar/DeadStoreElimination.cpp | ||
238 | Like Daniel said, DL are mandatory now. If that may still happen, (int would be an illegal type??) add a comment on when this is the case and add a test case for that! | |
241 | Add a comment why this is not profitable. Something along the line of “this is going to be only one store”. | |
492 | Add a comment with ASCII art like the other cases to explain what we handle here. | |
1069 | Make a method out of this to break the nested indentation. | |
test/Transforms/DeadStoreElimination/SplitMemintrinsic.ll | ||
2 | I would prefer two different tests:
| |
79 | Use opt -instnamer |
Thanks Quentin for the review. I will address your comments soon.
lib/Transforms/Scalar/DeadStoreElimination.cpp | ||
---|---|---|
238 | As far as I check, I didn't encounter any complain when injecting an IR without "target datalayout" into opt. Also there are many test cases without datalayout. It seems also possible to assign empty string like target datalayout="". Did the mandatory mean in IRs generated from the frontend? If DL.getLargestLegalIntTypeSizeInBits() returns 0, I think we should bail out with the assumption that we don't have proper information about backend. |
lib/Transforms/Scalar/DeadStoreElimination.cpp | ||
---|---|---|
238 | Having DL.getLargestLegalIntTypeSizeInBits() returning 0 would mean that someone specifically asked for not having int types supported at all. |
+Mehdi, who set the mandatory data layout.
Can you please clarify about the mandatory? The reason I'm asking is because I can still see many test cases without datalayout.
Having DL.getLargestLegalIntTypeSizeInBits() returning 0 would mean that someone specifically asked for not having int types supported at all. Handling this case means that LLVM acknowledges that it aims at supporting this case. Is it possible/desirable to support this? Otherwise an assertion would be more appropriate IMO.
If having DL.getLargestLegalIntTypeSizeInBits() returning 0 is undesirable, I think we should assert in getLargestLegalIntTypeSizeInBits().
http://llvm.org/docs/doxygen/html/Module_8cpp_source.html#l00375
"const DataLayout &Module::getDataLayout() const { return DL; }"
It returns a reference, so it is guaranteed to not be null.
"No data layout in the Textual IR means "use the default data layout" for the triple."
Thanks for clarifying this. So, we are guaranteed to DL even without datalayout in the textual IR. However, this doesn't mean that we can always get the non-zero from DL.getLargestLegalIntTypeSizeInBits(). As I test, we can use an empty datalayout in IR like : target datalayout="".
Wouldn't DL.getLargestLegalIntTypeSizeInBits() == 0 imply that you can have icmp or select? Hence almost nothing in llvm would work?
Although we can make DL.getLargestLegalIntTypeSizeInBits() return 0, I also doubt when this is meaningful. If this doesn't make sense, I will add assert in getLargestLegalIntTypeSizeInBits(). Can anyone give us any feedback if there is any situation where we need to consider getLargestLegalIntTypeSizeInBits() == 0 ? As far as I test, I didn't see any unit test failure with the assert in getLargestLegalIntTypeSizeInBits() == 0.
Hi Mehdi,
Do you expect DL.getLargestLegalIntTypeSizeInBits() always return non-zero?
However, using the test case added in this patch, we can easily make
DL.getLargestLegalIntTypeSizeInBits() return 0 whenever we remove
datalayout in the IR below. Please correct me if I miss something here.
; RUN: opt < %s -basicaa -dse -mtriple=arm64-linux-gnu -S
; target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
define void @test_32_8to15(i64* nocapture %P, i64 %n64) #0 {
entry:
%0 = bitcast i64* %P to i8*
call void @llvm.memset.p0i8.i64(i8* %0, i8 0, i64 32, i32 8, i1 false)
%arrayidx1 = getelementptr inbounds i64, i64* %P, i64 1 store i64 %n64,
i64* %arrayidx1 ret void }
Thanks,
Jun
- Bail out if DL.getLargestLegalIntTypeSizeInBits() returns 0, based on the assumption that we don't have proper information about backend.
- For getMaxStoresPerMemset(), pass Function as a parameter to allow targets to check whatever desirable attribute. The current default behavior is adoppted from shouldLowerMemFuncForSize() that is used when lowering memset in backend.
- Split the test case into two : one that check output of DSE and one that check stores lowered in backend.
- Add extra comments as Quentin pointed.
Please take a look.
Thanks,
Jun
Couple additional comments.
include/llvm/Analysis/TargetTransformInfo.h | ||
---|---|---|
465 | Is nullptr permitted for F? | |
include/llvm/Target/TargetLowering.h | ||
966 ↗ | (On Diff #59390) | Same question, nullptr OK or not? |
lib/CodeGen/TargetLoweringBase.cpp | ||
1648 ↗ | (On Diff #59390) | What the comment says is correct. However, I wonder if we want to have this difference here. |
test/Transforms/DeadStoreElimination/SplitMemintrinsic.ll | ||
8 | Don’t hard code %0. |
What about the MinSize attribute? (Oz)
Shouldn’t we just pass the Function (or its directly its list of attribute) and let the target checks whatever attribute it feels appropriate?