User Details
- User Since
- Aug 12 2016, 3:40 PM (354 w, 5 d)
Dec 19 2022
Should be ready for re-review now.
Updated per comments. The optimization is now enabled by default and tests have been updated accordingly.
Dec 15 2022
I know this is a big patch. Please feel free to let me know if it's too much to review and I can hand it to someone else. Thanks for your time :)
Nov 7 2022
Nov 5 2022
As an alternative, I wonder if we could teach SROA to do a form of argument promotion for nocapture noalias readonly dereferenceable aligned arguments. If the only thing blocking an alloca from SROA is being passed into a function by nocapture noalias readonly dereferenceable aligned pointer, insert a new alloca and memcpy to "reform" the structure right before the call, change the call to pass a pointer to the new alloca containing the "reformed" structure, and then the original alloca becomes SROAable. This could avoid all those ugly spills and reloads in the drop_in_place function bodies... assuming we could come up with some kind of heuristic to know when it's worth inserting copies to perform SROA in these instances.
Looking at this again, there are still quite a few spills that turn into effectively memcpys, so this isn't a panacea, but I'll take a ~25% stack size win. In any case turning the fields into SSA values will probably be a prerequisite for further optimizations.
No argument promotion: https://gist.github.com/pcwalton/38ff053dd1973ca383b85183e13de0fc
Oct 31 2022
Oct 30 2022
Failure seems like an unrelated Windows Fortran build error.
Rebased to make sure CI succeeds.
Address review comments.
Rebased. This should be ready for sign off now.
Rebased. This should be ready for sign off now.
Oct 29 2022
I'm unsure if this needs review, but I thought I would ask to be safe.
I split the tests out to D137033.
I noticed that the memcpy in the asan test case is volatile without any changes, so I shouldn't need to touch that test case now.
Wasn't sure if this needed review, but I wanted to be safe.
OK, I pushed the baseline test changes. @nikic Let me know if this looks good and I can push this.
Addresses comments.
Rebased on top of updated D136993.
Addressed review comments.
Addressed review comments.
Oct 28 2022
Review comments addressed. I tried to fix the asan test as well; let me know if that approach looks good.
Addressed comments. Note that D136993 contains the diff this is now based on top of.
I'm not sure if this needs review or not, but I thought I'd ask to be safe.
Oct 27 2022
Updated to address review comments.
Addressed review comments.
Oct 26 2022
Fix AMDGPU alias analysis and new test failure.
Fix AMDGPU alias analysis.
Should be ready for review again. I'll look at the failing test cases in the meantime.
Merge pointsToConstantMemory() into getModRefMask().
Oct 25 2022
Oct 24 2022
This is great! A couple of things I noticed on my test case:
Jun 24 2022
I verified that the failing tests don't break locally.
Jun 23 2022
The test failures in libomp don't look related to me.
Jun 17 2022
This should be ready for review now.
So this was a fun one. The previous version of this patch accidentally caused a
pre-existing bug to surface and break Chromium. In certain circumstances
including the Chromium test case, available_externally symbols would cause
invalid lookups in the SymSize table. Before this patch, the lookups were
done with the [] operator, which would succeed but cause an unspecified value
to be emitted in the .debug_aranges table. But this patch moves those lookups
to be done by dereferencing the result of the .find() method, which causes
asserts if the key is not found.
Jun 16 2022
Minimal reproduction:
Jun 8 2022
Updated the diff to address comments.
Jun 7 2022
Jun 2 2022
Test failures look unrelated?
Jun 1 2022
Abandoned in favor of D126835.
May 31 2022
I see. I don't have commit access unfortunately.
Sorry for the delay. I was looking at this on Friday but didn't get around to finishing it. Feel free to revert in the meantime.
May 25 2022
Closing in favor of D126257.
Oh, what I mean is that our choices in the case in which zero-sized global symbols are forbidden at the IR level would be (1) don't emit source-level zero-sized symbols into LLVM IR (or don't emit DWARF metadata for them), or (2) round all zero-sized symbols up to one byte. If we pick (1), I don't think they would show up in the debug info, which could be confusing for programmers. If we pick (2), then certain abstractions which are zero-cost today in Rust become non-zero-cost.
May 24 2022
Wouldn't that mean that those zero-sized data symbols won't show up in the debugger anymore? That might be confusing for Rust users, who do use zero-sized globals reasonably commonly (to attach methods to).
How's this?
Emit a constant 1 instead of a more complicated MCExpr when emitting symbols of length 1.
May 23 2022
@dblaikie Here's a new version of the patch that takes the alternate approach you suggested of rounding lengths up to 1 byte. Feel free to take either diff and I'll close the other.