- User Since
- Nov 22 2013, 10:24 PM (263 w, 3 d)
Sep 12 2018
Fix comment as requested by review, fix a small bug found in more extensive testing,
add a test case for the latter.
Sep 9 2018
Aug 11 2018
Fix ir names in tests broken by previous commits and proplerly match u/smin.
I have a general concern against such changes. In fact, you are introducing an alternative way to express the same thing. umin(a, b) and umax(~a, ~b) are the same, but now have 2 possible notations. It means that whatever pass or analysis that needs to recognize this pattern needs to be aware of both. Whenever the new node was not supported, it might be missed optimization opportunities. And we cannot know for sure how many such places there are, or will be. What motivation do you have for making this change? Is it strong enough to take a risk of missing optimization opportunities that I've just pointed out?
Aug 5 2018
Fix a small bug discovered during production testing (
when expanding umin/umax, don't go to integers just because
types are unequal - the only reason to do is if one of the
operands in an integer and the other is not).
Aug 1 2018
Jul 30 2018
Jul 25 2018
Add a small fix found during testing that I forgot to commit.
May 22 2018
If my memory recovey is successful and I understand the problem correctly, the best solution to me is to separate the mutation and analysis code in StackProtector. Suppose we split it into two passes "StackProtectorAnalysis" and "InsertStackProtectorPrologue", will it solve the problem? During the re-run, we only run StackProtectorAnalysis, but never again insert prologue. StackProtectorAnalysis should be kept as-is, without more special logic to detect whether the prologue is already generated (as this patch does). What do you think?
May 17 2018
This was merged in rL322311, closing to get it off my dashboard. In the future, ideally add
Differential Revision: https://reviews.llvm.org/D41960
with that syntax to the commit message, so Phabricator can close the revision automatically.
May 16 2018
Remove incorrect comment from test case.
Add LLVM_DUMP_METHOD to the added dump() method.
May 12 2018
Updated to use getLoopLatches and add a test.
Yeah, I've been thinking about what the best way to test this is. It seems somewhat hard to reliably trigger
a verifier fault by removing an instruction - llvm.stackprotector is the one I see most commonly, but I don't
really want to make the test depend on that behavior. One option is to teach bugpoint to use a custom verifier
pass and then write a custom verifier pass that e.g. checks for the presence of a call. That seems like a decently
attractive option, because it would be a useful feature in its own right. Some frontends have their own invariants
and corresponding verifier passes that they want to maintain in their early passes. Would be nice if bugpoint could
support that. The only thing I'm not quite sure off is to how to do this nicely. Presumably, we could use the analysis
name as a pass and run that through the pass manager. I'm just not sure how to get the analysis result out of that.
May 11 2018
May 4 2018
Jan 31 2018
This happens fairly frequently for us in julia, because we use i8 to represent tags for tagged unions and then switch on them to discover which of the union components is active.
Without this patch I saw ~3x performance regressions in the worst benchmark (because it prevented other optimizations from seeing optimization opportunities based on the
active union component). I did not see compilation performance impact in my testing.
Jan 18 2018
Jan 11 2018
Looks good to me with the minor tweak to the test. To keep the breadcrumb for future reference, I had changed this in D33179.
Oct 26 2017
Oct 25 2017
error: cannot initialize return object of type 'void *' with an rvalue of type 'FILE *const *' (aka '_IO_FILE *const *') EXPLICIT_SYMBOL(stderr); ^~~~~~~~~~~~~~~~~~~~~~~
Sep 25 2017
Sep 19 2017
this is the julia pass pipeline (https://github.com/JuliaLang/julia/blob/master/src/jitlayers.cpp#L148). IIRC the original list of passes came from VMKit,
but the pass list was adjusted as needed over the years.
Sep 17 2017
Fix a small bug and also look through a single level of bitcasts.
Since IRBuilder automatically inserts bitcasts to i8*, it seems
prudent to handle that case.
Sep 15 2017
Jul 7 2017
The implementation of this looks good to me. I guess I'm not entirely sure why it's necessary for that type to be distinct in the first place, but that's probably a different discussion from just fixing this bug.
Jun 29 2017
Jun 28 2017
Factor out intersection code into a method of AAMDNodes
Jun 20 2017
Jun 13 2017
Jun 10 2017
Having pondered this some more, I wonder if what we're missing is an annotation on the addrspace cast itself that indicates whether or not GEPs may be commuted past it (could call it inbounds or something else). It seems like in many cases (including allocas). The frontend (or whoever else is doing language/target specific work) can often know whether the entire object is available in the target address space (e.g. because the entire stack always is).
LGTM, FWIW. I encountered this issue as well and this seems like the correct solution.
Jun 9 2017
Bump on review for this.
I'm gonna go ahead and commit this anyway, since this fixes a miscompile. If there are any comments I'll take them in post-commit review.
Jun 2 2017
Bump. @sanjoy does this look good to you?
Bump. Would be great if somebody could review this. I know Sink it's not really actively developed, but I think this is a pretty straightforward fix.
Jun 1 2017
Sure, see if you think this is the right way to do it. If so, mark it as accepted, and I'll go ahead and commit it. I feel like we've given other people enough time to weigh in ;).
I don't think that change is entirely necessary. I don't have any strong objections to it, but I also don't see a good reason to require it. In any case, let me get this in to be able to re-land D33655 and we can revisit the more disruptive change later.
There's already such a test case, but the cloning currently doesn't assert properly (but it can generate incorrect code). D33655 fixes that up, so I think the testing is covered once that LLVM commit goes in. I'll hold off a little while to give @aprantl a chance to look at this as well, but I do want to get this in in short order, so I can recommit D33655.
Finalize all subprograms when we're done emitting them.
The one we're interested in is a side effect, but doing
this uniformly might be cleaner and help avoid similar errors in the future.
Sure, I'll commit it.
May 31 2017
Handle absolute path case
Not sure what you're asking. llvm-config is certainly generally installed there, but --bindir is wrong without this patch, so people using that will be confused (at least that was the case when I wrote this patch - it's been a while).
I don't think there's really a natural place in DIBuilder to do this, but
I suspect that most frontends would have one.
Will you only invoke it on functions to be cloned in clang or on all subprograms?
May 30 2017
FYI, reverted due to buildbot failure. Will investigate later today.
May 29 2017
Well, note really, but that's sort of separate from this revision. A more accurate (though IMO still slightly confusing) description is the one that's on the flag:
/// If this flag is set, the remapper knows that only local values within a /// function (such as an instruction or argument) are mapped, not global /// values like functions and global metadata. RF_NoModuleLevelChanges = 1,
May 28 2017
I'm really confused by the entirety of the changes to the CloneFunction parts of this. The whole point of the ValueMapper is to fix up value/metadata references. Why do we need a manual version? Also what's up with the DISubprogram::getDistinct? The ValueMapper will create new distinct nodes as it goes along unless RF_MoveDistinctMDs is set (which it isn't inside CloneFunction). Could the problem perhaps have been an incorrectly unset ModuleLevelChanges flag? I'm seeing a ton of assertion failures after this change. I'll submit an appropriate PR to revert the CloneFunction parts of this.
May 27 2017
Thanks! I'm a little confused why the compiler is complaining there - I see this pattern used all over the place for 1-element array construction, but your fix is of course fine.
May 22 2017
Add a C++ test to make sure the SCEV code path is tested
May 20 2017
May 19 2017
Bump. Can somebody review this, so we can get this in. @arsenm maybe?
Please see the discussion in D31924 - it's not clear that instcombine is allowed to introduce addrspacecasts that weren't in the original program. I know there's a couple cases like that in the code base already, but no need to add more.
May 17 2017
Changed the patch to do the argument re-arranging in the caller instead. I think
that's more localized to just this non-integral pointer change.
May 16 2017
May 15 2017
Do you mean non-integral pointers?
May 14 2017
Was there a decision reached here on what the correct semantics are? There are other places in LLVM (I found one in instcombine - there may be others) which do make the assumption that this change is proposing to introduce to the langref. Personally, I don't think this transformation should be allowed. I know there are architectures where different address spaces have different GEP behavior (though I'm not sure if this is the case for any in-tree backend). Nevertheless, if people do feel like this should be allowed (e.g. because such architectures should use something other than addrspacecast to convert between such address spaces), that's fine with me as well, but I think there should a clear statement in the langref on way or the other. As is, different people read the langref differently.
May 12 2017
May 11 2017
May 9 2017
Bump. Does this look good?
Apr 27 2017
Add some CHECK lines to the test case
Apr 26 2017
Apr 17 2017
Run clang-format over the diff
Now with testcases for the extra cases I discovered.
Apr 12 2017
There were a couple more of instances of the same pattern in this code.
I'll write some tests for all these cases next week (famous last words).
Apr 11 2017
Apr 10 2017
Updated to include the test for DIFinder
Apr 6 2017
I wasn't suggesting we blindly trust the user on shuffles, but that we somehow distinguish between masks generated by some earlier LLVM pass and masks passed in by the user. For the former I feel like it would make a lot of sense to be more aggressive about folding them than we are now, which for the latter the user may have used some target specific knowledge, so we may want to be less aggressive. I'm also fine with a TTI hook, happy to write the code. Somebody has to make a call here, and I don't think I'm the best person to do so - I just want all those redundant shuffles out of my code ;).
Address review comments
I'm a bit late to the party here, but I'm facing similar problems, so I'm interested in a clean solution. I wonder though if what we want to express isn't some sort of "type-based dereferencability annotations". For example the semantics I care about are essentially, "if you know you have a defereferencable pointer, you can go and dereference any other valid (managed) pointers the pointed to data references (recursively)". This has to be true for me, because the GC walks all pointers recursively that way. Of course the problem with this is that the compiler doesn't know which part of the referenced data are valid pointers for this purpose (and it can't just be based on the struct type, because we do store pointers to unmanaged data). So if we had a way to express to the compiler "here are the pointers in this struct that you may assume dereferencable", that would work very well for me. Would that solve your problem as well?
Apr 5 2017
Apr 4 2017
If a TTI hook is not the right way to go here, maybe we can find a way to actually express what we want here, maybe via metadata or some other specification on shufflevector that would tell the optimizer not to touch the shufflemask unless it's confident it can improve upon it. As it is, I see my CPU with a perfectly good shuffle instruction shuffle the same vector three times, because LLVM refuses to combine these shuffles - pretty embarrassing ;).
Hmm, unless I did something wrong, the backend generates equivalent code whether or not we merge the shuffles. Or did you mean that we could do a better job with the three IR instruction form even though we currently don't?
Can you try https://reviews.llvm.org/D31681? I wasn't able to reproduce the problem with your example, it kept crashing other parts of the compiler ;).