- User Since
- Nov 22 2013, 10:24 PM (225 w, 6 d)
Jan 31 2018
This happens fairly frequently for us in julia, because we use i8 to represent tags for tagged unions and then switch on them to discover which of the union components is active.
Without this patch I saw ~3x performance regressions in the worst benchmark (because it prevented other optimizations from seeing optimization opportunities based on the
active union component). I did not see compilation performance impact in my testing.
Jan 18 2018
Jan 11 2018
Looks good to me with the minor tweak to the test. To keep the breadcrumb for future reference, I had changed this in D33179.
Oct 26 2017
Oct 25 2017
error: cannot initialize return object of type 'void *' with an rvalue of type 'FILE *const *' (aka '_IO_FILE *const *') EXPLICIT_SYMBOL(stderr); ^~~~~~~~~~~~~~~~~~~~~~~
Sep 25 2017
Sep 19 2017
this is the julia pass pipeline (https://github.com/JuliaLang/julia/blob/master/src/jitlayers.cpp#L148). IIRC the original list of passes came from VMKit,
but the pass list was adjusted as needed over the years.
Sep 17 2017
Fix a small bug and also look through a single level of bitcasts.
Since IRBuilder automatically inserts bitcasts to i8*, it seems
prudent to handle that case.
Sep 15 2017
Jul 7 2017
The implementation of this looks good to me. I guess I'm not entirely sure why it's necessary for that type to be distinct in the first place, but that's probably a different discussion from just fixing this bug.
Jun 29 2017
Jun 28 2017
Factor out intersection code into a method of AAMDNodes
Jun 20 2017
Jun 13 2017
Jun 10 2017
Having pondered this some more, I wonder if what we're missing is an annotation on the addrspace cast itself that indicates whether or not GEPs may be commuted past it (could call it inbounds or something else). It seems like in many cases (including allocas). The frontend (or whoever else is doing language/target specific work) can often know whether the entire object is available in the target address space (e.g. because the entire stack always is).
LGTM, FWIW. I encountered this issue as well and this seems like the correct solution.
Jun 9 2017
Bump on review for this.
I'm gonna go ahead and commit this anyway, since this fixes a miscompile. If there are any comments I'll take them in post-commit review.
Jun 2 2017
Bump. @sanjoy does this look good to you?
Bump. Would be great if somebody could review this. I know Sink it's not really actively developed, but I think this is a pretty straightforward fix.
Jun 1 2017
Sure, see if you think this is the right way to do it. If so, mark it as accepted, and I'll go ahead and commit it. I feel like we've given other people enough time to weigh in ;).
I don't think that change is entirely necessary. I don't have any strong objections to it, but I also don't see a good reason to require it. In any case, let me get this in to be able to re-land D33655 and we can revisit the more disruptive change later.
There's already such a test case, but the cloning currently doesn't assert properly (but it can generate incorrect code). D33655 fixes that up, so I think the testing is covered once that LLVM commit goes in. I'll hold off a little while to give @aprantl a chance to look at this as well, but I do want to get this in in short order, so I can recommit D33655.
Finalize all subprograms when we're done emitting them.
The one we're interested in is a side effect, but doing
this uniformly might be cleaner and help avoid similar errors in the future.
Sure, I'll commit it.
May 31 2017
Handle absolute path case
Not sure what you're asking. llvm-config is certainly generally installed there, but --bindir is wrong without this patch, so people using that will be confused (at least that was the case when I wrote this patch - it's been a while).
I don't think there's really a natural place in DIBuilder to do this, but
I suspect that most frontends would have one.
Will you only invoke it on functions to be cloned in clang or on all subprograms?
May 30 2017
FYI, reverted due to buildbot failure. Will investigate later today.
May 29 2017
Well, note really, but that's sort of separate from this revision. A more accurate (though IMO still slightly confusing) description is the one that's on the flag:
/// If this flag is set, the remapper knows that only local values within a /// function (such as an instruction or argument) are mapped, not global /// values like functions and global metadata. RF_NoModuleLevelChanges = 1,
May 28 2017
I'm really confused by the entirety of the changes to the CloneFunction parts of this. The whole point of the ValueMapper is to fix up value/metadata references. Why do we need a manual version? Also what's up with the DISubprogram::getDistinct? The ValueMapper will create new distinct nodes as it goes along unless RF_MoveDistinctMDs is set (which it isn't inside CloneFunction). Could the problem perhaps have been an incorrectly unset ModuleLevelChanges flag? I'm seeing a ton of assertion failures after this change. I'll submit an appropriate PR to revert the CloneFunction parts of this.
May 27 2017
Thanks! I'm a little confused why the compiler is complaining there - I see this pattern used all over the place for 1-element array construction, but your fix is of course fine.
May 22 2017
Add a C++ test to make sure the SCEV code path is tested
May 20 2017
May 19 2017
Bump. Can somebody review this, so we can get this in. @arsenm maybe?
Please see the discussion in D31924 - it's not clear that instcombine is allowed to introduce addrspacecasts that weren't in the original program. I know there's a couple cases like that in the code base already, but no need to add more.
May 17 2017
Changed the patch to do the argument re-arranging in the caller instead. I think
that's more localized to just this non-integral pointer change.
May 16 2017
May 15 2017
Do you mean non-integral pointers?
May 14 2017
Was there a decision reached here on what the correct semantics are? There are other places in LLVM (I found one in instcombine - there may be others) which do make the assumption that this change is proposing to introduce to the langref. Personally, I don't think this transformation should be allowed. I know there are architectures where different address spaces have different GEP behavior (though I'm not sure if this is the case for any in-tree backend). Nevertheless, if people do feel like this should be allowed (e.g. because such architectures should use something other than addrspacecast to convert between such address spaces), that's fine with me as well, but I think there should a clear statement in the langref on way or the other. As is, different people read the langref differently.
May 12 2017
May 11 2017
May 9 2017
Bump. Does this look good?
Apr 27 2017
Add some CHECK lines to the test case
Apr 26 2017
Apr 17 2017
Run clang-format over the diff
Now with testcases for the extra cases I discovered.
Apr 12 2017
There were a couple more of instances of the same pattern in this code.
I'll write some tests for all these cases next week (famous last words).
Apr 11 2017
Apr 10 2017
Updated to include the test for DIFinder
Apr 6 2017
I wasn't suggesting we blindly trust the user on shuffles, but that we somehow distinguish between masks generated by some earlier LLVM pass and masks passed in by the user. For the former I feel like it would make a lot of sense to be more aggressive about folding them than we are now, which for the latter the user may have used some target specific knowledge, so we may want to be less aggressive. I'm also fine with a TTI hook, happy to write the code. Somebody has to make a call here, and I don't think I'm the best person to do so - I just want all those redundant shuffles out of my code ;).
Address review comments
I'm a bit late to the party here, but I'm facing similar problems, so I'm interested in a clean solution. I wonder though if what we want to express isn't some sort of "type-based dereferencability annotations". For example the semantics I care about are essentially, "if you know you have a defereferencable pointer, you can go and dereference any other valid (managed) pointers the pointed to data references (recursively)". This has to be true for me, because the GC walks all pointers recursively that way. Of course the problem with this is that the compiler doesn't know which part of the referenced data are valid pointers for this purpose (and it can't just be based on the struct type, because we do store pointers to unmanaged data). So if we had a way to express to the compiler "here are the pointers in this struct that you may assume dereferencable", that would work very well for me. Would that solve your problem as well?
Apr 5 2017
Apr 4 2017
If a TTI hook is not the right way to go here, maybe we can find a way to actually express what we want here, maybe via metadata or some other specification on shufflevector that would tell the optimizer not to touch the shufflemask unless it's confident it can improve upon it. As it is, I see my CPU with a perfectly good shuffle instruction shuffle the same vector three times, because LLVM refuses to combine these shuffles - pretty embarrassing ;).
Hmm, unless I did something wrong, the backend generates equivalent code whether or not we merge the shuffles. Or did you mean that we could do a better job with the three IR instruction form even though we currently don't?
Can you try https://reviews.llvm.org/D31681? I wasn't able to reproduce the problem with your example, it kept crashing other parts of the compiler ;).
The shuffles in my test case are equivalent to vector concatenation, but I have a local test case which is more complicated. Do you have a specific example where the backend would generate better code with having the middle end generate code of the form:
v1 = %shufflevector %v, undef, mask1 v2 = %shufflevector %v, undef, mask2 %shufflevector %v1, v2, mask3
%shufflevector %v, undef, mask4
I would have hoped that just lowering a shuffle of one vector would have sufficiently improved in the backend by now. You also mentioned this was a limitation for large vector types. Does it make sense to have a TTI hook to determine whether the target can efficiently lower such shuffles?
Apr 3 2017
Mar 31 2017
The corresponding discussion for the original patch was on llvm-commits: http://marc.info/?t=136735490000006&r=1&w=2
Mar 30 2017
Mar 7 2017
Mar 3 2017
Feb 20 2017
Ok, I've split everything but the register re-picking into D30173 and D30177. Let's get those reviewed and once that's done, I'll rebase this with just the register re-picking code. I hope I've also addressed the review comments for the relevant parts in those revisions.
Feb 17 2017
I'll split up this review further and let's go from there.
Feb 15 2017
Feb 9 2017
There's only really two major changes here:
- Updating the registers at the end
- isDependencyBreak support
Feb 8 2017
Jan 31 2017
Jan 19 2017
Sorry to make you re-review this, but the solution I had put up previously turned out to be insufficient to address all the regressions we were seeing on benchmarks. Combining your previous changes with the two pending revisions, we're seeing 2-3x improvements on a number of our benchmarks and no regressions! As requested, I split up the revision to pull out the vxorps insertion optimization into a separate revision.
Split out the optimization part into D28915