- User Since
- Nov 16 2012, 6:02 PM (264 w, 4 d)
A couple minor comments (feel free to address as you commit).
Could you use matchSelectPattern (from ValueTracking) here instead of matching the individual min/max patterns? That will also give us absolute value and the floating-point min/max as well. I think that you can hash_combine the fields of SelectPatternResult, and the logic might be simpler overall too (hopefully).
The current code generation is also odd on x86 (for the same reason). Returning the aggregate forces the values to be placed in specific GPRs. That makes the vectorization not worthwhile. Can you please also add this test into the X86 directory (running for x86, obviously) as well?
Is this something that RDF copy propagation would do for us as well? @kparzysz might know. FWIW, if this is something that RDF would help with, I'd rather see our efforts put there rather than recreating a less-powerful version of that infrastructure just for this specific case.
Mon, Dec 11
This looks fine, and I imagine we'll need to do this almost all intrinsics.
You have a bunch of variable names here with underscores, which is not our usual convention. BB1_chain -> BB1Chain, etc.
I think that we're on the same page. I'll also start reviewing the other patches.
Thu, Dec 7
I then thought about implementing a custom isel for SystemZ / INSERT_VECTOR_ELT, which seemed to be possible but tedious. It would still have to be proved that the INSERT_VECTOR_ELT was not dependent somehow on the prefetch node, which seems more doable here in a limited context.
Tue, Dec 5
Mon, Dec 4
I think that this is fine, as a work around, but I don't quite understand the problem based on this description. I'm thinking that either:
- Like on the P8 (and previous), there are some in-order resources in the decoder/dispatcher (which should be modeled as in-order resources).
- There are resources for which we're significantly underestimating the buffer size (the current P9 scheduling model only sets the BufferSize for BR (which is fine, but as we don't currently schedule branches, it's essentially documentation). Maybe we should be setting BufferSize for other things.
(or some combination of the two). Based on the description in the patch and in the patch description, it sounds like the latter, but I'm not sure.
I think this is a good idea.
Sat, Dec 2
Tue, Nov 28
I think this looks fine (and I agree that the underlying premise makes sense). Before you commit, could you also please add some tests with nested union members?
Tue, Nov 21
When Sema sees this code during compilation, it can not tell whether there is an error. Calling foo from the host code is perfectly valid. Calling it from device code is not. CUDADiagIfDeviceCode creates 'postponed' diagnostics which only gets emitted if we ever need to generate code for the function on device.
Thu, Nov 16
Nov 10 2017
Nov 9 2017
Nov 8 2017
Nov 7 2017
Nov 5 2017
Nov 3 2017
Nov 2 2017
So, with this change, what happens? Do the prefetches just all get chained together along with the last store (or other memory operation)?
Nov 1 2017
I'm happy; please go ahead when the other reviewers are too.
This looks pretty good.
Oct 31 2017
Oct 30 2017
Looks right to me. Regarding a helper, the comment is longer than the code, so you should name the helper so that you can move the comment too. Maybe getObjectOffsetWrapFlags or similar.
This is really interesting. So we're seeing a number of cases where, after instruction selection, we're doing something that makes operations have constant operands. Do you know what that is? Maybe one thing we could do with this is have it assert, instead of transforming the code, and then reduce the test cases in order to understand why this happens.
This logic certainly looks cleaner.
Oct 29 2017
Oct 27 2017
This is pretty neat.
First, should this logic be here or should it be in IndVarSimplify? I ask because there are transformations that can introduce new AddRecs into an existing loop without replacing the induction variable. Only the transformation using the SCEVExpander knows if it is planning to actually replace the induction variable.
Please upload diffs with full context.
I want to take a step back and discuss what problem this is trying to solve and how to best solve it. I understand that GCC has a flag for this (added in 4.7), but it is not clear to me that this is the best approach. Two thoughts:
Oct 26 2017
Oh, and please retitle this before you commit. You should talk about caller-preserved registers, not TLS.
Also, do we have a problem with prefetches in this regard too. For ISD::PREFETCH I see just this: