- User Since
- Sep 22 2017, 3:13 PM (56 w, 2 d)
Fri, Oct 19
Tue, Oct 16
No custom decoding is required
Fixed the tests as to check boundaries, and one value in the middle.
Changed tests ordering as they generated empty lines not treated by FileCheck.
To send patch....
Thu, Oct 11
Removed processor and specific triple target from test.
Tue, Oct 2
Updated test to use llvm-mc instead of clang.
Simplified tests. Using the standard function for emitting data symbols, if required.
Mon, Oct 1
Sep 12 2018
Sep 7 2018
Sep 6 2018
Fix test march triple.
Sep 5 2018
Aug 14 2018
Fixed comment explaining the elaborated growth cost-function.
Using FileCheck in test file.
Aug 13 2018
Aug 8 2018
Aug 7 2018
Added missing test-file
Aug 3 2018
I agree that it won't handle all cases. But one will need to come with a more generic thinking as to create a new pass that handles this. Something like an abstract known bits, that tells that two values hold the same bits coming from a given instruction, or some simplification by demanded bits from the same values. It is feasible, but it is not my intention to do it so now.
Did you look at (new)-GVN to see if it fits in there?
I must confess that I did not quite understand all the work-flow of newGVN, but from what I did see, it mostly wouldn't fit. It seems to behave like InstCombine, expecting to replace the current instruction being visited. And it would require to create one value as to detect if there is a leader of that value and then reuse it. It is not that complicated, but quite awkward IMO.
If this is in instcombine (in addition to missing the pattern when there is no 'or'), I think you have to limit the transform based on uses as Roman mentioned in an earlier comment.
Aug 2 2018
Jul 30 2018
Added test that the inner and must be replaced by the new shift operation.
Converted the function to bool, as it does not require to create the Or operation after the replaceAll.
As we are replacing all the uses of the DeadAnd, it is not required to create and replace the visiting Or operation. Replace all uses does the job already.
Replaces all uses of the innermost and with the new shift.
Jul 25 2018
Relaxed conditions for which the transformation is applied.
Added more tests for ashr.
Jul 24 2018
Removed unused arguments.
All required values are obtained during the pattern matching.
Moved to a separated function. Placed function call after knowing more about the operands.
Added ashr case, that was being wrongly treated as lshr.
Added comments, including one that argues that this function would be useless if and instructions are move before any type of shift operations.
Using m_c_Or, and passing operands as arguments.
Did most fixes. Just don't how to capture non-leaf nodes of the pattern being matched. Using other match operations would actually be more complicated than just passing the operands as arguments to the new function, now that I already know they are AND operations due the function call placement.
Jul 20 2018
Detect desired pattern from the binary operation using the results.
Sorry, I thought that it being an entire function it wouldn't matter. Lesson learned.
Jul 17 2018
Jul 16 2018
Using hasNUses() instead of numUses() ==
Jul 14 2018
Replaced num_uses by !hasNUsesOfValue as requested.
Jul 13 2018
Moved test to correct folder
Only accepts instructions with 2 uses (AND / SHIFT operations). So that looping through the uses is not expensive, and we avoid it in most cases.
Removed recursive bit value computations(computeKnownBits).
Fixed execution costs. See below.
Constrain to allow the transformation to happen only when the masked value has only 2 users (an AND and a SHIFT).
Removed value tracking operations.
Jul 12 2018
Jul 11 2018
Jul 9 2018
Ah, I see that now. But I'm not convinced this is the right approach. Why are we waiting to optimize this in the backend? This is a universally good optimization, so it should be in IR:
Agree, I also intend to implement this transformation in the IR. But there are cases that this is only seen after some instructions have been combined in the dag, so why not here also? And indeed, it is a requirement for a future patch that detects opportunities to reduce load and store widths.
Added checks for shift opcodes as to early exit if not found.
Validate mask widths (although it would be a error in the code if they are of different).
Sorry about that.
Beyond that, I don't understand the motivation. The patch increases the latency of a computation. Why is that beneficial? The x86 diff doesn't look like a win to me.
It reduces the number of computation operations, from 3 to 2, and the number of constants kept for performing the masking, from 2 to 1.
I don't see how it increases the latency. If you are going to perform the masking and the shift anyway.
I don't know what the ARM/AArch output looked like before this patch. Always check in the baseline tests before making a code change, so we have that as a reference (and in case the patch is reverted, we won't lose the test coverage that motivated the code patch).
Jul 6 2018
One bot is failing in the test CodeGen/AArch64/FoldRedundantShiftedMasking.ll
I'm trying to figure out why as it doesn't happen with my debug or release builds.
Jul 5 2018
Give patch more context.
AArch64 tests go into the folder AArch64
Jul 4 2018
Replaced tests as to be not dependent in the load width reduction.
Added 1 positive test per case, and 2 negative tests, one where one mask is not a constant and other the shifted amount is not constant.
Seems good to me.
The patch fixes the crash, but I would like to know if the desired updates aren't already done, just needing to copy paste them?
Jul 3 2018
Please add the new intrinsics to the target specific combine function of VLDUP NEON load/store intrinsics
ARMISelLowering.cpp, line 11477. The switch dies on llvm_unreachable.
Jun 27 2018
Removed flag as requested.