- User Since
- Jul 30 2013, 7:58 PM (259 w, 1 d)
There's a shift uop in the BTR/BTS/BTC memory flow so thats port06 restricted. And the bit test uop should also be port 06 restricted just like the register form.
Tue, Jul 17
Mon, Jul 16
Sun, Jul 15
Which microarchitecture cares about switching PD/PS? To my knowledge, no Intel architecture cares. Do any of the AMD architectures care?
Fri, Jul 13
I think this may have caused PR38125
It's 2 patterns repeated 3 times. And as of yesterday these patterns are all qualified with "optsize || !sse41) and we have new patterns to select blend under (sse41 & !optsize). This fixes a long standing issue where we turned movss/sd into blend later in the pipeline through an accidental double call to commuteInstruction. So with that fixed its now 12 patterns that can be removed. I need to rebase this patch.
Though JMP_1 has a one byte offset and will get relaxed to JMP_4 by the assembler if the offset isn't in the range [128,-127]. TAILJMPd64 always encodes to the same encoding as JMP_4 and doesn't go through relaxation in the assembler.
The domain switching isn't really intentiional. The code I removed from X86ISelLowering was removing some bitcasts that made this optimize better.
This seems right to me. I assume from the use of CALL64pcrel32 in the other branch that outlining is 64-bit only?
Thu, Jul 12
Why does this need the 'r' at the end of WriteBTr? By existing convention the memory form would be called WriteBTLd. So WriteBT should be interpreted as non-memory anyway.
Wed, Jul 11
Tue, Jul 10
Mon, Jul 9
What happens with cttz when -mattr=bmi which enables the tzcnt instruction
Sun, Jul 8
Remove a change that snuck into my last update.
Remove bad comment
Judging by the code in SystemZISelDAGToDAG.cpp that calls UpdateNodeOperands during select, I don't think it works the way I thought. The SystemZ code calls ReplaceNode if UpdateNodeOperands does CSE. I tried to change that to just assign over Node. Which would make it similar to what I'm trying to do here. But it ended up leaving a node unselected.
@niravd, let me make sure I understand what will happen in the CSE case. We'll walk the isel matching table using the CSE node, but at the end of the match we'll call MorphNodeTo using the original node that is still held in NodeToMatch in the target independent isel code? Then later we'll revisit the CSE node when its turn in the topological order comes up?
Rebase and capture the value returned form UpdateNodeOperands.
Without an And the target independent node wouldn't allow this. I thought about trying to insert an And into the DAG during DAG combine which would make simplify demanded bits trigger. But it breaks recognizition of (x << (32 - n)) >> (32 - n)) for bzhi. It would become (x << (-n & 0x1f)) >> (-n & 0x1f) and now we don't know if 'n' is safe to feed directly to bzhi.
Sat, Jul 7
We need tests cases that use tzcnt too.
Fri, Jul 6
I definitely believe these instructions would be different than single shifts. LGTM
Thu, Jul 5
Should v32i8 do the same or is there something different about it?
Rebase and full context
Wed, Jul 4
Looks like X86ISD::SHUF128 isn't recognized as a target shuffle. That explains some of the regressions here. I'll submit a separate patch for that and we can rebase this one.
I forget to mention, I tried to write the code in a way that it should be somewhat straightforward to extend to 512 bits with 128 bit lanes.
Replace the contents of lowerVectorShuffleByMerging128BitLanes. Now we try to form 2 lane correcting shuffles and a repeated shuffle mask.
This specifically looks for (and (shl -1, X), Y) or (and (shr -1, X), Y). If Y in either of those is a constant, I think we would have rewritten the LHS side of the shift already and removed the 'and' entirely.
Tue, Jul 3
On Intel hardware, in register renaming it basically rewrites the input register to an internal all zeros register. The instruction still needs to execute and now compares zero with zero and gets all ones. But by rewriting the input register to all zeros instead of the real input, the instruction doesn't need to wait for the previous writer of the input register to execute.
Mon, Jul 2
There are actually two selection DAGs for that test. I'm not sure why yet. The final schedule is printed separately for each DAG.
-Added a negative test
-Hopefully fixed all the grammatical/spelling errors.
-Attempted to clarify some more about prefer-vector-width and builtins.
@niravd would it be better to just do the full manual isel instead of trying to do it this way? Selecting the negate is easy. It just the variations of the shift that kind of ugly due to legacy vs BMI2 and BMI2 allowing load folding.
-Rebase the intrinsic headers and builtins file
-Add documentation for the attribute. Open to feedback on improvements here
-Add tests for wrong number of arguments to the attribute.
Sun, Jul 1
Sat, Jun 30
Sorry for the archaelogy here.
Fri, Jun 29
Place the new nodes before the original shift amount in the topological order of the DAG. This ensures that all shifts that had the same shift amount will be selected and run this transform before any of the new nodes are selected. The selection DAG should CSE all the new nodes together for multiple shifts.
That's what i meant. I think I didn't understand this statement "It will still need to produce two instructions (which is safe, since there is one-use check on the mask)."
If you do the one use check in the DAG combine to qualify the reverse, then we can use existing isel patterns to select each shift individually right?
Is it possible to reverse this as a DAG combine? I'd rather not produce 2 instructions from one pattern if we can avoid it.
I'll run our benchmark list. This was more of an observation that we were different than icc, gcc, and msvc.
I need to look at this in more detail, but does this handle the case where the flags are still live past the cmov? Moving the add down would break that.
Thu, Jun 28
@greened, did you forget to Accept when you wrote LGTM or should I wait for another reviewer?
Yeah I think the other patches took care of my perf regression. I'm fine to abandon this if you don't think the PowerPC changes are worthwhile.
@spatel, what are you thoughts on this?