Page MenuHomePhabricator

[X86] Move matching of (and (srl/sra, C), (1<<C) - 1) to BEXTR/BEXTRI instruction to custom isel

Authored by craig.topper on Sep 7 2017, 1:52 PM.



Recognizing this pattern during DAG combine hides information about the 'and' and the shift from other combines. I think it should be recognized at isel so its as late as possible. But it can't be done with table based isel because you need to be able to look at both immediates. This patch moves it to custom isel in X86ISelDAGToDAG.cpp.

This does break a couple tests in tbm_patterns because we are now emitting an and_flag node or (cmp and, 0) that we dont' recognize yet. We already had this problem for several other TBM patterns so I think this fine and we can address of them together.

I've also fixed a bug where the combine to BEXTR was preventing us from using a trick of zero extending AH to handle extracts of bits 15:8. We might still want to use BEXTR if it enables load folding. But honestly I hope we narrowed the load instead before got to isel.

I think we should probably also support matching BEXTR from (srl/srl (and mask << C), C). But that should be a different patch.

Diff Detail


Event Timeline

craig.topper created this revision.Sep 7 2017, 1:52 PM
aymanmus added inline comments.
2129 ↗(On Diff #114248)

Maybe add an assert to guarantee an 'AND' Node.

2190 ↗(On Diff #114248)

Why do you manually handle the memory folding here instead of letting the regular mechanism take care of that?

2272 ↗(On Diff #114248)

remove extra line

craig.topper added inline comments.Sep 11 2017, 9:15 AM
2190 ↗(On Diff #114248)

The regular mechanism is tablegen patterns and the isel table. But we're bypassing all of that here.

We could be lazy here and let the peephole pass that can also fold memory using the fold folding tables take care of it. But we normally try to fold as much as possible at isel.

aymanmus added inline comments.Sep 12 2017, 2:19 AM
2190 ↗(On Diff #114248)

Sorry if I'm being picky about that, but I can't understand the reason behind this.
In this case there is no special handling of memory folding (e.g. no special condition for applying/not applying the transformation), so why would we prefer to handle some of the cases here, some via tablegen patterns and the rest with the folding tables?

Wouldn't it be easier and more well-designed if the transformation would be completely controlled by one "owner" (lets say folding tables) and only exceptions are handled here or through tablegen patterns?

Normal loads are usually folded by isel patterns
Stack reloads created by the register allocator are folded using the load folding tables. Since these loads don't exist during isel we can't fold them there.
The peephole pass can also fold loads using the loading fold tables. At one point in time this was definitely weaker than the isel mechanism even ignoring the missing instructions in the isel table. I'm not sure what the status is now.

In this case I'd really like to be able to write an isel pattern for this instruction but I can't because there's no way to check the relationship of the two immediates. So I'm supplying custom code at the time of isel. If we were able to write the isel pattern we'd fold the load at that time. So I'm folding the load manually so that it happens during isel like it would if I could write the pattern. This is similar to what we do for integer MUL and DIV instructions that we also can't handle with isel patterns and do manually.

aymanmus accepted this revision.Sep 12 2017, 9:00 AM

Thanks for the answer.

This revision is now accepted and ready to land.Sep 12 2017, 9:00 AM
This revision was automatically updated to reflect the committed changes.