This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] isel (add (and X, 0x1FFFFFFFE), Y) as (SH1ADD (SRLI X, 1), Y)
ClosedPublic

Authored by craig.topper on May 27 2022, 10:48 PM.

Details

Summary

This pattern is what we get after DAG combine for C code like this.

short *ptr1, *ptr2, *ptr3;
unsigned diff = ptr1 - ptr2;
return ptr3[diff];

Diff Detail

Event Timeline

craig.topper created this revision.May 27 2022, 10:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2022, 10:48 PM
craig.topper requested review of this revision.May 27 2022, 10:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2022, 10:48 PM
reames accepted this revision.May 28 2022, 11:40 AM

LGTM as is.

And a thought for you; is there a more generic form of this?

Instead of going straight to shNadd, could we use zext.w instead? Something along the lines of:
srli a0, a0, N mask low
zext.w a0, a0
mask high
slli a0, a0, N restore position
add a0, a0, a1
add other value

The first three seem like a generic pattern for and X, C is any contiguous 32 bit mask.

From this form, we could then recognize the shNadd from the last three right?

Not sure if this works, and if it does, not a required follow up. Just an idea for you to think about.

This revision is now accepted and ready to land.May 28 2022, 11:40 AM

LGTM as is.

And a thought for you; is there a more generic form of this?

Instead of going straight to shNadd, could we use zext.w instead? Something along the lines of:
srli a0, a0, N mask low
zext.w a0, a0
mask high
slli a0, a0, N restore position
add a0, a0, a1
add other value

The first three seem like a generic pattern for and X, C is any contiguous 32 bit mask.

From this form, we could then recognize the shNadd from the last three right?

Not sure if this works, and if it does, not a required follow up. Just an idea for you to think about.

Where would we recognize the shNadd.uw? Post process peephole? Machine IR?

I had thought about trying to match the AND and 32-bit mask to srli+slli.uw later in MachineIR after LICM in case the mask can be pulled out of a loop. That would leave only an AND in the loop which would be cheaper than 2 instructions. The srli+shNadd.uw I've done here seemed like it was always profitable even in a loop.

This revision was landed with ongoing or failed builds.May 29 2022, 6:40 PM
This revision was automatically updated to reflect the committed changes.

LGTM as is.

And a thought for you; is there a more generic form of this?

Instead of going straight to shNadd, could we use zext.w instead? Something along the lines of:
srli a0, a0, N mask low
zext.w a0, a0
mask high
slli a0, a0, N restore position
add a0, a0, a1
add other value

The first three seem like a generic pattern for and X, C is any contiguous 32 bit mask.

From this form, we could then recognize the shNadd from the last three right?

Not sure if this works, and if it does, not a required follow up. Just an idea for you to think about.

Where would we recognize the shNadd.uw? Post process peephole? Machine IR?

Shouldn't DAGCombine be able to do this? (Might be wrong here, still recent to this part of things.)

I had thought about trying to match the AND and 32-bit mask to srli+slli.uw later in MachineIR after LICM in case the mask can be pulled out of a loop. That would leave only an AND in the loop which would be cheaper than 2 instructions. The srli+shNadd.uw I've done here seemed like it was always profitable even in a loop.

I see your point here. I think I agree.