This is an archive of the discontinued LLVM Phabricator instance.

[AARCH64][NEON] Reuse extended vdup value in low version of long operations when doing tryCombineLongOpWithDup
Needs ReviewPublic

Authored by sunho on Jan 30 2022, 5:31 AM.

Details

Reviewers
fhahn
dmgreen
Summary

Fixes https://github.com/llvm/llvm-project/issues/53261

tryCombineLongOpWithDup is a pass in Selection DAG combine stage that extends vdup vector to double sized vector so that patterns of high version of long operations (such as umull2, sabal2) can be used. The problem occurs when the original vdup vector is also used in low version of long operation. High version will be patched to use extended vdup vector, but low version will be in tact. This causes redundant vdup instructions to be generated because low version long operation is not reusing extended vdup vector where it can.

This patch fixed this issue by iterating through the users of original vdup vector and patching the low version of long operations to use extract_low(new extended vdup vector).

Diff Detail

Event Timeline

sunho created this revision.Jan 30 2022, 5:31 AM
sunho requested review of this revision.Jan 30 2022, 5:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 30 2022, 5:31 AM
sunho edited the summary of this revision. (Show Details)Jan 30 2022, 5:39 AM
sunho added reviewers: fhahn, dmgreen.
sunho edited the summary of this revision. (Show Details)Jan 30 2022, 5:52 AM
sunho updated this revision to Diff 404661.Jan 31 2022, 12:16 PM

Tidy up test case and code.

sunho edited the summary of this revision. (Show Details)Jan 31 2022, 12:19 PM
sunho added a comment.Feb 3 2023, 2:40 PM

Ping? Also wonder if this is not a problem anymore.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 3 2023, 2:40 PM
Herald added a subscriber: StephenFan. · View Herald Transcript

Sorry for the delay. We tend not to iterate uses if we can avoid it.

Could we use DAG.getNodeIfExists to common the nodes that are both 64bit and 128bit, if both exist? Like the existing transform in performDUPCombine but for MOVI and DUPLANEs. That way might be a little more generic, where we might be able to handle more cases than just the umull too.