This is an archive of the discontinued LLVM Phabricator instance.

[x86] Fix infinite loop inside DAG combiner lzcnt's optimization.
ClosedPublic

Authored by pgousseau on Apr 1 2022, 5:57 AM.

Details

Summary

The issue affects targets supporting fast-lzcnt such as btver2.
This removes extraneous zext/trunc node insertions to fix the infinite loop.
This fixes Issue https://github.com/llvm/llvm-project/issues/54694

Diff Detail

Event Timeline

pgousseau created this revision.Apr 1 2022, 5:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2022, 5:57 AM
pgousseau requested review of this revision.Apr 1 2022, 5:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2022, 5:57 AM
pgousseau updated this revision to Diff 419736.Apr 1 2022, 6:10 AM

Added a regression test.

Do you know which commit introduced the problem, or which transforms conflict?
This looks like a pretty heavy hammer.

Do you know which commit introduced the problem, or which transforms conflict?
This looks like a pretty heavy hammer.

Thank you Roman for reviewing!
The regression appeared after https://github.com/llvm/llvm-project/commit/8a156d1c2795189389fadbf33702384f522f2506 (thank you @wristow for finding it)
As @RKSimon found out, the IR does not get generated anymore from the original C++ test case but the issue is still present.
I am hopeful we can reintroduce the optimization (at a later codegen stage maybe?) but I would like to do some performance analysis first to see if it is still a win.

spatel added a comment.Apr 1 2022, 7:15 AM

Do you know which commit introduced the problem, or which transforms conflict?
This looks like a pretty heavy hammer.

Thank you Roman for reviewing!
The regression appeared after https://github.com/llvm/llvm-project/commit/8a156d1c2795189389fadbf33702384f522f2506 (thank you @wristow for finding it)
As @RKSimon found out, the IR does not get generated anymore from the original C++ test case but the issue is still present.
I am hopeful we can reintroduce the optimization (at a later codegen stage maybe?) but I would like to do some performance analysis first to see if it is still a win.

That change might have exposed the bug from source/IR, but the backend loop has existed since at least the 10.0 release based on a godbolt check:
https://godbolt.org/z/Tfnaevboz

Do you know which commit introduced the problem, or which transforms conflict?
This looks like a pretty heavy hammer.

Thank you Roman for reviewing!
The regression appeared after https://github.com/llvm/llvm-project/commit/8a156d1c2795189389fadbf33702384f522f2506 (thank you @wristow for finding it)
As @RKSimon found out, the IR does not get generated anymore from the original C++ test case but the issue is still present.
I am hopeful we can reintroduce the optimization (at a later codegen stage maybe?) but I would like to do some performance analysis first to see if it is still a win.

That change might have exposed the bug from source/IR, but the backend loop has existed since at least the 10.0 release based on a godbolt check:
https://godbolt.org/z/Tfnaevboz

Then i think just fixing it may be better than removing some chunks of code.

Do you know which commit introduced the problem, or which transforms conflict?
This looks like a pretty heavy hammer.

Thank you Roman for reviewing!
The regression appeared after https://github.com/llvm/llvm-project/commit/8a156d1c2795189389fadbf33702384f522f2506 (thank you @wristow for finding it)
As @RKSimon found out, the IR does not get generated anymore from the original C++ test case but the issue is still present.
I am hopeful we can reintroduce the optimization (at a later codegen stage maybe?) but I would like to do some performance analysis first to see if it is still a win.

That change might have exposed the bug from source/IR, but the backend loop has existed since at least the 10.0 release based on a godbolt check:
https://godbolt.org/z/Tfnaevboz

Then i think just fixing it may be better than removing some chunks of code.

Yes, I agree fixing would be best, unfortunately I can’t see an obvious way to fix it inside DAG combiner nor if one the transformation involved is doing something wrong.
My understanding of the issue is that some optimizations disagree on which transformation is the best, causing a ping pong effect. I thought we could eventually fix it by moving the optimisation to the instruction selection stage, any other ideas?

pgousseau updated this revision to Diff 420219.Apr 4 2022, 11:04 AM

@RKSimon suggested the transformation might be caused by unneeded trunc/zext operations.
It is causing the DAG combiner to needlessly sink them under the OR operations so I have removed them where possible and this fixes the infinite loop.
The instructions generated end up being reordered is some cases but I think it is equivalent.

pgousseau updated this revision to Diff 420247.Apr 4 2022, 11:10 AM

Remove unused parameter.

RKSimon added inline comments.Apr 5 2022, 12:08 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47664–47665

Is Ret ever null at this point?

pgousseau updated this revision to Diff 420464.Apr 5 2022, 5:33 AM
pgousseau retitled this revision from [x86] Disable optimization involved in an infinite loop inside DAG combiner. to [x86] Fix infinite loop inside DAG combiner lzcnt's optimization..
pgousseau edited the summary of this revision. (Show Details)

Remove unneeded Ret value check.

pgousseau marked an inline comment as done.Apr 5 2022, 5:34 AM

Fixed! thanks

RKSimon accepted this revision.Apr 5 2022, 7:24 AM

LGTM - thanks for dealing with this

This revision is now accepted and ready to land.Apr 5 2022, 7:24 AM
This revision was landed with ongoing or failed builds.Apr 5 2022, 9:32 AM
This revision was automatically updated to reflect the committed changes.