This is an archive of the discontinued LLVM Phabricator instance.

DeadMachineInstructionElim: Switch to using LiveRegUnits
ClosedPublic

Authored by arsenm on Apr 21 2022, 6:42 AM.

Details

Reviewers
qcolombet
MatzeB
Summary

Theoretically improves compile time for targets with many overlapping
registers

Diff Detail

Event Timeline

arsenm created this revision.Apr 21 2022, 6:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 6:42 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
arsenm requested review of this revision.Apr 21 2022, 6:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 6:42 AM
Herald added a subscriber: wdng. · View Herald Transcript
MatzeB added a comment.EditedApr 26 2022, 8:25 PM

I don't know what the state of this is nowadays, but doesn't this risk problems with the old problem of XMM and YMM not being differentiated within the register units? Calling convention for windows says XMM registers are preserved but YMM register are clobbered. With this change we would likely register XMM and YMM as clobbered. I haven't thought things through here, is this pass still safe with this in mind?

I don't know what the state of this is nowadays, but doesn't this risk problems with the old problem of XMM and YMM not being differentiated within the register units? Calling convention for windows says XMM registers are preserved but YMM register are clobbered. With this change we would likely register XMM and YMM as clobbered. I haven't thought things through here, is this pass still safe with this in mind?

I don't know anything about that, but shouldn't that be expressed with the existing constraints on the calls/returns anyway?

I don't know what the state of this is nowadays, but doesn't this risk problems with the old problem of XMM and YMM not being differentiated within the register units? Calling convention for windows says XMM registers are preserved but YMM register are clobbered. With this change we would likely register XMM and YMM as clobbered. I haven't thought things through here, is this pass still safe with this in mind?

I don't know anything about that, but shouldn't that be expressed with the existing constraints on the calls/returns anyway?

I don't see why this would warrant special casing, but given that no existing tests failed I assume this is fine

arsenm updated this revision to Diff 447144.Jul 24 2022, 10:45 AM

Fix missed test update that shows this fixes a bug where a mov was previously incorrectly deleted

qcolombet accepted this revision.Aug 25 2022, 11:32 AM
This revision is now accepted and ready to land.Aug 25 2022, 11:32 AM
nikic added a subscriber: nikic.Sep 12 2022, 9:28 AM

Theoretically improves compile time for targets with many overlapping registers

In practice, this is a major compile time regression instead: http://llvm-compile-time-tracker.com/compare.php?from=6c44a7179f1747ec38d580e6b50bde98555ad811&to=b5041527c75de2f409aa9e2e6deba12b17834c59&stat=instructions

Theoretically improves compile time for targets with many overlapping registers

In practice, this is a major compile time regression instead: http://llvm-compile-time-tracker.com/compare.php?from=6c44a7179f1747ec38d580e6b50bde98555ad811&to=b5041527c75de2f409aa9e2e6deba12b17834c59&stat=instructions

This doesn't make much sense. My preliminary analysis says that this is somehow phys_regs_and_masks fault (which really doesn't buy any real code simplification over iterating the operands)

Theoretically improves compile time for targets with many overlapping registers

In practice, this is a major compile time regression instead: http://llvm-compile-time-tracker.com/compare.php?from=6c44a7179f1747ec38d580e6b50bde98555ad811&to=b5041527c75de2f409aa9e2e6deba12b17834c59&stat=instructions

This doesn't make much sense. My preliminary analysis says that this is somehow phys_regs_and_masks fault (which really doesn't buy any real code simplification over iterating the operands)

Hopefully d90f7cb559e32c2cbf1f9839d7e8e0cc0be189ba fixes this

nikic added a comment.Sep 12 2022, 2:56 PM

Theoretically improves compile time for targets with many overlapping registers

In practice, this is a major compile time regression instead: http://llvm-compile-time-tracker.com/compare.php?from=6c44a7179f1747ec38d580e6b50bde98555ad811&to=b5041527c75de2f409aa9e2e6deba12b17834c59&stat=instructions

This doesn't make much sense. My preliminary analysis says that this is somehow phys_regs_and_masks fault (which really doesn't buy any real code simplification over iterating the operands)

Hopefully d90f7cb559e32c2cbf1f9839d7e8e0cc0be189ba fixes this

It does recover most of the regression: http://llvm-compile-time-tracker.com/compare.php?from=ab56719acd98778fb2e48fa425ac7c8d27bdea86&to=d90f7cb559e32c2cbf1f9839d7e8e0cc0be189ba&stat=instructions

There is still a bit of residual regression left though: http://llvm-compile-time-tracker.com/compare.php?from=6c44a7179f1747ec38d580e6b50bde98555ad811&to=d90f7cb559e32c2cbf1f9839d7e8e0cc0be189ba&stat=instructions

Theoretically improves compile time for targets with many overlapping registers

In practice, this is a major compile time regression instead: http://llvm-compile-time-tracker.com/compare.php?from=6c44a7179f1747ec38d580e6b50bde98555ad811&to=b5041527c75de2f409aa9e2e6deba12b17834c59&stat=instructions

This doesn't make much sense. My preliminary analysis says that this is somehow phys_regs_and_masks fault (which really doesn't buy any real code simplification over iterating the operands)

Hopefully d90f7cb559e32c2cbf1f9839d7e8e0cc0be189ba fixes this

It does recover most of the regression: http://llvm-compile-time-tracker.com/compare.php?from=ab56719acd98778fb2e48fa425ac7c8d27bdea86&to=d90f7cb559e32c2cbf1f9839d7e8e0cc0be189ba&stat=instructions

There is still a bit of residual regression left though: http://llvm-compile-time-tracker.com/compare.php?from=6c44a7179f1747ec38d580e6b50bde98555ad811&to=d90f7cb559e32c2cbf1f9839d7e8e0cc0be189ba&stat=instructions

I think this is explainable by LiveRegUnits::removeRegsNotPreserved being slower than clearBitsNotInMask. Ideally we would avoid this by switching from using regmasks to regunit masks