This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add AArch64Subtarget::isFusion function.
ClosedPublic

Authored by fhahn on Jul 3 2017, 10:02 AM.

Details

Summary

isFusion returns true if the subtarget supports any kind of instruction
fusion, similar to ARMSubtarget::isFusion. This was suggested in D34142.

This changes the current behavior slightly, because the macro fusion mutation
is now added to the PostRA MachineScheduler in case the subtarget supports
any kind of fusion. I think that makes sense because if the PostRA
MachineScheduler is run, there is potential that instructions scheduled back to
back are re-scheduled.

Diff Detail

Event Timeline

fhahn created this revision.Jul 3 2017, 10:02 AM
javed.absar added inline comments.Jul 4 2017, 1:10 AM
lib/Target/AArch64/AArch64TargetMachine.cpp
290

This doesn't look like NFCI - as previously this would be triggered on hasFuseAES and hasFuseLiteral only, but now it will include Bcc and cbz fusion as well.

fhahn edited the summary of this revision. (Show Details)Jul 4 2017, 10:14 AM
fhahn added reviewers: joelkevinjones, joel_k_jones.
fhahn added a subscriber: joel_k_jones.

Adding @joel_k_jones as a reviewer, as this change potentially changes code generated for thunderx2t99.

lib/Target/AArch64/AArch64TargetMachine.cpp
290

Indeed, that should have been clearer from the description. I'll update the description. I think it makes sense to add the macro fusion mutation in case any kind of fusion is supported, because if the PostRA machinescheduler is run, there's always potential that instructions scheduled back to back are re-scheduled.

Currently, this could potentially only change the code generated for thunderx2t99, which is the only subtarget that uses the PostRA machinescheduler AND has ArithmeticBccFusion (and not FuseAES or FuseLiterals).

joelkevinjones edited edge metadata.

Stefan has been doing the back-end as well.

fhahn added a comment.Jul 5 2017, 4:00 PM

Great thank you! The change should only have minimal impact on code generated on a thunderx2t99, the worst thing that could happen is that arithmetic instructions that have been fused with BCC during instruction scheduling would now be scheduled back-to-back in the PostRA MachineScheduler too, whereas before they might not have been, because the constraints weren't added for the PostRA MachineScheduler.

evandro added inline comments.Jul 10 2017, 3:44 PM
lib/Target/AArch64/AArch64TargetMachine.cpp
290

I agree. It's probably just luck that instructions fused with the ExitSU, when the latter depends tightly on the former, have little chance of moving post the RA.

fhahn added a comment.Jul 12 2017, 5:00 AM

Ping. @joelkevinjones @steleman are you happy for this to go in?

steleman edited edge metadata.EditedJul 12 2017, 9:10 AM

Ping. @joelkevinjones @steleman are you happy for this to go in?

I am running SPECint2K6 with and without your patch for comparison and
I will let you know as soon as I have all the numbers.

Ping. @joelkevinjones @steleman are you happy for this to go in?

I am running SPECint2K6 with and without your patch for comparison and
I will let you know as soon as I have all the numbers.

Your patch shows an improvement for 462.libquantum:

  1. Without your patch: 315.392 = 65.695
  2. With your patch: 301.580 = 68.704

The other benchmarks either didn't show much of a difference, or the difference
was well within the noise range.

So, I have no objections to this patch.

joelkevinjones accepted this revision.Jul 12 2017, 11:59 AM

Thanks Stefan. LGTM

This revision is now accepted and ready to land.Jul 12 2017, 11:59 AM
fhahn edited the summary of this revision. (Show Details)Jul 12 2017, 1:53 PM
fhahn closed this revision.Jul 12 2017, 1:53 PM
fhahn added a comment.Jul 12 2017, 1:55 PM

Thanks, great that it improves 462.libquantum slightly tx2