This is an archive of the discontinued LLVM Phabricator instance.

Implement machine unroller utility class
Needs ReviewPublic

Authored by jverma on Oct 8 2018, 4:28 PM.

Details

Reviewers
kparzysz
Summary

This patch implements the target independent MachineUnroller utility class which provides APIs to perform loop unrolling at the MI level. Only small inner-most loops with the run-time trip count and a single basic block are handled. The unroller is invoked from the software pipeliner if it's determined to improve the resource usage of the loop. With the increased ILP, the pipeliner often generates better code for the small loops with high latency instructions and under utilized resources.

For now, this feature is enabled only for Hexagon. To enable it for other targets, they must extend the MachineUnroller class and provide their own implementation of the target specific APIs. In addition, the target must also implement createMachineUnroller function which creates and returns the pointer to the target's MachineUnroller object.

Diff Detail

Event Timeline

jverma created this revision.Oct 8 2018, 4:28 PM

Hello. Very nice. I don't think I can speak to much of the detail here, especially the Hexagon parts, but can you:

  • Add full context to the patch (-U99999)
  • Replace the copyright headers to be more "llvmy"

Hello. Very nice. I don't think I can speak to much of the detail here, especially the Hexagon parts, but can you:

  • Add full context to the patch (-U99999)
  • Replace the copyright headers to be more "llvmy"

Hi Dave,

Sorry, I missed the copyright header. I will make it inline with the standard llvm header. I didn't quite get what you meant by 'Add the full context to the patch'? Do you want me to provide some background to the patch?

Thanks,
Jyotsna

Hello, see the part about context in https://llvm.org/docs/Phabricator.html#phabricator-request-review-web. It's easier to review things if we can see the code around the patch as well as the code in the patch.

jverma updated this revision to Diff 168841.Oct 9 2018, 11:37 AM
arsenm added a subscriber: arsenm.Oct 9 2018, 6:40 PM

I'd prefer to rename this to MachineLoopUnroll to match the IR pass name

arsenm added inline comments.Oct 9 2018, 6:51 PM
include/llvm/CodeGen/MachineUnroller.h
1

Missing C++ mode comment

lib/CodeGen/MachinePipeliner.cpp
907

I think this is misleading since it isn't a size. InstrCount?

941–948

Why is this under NDEBUG? This looks problematic

960–961

You should avoid using FP types for this

lib/CodeGen/MachineUnroller.cpp
437–439

Isn't this just terminators()?

jverma added inline comments.Oct 10 2018, 1:51 PM
include/llvm/CodeGen/MachineUnroller.h
1

Will fix.

lib/CodeGen/MachinePipeliner.cpp
941–948

It is just for the debugging purpose. There is a command line flag (pipeliner-unroll-max) that can be used to set the unrolling limit and is available only with a debug build.

960–961

Sure. In this case, we do need it here. Since the loops handled by the machine unroller are fairly small in size, ResMII happens to be a small value as well, usually in the range of 1 to 5. For this reason, the change in UnrollResMIIRatio from one unroll factor (i) to another is pretty small and can't be captured without it being a float.

lib/CodeGen/MachineUnroller.cpp
437–439

Will fix.

I'd prefer to rename this to MachineLoopUnroll to match the IR pass name

I don't really have a preference here. I will be happy to change the name If anyone else feels the same way.

arsenm added inline comments.Oct 11 2018, 2:04 AM
lib/CodeGen/MachinePipeliner.cpp
941–948

The flag should always be available or just removed entirely. Flags that disappear under some builds are really annoying

zzheng added inline comments.Oct 11 2018, 1:54 PM
lib/CodeGen/MachinePipeliner.cpp
960–961

You can scale both ratio to get rid of FP type.

UnrollResMIIRatio * (i * MinUnrollFactor)
= UnrollResMIIRatio / i * (i * MinUnrollFactor)
= UnrollResMIIRatio * MinUnrollFactor;
MinResMIIRatio * (i * MinUnrollFactor)
= MinResMII / MinUnrollFactor  * (i * MinUnrollFactor)
= MinResIIRatio * i;

if (UnrollResMIIRatio * MinUnrollFactor < MinResIIRatio * i ) {
...
}

instead two float divs it's now cost two int muls, without lost precision, assuming no mul overflow in real cases.

jverma added inline comments.Oct 12 2018, 8:54 AM
lib/CodeGen/MachinePipeliner.cpp
941–948

That's understandable. I will remove NDEBUG and make it available for all builds.

960–961

Thanks a lot ! I will fix it.

jverma updated this revision to Diff 169458.Oct 12 2018, 10:45 AM

Ping!

Can someone please review this patch?

I am working on an out of tree target and this would be useful.
So what is the status of this patch?

I am working on an out of tree target and this would be useful.
So what is the status of this patch?

Same here, but unfortunately I'm unable to say that I'm capable of reviewing this work.
One suggestion I have, at least from my use-case, is to modify the heuristic in MachinePipeliner to include the target in the decision.

Continuing to unroll until the ResMII is optimal may not be the best option for every target. Depending on the loop, the target might be able to generate better code if unrolled 2x instead of 4x, even if 4x would be more attractive, theoretically.

I believe you might be able to do this with a target-specific implementation of MachineUnroller, but I don't think that's a great answer. The unroller should unroll, not analyze.