This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Unroll loop with iv of pointer type
Changes PlannedPublic

Authored by jaykang10 on Jul 29 2021, 9:13 AM.

Details

Summary

At this moment, AArch64 target does not unrolls loop with IV of pointer type.

Following the comment of @efriedma on https://bugs.llvm.org/show_bug.cgi?id=51178, we need to enable UP.Runtime and UP.Force in order to unroll the loop.

This patch enables these unroll preference options universally for AArch64 target.

Diff Detail

Event Timeline

jaykang10 created this revision.Jul 29 2021, 9:13 AM
jaykang10 requested review of this revision.Jul 29 2021, 9:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 29 2021, 9:13 AM
jaykang10 added a comment.EditedJul 29 2021, 9:15 AM

I will update the performance number from benchmarks.

If you feel something wrong from this patch or you have other idea to unroll the target loop, please share it.

nikic added a subscriber: nikic.Jul 29 2021, 9:24 AM

This looks very dubious. UP.Force means that you are unrolling without folding anything. You just repeat the loop body and keep all the branches. This is unlikely to be profitable in general.

Note that this has nothing to do with pointer types -- you simply have a loop where the only exit is unpredictable.

This looks very dubious. UP.Force means that you are unrolling without folding anything. You just repeat the loop body and keep all the branches. This is unlikely to be profitable in general.

Note that this has nothing to do with pointer types -- you simply have a loop where the only exit is unpredictable.

@nikic Thanks for your comment. Let me explain something more.

If the IV's type is pointer type, we can see gep for the IV.

AArch64 has pre/post indexing address mode. If we unroll loop with IV of pointer type which is gep, there will be more opportunities in order to generate instructions with the pre/post indexing address mode. That's why I mentioned the IV of pointer type and try to unroll the loop with it even though the exit count is not computable.

I need to add heuristics to recognize the profitable case in general. I am trying to get performance number in order to do it.

I am sorry for poor example and explanation...

If you feel something wrong, please let me know. I would like to discuss it and get ideas more.

Fundamentally, the issue with forced unrolling is that it increases codesize without actually reducing the number of operations we execute per iteration. If there's a significant reduction to the number of operations per iteration here because we can take advantage of addressing modes, then that might be worthwhile. But we need to be careful with the cost modeling.

You might want to look at ARMTTIImpl for ideas.

Fundamentally, the issue with forced unrolling is that it increases codesize without actually reducing the number of operations we execute per iteration. If there's a significant reduction to the number of operations per iteration here because we can take advantage of addressing modes, then that might be worthwhile. But we need to be careful with the cost modeling.

You might want to look at ARMTTIImpl for ideas.

@eli.friedman Thanks for your comment. It is helpful.

um... It looks other optimization passes remove the opportunity for addressing mode from unrolling... Maybe, I need to find another way...

jaykang10 planned changes to this revision.Aug 19 2021, 4:34 AM