Page MenuHomePhabricator

Optimize long branch for MIPS64 by removing calculation of %higher and %highest
ClosedPublic

Authored by sstankovic on Apr 3 2014, 1:49 PM.

Details

Summary

[mips] Optimize long branch for MIPS64 by removing %higher and %highest. %higher and %highest can have non-zero values only for offsets greater than 4GB, which is highly unlikely, if not impossible when compiling a single function. This makes long branch for MIPS64 3 instructions smaller.

Diff Detail

Event Timeline

mseaborn added inline comments.Apr 3 2014, 2:15 PM
lib/Target/Mips/MipsLongBranch.cpp
327

Does 64-bit MIPS not have an instruction for setting bits 16-31 of a register while zeroing the others (as 32-bit MIPS' "lui" instruction does)?

What does lui64 do?

This part looks OK to me, but I'm not familiar with 64-bit MIPS, so you might want to get someone who is sign off on the change too.

352–353

Is this change based on your not-yet-committed change? If you, presumably you should remove LONG_BRANCH_LUi64 too? Or you might want to commit this change first so that you don't have to add LONG_BRANCH_LUi64?

sstankovic updated this revision to Unknown Object (????).Apr 4 2014, 2:54 PM

Patch is updated.

dsanders added inline comments.Apr 15 2014, 9:06 AM
lib/Target/Mips/MipsLongBranch.cpp
327

Does 64-bit MIPS not have an instruction for setting bits 16-31 of a register while
zeroing the others (as 32-bit MIPS' "lui" instruction does)?

Not while zeroing the other bits but 32-bit results are generally sign-extended to 64-bit. lui is nearly the right operation but sign-extends instead of zero extending.

What does lui64 do?

lui64 doesn't exist in the assembler or the ISA spec so we ought to avoid using that term when writing assembly examples. It's the same as the 32-bit lui instruction but sign-extended to 64-bits.


It looks like you should be able to remove another insn by replacing:

daddiu $at, $zero, 0
daddiu $at, $at, %hi($tgt - $baltgt)

with:

daddiu $at, $zero, %hi($tgt - $baltgt)
349–363

This logic makes sense to me but I didn't notice a guarantee that $tgt and $baltgt are in the same function. Have I missed it?

@Sasa: You have confused me a bit by having interdependent changes in flight at the same time, which is why I've been slow coming back to this.

When you post an updated patch in reply to a review, could you possibly reply to each of the review comments, even if it's just to say "Done"?

I wrote "Is this change based on your not-yet-committed change? If so, presumably you should remove LONG_BRANCH_LUi64 too?". Since the current version of the patch does remove LONG_BRANCH_LUi64, it looks like the answer to my first question was "yes", but it would be better if you answered directly, so that I don't have to deduce it from the change to the patch.

@Sasa: You have confused me a bit by having interdependent changes in flight at the same time, which is why
I've been slow coming back to this.

When you post an updated patch in reply to a review, could you possibly reply to each of the review comments,
even if it's just to say "Done"?

I wrote "Is this change based on your not-yet-committed change? If so, presumably you should remove
LONG_BRANCH_LUi64 too?". Since the current version of the patch does remove LONG_BRANCH_LUi64,
it looks like the answer to my first question was "yes", but it would be better if you answered directly, so that
I don't have to deduce it from the change to the patch.

Actually I did answered, but I don't know why the answers were not visible; maybe because I sent them together with the updated patch, and they were related to the code that was removed. The answers were only visible in the link that shows the change since the last diff. To your question:

Is this change based on your not-yet-committed change? If you, presumably you should remove LONG_BRANCH_LUi64 too? Or you might want to commit this change first so that you don't have to add LONG_BRANCH_LUi64?

I answered:

I removed LONG_BRANCH_LUi64.

And to your question:

What does lui64 do?

I answered:

There is no lui64 (I fixed that comment), there is only a lui instruction (you can see it in .td file lib/Target/Mips/Mips64InstrInfo.td - mnemonic for MIPS64 LUi64 instruction is just "lui"). lui on MIPS64 sets bits 16-31 of a register, zeroes bits 15-0, and sign-extends bit 31 into higher part of the 64-bit register. MIPS64 supports all (or almost all) MIPS32 instructions, and they all operate like this - they change low 32 bits just like they do on MIPS32, and they sign-extend bit 31 into high 32 bits. MIPS64 also has instructions that operate on all 64 bits (their mnemonics start with d (from double)), but there is no instruction that sets bits 16-31 while clearing the others.

I suppose that I should not send inline comments together with the updated patch, but rather before or after I send the patch.

lib/Target/Mips/MipsLongBranch.cpp
327

There is no lui64 (I fixed that comment), there is only a lui instruction (you can see it in .td file lib/Target/Mips/Mips64InstrInfo.td - mnemonic for MIPS64 LUi64 instruction is just "lui"). lui on MIPS64 sets bits 16-31 of a register, zeroes bits 15-0, and sign-extends bit 31 into higher part of the 64-bit register. MIPS64 supports all (or almost all) MIPS32 instructions, and they all operate like this - they change low 32 bits just like they do on MIPS32, and they sign-extend bit 31 into high 32 bits. MIPS64 also has instructions that operate on all 64 bits (their mnemonics start with d (from double)), but there is no instruction that sets bits 16-31 while clearing the others.

352–353

I removed LONG_BRANCH_LUi64.

sstankovic added inline comments.Apr 16 2014, 10:26 AM
lib/Target/Mips/MipsLongBranch.cpp
327

You are right, I'll implement this.

349–363

Do you want that I add the assert that that $tgt and $baltgt are in the current function? $baltgt is always in the current function because it is part of the long branch. As for $tgt, it is the target operand of the original branch that is expanded into long branch. I suppose that branches always jump to the blocks in the same function, but I may be wrong.

sstankovic updated this revision to Unknown Object (????).Apr 16 2014, 10:32 AM

Additionally optimize long branch for MIPS64 by removing one more instruction.

sstankovic updated this revision to Diff 8897.Apr 28 2014, 2:39 PM
sstankovic updated this object.
sstankovic edited edge metadata.

Updated the patch to reflect changes to patch D3089.

@Mark: Are there any new comments on this patch?

LGTM for the code change, with the caveat about tail calls below and the warning that I'm not an expert on MIPS. :-)

lib/Target/Mips/MipsLongBranch.cpp
345

Including this snippet seems a bit too verbose to me. You don't need to document what the code used to do -- we have version control for that. :-)

Just say you assume the jump is within-function, so the offset is within +/- 2GB.

352

I don't think you answered dsanders' question about why it's valid to assume that the jump is within-function.

I was wondering whether tail calls to other functions would become jumps that would need handling via MipsLongBranch. The answer seems to be that LLVM always compiles function calls to indirect jumps on MIPS. (However, GCC does compile tail calls to direct jumps.) That answer seems a bit crazy, so I'm not sure if it's right. You should check. Does the MIPS linker insert veneers to handle long direct jumps the same way ARM linkers do?

sstankovic updated this revision to Diff 9447.May 15 2014, 9:37 AM
sstankovic edited edge metadata.

I modified the comment.

sstankovic added inline comments.May 15 2014, 9:42 AM
lib/Target/Mips/MipsLongBranch.cpp
345

Done

352

I don't think you answered dsanders' question about why it's valid to assume that the
jump is within-function.

I didn't answer because I wasn't aware of the cases where jumps are outside of a function. I just suggested that we can add assert that the branch target is within function.

I was wondering whether tail calls to other functions would become jumps that would > need handling via MipsLongBranch. The answer seems to be that LLVM always
compiles function calls to indirect jumps on MIPS. (However, GCC does compile tail > calls to direct jumps.) That answer seems a bit crazy, so I'm not sure if it's right. You > should check. Does the MIPS linker insert veneers to handle long direct jumps the
same way ARM linkers do?

I checked tail calls on LLVM and GCC, and they generate similar code when given explicit relocation model on the command line. When relocation model is omitted the code differs because default for MIPS GCC is non-pic, and for MIPS LLVM is pic. For pic code (which this patch handles) tail calls are replaced by load from GOT and indirect jump:

lw $t9, %call16(func)($gp)
jr $t9

and this doesn't need long branch.

As for veneers, MIPS GNU ld linker doesn't insert them.

sstankovic accepted this revision.Jun 5 2014, 1:37 AM
sstankovic added a reviewer: sstankovic.

Committed in r209678.

This revision is now accepted and ready to land.Jun 5 2014, 1:37 AM
sstankovic closed this revision.Jun 5 2014, 1:37 AM