This is an archive of the discontinued LLVM Phabricator instance.

[RFC][RISCV] Don't disassemble `addi`s with relocations as `mv`s
Needs ReviewPublic

Authored by luismarques on Feb 5 2023, 7:50 AM.

Details

Summary

A RISC-V mv rd, rs instruction is an alias/pseudoinstruction for addi rd, rs, 0.

When disassembling object files we'll commonly encounter many (arguably) spurious mvs: addis with unresolved relocations. The immediate operand isn't actually zero, though that is obscured by the relocation, so IMO it shouldn't be disassembled as a mv. I'd argue that even if such a relocation happens to resolve to zero it's still semantically not a mv, but that is a minor point.

This patch tries to address that situation by detecting when those addis have relocations and not printing them as mvs.

Please let me know if you agree with (1) my reasoning and (2) the overall approach of this patch.
If there is a rough agreement I'll further improve the patch, including optimizing the search for the relocations and not introducing a separate printInst method.

Diff Detail

Event Timeline

luismarques created this revision.Feb 5 2023, 7:50 AM
Herald added a reviewer: MaskRay. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
luismarques requested review of this revision.Feb 5 2023, 7:50 AM
craig.topper added inline comments.Feb 15 2023, 11:09 AM
llvm/tools/llvm-objdump/llvm-objdump.cpp
535

Do we have an llvm::find_if that uses ranges?

I think the reasoning makes sense to me. Do you also use -r when you disassemble object files? Or do you anticipate just knowing that an addi with 0 is some unrelocated address?

Basic motivation/direction here makes sense to me. +1 on the general approach.

I think the reasoning makes sense to me. Do you also use -r when you disassemble object files? Or do you anticipate just knowing that an addi with 0 is some unrelocated address?

I often use the -r option to determine the symbol addresses being computed (in non-trivial disassemblies) or as a sanity check to verify that some immediate isn't actually zero. Sometimes I don't bother because it makes the output noisy or I just forget.

I'm not sure I understand your second question, though (it's late here...). I assume it's more than just a curiosity and it has some relation to this patch but it's not clear to me what the implication is. Yes, without -r the disassemblies can be a bit puzzling at times, and you need to anticipate that some 0 immediates are misleading, and they won't actually be zero in the final binary. You have the same problem with mvs, just exacerbated because then you have to think that they aren't even true mv instructions at all. Is your point that when you do use -r it should be clear that the mvs aren't actually moves and the addis aren't actually zero? Apologies if I'm being dense.

I'm not sure I understand your second question, though (it's late here...). I assume it's more than just a curiosity and it has some relation to this patch but it's not clear to me what the implication is. Yes, without -r the disassemblies can be a bit puzzling at times, and you need to anticipate that some 0 immediates are misleading, and they won't actually be zero in the final binary. You have the same problem with mvs, just exacerbated because then you have to think that they aren't even true mv instructions at all. Is your point that when you do use -r it should be clear that the mvs aren't actually moves and the addis aren't actually zero? Apologies if I'm being dense.

I assume it's the latter. It's true that with -r the spurious mvs are less of a problem, though I think it's still something to be avoided if we can.

I'm not sure I understand your second question, though (it's late here...). I assume it's more than just a curiosity and it has some relation to this patch but it's not clear to me what the implication is. Yes, without -r the disassemblies can be a bit puzzling at times, and you need to anticipate that some 0 immediates are misleading, and they won't actually be zero in the final binary. You have the same problem with mvs, just exacerbated because then you have to think that they aren't even true mv instructions at all. Is your point that when you do use -r it should be clear that the mvs aren't actually moves and the addis aren't actually zero? Apologies if I'm being dense.

I assume it's the latter. It's true that with -r the spurious mvs are less of a problem, though I think it's still something to be avoided if we can.

Yes it was the latter. I was just curious.

It'd be really cool if we could use the relocation to print %pcrel_lo(sym) instead of 0. But I don't know how much work that would be.

It'd be really cool if we could use the relocation to print %pcrel_lo(sym) instead of 0. But I don't know how much work that would be.

I can work on that if there is agreement that we should go in that direction. Makes sense to me, but I wonder if some people will push back against that, regardless of implementation concerns.

It'd be really cool if we could use the relocation to print %pcrel_lo(sym) instead of 0. But I don't know how much work that would be.

I can work on that if there is agreement that we should go in that direction. Makes sense to me, but I wonder if some people will push back against that, regardless of implementation concerns.

From my perspective that would be cool and useful for all targets. I can’t think of a reason why one wouldn’t prefer that, beyond the complexity of having to synthesise labels when the assembler has converted them to section-relative.