If .L1 is not within +-4KiB range,
convert
bge a0, a1, .L1
to
blt a0, a1, 8 j .L1
In this patch, if the symbol is unresolved at assembly time, do not do
this relaxation.
Fix the bug reported in https://bugs.llvm.org/show_bug.cgi?id=47910
Differential D108961
[RISCV] MC relaxation for out-of-range conditional branch. craig.topper on Aug 30 2021, 5:47 PM. Authored by
Details If .L1 is not within +-4KiB range, convert bge a0, a1, .L1 to blt a0, a1, 8 j .L1 In this patch, if the symbol is unresolved at assembly time, do not do Fix the bug reported in https://bugs.llvm.org/show_bug.cgi?id=47910
Diff Detail
Event TimelineComment Actions I know GNU as does this but I don't see why that's a good idea nor why we should; if you use a conditional branch to a label that's too far away then that's your problem and you should have written non-broken code. Having it insert the long branch form is too much magic for me... Comment Actions Yes, I'm in two minds about this. I'm usually in favour of trying to match GNU tools behaviour in order to minimise porting issues / provide a more predictable user experience. But fighting against that, I'm really not a fan of these "magic" transformations either. The fear with adding anything like this is of course adding a very rarely executed path that ends up having bugs in obscure cases. As such, if we were to go in this direction I'd be keen to see more extensive test coverage. Comment Actions FWIW, I *think* the origin of this stems from: (a) GCC/binutils divide so GCC doesn't actually know how far away things are and thus can't pick the right branch all the time. Fixing it up in the assembler was presumably the easier solution rather than fixing GCC to emit correct assembly. That then has the unfortunate effect of becoming something hand-written assembly relies on (e.g. bbl was, as was FreeBSD in one place very early on before LLVM was mature enough to build it). Comment Actions That's a good point re the origin of this kind of magic. It probably strengthens the argument in my mind for supporting this - being able to take .s generated by GCC and assemble it with clang/LLVM is a good thing (and we should probably be doing more such testing). Comment Actions
The description about GCC is not fully correct, GCC also has capability estimate jump range, and convert cond branch to inverted cond branch and a jump during the compilation stage, GCC didn't emit any compressed instruction so the instruction distance estimation is over-conservative. So when we need this magic?
Comment Actions Is that not a problem for every architecture though?
There's nothing stopping an assembler from detecting that a label will/won't/might be out of range for a given branch by doing best-case and worst-case analysis on the instruction sequence and emitting an error in the case where it's definitely going to be out of bounds however much you relax, and a warning when it's unsure because it depends on how much relaxation there is (which is likely to be relatively rare, normally these cases are _way_ out of bounds for user-provided assembly). Comment Actions
Yeah, but RISC-V is more frequently hit due to shorter range for conditional branch, I believe integrated assembler is right way to resolve this, but sadly it's not existing in GCC :(
Technically that's feasible solution, but I think that's would be more like philosophical issue here :p Comment Actions I don't particularly like adding magic to RISC-V just because it hits an issue shared by many architectures more often than them. If I can break every architecture other than RISC-V with a sufficiently-large blob of inline assembly then it seems to me like GCC needs fixing, not that RISC-V should have a special hacky workaround just for it that gets exposed to and relied upon by users?
Comment Actions Some objection from me. x86 does have such a relaxation (breaking WYSIWYG), but this is not a thing that any other RISC arch I know does. # thumb: Error: branch out of range .thumb beq .Lfoo .rept 129 nop .endr .Lfoo: RISCV already uses lib/CodeGen/BranchRelaxation.cpp on the compiler side, so I am not sure why we need this assembler side solution. https://gcc.gnu.org/onlinedocs/gcc/Size-of-an-asm.html#Size-of-an-asm says For completeness, such long branches are typically not performance bottleneck. Comment Actions Will the BranchRelaxation pass account for .insn directives in inline asm correctly? Or will we need to do additional work for that associated with D108602 Comment Actions I haven't looked at it closely yet. BranchRelaxation.cpp uses TargetInstrInfo::getInstSizeInBytes to compute the size. Comment Actions
That's not target dependent feature, and it's roughly way to estimate, unless GCC has something like MC layer, otherwise it's never accurate.
RISC-V don't have special relocation to handle this relaxation, and binutils didn't handle that, although that could be recognized by R_RISCV_BRANCH + R_RISCV_JAL pair, but as you said it's might be complicated and not worth. Comment Actions It is not a new feature for RISC-V. We already have c.beqz to beq, c.bnez to bne, c.j to jal, etc. This patch just extends the relaxation patterns to conditional branches. If MC relaxation is not a good idea, should we remove all the patterns in the relaxInstruction target hook? Comment Actions My understanding is that's more a quirk of our implementation and is there to un-compress branches that get erroneously compressed (since a bare symbol is a valid operand for compressed branches), but yes, that does currently cause things to behave a bit strangely and ideally wouldn't be there. Comment Actions binutils also does this for at least Xtensa http://sourceware.org/binutils/docs/as/Xtensa-Branch-Relaxation.html There was this pull request to document the GNU behavior in the riscv-asm-manual https://github.com/riscv-non-isa/riscv-asm-manual/pull/58/files The inline assembly counting in LLVM is known to be incomplete see https://bugs.llvm.org/show_bug.cgi?id=42539 I think maybe .insn will work with the current implementation of getInlineAsmLength because it mostly just counts the number of lines and multiplies by MaxInstLength. With a special case for .space directive. Comment Actions If we deal with special directives such as .rept in getInlineAsmLength, we should be able to get the correct branch instruction according to the estimated offset. I agree that the compiler should be responsible to generate the correct code, not depends on MC to fix it. However, hand-written assembly code is another story. How about to turn MC relaxation off by default and let users turn on it if users want the compiler to help them to take care of these transformation? Comment Actions I found some pseudo instruction are expanded to multiple MI instructions in RISCVExpandPseudoInsts.cpp. And the BranchRelaxation pass will run before the expandPseudo pass. Is it possible to cause the fixup value out-of-range on some branch instructions? If this will happen, maybe add MC layer branch relaxation can deal with it. Comment Actions Hello, we are currently trying to implement an LLVM backend on RISC-V for the GraalVM Native Image project and we run into some issues that are solved by this patch. In summary, we produce LLVM bitcode from Java code and feed it to llc. The problem is that we cannot really control the size of the functions and it can cause branches to out of range locations. However, this issue seems stale and controversial. Is there any news to share? This comment was removed by Zeavee. Comment Actions @reames has discovered that CodeGen's BranchRelaxation doesn't understand how RISC-V emit alignment directives when relaxation is enabled. Align directives always emit (align - mininstsize) bytes of NOPs. The linker is responsible for cleaning up the extra NOPs to meet the requested alignment. BranchRelaxation doesn't know this and undercounts the size of the alignment NOPs. This shows up when -falign-loops is used for example. Adding this relaxation to the assembler is the quickest fix and matches the GNU assembler anyway. Though it still wouldn't handle the case that requires a branch to become indirect. Comment Actions As @craig.topper mentioned above, I stumbled into this issue last week when debugging an LTO failure. The exact issue I hit was because BranchRelaxation doesn't account for the padding required in RISCV's alignment directive. (Which is different from every other target.) We could, of course, patch the particular issue in BranchRelaxation, but I want to make an argument for reversing course on this patch as well. If we'd landed this patch, my hard to debug LTO correctness issue would have been a minor code quality issue at worst, and I'd not have lost a week of my life. :) First, we are currently incompatible with the GNU toolchain. LLVM's assembler will not successfully assemble programs which 'as' will. This shows up both in hand written assembly, but also, in theory at the moment, in output from gcc. target: bne a0, a1, target .rep 1024 nop .endr bne a0, a1, target Personally, I consider this a very strong reason to accept this patch (or some variant thereof). To the point where I almost hesitate to bring up other lines of arguments because I'm worried they'll distract from the core point of cross toolchain compatibility. We *need* cross toolchain compatibility; any exceptions to this need to be *very* strongly justified, and I don't consider any of the arguments made on this thread to date to come anywhere close to that bar. Second, I don't believe the correctness invariant relied on from BranchRelaxation is a good idea. At the moment, we *must* get all instruction sizes used in branch relaxation to be upper bounds on the actual bytes generated. We've now had multiple examples where this was not upheld in edge cases, and I don't see anything in the code which particularly enforces said invariant. I've posted a couple patches to add a few relevant asserts here and there, but this is a cross cutting invariant with broad implications. If we can remove it it, that removes a major correctness risk. There was an argument made on thread previously that this patch adds rarely executed behavior which is likely to bit rot. I think this argument gets things completely backwards. The fundamental case requiring relaxation has to be implemented somewhere, and the assembler implementation is actually the easier to test of the alternatives. The assembler change has dramatically less "blast radius" than the compiler BranchRelaxation approach. Now, I do need to acknowledge that there's related branch relaxation case which the assembler *can't* easily handle. Specifically, when we cross out of +/- 2MB range, we need a scratch register to materialize the long branch. To my knowledge, GCC doesn't handle this case at all. Third, I think we actually have potentially significant missed optimizations here. BranchRelaxation is inherently conservative. As one example, @craig.topper tells me that every branch is modeled as 4 bytes, even if the branch target would allow the branch to be compressible. As such, for branch dense code (e.g. loop nests) we can end up relaxation branches which don't need relaxed. I haven't (yet) seen a case compelling enough on it's own to address, but in theory, such cases are certainly possible. Since we have a correctness need to change schemes here anyways, we might as well fix the optimization problem at the same time.
Comment Actions I'm very sorry for that, I know from personal experience how galling it is to spend a long time debugging something to later find a patch that fixed it had already been proposed. I don't personally have a reliable system for tracking patches that don't get re-pinged by their authors, but there was a bug report for this issue here, so definitely a failure there...
I wanted to clarify this point - my concern was adding a rarely executed and insufficiently tested code path (which has in the past also been the source of hard to debug issues).
Comment Actions Code wise, LGTM. Please wait on landing until we settle the high level approach questions. Comment Actions Code changes LGTM as well. I can't think of other cases worth testing either (but of course, more eyes always welcome there). Comment Actions Vote +1 from my personal side. Comment Actions We discussed this today at the RISCV sync up meeting. Unfortunately, @jrtc27 wasn't in attendance. Out of the attendees, @asb, @craig.topper, and I spoke in support of landing this. @luismarques remembered being hesitant, and was going to revisit. I'm going to reach out to @jrtc27 to discuss, and see if her previous opposition still stands given new information. (edit: Reached out via email on 2/2/23) Comment Actions To summarise my position after chatting things through with Philip, I'm in favour of landing this as-is^. For compiler-generated code, I would personally rather it not be something relied upon, but I understand the reasons it does from a practical perspective and that others have differing philosophical positions. My bigger concern was with hand-written assembly, especially with .option norelax (and possibly .option norvc) where there is the expectation that what you write is exactly what you get at the byte level, not just semantically (c.f. Linux's alternatives and static branches, which make use of that). For compatibility with GNU as on real-world code we do unfortunately need to support this, and so having it on by default is the way we have to go. I would, however, like to see both assemblers provide a standard way to turn this off in future, and perhaps an eventual path for compilers to explicitly opt-in to using this such that the default could be changed many years down the road (and thus not "surprise" developers), but that's all future work that isn't blocking for any of this. ^ NB: This is not a code review though, I haven't studied the changes themselves, hence making this just a comment and not an "official" approval Comment Actions Thanks @jrtc27 - marking this explicitly as LGTM now the remaining concerns have I think all been addressed.
|
isInt<13>(Offset).
(Offset > 4094 || Offset < -4096); is not different from (Offset > 4095 || Offset < -4096);