This is an archive of the discontinued LLVM Phabricator instance.

[RISCV][LLD] Add RISCV zcmt optimise in linker relaxation
Needs ReviewPublic

Authored by VincentWu on Sep 25 2022, 4:27 AM.

Details

Summary

This patch implements optimizations for the zcmt extension in lld.

A new TableJumpSectio has been added.

Scans each R_RISCV_CALL/R_RISCV_CALL_PLT relocType in each section before the linker relaxation, recording the name of the symbol.

In finalizeContents the recorded symbol names are sorted in descending order by the number of jumps.

Optimise and insert a new cm.jt/cm.jalt during the relax process. in the process, we reused the`R_RISCV_JAL` relocType

co-author: @ScottEgerton

Diff Detail

Event Timeline

VincentWu created this revision.Sep 25 2022, 4:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 25 2022, 4:27 AM
VincentWu requested review of this revision.Sep 25 2022, 4:27 AM
kito-cheng added a comment.EditedSep 26 2022, 4:14 AM

I think this should really write a psABI spec proposal first.

lld/ELF/Arch/RISCV.cpp
617

This relaxation type remove 6 byte, so this should has higher priority than R_RISCV_CALL/R_RISCV_CALL_PLT to jal?

lld/ELF/SyntheticSections.cpp
1165

Should be SHT_PROGBITS, SHT_RISCV_ATTRIBUTES is only used for .riscv.attribute.

1188–1192

This should be enough.

1273–1287

Hmm, how do you deal with more than one local symbol with same name?

e.g.
Following example has 2 different bar:

foo1.c

static int bar (){return 1;}

int foo1(){return bar();}

foo2.c

static int bar (){return 2;}

int foo2(){return bar();}

main.c

int foo1();
int foo2();
int main(){return foo1()+foo2();}
lld/ELF/SyntheticSections.h
379

addCMJTEntryCandidate/getaddCMJTEntryIndex, and same for addEntryZero/getEntryZero

402

entriesZero -> CMJTEntryCandidates
finalizedEntriesZero -> finalizedCMJTEntries
and same for entriesRa/finalizedEntriesRa.

We only need to record symbol name for finalEntries, so std::vector<std::string> is enough?

409

CMJTEntry/CMJALTEntry rather than Zero/Ra, e.g. maxCMJTEntrySize/maxCMJALTEntrySize, startCMJTEntryIdx/startCMJALTEntryIdx.

lld/ELF/Writer.cpp
457

newlib's implementation using __tbljalvec_base$? and maybe use STB_GLOBAL like __global_pointer$?

https://github.com/linsinan1995/riscv-glibc/commit/5d2e0376316f64acfc047dc3e29fd266a456fb8f

VincentWu updated this revision to Diff 462882.Sep 26 2022, 6:29 AM
VincentWu marked 6 inline comments as done.

address part of comments

lld/ELF/SyntheticSections.cpp
1237

We only need to record symbol name for finalEntries, so std::vector<std::string> is enough?

yep but we need sort it in descending order by the number of jumps.
it will be spand more linking time to copy again.

so I chose memory space instead of time.

did you mean it is acceptable if i spend more time to copy it again?

sinan added a subscriber: sinan.Sep 27 2022, 1:37 AM
sinan added inline comments.
lld/ELF/SyntheticSections.cpp
1165
craig.topper added inline comments.Sep 30 2022, 10:31 AM
lld/ELF/Options.td
321

RISCV-> RISC-V

MaskRay added inline comments.Oct 1 2022, 12:03 AM
lld/ELF/Options.td
320

FF: we require that new options must start with --, not -

lld/ELF/SyntheticSections.cpp
1164

The implementation should be moved to Arch/RISCV.cpp to avoid redundant virtual functions in TargetInfo.

lld/ELF/SyntheticSections.h
31

This is inefficient. If DenseMap<CachedStringRef, ...> works, prefer it.

We have the RISC-V attributes section, you should use that (and --(no-)relax) to determine whether or not to use Zcmt relaxation, not invent some new flag, especially since compiler drivers can't use it unless they're sure the linker supports that new flag (think Clang + binutils as the common case where it's hard to know for sure; at least with LLD the likelihood is it's not older than the Clang version, especially in the embedded space) or you require users to opt-in at link time to the relaxation.

jrtc27 added inline comments.Oct 1 2022, 11:46 AM
lld/ELF/Arch/RISCV.cpp
377

This seems wrong

813

This seems like poor design, you shouldn't be using R_RISCV_JAL once it's no longer a JAL instruction there

lld/ELF/SyntheticSections.cpp
1213

When moving this to RISCV.cpp please use extractBits

lld/ELF/Writer.cpp
1637

Why isn't this done when we read in the relocations the first time?

2859

Huh? This doesn't make any sense? You're just writing the jump table section on top of every SHF_ALLOC output section? Make it act like every single other synthetic section and end up in an output section.

VincentWu updated this revision to Diff 487031.Jan 6 2023, 7:52 PM
VincentWu marked 12 inline comments as done.

Refactoring the implementation of lld from psABI.
rebase & address comments

VincentWu added inline comments.Jan 6 2023, 7:54 PM
lld/ELF/Arch/RISCV.cpp
813

Use R_RISCV_32 or R_RISCV_64 instead

lld/ELF/Writer.cpp
1637

IMHO, we can calculate the gain value directly if we scan them when all relaxation have been done.

VincentWu updated this revision to Diff 510671.Apr 3 2023, 7:59 PM

Reduce the time complexity of the algorithm by storing symbol pointers instead of symbol names

Try to compile *Vim* with zcmt opt, the code size of result is about 2.5M. get 0.1M (3.8%) code size decrease when it compare to the code size of *Vim* without zcmt (2.6M).

VincentWu updated this revision to Diff 516050.Apr 22 2023, 3:15 AM

extend the size of InputSection

extend the size of InputSection

Why is this change? InputSection may have millions of instances for a large link. Increasing a word may increase the peak memory usage by 1%. We should strive to avoid it.

Revert commit "extend sizeof InputSection"

jrtc27 added inline comments.Apr 24 2023, 12:40 AM
lld/ELF/Arch/RISCV.cpp
324

These are extremely dodgy

692

Blank line doesn't match style

696

This naming doesn't fit

908

"Entrys" is not an English word

941

We call finalizeContents more than once?

lld/ELF/SyntheticSections.cpp
42

?

kito-cheng added inline comments.May 24 2023, 12:13 AM
lld/ELF/Arch/RISCV.cpp
324
evandro removed a subscriber: evandro.May 24 2023, 1:47 PM
VincentWu updated this revision to Diff 534880.Jun 27 2023, 2:29 AM
VincentWu marked 6 inline comments as done.

address comments

VincentWu marked an inline comment as not done.Jun 27 2023, 2:30 AM
lld/ELF/Arch/RISCV.cpp
941

This method will be called at the end of the last round of relaxation. the changed in Writer<ELFT>::finalizeAddressDependentContent() might be set as true, which will cause the linker to do an additional relaxation. the code here is just try to avoiding it being called again in an additional relaxation.

jrtc27 added inline comments.Aug 29 2023, 2:14 PM
lld/ELF/Arch/RISCV.cpp
324

Uh, what is this diff now? That's not what Kito said to do, and doesn't make it less dodgy. All you've done is given it a name, but that doesn't change the fact that the whole approach is dodgy.

VincentWu updated this revision to Diff 556100.Sep 6 2023, 5:56 PM

use INTERNAL_ relocation type for Zcmt

VincentWu marked an inline comment as done.Sep 6 2023, 5:56 PM

Hi all,

I think I've tried to address all the comments in this revision so far, if I missed something, please remind me again.

This revision hasn't received new commit for a while. So this ping is ask if there is any further comment for this revision?
I'll try to make changes as soon as possible so that we can move forward with the revision.

Is this patch currently meeting some blocks? If there is anything I can do to help, please let me know.

Uh, what is this diff now? That's not what Kito said to do, and doesn't make it less dodgy. All you've done is given it a name, but that doesn't change the fact that the whole approach is dodgy.

@jrtc27 Hope I get what you mean correctly this time, does it still dodgy?

@MaskRay Dose anything I can do for this patch? or do you have any thoughts on how this patch could be improved?

This hasn't received any attention in a bit over a month. Has it been moved to GitHub or are the reviewers just busy? It would be great to make progress with Zcmt.

This hasn't received any attention in a bit over a month. Has it been moved to GitHub or are the reviewers just busy? It would be great to make progress with Zcmt.

no, this patch not been moved to github, it will be reviewed here.
I am trying to find more evidence make this patch looks more reliable

I am only starting my involvement with RISC-V and thereby have much less experience than most of the other reviewers. Especially when it comes to lld. So if any of the more experienced reviewers disagree with any of my suggestions, please favour their suggestions over mine.

I for one certainly would like to see support for Zcmt land in lld sooner rather than later, so I have provided a number of comments that I am hoping can at least help advance the discussion even if they do not need to be incorporated.

Finally, it is not clear to me where the JVT is set (i.e. csrw jvt, <reg with base address>). Is that set in some library or the crt files? I imagine that is documented somewhere, but I think it might be useful to add a link to that documentation as a comment to the entry point for this relaxation (relaxTableJump()).

lld/ELF/Arch/RISCV.cpp
891

If a function returns an index, I think it should be called getIndex() rather than getEntry().

896–905

Can this not just be something like:

// Find this symbol in the ordered list of entries if it exists.
assert(maxSize >= entriesList.size() &&
       "Finalized vector of entries exceeds maximum");
auto idx = std::find_if(
    entriesList.begin(), entriesList.end(),
    [symbol](llvm::detail::DenseMapPair<const Symbol *, int> &e) {
      return e.first == symbol;
    });
if (idx == entriesList.end())
  return entriesList.size();
return idx - entriesList.begin();
923

I think it is useful to not name the opposite of what they mean (even though I do understand that gain is wrt. the goal of code size reduction). It is probably to change all references to gain to codeSizeReduction or csReduction if that's to long.

949

I think it would be useful to document this. Perhaps something like:

// The total reduction in code size is J - JA + JTS - JAE.
// Where:
// J = number of bytes saved for all the cm.jt instructions emitted
// JA = number of bytes saved for all the cm.jalt instructions emitted
// JTS = size of the part of the table for cm.jt jumps (i.e. 32 x worsize)
// JAE = number of entries emitted for the cm.jalt jumps x wordsize

Plus it seems like a much more readable implementation would be something like:

int sizeIncrease = getSize();
for (auto entry : finalizedCMJTEntries) {
  if (sizeIncrease < 0)
    break;
  sizeIncrease -= entry.second;
}
for (auto entry : finalizedCMJALTEntries) {
  if (sizeIncrease < 0)
    break;
  sizeIncrease -= entry.second;
}

In fact, it might be useful to just implement a function getSizeReduction() similar to getSize() that would simply return the amount of code size reduction for all the table entries.

968
/// Sort the map in decreasing order of the amount of code reduction provided
/// by the entries. Drop any entries that can't fit in the map from the tail
/// end since they provide less code reduction. Drop any entries that cause
/// an increase in code size (i.e. the reduction from instruction conversion
/// does not cover the code size gain from adding a table entry).
988

// Drop any items that have a negative effect (i.e. increase code size).

1016

Is this needed if there are no entries in finalizedCMJALTEntries?

1037–1038

What is this for? How can there be a symbol in the vector that has a zero "gain"? Wouldn't that just mean that it doesn't have any calls? Seems more like this should be an assert.

lld/ELF/Options.td
320

I find the name and description somewhat unnatural for what this does. I think something like this would feel more natural:

riscv-tbljmp
"Enable conversion of call instructions to table jump instruction from the Zcmt extension for frequently called functions (RISC-V only)"

Also, is an option perhaps too coarse-grained? Is it feasible that a user would want to use table jumps within some objects but not in others? Namely, if the user compiles some objects with no Zcmt, shouldn't the linker respect that?

lld/ELF/Target.h
37

I think it would be clearer if this had entry or target in the name since it is writing a table entry (a target address).

lld/ELF/Writer.cpp
1674

I wonder if it would be possible and/or advantageous to produce the jump table and convert the instructions before we begin the process of relaxation. I am not familiar enough with the process here so this may not make sense, but this is how I see it and why I'm suggesting this...

  1. This adds another source of volatility with relaxation, possibly making it more difficult to reach a fixed point
  2. If we compile with -ffunction-sections, the linker has the ability to lay out the sections that contain functions for which we will use the jump table as far away from the rest of text as possible. This will keep the remaining functions closer and make them more likely to be possible to relax. Of course, the functions for which we're emitting jump table entries will themselves have calls, so an optimal layout is not feasible to compute, but it is likely useful to sort such functions late.
lld/test/ELF/riscv-tbljal-call.s
2

I think it would be useful to add checks for the layout of the jump table including the padding.

VincentWu updated this revision to Diff 558175.Mon, Nov 27, 6:58 PM
VincentWu marked 10 inline comments as done.

address comments

Finally, it is not clear to me where the JVT is set (i.e. csrw jvt, <reg with base address>). Is that set in some library or the crt files? I imagine that is documented somewhere, but I think it might be useful to add a link to that documentation as a comment to the entry point for this relaxation (relaxTableJump()).

as psabi has mentioned https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-elf.adoc

This relaxation requires programs to initialize the jvt CSR with the address of the __jvt_base$ symbol before executing table jump instructions. It is recommended to initialize jvt CSR immediately after global pointer initialization.

lld/ELF/Options.td
320

Also, is an option perhaps too coarse-grained? Is it feasible that a user would want to use table jumps within some objects but not in others? Namely, if the user compiles some objects with no Zcmt, shouldn't the linker respect that?

Yes, it is coarse-grained a bit. But this extension is intended for embedded systems, (It is incompatible with applications-class processors.). Thus I think it might be ok if we just do it like that.

lld/ELF/Writer.cpp
1674

IMHO, I don't think it's possible to produce jump table before the linker relaxation. Because we need to calculate the reduction gained when jumping using TableJump instruction. But it will be changed by Compressed Tail Call Relaxation and Compressed Function Call Relaxation during linker relaxation.

As mentioned in psABI:"Compressed Tail Call Relaxation and Compressed Function Call Relaxation are always prefered during relaxation, since table jump relaxation has no extra size saving over these two relaxations and might bring a performance overhead."

If we produce the jump table before the linker relaxation, the code size reduction may be reduced to a small amount and even lead to a negative optimization.

VincentWu updated this revision to Diff 558216.Fri, Dec 8, 2:44 AM

add riscvTableJumpSection only when SizeReduction > 0

Hi all,
I tried linking several common applications and libraries using this patch to compare Zcmt's contribution to reducing Code Szie.

The results can be found in the Sheet or in the image below.

Besides that, I found a problem that needs to be discussed.

In TableJumpSection::finalizeContents(), the linker will abort the tbljal optimization and empty the Jump table if tbljal would cause a negative optimization (i.e., the size reduction caused by the Table jump Inst is less than the size increase caused by making the Jump Table).

However, the linker still adds the symbol __jvt_base$ to .symtab . This results in a small increase in program size.

VincentWu added inline comments.Fri, Dec 8, 2:51 AM
lld/ELF/Writer.cpp
1697–1698

the linker still adds the symbol __jvt_base$ to .symtab . This results in a small increase in program size.

Thus, I tried to delay adding the symbol __jvt_base$ here and remove the symbol __jvt_base$ when the Jump Table is not empty. Something like that

But it will crash with error placeholder symbol reached writer at lld/ELF/Symbols.cpp:154. Does anyone can point out anything else should I do to add symbol delay?