This is an archive of the discontinued LLVM Phabricator instance.

Introduce and use a new section type for the bb_addr_map section.
ClosedPublic

Authored by rahmanl on Sep 24 2020, 12:38 AM.

Details

Summary

This patch lets the bb_addr_map (renamed to __llvm_bb_addr_map) section use a special section type (SHT_LLVM_BB_ADDR_MAP) instead of SHT_PROGBITS. This would help parsers, dumpers and other tools to use the sh_type ELF field to identify this section rather than relying on string comparison on the section name.

Diff Detail

Event Timeline

rahmanl created this revision.Sep 24 2020, 12:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 24 2020, 12:38 AM
rahmanl published this revision for review.Sep 24 2020, 12:44 AM
rahmanl added a reviewer: jhenderson.
rahmanl edited the summary of this revision. (Show Details)
rahmanl edited subscribers, added: MaskRay, tmsriram, shenhan and 2 others; removed: hiraditya.

Mostly looks okay. I think there is no testing for the assembly parser being able to parse the new section type?

I'm also keen to hear what others think (e.g. @MaskRay).

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1025–1026

Please make sure to run clang-format on all the bits of code you've modified.

rahmanl updated this revision to Diff 294371.Sep 25 2020, 11:19 AM
  • Add new test for MCParser.
  • Clang-format.
rahmanl marked an inline comment as done.Sep 25 2020, 11:21 AM

Thanks @jhenderson, turned out I did need a one-line change for asmparser to work.

Mostly looks okay. I think there is no testing for the assembly parser being able to parse the new section type?

I'm also keen to hear what others think (e.g. @MaskRay).

This patch lets the bb_addr_map (renamed to llvm_bb_addr_map) section use a special section type (SHT_LLVM_BB_ADDR_MAP) instead of SHT_PROGBITS. This would help linker and other tools to use the sh_type ELF field to identify this section rather than relying on string comparison on the section name.

This description implies that some special processing is needed in the linker. (As you can see, when llvm_call_graph_profile and llvm_dependent_libraries were proposed, they had clear patches on the linker side.)

llvm/test/MC/AsmParser/section_types.s
12 ↗(On Diff #294371)

@note is covered by other tests.

Keeping just llvm specific section types here should be fine.

Mostly looks okay. I think there is no testing for the assembly parser being able to parse the new section type?

I'm also keen to hear what others think (e.g. @MaskRay).

This patch lets the bb_addr_map (renamed to llvm_bb_addr_map) section use a special section type (SHT_LLVM_BB_ADDR_MAP) instead of SHT_PROGBITS. This would help linker and other tools to use the sh_type ELF field to identify this section rather than relying on string comparison on the section name.

This description implies that some special processing is needed in the linker. (As you can see, when llvm_call_graph_profile and llvm_dependent_libraries were proposed, they had clear patches on the linker side.)

Maybe pivot the statement to say "dumpers and other tools"? Dumpers will need to know the section type for simpler processing of the code, even if the linker has no special handling currently.

rahmanl edited the summary of this revision. (Show Details)Sep 28 2020, 11:00 AM
rahmanl updated this revision to Diff 294765.Sep 28 2020, 11:08 AM
  • Remove non-llvm section types from the test.
  • Rebase.
rahmanl updated this revision to Diff 294766.Sep 28 2020, 11:12 AM
rahmanl marked an inline comment as done.
  • Rename test to llvm_section_types.

Thanks @MaskRay and @jhenderson.
Changed the description and the test according to your comments.

Mostly looks okay. I think there is no testing for the assembly parser being able to parse the new section type?

I'm also keen to hear what others think (e.g. @MaskRay).

This patch lets the bb_addr_map (renamed to llvm_bb_addr_map) section use a special section type (SHT_LLVM_BB_ADDR_MAP) instead of SHT_PROGBITS. This would help linker and other tools to use the sh_type ELF field to identify this section rather than relying on string comparison on the section name.

This description implies that some special processing is needed in the linker. (As you can see, when llvm_call_graph_profile and llvm_dependent_libraries were proposed, they had clear patches on the linker side.)

Maybe pivot the statement to say "dumpers and other tools"? Dumpers will need to know the section type for simpler processing of the code, even if the linker has no special handling currently.

xray_instr_map and xray_fn_idx are also an LLVM specific section types but they do not get their own section types. They use a dedicated dumper (llvm-xray), instead of llvm-readobj.

If the eventual behavior is to let LLD interpret .bb_addr_map differently (like .deplibs and .llvm_sympart*) differently (not "Rules for Linking Unrecognized Sections" on http://www.sco.com/developers/gabi/latest/ch4.sheader.html), having a dedicated section type seems fine. Though, I'd really like to know whether auto-defined __start_bb_addr_map and __stop_bb_addr_map are needed. If yes, the section name should not start with a period. And I'd like to know how the section is differently interpreted. xray_instr_map and xray_fn_idx only use generic ELF features.

rahmanl edited the summary of this revision. (Show Details)Sep 28 2020, 12:01 PM

Mostly looks okay. I think there is no testing for the assembly parser being able to parse the new section type?

I'm also keen to hear what others think (e.g. @MaskRay).

This patch lets the bb_addr_map (renamed to llvm_bb_addr_map) section use a special section type (SHT_LLVM_BB_ADDR_MAP) instead of SHT_PROGBITS. This would help linker and other tools to use the sh_type ELF field to identify this section rather than relying on string comparison on the section name.

This description implies that some special processing is needed in the linker. (As you can see, when llvm_call_graph_profile and llvm_dependent_libraries were proposed, they had clear patches on the linker side.)

Maybe pivot the statement to say "dumpers and other tools"? Dumpers will need to know the section type for simpler processing of the code, even if the linker has no special handling currently.

xray_instr_map and xray_fn_idx are also an LLVM specific section types but they do not get their own section types. They use a dedicated dumper (llvm-xray), instead of llvm-readobj.

If the eventual behavior is to let LLD interpret .bb_addr_map differently (like .deplibs and .llvm_sympart*) differently (not "Rules for Linking Unrecognized Sections" on http://www.sco.com/developers/gabi/latest/ch4.sheader.html), having a dedicated section type seems fine. Though, I'd really like to know whether auto-defined __start_bb_addr_map and __stop_bb_addr_map are needed. If yes, the section name should not start with a period. And I'd like to know how the section is differently interpreted. xray_instr_map and xray_fn_idx only use generic ELF features.

No, there are no plans to treat this section differently in LLD. However, we will have an external tool parsing this section. Also we don't need start and stop symbols.

I noticed that there are other LLVM sections which use __llvm_* as a prefix for their section name, instead of .llvm_* or .llvm.* (see e.g. __llvm_prf_cnts). What do you think about using that style for the name? It's not necessary, and it looks like we have a bit of inconsistency going on here, but it's something worth considering.

I noticed that there are other LLVM sections which use __llvm_* as a prefix for their section name, instead of .llvm_* or .llvm.* (see e.g. __llvm_prf_cnts). What do you think about using that style for the name? It's not necessary, and it looks like we have a bit of inconsistency going on here, but it's something worth considering.

That's possible too, but just to verify, do we still want to add the new section type?

Personally, I think we should still have a new section type - the section has a well-defined structure, which in turn means it can be parsed and dumped. We shouldn't need to use section names to distinguish it from other sections to determine which section to parse or dump.

rahmanl updated this revision to Diff 295666.Oct 1 2020, 1:39 PM
  • Rename section to __llvm_bb_addr_map.
Herald added a reviewer: andreadb. · View Herald Transcript
Herald added a reviewer: herhut. · View Herald Transcript
Herald added a reviewer: rriddle. · View Herald Transcript
Herald added a reviewer: aartbik. · View Herald Transcript
Herald added a reviewer: ftynse. · View Herald Transcript
Herald added a reviewer: silvas. · View Herald Transcript
Herald added projects: Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project. · View Herald Transcript
Herald added a reviewer: Restricted Project. · View Herald Transcript
Herald added a reviewer: Restricted Project. · View Herald Transcript
rahmanl edited the summary of this revision. (Show Details)Oct 1 2020, 1:39 PM
rahmanl updated this revision to Diff 295668.Oct 1 2020, 1:40 PM

Diff reupload.

rahmanl removed projects: Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project, Restricted Project.
rahmanl removed subscribers: arsenm, dschuff, dylanmckay and 58 others.

@MaskRay Do you think this is fine now?

I noticed that there are other LLVM sections which use __llvm_* as a prefix for their section name, instead of .llvm_* or .llvm.* (see e.g. __llvm_prf_cnts). What do you think about using that style for the name? It's not necessary, and it looks like we have a bit of inconsistency going on here, but it's something worth considering.

I am not sure it will be a good idea. The start/stop symbols of __llvm_prf_cnts is __start___llvm_prf_cnts and __stop___llvm_prf_cnts. The 3 _ look strange.

If the optimization eventually will be ported to Mach-O: strlen("__llvm_prf_cnts") <= 16 so the section name can be encoded in Mach-O. __llvm_bb_addr_map is not.

rahmanl updated this revision to Diff 296366.Oct 5 2020, 11:38 PM

Revert back to using ".llvm_bb_addr_map" for the section name.

I noticed that there are other LLVM sections which use __llvm_* as a prefix for their section name, instead of .llvm_* or .llvm.* (see e.g. __llvm_prf_cnts). What do you think about using that style for the name? It's not necessary, and it looks like we have a bit of inconsistency going on here, but it's something worth considering.

I am not sure it will be a good idea. The start/stop symbols of __llvm_prf_cnts is __start___llvm_prf_cnts and __stop___llvm_prf_cnts. The 3 _ look strange.

Right. We won't be using the __start and __stop symbols.

If the optimization eventually will be ported to Mach-O: strlen("__llvm_prf_cnts") <= 16 so the section name can be encoded in Mach-O. __llvm_bb_addr_map is not.

Good point. Actually this shows the importance of using the section type as opposed to the section name. If necessary, we can always change the section name since we make sure we never rely on the actual name.

jhenderson accepted this revision.Oct 7 2020, 3:03 AM
jhenderson added a subscriber: grimar.

LGTM, but please wait for @MaskRay to give any final thoughts.

Also subscribing @grimar who is involved heavily in the ELF binutils/LLD too, in case he's got any comments.

This revision is now accepted and ready to land.Oct 7 2020, 3:03 AM
grimar added a comment.Oct 7 2020, 3:17 AM

I think you also need to update the documentation: llvm\docs\Extensions.rst (https://llvm.org/docs/Extensions.html)

llvm/test/MC/AsmParser/llvm_section_types.s
3

Perhaps just i386-pc-linux, as there is nothing GNU specific here.

4
16

Add an empty line to separate the code and checks?

rahmanl updated this revision to Diff 296834.Oct 7 2020, 5:55 PM
  • Add entry in docs/Extensions.rst.
  • Address @grimar's comments.
rahmanl marked 3 inline comments as done.Oct 7 2020, 5:58 PM

Thanks @grimar.
I added the entry in the Extensions doc. I am not sure if my formatting is correct. I appreciate if you could please take a look.

rahmanl updated this revision to Diff 296838.Oct 7 2020, 6:00 PM
  • Format Extensions.rst.
grimar added a comment.Oct 8 2020, 2:18 AM

Thanks @grimar.
I added the entry in the Extensions doc. I am not sure if my formatting is correct. I appreciate if you could please take a look.

I am not an expert in this area too, but it looks consistent with the rest of the file, so I think it is fine.
To recheck myself I've used "Online Sphinx editor" (https://livesphinx.herokuapp.com/) and its output looks fine to me too.
I've inlined a comment about a single concern/nit I have though.

llvm/docs/Extensions.rst
411

Does #### has some special meaning? If not, I'd just put '#', that is the way other comments
are marked in the file.

rahmanl updated this revision to Diff 297019.Oct 8 2020, 11:08 AM
  • Remove extra '#'s from Extensions.rst.
rahmanl marked an inline comment as done.Oct 8 2020, 11:11 AM

Thanks everyone for the review. I'll go ahead and commit this and start working on D88717.

This revision was landed with ongoing or failed builds.Oct 8 2020, 11:13 AM
This revision was automatically updated to reflect the committed changes.