Page MenuHomePhabricator

[CodeView] Emit S_OBJNAME record
ClosedPublic

Authored by aganea on Feb 6 2018, 7:33 PM.

Details

Summary

We've had a longstanding incompatibility with MSVC where we don't emit a couple of CodeView records that normally go in each file. This patch aims to address the first of these records - the S_OBJNAME symbol. This symbol is very simple. It just contains the full path to the object file that the debug info is being emitted to.

When building from clang, the approach taken here is to use the target file name, as decided by the CompilerInstance::createDefaultOutputFile. It is then passed down to the backend through MCTargetOptions, to make it available to CodeViewDebug. The same approach is taken when using llc or LTO/ThinLTO (through LLD or llvm-lto2). When emitting to stdout (path is '-'), we emit the S_OBJNAME record with an empty name.

The objective of this patch is to support live code patching tools like Recode or Live++. They are different from Microsoft Edit & Continue, which we don't plan to support. Several changes are needed in LLVM to support these tools:

  1. Emit S_OBJNAME (this patch)
  2. Emit full cc1 cmd-line in LF_BUILDINFO, see D80833.
  3. Emit two-byte nops, see D81301.
  4. Emit the 'hotpatch' flag in S_COMPILE3, see D116511.

Diff Detail

Unit TestsFailed

TimeTest
380 mslinux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -m64 -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -mcmodel=large -mllvm -hwasan-globals -mllvm -hwasan-use-short-granules -mllvm -hwasan-instrument-landing-pads=0 -mllvm -hwasan-instrument-personality-functions /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/hwasan/TestCases/sizes.cpp -nostdlib++ -lstdc++ -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/hwasan/X86_64/TestCases/Output/sizes.cpp.tmp
340 mslinux > HWAddressSanitizer-x86_64.TestCases/Linux::reuse-threads.cpp
Script: -- : 'RUN: at line 2'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -m64 -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -mcmodel=large -mllvm -hwasan-globals -mllvm -hwasan-use-short-granules -mllvm -hwasan-instrument-landing-pads=0 -mllvm -hwasan-instrument-personality-functions -mllvm -hwasan-instrument-stack=0 /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/hwasan/TestCases/Linux/reuse-threads.cpp -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/hwasan/X86_64/TestCases/Linux/Output/reuse-threads.cpp.tmp && env HWASAN_OPTIONS=disable_allocator_tagging=1:random_tags=0:verbose_threads=1 /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/hwasan/X86_64/TestCases/Linux/Output/reuse-threads.cpp.tmp 2>&1 | FileCheck /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/hwasan/TestCases/Linux/reuse-threads.cpp
210 mswindows > lld.ELF/invalid::symtab-sh-info.s
Script: -- : 'RUN: at line 4'; c:\ws\w64\llvm-project\premerge-checks\build\bin\yaml2obj.exe --docnum=1 C:\ws\w64\llvm-project\premerge-checks\lld\test\ELF\invalid\symtab-sh-info.s -o C:\ws\w64\llvm-project\premerge-checks\build\tools\lld\test\ELF\invalid\Output\symtab-sh-info.s.tmp.o

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Modified the tests as suggested by @aprantl

The only thing that's missing is a test for the correct hashing of the new field in unittests/IR/MetadataTest.cpp.

To clarify my last comment: No need to check for the correct hashing, since DICompileUnits are always distinct, but extending DICompileUnitTest would be the right thing to do.

rnk added a comment.Oct 29 2018, 5:45 PM

I was reviewing the records that msvc emits that clang does not, and I noticed this again.

Here's a thought. Why are we putting this on the compile unit in the first place? Why not use a global named metadata node? Most of the complexity in this patch is from going through DICompileUnit, when we can just claim a global named metadata node for the output object file. It could probably even point to a DIFile so we can preserve some information about whether the path is relative.

In D43002#1279796, @rnk wrote:

I was reviewing the records that msvc emits that clang does not, and I noticed this again.

Here's a thought. Why are we putting this on the compile unit in the first place? Why not use a global named metadata node? Most of the complexity in this patch is from going through DICompileUnit, when we can just claim a global named metadata node for the output object file. It could probably even point to a DIFile so we can preserve some information about whether the path is relative.

If this is meant to name the final object file that's emitted - perhaps it shouldn't be in metadata at all. The Split-DWARF DWO file name is similar: it must name the actual file that was emitted - and isn't related to the original modules that may've been generated. This data is passed down through an MCOption, if I recall correctly.

aganea added a subscriber: aganea.Oct 30 2018, 5:15 AM

@zturner Do you mind if we took over this patch and finish it? This prevents using Live++ with Clang.

Herald added a project: Restricted Project. · View Herald TranscriptJan 20 2020, 5:58 AM

Hi, I just jumped on this project. I'm currently trying to get S_OBJECTNAME symbol in order to make Live++ work.
I was looking at the feed and I don't understand what you mean @rnk when you say "Why not use a global named metadata node?". Could you please give more clarification on what you mean here, it would help a lot.
Also, @dblaikie if the info shouldn't be in metadata, where do you think it should be ?
Thanks

rnk added a comment.Mar 10 2020, 10:27 AM

Thinking about it again, maybe it should be an MCOption like David suggested. S_OBJNAME should probably refer to the final location of the native object file, which is only known to whoever sets up MCOptions. For example, with "fat" LTO, the linker is the one responsible for setting up the output file.

Named metadata nodes are things like !llvm.ident = {...} that you see at global scope in .ll files, but I don't think it's relevant.

As far as this patch is concerned, I don't think we should put this thing on the compile unit. The whole idea of compile units is to preserve the separation between compilation units through LTO, but we can't really do that in codeview anyway.

Hello, I as requested placed the outputfile name in the MCTargetOptions.h. However, I encountered an issue. @rnk

I didn’t find any way to go back to MCTargetOptions.h from the CompilerInvocation.cpp. I understand it would make sense that there is one somewhere, but I can’t find it. Do any of you know how I’m supposed to access the MCTargetOption? Should I create my own way there?
Sorry to waste your time and maybe the answer is simple to answer but I’m fairly new to this code so there is some aspects of the code I dont fully understand.

aganea commandeered this revision.May 28 2020, 12:00 PM
aganea added a reviewer: zturner.
aganea edited the summary of this revision. (Show Details)May 28 2020, 12:19 PM
aganea added reviewers: MaskRay, reames.
aganea added a subscriber: lantictac.
aganea updated this revision to Diff 266989.May 28 2020, 12:27 PM

I'm taking over this patch, as discussed offline with Zachary.

I've re-implemented the patch to use the target/final file name of the .OBJ, as suggested.

The bulk of the changes is mostly to pass around the file name, down to the back end.

First question:

Since split dwarf has to do some similar things can we not use the same support? This seems to be a lot of changes for this.

Second question:

and if we can't use the same support can we make split dwarf use this? Having two separate methods for passing everything around is a bit painful.

aganea updated this revision to Diff 267111.EditedMay 28 2020, 7:34 PM

Simplified. Now using CodeGenOptions to pass the object filename around.

aganea added a comment.EditedMay 28 2020, 7:39 PM

First question:

Since split dwarf has to do some similar things can we not use the same support? This seems to be a lot of changes for this.

Second question:

and if we can't use the same support can we make split dwarf use this? Having two separate methods for passing everything around is a bit painful.

To answer both questions, I tried to converge but the uses are slightly different. In one case (DWARF) the -split-dwarf-file/-split-dwarf-output flags are passed down almost blindly to the back-end, whereas for COFF there's more logic to determine where the file should be emitted (see CompilerInstance::createOutputFile). We always need full path names in S_OBJNAME, whereas (it seems) split DWARF can handle relative paths (for example see llvm-mc -split-dwarf-file). I think the purpose is also different: the .dwo is for immediate debugger consumption, whereas the path in S_OBJNAME seems to be used more like a global identifier.

Please let me know if you feel this could be done differently.

reames resigned from this revision.May 29 2020, 11:49 AM
aganea updated this revision to Diff 304364.Nov 10 2020, 5:54 PM
aganea edited the summary of this revision. (Show Details)
aganea edited reviewers, added: mstorsjo; removed: reames, espindola.

Simplify.
Added coverage when using clang-cl since this is a MSVC-specific feature.
When -fdebug-compilation-dir=. is used, relative paths are now emitted in the S_OBJNAME record.

aganea retitled this revision from Emit S_OBJNAME symbol in CodeView to [CodeView] Emit S_OBJNAME record.Dec 10 2020, 8:03 AM
aganea added a reviewer: gratianlup.
rnk added a comment.Dec 10 2020, 3:15 PM

I think the biggest concern here is what to do about /Fa / -save-temps. If we do nothing, the S_OBJNAME record will change if you run the same compilation twice with and without those flags. Generally, we strive to ensure that the directly emitted object file is identical to one rinsed through textual assembly, but that doesn't always work. I'm not sure if /Fa works reliably since we switched to Intel style assembly syntax either.

clang/include/clang/Basic/CodeGenOptions.h
212

Let's bikeshed the name a bit. This thing is the -o//Fo value, plus some processing. It can be symbolic, as in -. It could be the name of a bitcode file with -flto. It could be the name of an assembly file if you do clang -S --target=x86_64-windows-msvc -g t.cpp -o t.s. It could be the name of an ELF file if you try hard. I think in any of these cases where the user directly emits something that isn't a COFF object, it's OK to use that name for the S_OBJNAME record.

What it is, is the name of the compiler's output file, as we would like it to appear in debug info. With that in mind, here are some ideas:

  • CodeViewObjectName: very directly referring to S_OBJNAME
  • ObjectFilename: very general
  • ObjectFilenameForDebug: generalizing over cv/dwarf
  • OutputFilenameForDebug: generalizing over assembly, bc, and obj

I think I like ObjectFilenameForDebug best. DebugObjectFilename seems like a no-go, since that sounds like the dwo file name.


This brings me to the case of -save-temps / /Fa. In these cases, the compile action is broken into pieces, and the assembler is invoked in a follow-up process. Does that mean the driver needs to pass an extra flag along to the -cc1 action? That would be a bummer.

clang/lib/Frontend/CompilerInstance.cpp
717

I think saving the output file name during output file creation seems like an unexpected side effect for this function. Can we handle it earlier, maybe near dwo file name handling?

clang/test/CodeGenCXX/debug-info-objname.cpp
4

I checked, this is also consistent with clang's regular /Z7 output for source filenames.

20–21

And this is the thing we rely on, so it should all work.

llvm/include/llvm/MC/MCTargetOptions.h
63

This should probably be in llvm/include/llvm/Target/TargetOptions.h. I think MCTargetOptions is intended to control assembler behavior, but this option is read while producing assembly in CodeGen, not during assembling.

llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
752

This isn't unescaping them, it's just canonicalizing double slashes to one slash, right? Would llvm::sys::native suffice?

aganea marked 8 inline comments as done.Jan 13 2021, 11:46 AM
aganea added inline comments.
clang/include/clang/Basic/CodeGenOptions.h
212

I think we should fix -save-temps and /Fa, thanks for raising that! Would you mind if I added a new cc1 flag for that purpose? Something like -target-file-name.
When using -save-temps, each cc1 command doesn't know about the final filename, except for the last cc1 command. We would need to provide that name somehow when passing -S.

llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
752

llvm::sys::path::native() doesn't remove consecutive (back)slashes. I think @zturner's main intent was when debugging in Visual Studio with arguments from LIT tests, they sometimes contain double backslashes.

rnk added inline comments.Jan 13 2021, 6:36 PM
clang/include/clang/Basic/CodeGenOptions.h
212

Yes, I think we'll need a new cc1 flag. I'd avoid "target" in the flag name, since it makes me thing of target as in platform, OS, ISA, etc. Maybe just -object-file-name? It's very boring, and reasonably specific.

llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
752

I see. I think "unescape" shouldn't be in the name, this isn't about escape characters, it's about cleaning up or canonicalizing the path.

@rnk @aganea Ping... Just wanted to see what was happening with this change, and if there is anything I can do to help get it merged in.

aganea updated this revision to Diff 374884.Sep 24 2021, 9:50 AM
aganea marked 5 inline comments as done.
aganea edited reviewers, added: akhuang; removed: zturner, gratianlup.
aganea edited subscribers, added: zturner, gratianlup; removed: aganea.

As suggested by @rnk

aganea added inline comments.Sep 24 2021, 9:50 AM
llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
752

I removed this function. Now using sys::path::remove_dots which removes the consecutive slashes.

Ping! Any further comments? Thanks in advance for your time.

rnk accepted this revision.Dec 16 2021, 12:57 PM

lgtm

Sorry for the delay, I got sick last week and read through email newest to oldest.

This revision is now accepted and ready to land.Dec 16 2021, 12:57 PM
This revision was landed with ongoing or failed builds.Dec 21 2021, 6:27 AM
This revision was automatically updated to reflect the committed changes.

Some test nits:) I don't know CodeView :(

clang/test/CodeGenCXX/debug-info-objname.cpp
2

The whole file appears to be CRLF where almost all other tests use LF in llvm-project.

3

%T is deprecated in lit (https://llvm.org/docs/CommandGuide/lit.html#substitutions): it is easy to cause multi-threading file reuse problems because lit runs tests parallelly.

I personally prefer RUN: rm -rf %t && mkdir %t && cd %t if you need to cd.

My personal comment style for non-RUN non-CHECK lines is /// - it stands out in many editors and may help possible FileCheck/lit's future direction to catch stale CHECK prefixes.

31

Prefer --input-file or input redirection to cat xxx | yyy

37

{{[0-9]+}} -> [[#]]

Some test nits:) I don't know CodeView :(

Fixed, thanks @MaskRay !

aganea edited the summary of this revision. (Show Details)Jan 16 2022, 6:21 AM