This is an archive of the discontinued LLVM Phabricator instance.

Fix Bug 30978 by emitting cv file checksums.
ClosedPublic

Authored by ecbeckmann on Aug 25 2017, 11:37 AM.

Details

Summary

The checksums had already been placed in the IR, this patch allows
MCCodeView to actually write it out to an MCStreamer.

Diff Detail

Repository
rL LLVM

Event Timeline

ecbeckmann created this revision.Aug 25 2017, 11:37 AM
ecbeckmann updated this revision to Diff 114116.Sep 6 2017, 7:00 PM

Some more testing changes.

zturner edited edge metadata.Sep 6 2017, 9:27 PM

Try to be more careful so as not to submit testing code / comments etc in uploaded patches. Reasons like this are why I prefer using a debugger instead of printf debugging :)

llvm/include/llvm/IR/DebugInfoMetadata.h
476–481 ↗(On Diff #114116)

If you're going to make it an enum class, then it doesn't need the CSK prefix in the name, since you have to write ChecksumKind anyway. I wouldn't suggest churning the code to change the names everywhere it's used, so instead maybe just leave this as a regular enum. It's already scoped to DIFile anyway, so there's no risk of namespace pollution.

llvm/include/llvm/MC/MCCodeView.h
297–300 ↗(On Diff #114116)

Any reason this isn't just:

struct FileInfo {
  StringRef Name;
  StringRef Checksum;
  ChecksumKind Kind;
};

SmallVector<FileInfo> Files;

having a StringMap seems unnecessary if we're already iterating a linear container anyway

llvm/lib/DebugInfo/CodeView/DebugChecksumsSubsection.cpp
53 ↗(On Diff #114116)

Delete

llvm/lib/DebugInfo/CodeView/DebugStringTableSubsection.cpp
32 ↗(On Diff #114116)

Delete

38 ↗(On Diff #114116)

Delete

44 ↗(On Diff #114116)

Delete

llvm/lib/MC/MCCodeView.cpp
188–200 ↗(On Diff #114116)

What is this? A bunch of code commented out, and then emitting a bunch of magic numbers?

llvm/tools/llvm-readobj/COFFDumper.cpp
927 ↗(On Diff #114116)

Delete

929 ↗(On Diff #114116)

Delete

1088 ↗(On Diff #114116)

Delete

1090 ↗(On Diff #114116)

Delete

rnk added inline comments.Sep 7 2017, 9:51 AM
llvm/lib/MC/MCCodeView.cpp
418–420 ↗(On Diff #114116)

We shouldn't assume all DIFiles have MD5 checksums, we'll need some way to know the filechecksum table offset.

We should probably implement this FIXME:

// FIXME: We should store the string table offset of the filename, rather than
// the filename itself for efficiency.
Filename = addToStringTable(Filename);
Filenames[Idx] = Filename;

We should do that by making FIlenames a vector of two integers: the offset of the file in the file checksum table, and the offset of the filename in the string table.

All the places where we do 8 * (FileId - 1) we would rewrite to use Filenames[FileId - 1].ChecksumTableOffset.

ecbeckmann changed the visibility from "Public (No Login Required)" to "ecbeckmann (Eric Beckmann)".Sep 7 2017, 10:51 AM
ecbeckmann changed the edit policy from "All Users" to "ecbeckmann (Eric Beckmann)".
ecbeckmann removed reviewers: zturner, rnk.
ecbeckmann removed a subscriber: hiraditya.

Use MCFixup

ecbeckmann updated this revision to Diff 114469.Sep 8 2017, 7:00 PM

Everything works finally.

ecbeckmann added inline comments.Sep 8 2017, 7:25 PM
llvm/include/llvm/MC/MCCodeView.h
285 ↗(On Diff #114469)

remove

299 ↗(On Diff #114469)

Just one comment says:

Array storing file information. Each tuple contains the string table offset, a symbol representing the file table offset, and a set flag.

304 ↗(On Diff #114469)

remove

llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
700 ↗(On Diff #114469)

remove this comment

702 ↗(On Diff #114469)

remove junk here too

llvm/lib/DebugInfo/CodeView/DebugChecksumsSubsection.cpp
53 ↗(On Diff #114469)

remove

llvm/lib/DebugInfo/CodeView/DebugStringTableSubsection.cpp
32 ↗(On Diff #114469)

remove

39 ↗(On Diff #114469)

remove

45 ↗(On Diff #114469)

remove

ecbeckmann added inline comments.Sep 8 2017, 7:25 PM
llvm/lib/MC/MCCodeView.cpp
49 ↗(On Diff #114469)

remove

63 ↗(On Diff #114469)

remove comment

137 ↗(On Diff #114469)

combine this all into one line

138 ↗(On Diff #114469)

Don't reassign to S and just directly use Ret.first

155 ↗(On Diff #114469)

add a comment explaining this function

156 ↗(On Diff #114469)

remove

162 ↗(On Diff #114469)

remove spaces here

168 ↗(On Diff #114469)

remove these

174 ↗(On Diff #114469)

+=4 for the string table offset

177 ↗(On Diff #114469)

+1 for size, +1 for kind, +2 to align (maybe just replace that with alignTo for clarity)

180 ↗(On Diff #114469)

remove logging, more comments to explain offset

216 ↗(On Diff #114469)

add comment explaining that, since the checksum section has been emitted, we can assume the offsets have been finalized so we can assign them.

227 ↗(On Diff #114469)

update this comment

231 ↗(On Diff #114469)

remove these

241 ↗(On Diff #114469)

yes remove this

253 ↗(On Diff #114469)

add comment to explain, how it needs to use a MCFixup provided by EmitValueImpl

265 ↗(On Diff #114469)

remove

268 ↗(On Diff #114469)

create a temporary symbol to hold the offset once it is calculated

304 ↗(On Diff #114469)

remove

319 ↗(On Diff #114469)

a

458 ↗(On Diff #114469)

r

504 ↗(On Diff #114469)

change this comment

llvm/lib/MC/MCObjectStreamer.cpp
144 ↗(On Diff #114469)

r

llvm/lib/MC/MCParser/AsmParser.cpp
3455 ↗(On Diff #114469)

should be lower case for consistency

llvm/lib/MC/WinCOFFObjectWriter.cpp
725 ↗(On Diff #114469)

r

llvm/tools/llvm-readobj/COFFDumper.cpp
910 ↗(On Diff #114469)

r these

1020 ↗(On Diff #114469)

r

1106 ↗(On Diff #114469)

not needed

Make this entire thing easier to follow and understand.

llvm/lib/MC/MCCodeView.cpp
60 ↗(On Diff #114469)

This is confusing to read, make it more intuitive

Fortunately this seems to work whether generating assembly or object

Why do we have to add placeholders for the checksum offsets? Aren't they just added in order, one after another?

Actually, the implementation of addFile hasn't assumed it is done in order, and you can actually add random fileNo's whenever you want. Therefore, we shouldn't assume they are always done in order, and we should allow for the case where we don't yet know the checksum offset.

Add checksum offset directive.

Restructrue, create real symbol not temp.

Remove errant errors.

ecbeckmann changed the visibility from "ecbeckmann (Eric Beckmann)" to "Public (No Login Required)".Sep 13 2017, 6:45 PM
ecbeckmann marked 2 inline comments as done.
ecbeckmann added inline comments.
llvm/include/llvm/IR/DebugInfoMetadata.h
476–481 ↗(On Diff #114116)

Unfortunately I need the actual enum values to correspond exactly to the checksum types flag integer values actually present in the codeview section, hence the strongly typed enum class. This is important because I will now be emitting the ChecksumKind as assembly directives as well as directly into the binary.

llvm/include/llvm/MC/MCCodeView.h
297–300 ↗(On Diff #114116)

Yes, combined all this info into one vector. It's still useful to have a StringMap because we want to check if a filename is already used before making a new entry.

ecbeckmann changed the visibility from "Public (No Login Required)" to "ecbeckmann (Eric Beckmann)".Sep 13 2017, 6:45 PM
ecbeckmann marked 2 inline comments as done.

Reformat

ecbeckmann changed the visibility from "ecbeckmann (Eric Beckmann)" to "Public (No Login Required)".Sep 14 2017, 1:48 PM
ecbeckmann marked 2 inline comments as done.
ecbeckmann added inline comments.
llvm/lib/MC/MCCodeView.cpp
188–200 ↗(On Diff #114116)

It was all debugging code that was unintended for review. Please disregard. Sorry again for not making this patch private to begin with.

418–420 ↗(On Diff #114116)

Filenames renamed to Files, which is now a vector of a massive structure containing all offset and checksum information.

ecbeckmann marked an inline comment as done.
ecbeckmann marked 8 inline comments as done.Sep 14 2017, 2:15 PM
ecbeckmann added inline comments.Sep 14 2017, 2:26 PM
llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
688 ↗(On Diff #115274)

Update this FIXME to say that we do provide checksum file info, and what VS uses this for.

rnk added inline comments.Sep 14 2017, 4:12 PM
llvm/lib/MC/MCCodeView.cpp
169 ↗(On Diff #115274)

I don't think you want to do this, this leaks the internal MCSymbol (file1, etc) to the assembler. Nothing in the assembly defines file1, so the assembly isn't very readable.

llvm/test/DebugInfo/COFF/inlining.ll
52 ↗(On Diff #115274)

What I had in mind was that we'd add a directive like .cv_filechecksum_offset <fileid> and MCCodeView.cpp would internally implement it without leaking any labels to the caller. We could use a label difference of two symbols to implement the offset computation, but any symbols shouldn't leak out to the assembler.

probinson added inline comments.
llvm/include/llvm/IR/DebugInfoMetadata.h
476–481 ↗(On Diff #114116)

If you need the enums to have specific values, e.g. because those values end up in the object file (hard to tell here but I think that's what you are saying), you should set the enum values explicitly. This documents the exact values you need (even if they happen to be sequential at the moment). Making it an enum class doesn't get you that.

llvm/lib/IR/DebugInfoMetadata.cpp
357 ↗(On Diff #115274)

If you require exact correspondence between an enum and some array of strings, we usually do that with a .def file and macros. See for example Dwarf.def. As it is, the association between this array and the ChecksumKind values is implied rather than explicit. Safer to be explicit.

rnk added inline comments.Sep 14 2017, 5:27 PM
llvm/lib/MC/MCCodeView.cpp
68 ↗(On Diff #115274)

Use MCContext::createTempSymbol to make sure these don't show up in the .obj symbol table.

ecbeckmann marked 5 inline comments as done.

Don't leak symbol into assembler, remove enum class, use createTempSymbol.

ecbeckmann added inline comments.Sep 14 2017, 7:36 PM
llvm/include/llvm/IR/DebugInfoMetadata.h
476–481 ↗(On Diff #114116)

Remove enum class to explicitly define values.

llvm/lib/IR/DebugInfoMetadata.cpp
357 ↗(On Diff #115274)

Okay this should probably be addressed in another patch as this patch did not introduce the array.

llvm/lib/MC/MCCodeView.cpp
169 ↗(On Diff #115274)

Instead of emitting the symbol, we will emit the .cv_filechecksumoffset <fileno> directive.

Some formatting fixes.

Removed some last unnecessary changes.

A few minor commentary points; nothing huge. I've been looking at this mainly with the thought of having to do something similar for DWARF 5 eventually, but it seems likely there won't be a lot to leverage as CodeView and DWARF appear to associate the hashes with files pretty differently. Oh well.

llvm/include/llvm/IR/DebugInfoMetadata.h
476–481 ↗(On Diff #114116)

And maybe a comment that these end up in the object file, so some well-meaning person doesn't come a long later and remove the "unnecessary" explicit values.

llvm/include/llvm/MC/MCStreamer.h
732 ↗(On Diff #115352)

As long as you're changing the comment anyway, please remove the \brief.

778 ↗(On Diff #115352)

Remove \brief.

ecbeckmann added inline comments.Sep 15 2017, 10:56 AM
llvm/test/DebugInfo/COFF/inlining.ll
52 ↗(On Diff #115274)

implemented as such

rnk accepted this revision.Sep 15 2017, 11:08 AM

Thanks, looks good!

llvm/include/llvm/MC/MCCodeView.h
302–303 ↗(On Diff #115352)

Generally, if the comment describing the variable doesn't fit on one line, consider moving it before the variable. Then we won't end up with this awkard split after the *.

This revision is now accepted and ready to land.Sep 15 2017, 11:08 AM
ecbeckmann marked 3 inline comments as done.

Changed some comments.

ecbeckmann marked an inline comment as done.

Updated comment formatting.

This revision was automatically updated to reflect the committed changes.