This is an archive of the discontinued LLVM Phabricator instance.

[llvm-debuginfo-analyzer] 09 - CodeView Reader
ClosedPublic

Authored by CarlosAlbertoEnciso on May 17 2022, 6:50 AM.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
probinson added inline comments.Sep 22 2022, 1:42 PM
llvm/lib/DebugInfo/LogicalView/LVReaderHandler.cpp
192

No need for local variable TheFilename .

272

No need to construct TheFilename explicitly.

llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp
839

I think recorded makes it sound like this function puts them into CUInlineeLines.

843

Isn't a lambda usually defined with auto?

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
57

I don't find any uses of ABSOLUTE_OFFSET which means it and the (bad!) unscoped global (bad!) PointerToRawData can be eliminated.

59

It looks like this is almost exactly getSymbolKindName from SymbolDumper.cpp. I don't see an easy way to make use of that function, though.

Return type should be a StringRef, no need to make a copy of the string literal.

68

Might need an llvm_unreachable here to avoid warnings about falling off the end of the function without a return value.

132

RelocSym might be null (checked above).

392

?

400

No need to construct TheFilename explicitly.

412

SmallString has a direct method for getting a C string, much more efficient than going through std::string (no copying).

422

If we have to check expectedInfo anyway, let's not assume getPDBInfoStream succeeds.

427
467

SmallString has a more efficient direct conversion to C string.

474
514
665

PointerToRawData is not used anywhere.

697

Might as well turn this off unconditionally.

777
929

In this context I think ExePath is the class member? which is already a std::string.

944

I think ExePath is already a std::string.

1019

"use the and it is marked as such." Could you rephrase?

1037

Please put braces around this, for consistency with the matching if.

1043

Please put braces around this, for consistency with the matching if.

1170
1175
1226

Is this an overload called from generic code? I don't see it used anywhere in this patch.

Comments through the end of llvm/lib.

My goodness this is a large patch. I don't really know anything about CodeView so the reader/visitor modules didn't get as much attention as they probably deserve.
Will try to finish tomorrow.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
226
259

FirstNonNamespace will be set to 0 on the first iteration of the loop, which cannot be a zero-trip loop because we know Components.size() is nonzero. So this initialization is dead.

267

I think this loop could be replaced with something along the lines of

LVStringRefs::iterator Iter = std::find_if(Components.begin(), Components.end(), 
    [](StringRef Name) {
        return IdentifiedNamespaces.find(Name) == IdentifiedNamespaces.end();
    });
LVStringRefs::size_type FirstNonNamespace = std::distance(Components.begin(), Iter);

find_if returns an iterator into Components for the first non-namespace, and then std::distance gets the index.

314

Reader and Visitor are initialized by the constructor, so remove = nullptr here.

403–406
441
482
687

CurrentElement appears to be used in only one method, so it can be made local to that method.
It would be best to find some other home for the other three globals here.

766
1068

Otherwise it reads as "de-franges" to me :)

1095

etc. I'll stop marking each one.

1400

Usually new is assumed to return non-null, why check it just in this one place?

2257
2323
3375

Unless format() understands StringRef directly? I'm not sure.

3404

Isn't this for loop (which does a break on the first iteration) equivalent to

if (!Locations->empty())
  ParentLowPC = Locations->begin()->getLowerAddress();
3407
3496

break not needed here

Update patch due to changes in previous patches.

The lit tests look familiar, I didn't examine them very closely.
Completed this review pass.

llvm/unittests/DebugInfo/LogicalView/CodeViewReaderTest.cpp
37–39
84–85

You can't continue if Ranges is null; should be able to continue if size() is wrong.

124–125
150–151
190–191
216–217
256–257
rnk added a comment.Sep 26 2022, 4:40 PM

This looks really cool, but I cannot claim to have reviewed it. Let me ask around for other reviewers.

Updated patch due to changes in previous patches.

This looks really cool, but I cannot claim to have reviewed it. Let me ask around for other reviewers.

@rnk Thanks.

Updated patch due to changes in previous patches.

zequanwu added inline comments.
llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
3394

It looks like it doesn't handle following case correctly. After parse line 5, it will create line starting at 0x20 with line offset being 3, which should be 2. It's from https://reviews.llvm.org/D123151#3431153. You may want to take a look on https://github.com/llvm/llvm-project/blob/main/llvm/lib/DebugInfo/PDB/Native/NativeInlineSiteSymbol.cpp#L101.

0602      line 1 (+1)
0315      code 0x15 (+0x15)
0B2B      code 0x20 (+0xB) line 2 (+1)
0602      line 3 (+1)
0311      code 0x31 (+0x11)
0B29      code 0x3A (+0x9) line 4 (+1)
0409      code end 0x43 (+0x9)
zequanwu added inline comments.Oct 7 2022, 5:38 PM
llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
1194

Why is Range set for S_DEFRANGE_FRAMEPOINTER_REL and S_DEFRANGE_REGISTER_REL but not set for S_DEFRANGE_REGISTER, S_DEFRANGE_SUBFIELD_REGISTER?

1623

Clang generated global ctor and dtor names containing the substrings: dynamic initializer for and dynamic atexit destructor for: https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CGDebugInfo.cpp#L2254-L2259

1901

Might worth checking Size != 0, I have seen that element size being 0, might be missing in pdb.

3394

NVM, ignore the comment. It's correct.

llvm/include/llvm/DebugInfo/LogicalView/Readers/LVCodeViewReader.h
63

Changed.

llvm/lib/DebugInfo/LogicalView/Core/LVReader.cpp
183

Changed.

llvm/lib/DebugInfo/LogicalView/Core/LVScope.cpp
1795

Changed.

llvm/lib/DebugInfo/LogicalView/Core/LVType.cpp
333

Changed.

llvm/lib/DebugInfo/LogicalView/LVReaderHandler.cpp
152

Changed.

llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp
839

Changed.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
392

Changed.

427

Changed.

777

Changed.

1170

Changed.

1175

Changed.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
403–406

Changed.

1068

Changed.

1095

Changed this one and other ocurrences.

2257

Changed.

2323

Changed.

3407

Changed.

llvm/include/llvm/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.h
116

Removed the explicit initialization.

134

Removed the explicit initialization.

140

Removed the explicit initialization.

149

Removed the explicit initialization.

240

Removed the explicit initialization.

249

Removed the explicit initialization.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
314

Removed the explicit initialization.

llvm/lib/DebugInfo/LogicalView/LVReaderHandler.cpp
192

Nice suggestion. Removed TheFilename.

llvm/lib/DebugInfo/LogicalView/LVReaderHandler.cpp
272

Nice suggestion. Removed TheFilename.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
400

Nice suggestion. Removed TheFilename.

412

Nice suggestion. Removed TheFilename.

467

Nice suggestion. Removed the TheFilename.

474

Nice suggestion. Removed TheFilename.

929

Yes you are correct. Removed TheFilename.

944

Yes you are correct. Removed TheFilename.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
441

Nice suggestion. Removed TheComponent.

482

Nice suggestion. Removed TheScopeName.

3375

From my understanding format needs C-type string.
Nice suggestion. Removed TheName.

llvm/include/llvm/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.h
155

Good point.

llvm/include/llvm/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.h
126

The globals

LVElement *CurrentElement;
LVScope *CurrentScope;
LVSymbol *CurrentSymbol;
LVType *CurrentType;

are used by the SymbolVisitor and LogicalVisitor. We can have several SymbolVisitor instances but only one LogicalVisitor instance which is created by the CodeViewReader.

Moved those globals to the LogicalVisitor as public members.

llvm/lib/DebugInfo/LogicalView/LVReaderHandler.cpp
132

You are correct. Removed the explicit call.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
57

Removed ABSOLUTE_OFFSET, PointerToRawData and the comments. The idea behind them was to calculate an unique offset for the logical symbols and types (similar to the DWARF DIE offset). This limitation is already recorded for future work.

665

Removed.

1037

Added braces.

1043

Added braces.

1226

It is an overload method in the LVReader

class LVReader {
  ...
  virtual std::string getRegisterName(LVSmall Opcode, uint64_t Operands[2]) {
    ...
  }
  ...
};

It is called in

LVOperation::getOperandsDWARFInfo() {
  ...
  getReader().getRegisterName(...);
  ...
}
std::string LVOperation::getOperandsCodeViewInfo() {
  ...
  getReader().getRegisterName(...);
  ...
}

They are called when print the debug locations.

void LVLocationSymbol::printExtra(raw_ostream &OS, bool Full) const {
  ...
             << (CodeViewLocation ? Operation->getOperandsCodeViewInfo()
                                  : Operation->getOperandsDWARFInfo());
  ...
}
llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
687

The globals

LVElement *CurrentElement;
LVScope *CurrentScope;
LVSymbol *CurrentSymbol;
LVType *CurrentType;

are used by the SymbolVisitor and LogicalVisitor. We can have several SymbolVisitor instances but only one LogicalVisitor instance which is created by the CodeViewReader.

Moved those globals to the LogicalVisitor as public members.

llvm/include/llvm/DebugInfo/LogicalView/Readers/LVCodeViewReader.h
37

I will go with you on this one: aesthetics

llvm/include/llvm/DebugInfo/LogicalView/Readers/LVCodeViewReader.h
177

Added TODO: and put it in the list for future work.

llvm/lib/DebugInfo/LogicalView/Core/LVElement.cpp
245

This method is called for all logical elements regardless of the debug info format. It generates a name for unnamed elements.
Removed the comment as it makes no sense.

llvm/lib/DebugInfo/LogicalView/Core/LVReader.cpp
184

You are correct. Already added to list for future work. Added TODO:.

// TODO: The current CodeView Reader implementation does not have support
// for multiple compile units. Until we have a proper offset calculation,
// check only in the current compile unit.
llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
514

Good point. Added your suggested comment.

1019

Rephrased as:

// The CodeView compile unit containing the global symbols does not
// have a name; generate one using its parent name (object filename)
// follow by the '_global' string.
llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
766

Despite that CurrentElement it is used only in this method, keep it in the LogicalVisitor.

llvm/unittests/DebugInfo/LogicalView/CodeViewReaderTest.cpp
37–39

ASSERT_NE can be used only inside functions that don't return a value.

See https://google.github.io/googletest/advanced.html#assertion-placement

The one constraint is that assertions that generate a fatal failure (FAIL* and ASSERT_*) can only be used in void-returning functions.

84–85

Changed to

ASSERT_NE(Ranges, nullptr);
ASSERT_EQ(Ranges->size(), 1u);

If Ranges is empty,

LVLocations::const_iterator IterLocation = Ranges->begin();
LVLocation *Location = (*IterLocation);

It will crash trying to access Location (Invalid iterator).

124–125

Changed to

ASSERT_NE(Lines, nullptr);
EXPECT_EQ(Lines->size(), 0x10u);
150–151

Changed to

ASSERT_NE(Ranges, nullptr);
ASSERT_EQ(Ranges->size(), 1u);

If Ranges is empty,

LVLocations::const_iterator IterLocation = Ranges->begin();
LVLocation *Location = (*IterLocation);

It will crash trying to access Location (Invalid iterator).

190–191

Changed to:

ASSERT_NE(Lines, nullptr);
EXPECT_EQ(Lines->size(), 0x0eu);
216–217

Changed to

ASSERT_NE(Ranges, nullptr);
ASSERT_EQ(Ranges->size(), 1u);

If Ranges is empty,

LVLocations::const_iterator IterLocation = Ranges->begin();
LVLocation *Location = (*IterLocation);

It will crash trying to access Location (Invalid iterator).

256–257

Changed to:

ASSERT_NE(Lines, nullptr);
EXPECT_EQ(Lines->size(), 0x0eu);
llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
422

Good suggestion. Changed.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
226

Changed.

3404

Good finding. Replaced with

if (const LVLocations *Locations = Parent->getRanges()) {
  if (!Locations->empty())
    ParentLowPC = (*Locations->begin())->getLowerAddress();
}
3496

Removed.

llvm/lib/DebugInfo/LogicalView/Core/LVScope.cpp
2051

TraverseScope is recursive.

std::function<void(LVScope * Parent)> TraverseScope = [&](LVScope *Parent) {
  ...
  if (const LVScopes *Scopes = Parent->getScopes())
    for (LVScope *Scope : *Scopes) {
      Scope->setInnerComponent();
      TraverseScope(Scope);
    }
  ...
};

It traverses the parent scopes.

llvm/lib/DebugInfo/LogicalView/LVReaderHandler.cpp
53

You are correct. Changed to cast.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
59

You are correct. I have add it to the list of future work.
Easy allow to access to getSymbolKindName and formatRegisterId.
Changed the return type to StringRef.

68

Added

llvm_unreachable("Unknown SymbolKind::Kind");
llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
132

Good finding.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
259

Good finding. Removed the initialization.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
1194

Good point. I missed the LocalVariableAddrRange information.

Added for both S_DEFRANGE_REGISTER, S_DEFRANGE_SUBFIELD_REGISTER range information.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
3394

Thanks.

llvm/lib/DebugInfo/LogicalView/Core/LVSupport.cpp
112

That is a good suggestion. Eliminated InTemplate and CharsSeen.

warning: variable 'CharsSeen' set but not used [-Wunused-but-set-variable]

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
1400

That is leftovers from another change. Removed the non-null check.

1623

Thanks for the information. Adding an extra check for dynamic atexit destructor for.

// We don't have a way to see if the symbol is compiler generated. Use
// the linkage name, to detect `scalar deleting destructor' functions.
std::string DemangledSymbol = demangle(std::string(LinkageName));
if (DemangledSymbol.find("scalar deleting dtor") != std::string::npos) {
  Function->setIsArtificial();
} else {
  // Clang generates global ctor and dtor names containing the substrings:
  // 'dynamic initializer for' and 'dynamic atexit destructor for'.
  if (DemangledSymbol.find("dynamic atexit destructor for") !=
      std::string::npos)
    Function->setIsArtificial();
}
1901

Size is already checked in the if condition.

Updated patch addressing reviews from @probinson, @zequanwu.

Still there are couple of questions that need to be answered, that do not require changes in code.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
267

Nice suggestion. I tried the change but it seems that FirstNonNamespace is always +1 in relation to the value calculated by the for loop.

These version passes all internal/external tests:

LVStringRefs::iterator Iter =
    std::find_if(Components.begin(), Components.end(), [&](StringRef Name) {
      return IdentifiedNamespaces.find(Name) == IdentifiedNamespaces.end();
    });
LVStringRefs::size_type FirstNonNamespace =
    std::distance(Components.begin(), Iter) - 1;
llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp
843

As the lambda is used recursively, the use of auto will raise:
error: use of ‘FindInlinedScopes’ before deduction of ‘auto’

auto FindInlinedScopes =
    [&](LVScope *Parent) {
          ...
          FindInlinedScopes(Scope);  <-- Error
    };

Basically the name FindInlinedScopes is not accessible within the lambda.
By using std::function<void(LVScope * Parent)> FindInlinedScopes beforehand, we are able to reference it inside the lambda.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
697

Changed:

if (SubType & SubsectionIgnoreFlag)
  SubType &= ~SubsectionIgnoreFlag;

to

// Process the subsection as normal even if the ignore bit is set.
SubType &= ~SubsectionIgnoreFlag;
llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
222

This is a very good observation.

Few points:

Changing to using LookupSet = std::unordered_set<StringRef>;

compilation fails with:
'std::_Uhash_compare<_Kty,_Hasher,_Keyeq>::_Uhash_compare(const std::_Uhash_compare<_Kty,_Hasher,_Keyeq> &)': attempting to reference a deleted function

Changing to using LookupSet = std::unordered_set<std::string>;

Requires changes in API signatures for functions used in `LVNamespaceDeduction` that are expecting `StringRef` returned by `getInnerComponent` and `getAllLexicalIndexes`.

I am adding this optimization to the list of items to be addressed in later patches.

Updated patch to address reviews from @probinson, @psamolysov.

CarlosAlbertoEnciso added a subscriber: thakis.

Updated patch that includes:

  • @thakis feedback: alphabetical order in llvm/unittests/DebugInfo/LogicalView/CMakeLists.txt
  • Due to sanitizer issues (memory leak) in 08-elf-reader, fixed a similar case: The TypeIndex returned by 'getUDT()' must point to an already created logical element.
CarlosAlbertoEnciso edited the summary of this revision. (Show Details)Nov 14 2022, 4:03 AM

To all reviewers,

A new patch has been uploaded that changes the memory management model to use smart pointers.

Patch sequence:

Thanks

To all reviewers,

Any additional questions on this current patch: 09-codeview-reader?

Thanks

CarlosAlbertoEnciso edited the summary of this revision. (Show Details)

This patch uses the new SpecificBumpPtrAllocator functionality to manage the creation of logical elements.

Updated patch to reflect changes from 08a-memory-management.

To all reviewers,

Any additional questions on this current patch: 09-codeview-reader?

Thanks

This revision was landed with ongoing or failed builds.Feb 27 2023, 1:25 AM
This revision was automatically updated to reflect the committed changes.