This is an archive of the discontinued LLVM Phabricator instance.

[llvm-debuginfo-analyzer] 09 - CodeView Reader
ClosedPublic

Authored by CarlosAlbertoEnciso on May 17 2022, 6:50 AM.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
probinson added inline comments.Sep 22 2022, 1:42 PM
llvm/lib/DebugInfo/LogicalView/LVReaderHandler.cpp
185

No need for local variable TheFilename .

265

No need to construct TheFilename explicitly.

llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp
863

I think recorded makes it sound like this function puts them into CUInlineeLines.

867

Isn't a lambda usually defined with auto?

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
58

I don't find any uses of ABSOLUTE_OFFSET which means it and the (bad!) unscoped global (bad!) PointerToRawData can be eliminated.

60

It looks like this is almost exactly getSymbolKindName from SymbolDumper.cpp. I don't see an easy way to make use of that function, though.

Return type should be a StringRef, no need to make a copy of the string literal.

69

Might need an llvm_unreachable here to avoid warnings about falling off the end of the function without a return value.

133

RelocSym might be null (checked above).

393

?

401

No need to construct TheFilename explicitly.

413

SmallString has a direct method for getting a C string, much more efficient than going through std::string (no copying).

423

If we have to check expectedInfo anyway, let's not assume getPDBInfoStream succeeds.

428
468

SmallString has a more efficient direct conversion to C string.

475
515
666

PointerToRawData is not used anywhere.

698

Might as well turn this off unconditionally.

778
930

In this context I think ExePath is the class member? which is already a std::string.

945

I think ExePath is already a std::string.

1020

"use the and it is marked as such." Could you rephrase?

1038

Please put braces around this, for consistency with the matching if.

1044

Please put braces around this, for consistency with the matching if.

1171
1176
1227

Is this an overload called from generic code? I don't see it used anywhere in this patch.

Comments through the end of llvm/lib.

My goodness this is a large patch. I don't really know anything about CodeView so the reader/visitor modules didn't get as much attention as they probably deserve.
Will try to finish tomorrow.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
227
260

FirstNonNamespace will be set to 0 on the first iteration of the loop, which cannot be a zero-trip loop because we know Components.size() is nonzero. So this initialization is dead.

268

I think this loop could be replaced with something along the lines of

LVStringRefs::iterator Iter = std::find_if(Components.begin(), Components.end(), 
    [](StringRef Name) {
        return IdentifiedNamespaces.find(Name) == IdentifiedNamespaces.end();
    });
LVStringRefs::size_type FirstNonNamespace = std::distance(Components.begin(), Iter);

find_if returns an iterator into Components for the first non-namespace, and then std::distance gets the index.

315

Reader and Visitor are initialized by the constructor, so remove = nullptr here.

404–407
442
483
688

CurrentElement appears to be used in only one method, so it can be made local to that method.
It would be best to find some other home for the other three globals here.

767
1069

Otherwise it reads as "de-franges" to me :)

1096

etc. I'll stop marking each one.

1401

Usually new is assumed to return non-null, why check it just in this one place?

2258
2324
3376

Unless format() understands StringRef directly? I'm not sure.

3405

Isn't this for loop (which does a break on the first iteration) equivalent to

if (!Locations->empty())
  ParentLowPC = Locations->begin()->getLowerAddress();
3408
3497

break not needed here

Update patch due to changes in previous patches.

The lit tests look familiar, I didn't examine them very closely.
Completed this review pass.

llvm/unittests/DebugInfo/LogicalView/CodeViewReaderTest.cpp
38–40
85–86

You can't continue if Ranges is null; should be able to continue if size() is wrong.

125–126
151–152
191–192
217–218
257–258
rnk added a comment.Sep 26 2022, 4:40 PM

This looks really cool, but I cannot claim to have reviewed it. Let me ask around for other reviewers.

Updated patch due to changes in previous patches.

This looks really cool, but I cannot claim to have reviewed it. Let me ask around for other reviewers.

@rnk Thanks.

Updated patch due to changes in previous patches.

zequanwu added inline comments.
llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
3394

It looks like it doesn't handle following case correctly. After parse line 5, it will create line starting at 0x20 with line offset being 3, which should be 2. It's from https://reviews.llvm.org/D123151#3431153. You may want to take a look on https://github.com/llvm/llvm-project/blob/main/llvm/lib/DebugInfo/PDB/Native/NativeInlineSiteSymbol.cpp#L101.

0602      line 1 (+1)
0315      code 0x15 (+0x15)
0B2B      code 0x20 (+0xB) line 2 (+1)
0602      line 3 (+1)
0311      code 0x31 (+0x11)
0B29      code 0x3A (+0x9) line 4 (+1)
0409      code end 0x43 (+0x9)
zequanwu added inline comments.Oct 7 2022, 5:38 PM
llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
1194

Why is Range set for S_DEFRANGE_FRAMEPOINTER_REL and S_DEFRANGE_REGISTER_REL but not set for S_DEFRANGE_REGISTER, S_DEFRANGE_SUBFIELD_REGISTER?

1623

Clang generated global ctor and dtor names containing the substrings: dynamic initializer for and dynamic atexit destructor for: https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CGDebugInfo.cpp#L2254-L2259

1901

Might worth checking Size != 0, I have seen that element size being 0, might be missing in pdb.

3394

NVM, ignore the comment. It's correct.

llvm/include/llvm/DebugInfo/LogicalView/Readers/LVCodeViewReader.h
64

Changed.

llvm/lib/DebugInfo/LogicalView/Core/LVReader.cpp
186

Changed.

llvm/lib/DebugInfo/LogicalView/Core/LVScope.cpp
1791

Changed.

llvm/lib/DebugInfo/LogicalView/Core/LVType.cpp
329

Changed.

llvm/lib/DebugInfo/LogicalView/LVReaderHandler.cpp
145

Changed.

llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp
863

Changed.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
393

Changed.

428

Changed.

778

Changed.

1171

Changed.

1176

Changed.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
404–407

Changed.

1069

Changed.

1096

Changed this one and other ocurrences.

2258

Changed.

2324

Changed.

3408

Changed.

llvm/include/llvm/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.h
117

Removed the explicit initialization.

135

Removed the explicit initialization.

141

Removed the explicit initialization.

150

Removed the explicit initialization.

241

Removed the explicit initialization.

250

Removed the explicit initialization.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
315

Removed the explicit initialization.

llvm/lib/DebugInfo/LogicalView/LVReaderHandler.cpp
185

Nice suggestion. Removed TheFilename.

llvm/lib/DebugInfo/LogicalView/LVReaderHandler.cpp
265

Nice suggestion. Removed TheFilename.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
401

Nice suggestion. Removed TheFilename.

413

Nice suggestion. Removed TheFilename.

468

Nice suggestion. Removed the TheFilename.

475

Nice suggestion. Removed TheFilename.

930

Yes you are correct. Removed TheFilename.

945

Yes you are correct. Removed TheFilename.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
442

Nice suggestion. Removed TheComponent.

483

Nice suggestion. Removed TheScopeName.

3376

From my understanding format needs C-type string.
Nice suggestion. Removed TheName.

llvm/include/llvm/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.h
156

Good point.

llvm/include/llvm/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.h
127

The globals

LVElement *CurrentElement;
LVScope *CurrentScope;
LVSymbol *CurrentSymbol;
LVType *CurrentType;

are used by the SymbolVisitor and LogicalVisitor. We can have several SymbolVisitor instances but only one LogicalVisitor instance which is created by the CodeViewReader.

Moved those globals to the LogicalVisitor as public members.

llvm/lib/DebugInfo/LogicalView/LVReaderHandler.cpp
125

You are correct. Removed the explicit call.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
58

Removed ABSOLUTE_OFFSET, PointerToRawData and the comments. The idea behind them was to calculate an unique offset for the logical symbols and types (similar to the DWARF DIE offset). This limitation is already recorded for future work.

666

Removed.

1038

Added braces.

1044

Added braces.

1227

It is an overload method in the LVReader

class LVReader {
  ...
  virtual std::string getRegisterName(LVSmall Opcode, uint64_t Operands[2]) {
    ...
  }
  ...
};

It is called in

LVOperation::getOperandsDWARFInfo() {
  ...
  getReader().getRegisterName(...);
  ...
}
std::string LVOperation::getOperandsCodeViewInfo() {
  ...
  getReader().getRegisterName(...);
  ...
}

They are called when print the debug locations.

void LVLocationSymbol::printExtra(raw_ostream &OS, bool Full) const {
  ...
             << (CodeViewLocation ? Operation->getOperandsCodeViewInfo()
                                  : Operation->getOperandsDWARFInfo());
  ...
}
llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
688

The globals

LVElement *CurrentElement;
LVScope *CurrentScope;
LVSymbol *CurrentSymbol;
LVType *CurrentType;

are used by the SymbolVisitor and LogicalVisitor. We can have several SymbolVisitor instances but only one LogicalVisitor instance which is created by the CodeViewReader.

Moved those globals to the LogicalVisitor as public members.

llvm/include/llvm/DebugInfo/LogicalView/Readers/LVCodeViewReader.h
38

I will go with you on this one: aesthetics

llvm/include/llvm/DebugInfo/LogicalView/Readers/LVCodeViewReader.h
178

Added TODO: and put it in the list for future work.

llvm/lib/DebugInfo/LogicalView/Core/LVElement.cpp
245

This method is called for all logical elements regardless of the debug info format. It generates a name for unnamed elements.
Removed the comment as it makes no sense.

llvm/lib/DebugInfo/LogicalView/Core/LVReader.cpp
187

You are correct. Already added to list for future work. Added TODO:.

// TODO: The current CodeView Reader implementation does not have support
// for multiple compile units. Until we have a proper offset calculation,
// check only in the current compile unit.
llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
515

Good point. Added your suggested comment.

1020

Rephrased as:

// The CodeView compile unit containing the global symbols does not
// have a name; generate one using its parent name (object filename)
// follow by the '_global' string.
llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
767

Despite that CurrentElement it is used only in this method, keep it in the LogicalVisitor.

llvm/unittests/DebugInfo/LogicalView/CodeViewReaderTest.cpp
38–40

ASSERT_NE can be used only inside functions that don't return a value.

See https://google.github.io/googletest/advanced.html#assertion-placement

The one constraint is that assertions that generate a fatal failure (FAIL* and ASSERT_*) can only be used in void-returning functions.

85–86

Changed to

ASSERT_NE(Ranges, nullptr);
ASSERT_EQ(Ranges->size(), 1u);

If Ranges is empty,

LVLocations::const_iterator IterLocation = Ranges->begin();
LVLocation *Location = (*IterLocation);

It will crash trying to access Location (Invalid iterator).

125–126

Changed to

ASSERT_NE(Lines, nullptr);
EXPECT_EQ(Lines->size(), 0x10u);
151–152

Changed to

ASSERT_NE(Ranges, nullptr);
ASSERT_EQ(Ranges->size(), 1u);

If Ranges is empty,

LVLocations::const_iterator IterLocation = Ranges->begin();
LVLocation *Location = (*IterLocation);

It will crash trying to access Location (Invalid iterator).

191–192

Changed to:

ASSERT_NE(Lines, nullptr);
EXPECT_EQ(Lines->size(), 0x0eu);
217–218

Changed to

ASSERT_NE(Ranges, nullptr);
ASSERT_EQ(Ranges->size(), 1u);

If Ranges is empty,

LVLocations::const_iterator IterLocation = Ranges->begin();
LVLocation *Location = (*IterLocation);

It will crash trying to access Location (Invalid iterator).

257–258

Changed to:

ASSERT_NE(Lines, nullptr);
EXPECT_EQ(Lines->size(), 0x0eu);
llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
423

Good suggestion. Changed.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
227

Changed.

3405

Good finding. Replaced with

if (const LVLocations *Locations = Parent->getRanges()) {
  if (!Locations->empty())
    ParentLowPC = (*Locations->begin())->getLowerAddress();
}
3497

Removed.

llvm/lib/DebugInfo/LogicalView/Core/LVScope.cpp
2047

TraverseScope is recursive.

std::function<void(LVScope * Parent)> TraverseScope = [&](LVScope *Parent) {
  ...
  if (const LVScopes *Scopes = Parent->getScopes())
    for (LVScope *Scope : *Scopes) {
      Scope->setInnerComponent();
      TraverseScope(Scope);
    }
  ...
};

It traverses the parent scopes.

llvm/lib/DebugInfo/LogicalView/LVReaderHandler.cpp
47

You are correct. Changed to cast.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
60

You are correct. I have add it to the list of future work.
Easy allow to access to getSymbolKindName and formatRegisterId.
Changed the return type to StringRef.

69

Added

llvm_unreachable("Unknown SymbolKind::Kind");
llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
133

Good finding.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
260

Good finding. Removed the initialization.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
1194

Good point. I missed the LocalVariableAddrRange information.

Added for both S_DEFRANGE_REGISTER, S_DEFRANGE_SUBFIELD_REGISTER range information.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
3394

Thanks.

llvm/lib/DebugInfo/LogicalView/Core/LVSupport.cpp
118

That is a good suggestion. Eliminated InTemplate and CharsSeen.

warning: variable 'CharsSeen' set but not used [-Wunused-but-set-variable]

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
1401

That is leftovers from another change. Removed the non-null check.

1623

Thanks for the information. Adding an extra check for dynamic atexit destructor for.

// We don't have a way to see if the symbol is compiler generated. Use
// the linkage name, to detect `scalar deleting destructor' functions.
std::string DemangledSymbol = demangle(std::string(LinkageName));
if (DemangledSymbol.find("scalar deleting dtor") != std::string::npos) {
  Function->setIsArtificial();
} else {
  // Clang generates global ctor and dtor names containing the substrings:
  // 'dynamic initializer for' and 'dynamic atexit destructor for'.
  if (DemangledSymbol.find("dynamic atexit destructor for") !=
      std::string::npos)
    Function->setIsArtificial();
}
1901

Size is already checked in the if condition.

Updated patch addressing reviews from @probinson, @zequanwu.

Still there are couple of questions that need to be answered, that do not require changes in code.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
268

Nice suggestion. I tried the change but it seems that FirstNonNamespace is always +1 in relation to the value calculated by the for loop.

These version passes all internal/external tests:

LVStringRefs::iterator Iter =
    std::find_if(Components.begin(), Components.end(), [&](StringRef Name) {
      return IdentifiedNamespaces.find(Name) == IdentifiedNamespaces.end();
    });
LVStringRefs::size_type FirstNonNamespace =
    std::distance(Components.begin(), Iter) - 1;
llvm/lib/DebugInfo/LogicalView/Readers/LVBinaryReader.cpp
867

As the lambda is used recursively, the use of auto will raise:
error: use of ‘FindInlinedScopes’ before deduction of ‘auto’

auto FindInlinedScopes =
    [&](LVScope *Parent) {
          ...
          FindInlinedScopes(Scope);  <-- Error
    };

Basically the name FindInlinedScopes is not accessible within the lambda.
By using std::function<void(LVScope * Parent)> FindInlinedScopes beforehand, we are able to reference it inside the lambda.

llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewReader.cpp
698

Changed:

if (SubType & SubsectionIgnoreFlag)
  SubType &= ~SubsectionIgnoreFlag;

to

// Process the subsection as normal even if the ignore bit is set.
SubType &= ~SubsectionIgnoreFlag;
llvm/lib/DebugInfo/LogicalView/Readers/LVCodeViewVisitor.cpp
222

This is a very good observation.

Few points:

Changing to using LookupSet = std::unordered_set<StringRef>;

compilation fails with:
'std::_Uhash_compare<_Kty,_Hasher,_Keyeq>::_Uhash_compare(const std::_Uhash_compare<_Kty,_Hasher,_Keyeq> &)': attempting to reference a deleted function

Changing to using LookupSet = std::unordered_set<std::string>;

Requires changes in API signatures for functions used in `LVNamespaceDeduction` that are expecting `StringRef` returned by `getInnerComponent` and `getAllLexicalIndexes`.

I am adding this optimization to the list of items to be addressed in later patches.

Updated patch to address reviews from @probinson, @psamolysov.

CarlosAlbertoEnciso added a subscriber: thakis.

Updated patch that includes:

  • @thakis feedback: alphabetical order in llvm/unittests/DebugInfo/LogicalView/CMakeLists.txt
  • Due to sanitizer issues (memory leak) in 08-elf-reader, fixed a similar case: The TypeIndex returned by 'getUDT()' must point to an already created logical element.
CarlosAlbertoEnciso edited the summary of this revision. (Show Details)Nov 14 2022, 4:03 AM

To all reviewers,

A new patch has been uploaded that changes the memory management model to use smart pointers.

Patch sequence:

Thanks

To all reviewers,

Any additional questions on this current patch: 09-codeview-reader?

Thanks

CarlosAlbertoEnciso edited the summary of this revision. (Show Details)

This patch uses the new SpecificBumpPtrAllocator functionality to manage the creation of logical elements.

Updated patch to reflect changes from 08a-memory-management.

To all reviewers,

Any additional questions on this current patch: 09-codeview-reader?

Thanks

This revision was landed with ongoing or failed builds.Feb 27 2023, 1:25 AM
This revision was automatically updated to reflect the committed changes.