This is an archive of the discontinued LLVM Phabricator instance.

[Debuginfo][DWARF][NFC] Refactor DwarfStringPoolEntryRef.
ClosedPublic

Authored by avl on Jun 2 2022, 7:54 AM.

Details

Summary

This review is extracted from D96035.

This patch adds possibility to keep not only DwarfStringPoolEntry, but also
pointer to it. The DwarfStringPoolEntryRef keeps reference to the string map entry.
String map keeps string data and corresponding DwarfStringPoolEntry
info. Not all string map entries may be included into the result,
and then not all string entries should have DwarfStringPoolEntry
info. Currently StringMap keeps DwarfStringPoolEntry for all entries.
It leads to extra memory usage. This patch allows to keep
DwarfStringPoolEntry info only for entries which really need it.

Diff Detail

Event Timeline

avl created this revision.Jun 2 2022, 7:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2022, 7:54 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
avl requested review of this revision.Jun 2 2022, 7:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2022, 7:54 AM
avl edited the summary of this revision. (Show Details)Jun 2 2022, 7:59 AM

Might be worth splitting up further - removing the indexed support in a separate patch?

Add possibility to keep not only DwarfStringPoolEntry, but also pointer to it.

Looks like this functionality is added, but untested - it'd be good to include some test coverage for it, somehow? Maybe worth adding a unit test?

(general feedback: avoid else-after-return, please check over the change for a few insntances of that)

avl added a comment.Jun 2 2022, 1:47 PM

Might be worth splitting up further - removing the indexed support in a separate patch?

Ok.

Add possibility to keep not only DwarfStringPoolEntry, but also pointer to it.

Looks like this functionality is added, but untested - it'd be good to include some test coverage for it, somehow? Maybe worth adding a unit test?

I thought it(test) will come with D96035. Will add it for this review as requested.

(general feedback: avoid else-after-return, please check over the change for a few insntances of that)

avl updated this revision to Diff 434454.Jun 6 2022, 6:23 AM

rebased. added unit test.

avl retitled this revision from [Debuginfo][DWARF][NFC] Refactor DwarfStringPoolEntry. to [Debuginfo][DWARF][NFC] Refactor DwarfStringPoolEntryRef..Jun 6 2022, 6:24 AM
avl edited the summary of this revision. (Show Details)
avl added a subscriber: debug-info.

The code looks mechanically OK, but I'm not quite following the patch description/motivation. Perhaps it could be rephrased?

This patch adds possibility to keep not only DwarfStringPoolEntry, but also pointer to it. The DwarfStringPoolEntryRef keeps reference to the string map entry. String map keeps string data and corresponding DwarfStringPoolEntry info.

I /think/ I'm following this part, at least. Might be worth pointing to specific members/classes - which string map in particular are you referring to here?

Not all string map entries may be included into the result, and then not all string entries should have DwarfStringPoolEntry info.

OK, I sort of follow - this is for strings in input .o files that, due to some DWARF dead-stripping, are no longer used in the output? If that's the case, then I'd expect that the values in the map should become unique_ptrs, perhaps? (or raw pointers, if they can be allocated by a pool allocator of some kind). But I'm not sure why that would change how the *Ref class works - oh, because some users of this class don't want/need to change their map? (can you point to the different maps/use cases?)

Then maybe this class should become templated rather than using a union?

llvm/include/llvm/CodeGen/DwarfStringPoolEntry.h
23–26

What's the motivation for this change?

90–91

Remove else-after-return

98–99
avl added a comment.Jun 7 2022, 4:54 AM

The code looks mechanically OK, but I'm not quite following the patch description/motivation. Perhaps it could be rephrased?

This patch adds possibility to keep not only DwarfStringPoolEntry, but also pointer to it. The DwarfStringPoolEntryRef keeps reference to the string map entry. String map keeps string data and corresponding DwarfStringPoolEntry info.

I /think/ I'm following this part, at least. Might be worth pointing to specific members/classes - which string map in particular are you referring to here?

Not all string map entries may be included into the result, and then not all string entries should have DwarfStringPoolEntry info.

OK, I sort of follow - this is for strings in input .o files that, due to some DWARF dead-stripping, are no longer used in the output? If that's the case, then I'd expect that the values in the map should become unique_ptrs, perhaps? (or raw pointers, if they can be allocated by a pool allocator of some kind).

rephrased description:

This patch adds possibility to DwarfStringPoolEntryRef to keep not only DwarfStringPoolEntry, but also pointer to it.
The DwarfStringPoolEntryRef keeps reference to the string map entry.
String map keeps string data and corresponding DwarfStringPoolEntry info:

NonRelocatableStringpool.h:

class NonRelocatableStringpool {
  using MapTy = StringMap<DwarfStringPoolEntry, BumpPtrAllocator>;  
}

The DwarfStringPoolEntryRef can reference string map entries of NonRelocatableStringpool.
Not all string map entries may be included into the result, and then not all string entries
should have DwarfStringPoolEntry info. Examples:

  1. NonRelocatableStringpool.h:
class NonRelocatableStringpool {
  DwarfStringPoolEntryRef getEntry(StringRef S);  <<< add string for emission(use DwarfStringPoolEntry)
  StringRef internString(StringRef S); <<< add string not for emission(i.e. may not use DwarfStringPoolEntry)
}
  1. DWARFLinkerDeclContext.h:
/// String pool keeping real path bodies.
NonRelocatableStringpool StringPool;  <<< does not use DwarfStringPoolEntry
  1. D96035:

It uses single string pool for all strings. Not all strings would be put into the final result(
some strings are internal names, some strings would be translated and then should not be put into the result
, some strings relate to garbage collected data and then should not be put into the final result).

But I'm not sure why that would change how the *Ref class works - oh, because some users of this class don't want/need to change their map? (can you point to the different maps/use cases?)

Right. There are users of DwarfStringPoolEntryRef and this patch tries to pass newly organized data to them.

In the D96035 I do not use NonRelocatableStringpool since it could not be used in multi-thread mode.
Instead, I use hash map from this review https://reviews.llvm.org/D125979:

llvm/include/llvm/DWARFLinkerNext/StringPool.h:

ConcurrentHashTable<StringRef, StringMapEntry<DwarfStringPoolEntry *>, BumpPtrAllocator>

Some of the strings from ConcurrentHashTable should be passed to AccelTable:

template <typename DataT> class AccelTable : public AccelTableBase {
public:
  template <typename... Types>
  void addName(DwarfStringPoolEntryRef Name, Types &&... Args);  <<<<<<<
};

So, DwarfStringPoolEntryRef should be able to point to the StringMapEntry<DwarfStringPoolEntry *>.

Then maybe this class should become templated rather than using a union?

If we make DwarfStringPoolEntryRef to be templated then we need to add more template parameters to the
AccelTable class(and all other users of DwarfStringPoolEntryRef):

template <typename DataT, **typename EntryT**> class AccelTable : public AccelTableBase {
public:
  template <typename... Types>
  void addName(DwarfStringPoolEntryRef**<EntryT>** Name, Types &&... Args);
};

Would it be OK?

avl added inline comments.Jun 7 2022, 4:55 AM
llvm/include/llvm/CodeGen/DwarfStringPoolEntry.h
23–26

To have fields initialized(so that they do not contain garbage).

Thanks for all the context - I think this is nuanced enough I'll have to leave it to some of the Apple folks with more context here. (@aprantl @JDevlieghere @dexonsmith )

Mechanically this looks fine, but I think the documentation could be improved. Could you add something to the comment of DwarfStringPoolEntryRef that explains what the two cases in the PointerUnion and their use-cases are?

llvm/include/llvm/CodeGen/DwarfStringPoolEntry.h
57

the formatting here is weird, it's not a doxygen comment, but uses \p...
But more importantly, it's not clear whether this describes a property of DwarfStringPoolEntryRef, a bug, or something users need to be aware of. Can you make the comment longer?

avl updated this revision to Diff 435149.Jun 8 2022, 7:13 AM

addressed comments: make doc comments to be more detailed.

JDevlieghere added inline comments.Jun 8 2022, 5:46 PM
llvm/include/llvm/CodeGen/DwarfStringPoolEntry.h
52

Instead of a union, would it make sense to template the DwarfStringPoolEntryRef in the type it holds? And have a typedef for an OwningDwarfStringPoolEntryRef which is a DwarfStringPoolEntryRef<DwarfStringPoolEntry> and a NonOwningDwarfStringPoolEntryRef which would be a DwarfStringPoolEntryRef<DwarfStringPoolEntry>?

I missed the original discussion about templating the class. It seems like that issue can be solved with the same solution we have form something like SmallVector where we pass the generic base class SmallVectorImpl around. I think we should be able to do something similar here?

avl added a comment.Jun 9 2022, 12:46 PM

I missed the original discussion about templating the class. It seems like that issue can be solved with the same solution we have form something like SmallVector where we pass the generic base class SmallVectorImpl around. I think we should be able to do something similar here?

I.e. something like this:

template <typename T>
DwarfStringPoolEntryRef (
  MCSymbol *getSymbol() const;
  uint64_t getOffset() const;
  unsigned getIndex() const;
  StringRef getString() const;
  const DwarfStringPoolEntry &getEntry() const;
)


OwningDwarfStringPoolEntryRef : public DwarfStringPoolEntryRef<DwarfStringPoolEntry> (
)

NonOwningDwarfStringPoolEntryRef : public DwarfStringPoolEntryRef<DwarfStringPoolEntry*> (
)


template <typename DataT> class AccelTable : public AccelTableBase {
public:
  template <typename EntryT, typename... Types>
  void addName(DwarfStringPoolEntryRef<EntryT>& Name, Types &&... Args);
};

template<typename T>
class DIEString {
  DwarfStringPoolEntryRef<T>& S;

}

?

avl added a comment.Jun 13 2022, 12:06 AM

@JDevlieghere Jonas, Could you take a look at above example, please? Did I correctly understand your suggestion?

I missed the original discussion about templating the class. It seems like that issue can be solved with the same solution we have form something like SmallVector where we pass the generic base class SmallVectorImpl around. I think we should be able to do something similar here?

I.e. something like this:

template <typename T>
DwarfStringPoolEntryRef (
  MCSymbol *getSymbol() const;
  uint64_t getOffset() const;
  unsigned getIndex() const;
  StringRef getString() const;
  const DwarfStringPoolEntry &getEntry() const;
)


OwningDwarfStringPoolEntryRef : public DwarfStringPoolEntryRef<DwarfStringPoolEntry> (
)

NonOwningDwarfStringPoolEntryRef : public DwarfStringPoolEntryRef<DwarfStringPoolEntry*> (
)


template <typename DataT> class AccelTable : public AccelTableBase {
public:
  template <typename EntryT, typename... Types>
  void addName(DwarfStringPoolEntryRef<EntryT>& Name, Types &&... Args);
};

template<typename T>
class DIEString {
  DwarfStringPoolEntryRef<T>& S;

}

?

Almost, what I had in mind was this:

DwarfStringPoolEntry {
  ...
}

DwarfStringPoolEntryRefImpl {
  // Everything that *doesn't* depend on whether it's backed by a DwarfStringPoolEntry or DwarfStringPoolEntry*. This should be able to cover your entire public interface. 
}

template<typename T>
DwarfStringPoolEntryRef : public DwarfStringPoolEntryRefImpl {
  // Everything that *does* depend on the type of T. 
}

// The (Non)OwningDwarfStringPoolEntryRef are now just typedefs.
typedef OwningDwarfStringPoolEntryRef = DwarfStringPoolEntryRef<DwarfStringPoolEntry*>;
typedef NonOwningDwarfStringPoolEntryRef = DwarfStringPoolEntryRef<DwarfStringPoolEntry*>;

// The accelerator table can remain non-templated by only interacting with the DwarfStringPoolEntryRefImpl:

template class AccelTable : public AccelTableBase {
public:
  template <typename EntryT, typename... Types>
  void addName(DwarfStringPoolEntryRefImpl& Name, Types &&... Args);
};
avl added a comment.Jun 14 2022, 9:59 AM

Almost, what I had in mind was this:

DwarfStringPoolEntry {
  ...
}

DwarfStringPoolEntryRefImpl {
  // Everything that *doesn't* depend on whether it's backed by a DwarfStringPoolEntry or DwarfStringPoolEntry*. This should be able to cover your entire public interface. 
}

template<typename T>
DwarfStringPoolEntryRef : public DwarfStringPoolEntryRefImpl {
  // Everything that *does* depend on the type of T. 
}

// The (Non)OwningDwarfStringPoolEntryRef are now just typedefs.
typedef OwningDwarfStringPoolEntryRef = DwarfStringPoolEntryRef<DwarfStringPoolEntry*>;
typedef NonOwningDwarfStringPoolEntryRef = DwarfStringPoolEntryRef<DwarfStringPoolEntry*>;

// The accelerator table can remain non-templated by only interacting with the DwarfStringPoolEntryRefImpl:

template class AccelTable : public AccelTableBase {
public:
  template <typename EntryT, typename... Types>
  void addName(DwarfStringPoolEntryRefImpl& Name, Types &&... Args);
};

yep, thank you.

It looks like this solution does not cover one original property of DwarfStringPoolEntryRef.
Current DwarfStringPoolEntryRef assumed to be passed by value.
That means that no neccessary to have extra storage to keep it:

CodeGen/AccelTable.h

struct HashData {
  DwarfStringPoolEntryRef Name;
  ...
}

CodeGen/DIE.h

class DIEString {
  DwarfStringPoolEntryRef S;
}

If we will pass it by reference:

CodeGen/AccelTable.h

struct HashData {
  DwarfStringPoolEntryRefImpl& Name;
  ...
}

CodeGen/DIE.h

class DIEString {
  DwarfStringPoolEntryRefImpl& S;
}

Then we need to add some containers and do more copies for OwningDwarfStringPoolEntryRef or NonOwningDwarfStringPoolEntryRef:

template class AccelTable : public AccelTableBase {
public:
  template <typename EntryT, typename... Types>
  void addName(DwarfStringPoolEntryRefImpl& Name, Types &&... Args);
};

The addName should call special copy for the Name(which would copy real object referenced by DwarfStringPoolEntryRefImpl) and store it in an additional container.

class DIEString {
  DwarfStringPoolEntryRef& S;
}

The DIEString should be changed somehow like this:

class DIEString {
  std::unique_ptr<DwarfStringPoolEntryRef> S;
}

Would it be OK?

Ah that's a good point. Yep the DIEString approach looks good.

avl added a comment.Jun 15 2022, 3:52 AM

Ah that's a good point. Yep the DIEString approach looks good.

I would suggest to use solution from this patch instead of suggestion with DwarfStringPoolEntryRefImpl.

The drawbacks of suggestion with DwarfStringPoolEntryRefImpl:

  1. It leads to extra memory usage and fragmentation since many additional objects would be allocated.
  2. It requires to change many places where DwarfStringPoolEntryRef is currently used.
class NonRelocatableStringpool {
  std::unique_ptr<DwarfStringPoolEntryRefImpl> getEntry(StringRef S);
}

class DIEString {
  std::unique_ptr<DwarfStringPoolEntryRefImpl> S;
}

class AccelTableBase {
public:
  /// Represents a group of entries with identical name (and hence, hash value).
  struct HashData {
    std::unique_ptr<DwarfStringPoolEntryRefImpl> Name;
 }
}
...
  1. It introduces new performance penalties, because std::unique_ptr de-referencing and virtual function call would be added while accessing DwarfStringPoolEntryRef methods.

The solution from this patch does not introduce new memory allocations, It does not change usage points of DwarfStringPoolEntryRef, It adds only "MapEntry.is<ByValStringEntryPtr>" check while accessing DwarfStringPoolEntryRef methods.

What do you think?

JDevlieghere accepted this revision.Jun 24 2022, 12:33 PM

Thanks for the ping. Fair enough. This LGTM.

This revision is now accepted and ready to land.Jun 24 2022, 12:33 PM
avl added a comment.Jun 24 2022, 1:07 PM

@JDevlieghere Thank you for the review!

This revision was landed with ongoing or failed builds.Jun 28 2022, 2:24 PM
This revision was automatically updated to reflect the committed changes.
vitalybuka reopened this revision.Jun 29 2022, 5:52 PM
vitalybuka added a subscriber: vitalybuka.

Msan complains about this patch https://lab.llvm.org/buildbot/#/builders/74/builds/11664/steps/17/logs/stdio

[ RUN      ] DwarfStringPoolEntryRefTest.TestShortEntry
Uninitialized bytes in MemcmpInterceptorCommon at offset 20 inside [0x7ffc5d135098, 24)
==2198==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x562f6d41b1fc in __interceptor_bcmp /b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/compiler-rt/lib/msan/../sanitizer_common/sanitizer_common_interceptors.inc:978:33
    #1 0x562f6d41b1fc in bcmp /b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/compiler-rt/lib/msan/../sanitizer_common/sanitizer_common_interceptors.inc:973:1
    #2 0x562f6d55d54b in DwarfStringPoolEntryRefTest_TestShortEntry_Test::TestBody() /b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/unittests/CodeGen/DwarfStringPoolEntryRefTest.cpp:97:3
    #3 0x562f75fec7e4 in HandleExceptionsInMethodIfSupported<testing::Test, void> /b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc
    #4 0x562f75fec7e4 in testing::Test::Run() /b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2508:5
    #5 0x562f75ff14c4 in testing::TestInfo::Run() /b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2684:11
    #6 0x562f75ff3070 in testing::TestSuite::Run() /b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2816:28
    #7 0x562f76025691 in testing::internal::UnitTestImpl::RunAllTests() /b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:5338:44
    #8 0x562f76023dff in HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> /b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc
    #9 0x562f76023dff in testing::UnitTest::Run() /b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:4925:10
    #10 0x562f6d7918cf in RUN_ALL_TESTS /b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/utils/unittest/googletest/include/gtest/gtest.h:2472:46
    #11 0x562f6d7918cf in main /b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/unittests/CodeGen/TargetOptionsTest.cpp:76:10
    #12 0x7f6d31eb409a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) (BuildId: eb6a5dd378d22b1e695984462a799cd4c81cdc22)
    #13 0x562f6d3c3d69 in _start (/b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm_build_msan_track_origins/unittests/CodeGen/CodeGenTests+0x2f80d69)
  Uninitialized value was created by an allocation of 'DwarfEntry2' in the stack frame of function '_ZN47DwarfStringPoolEntryRefTest_TestShortEntry_Test8TestBodyEv'
    #0 0x562f6d558fc0 in DwarfStringPoolEntryRefTest_TestShortEntry_Test::TestBody() /b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/llvm/unittests/CodeGen/DwarfStringPoolEntryRefTest.cpp:57
SUMMARY: MemorySanitizer: use-of-uninitialized-value /b/sanitizer-x86_64-linux-bootstrap-msan/build/llvm-project/compiler-rt/lib/msan/../sanitizer_common/sanitizer_common_interceptors.inc:978:33 in __interceptor_bcmp
Exiting
This revision is now accepted and ready to land.Jun 29 2022, 5:52 PM