This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Store optional DIFile::Source as pointer
ClosedPublic

Authored by Hahnfeld on Nov 24 2022, 4:46 AM.

Details

Summary

getCanonicalMDString() also returns a nullptr for empty strings, which tripped over the getSource() method. Solve the ambiguity of None versus an Optional containing a nullptr by simply storing a pointer.

Diff Detail

Event Timeline

Hahnfeld created this revision.Nov 24 2022, 4:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 24 2022, 4:46 AM
Hahnfeld requested review of this revision.Nov 24 2022, 4:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 24 2022, 4:46 AM

Where did this issue come up in practice? Change seems reasonable, but wouldn't mind having more context.

llvm/include/llvm/IR/DebugInfoMetadata.h
655–658

How is this different from the previous code? Optional::operator* would have an assert in it, right? So the previous code would've been asserting in the same places/ways, I think?

Where did this issue come up in practice? Change seems reasonable, but wouldn't mind having more context.

Please see https://reviews.llvm.org/D137152 for context. I think what's happening is that we (the Cling interpreter in ROOT) create an empty source file while injecting some code during startup. It may be possible to fix that, but LLVM shouldn't crash from it.

llvm/include/llvm/IR/DebugInfoMetadata.h
655–658

Yes, there is an assert in OptionalStorage::value, but it only checks that there is a value. nullptr is a valid value inside a Optional<MDString *>, but the expectation of this class is that Source is either None or contains a non-null pointer.

(Side remark: It may be possible to change Source to just a MDString * and make nullptr mean that there is no source. That's equally fine with me, but requires changing the signature of getRawSource. Let me know if you prefer that.)

Where did this issue come up in practice? Change seems reasonable, but wouldn't mind having more context.

Please see https://reviews.llvm.org/D137152 for context. I think what's happening is that we (the Cling interpreter in ROOT) create an empty source file while injecting some code during startup. It may be possible to fix that, but LLVM shouldn't crash from it.

hmm, more context might be nice - like where's the Cling code that's doing something different than the Clang code when it's creating DIFiles?

I think it might be more suitable to say that DIFile shouldn't be given a non-None Optional that refers to an empty string? & it's up to the caller (Cling) to fix that - we could add an assertion to that effect in the getImpl?

Or perhaps there's an intent to support present-but-zero-length source, in which case this change isn't suitable either, I guess? (since this change collapses the empty and None cases together)

If None and empty are meant to be identical, maybe removing the Optional layer entirely and using the empty string to communicate not-present would be suitable? I'm not sure.

Where did this issue come up in practice? Change seems reasonable, but wouldn't mind having more context.

Please see https://reviews.llvm.org/D137152 for context. I think what's happening is that we (the Cling interpreter in ROOT) create an empty source file while injecting some code during startup. It may be possible to fix that, but LLVM shouldn't crash from it.

hmm, more context might be nice - like where's the Cling code that's doing something different than the Clang code when it's creating DIFiles?

I think it's actually Clang code that just passes on the result of getSource: https://github.com/llvm/llvm-project/blob/fb2f3b30b2ab2494aedf54940f80258e073f0486/clang/lib/CodeGen/CGDebugInfo.cpp#L426 which is not protected against empty strings: https://github.com/llvm/llvm-project/blob/fb2f3b30b2ab2494aedf54940f80258e073f0486/clang/lib/CodeGen/CGDebugInfo.cpp#L376-L388

I think it might be more suitable to say that DIFile shouldn't be given a non-None Optional that refers to an empty string? & it's up to the caller (Cling) to fix that - we could add an assertion to that effect in the getImpl?

Or perhaps there's an intent to support present-but-zero-length source, in which case this change isn't suitable either, I guess? (since this change collapses the empty and None cases together)

If None and empty are meant to be identical, maybe removing the Optional layer entirely and using the empty string to communicate not-present would be suitable? I'm not sure.

Well yes, these are the other three possibilities we have. I'm fine with any and went with the least intrusive, but I can implement any of them if they are more suitable from a DebugInfo point of view (I'm not an expert there).

Where did this issue come up in practice? Change seems reasonable, but wouldn't mind having more context.

Please see https://reviews.llvm.org/D137152 for context. I think what's happening is that we (the Cling interpreter in ROOT) create an empty source file while injecting some code during startup. It may be possible to fix that, but LLVM shouldn't crash from it.

hmm, more context might be nice - like where's the Cling code that's doing something different than the Clang code when it's creating DIFiles?

I think it's actually Clang code that just passes on the result of getSource: https://github.com/llvm/llvm-project/blob/fb2f3b30b2ab2494aedf54940f80258e073f0486/clang/lib/CodeGen/CGDebugInfo.cpp#L426 which is not protected against empty strings: https://github.com/llvm/llvm-project/blob/fb2f3b30b2ab2494aedf54940f80258e073f0486/clang/lib/CodeGen/CGDebugInfo.cpp#L376-L388

That seems a Cling-novelty though, right? A zero-length file isn't valid in C++ (you have to have a newline at the end) and I guess if you give clang a truly zero-length file it doesn't generate DWARF at least, or doesn't do any code generation at all? So maybe Cling should do something similar to whatever clang does with a zero-length file? (or does clang hit the same problem if you give it a zero-length file?)

That seems a Cling-novelty though, right? A zero-length file isn't valid in C++ (you have to have a newline at the end) and I guess if you give clang a truly zero-length file it doesn't generate DWARF at least, or doesn't do any code generation at all? So maybe Cling should do something similar to whatever clang does with a zero-length file? (or does clang hit the same problem if you give it a zero-length file?)

So yes, it is an oversight (?) in Cling that we can easily work around: https://github.com/root-project/root/pull/11803 However, I would still argue that LLVM should not crash in this case and just "do the right thing"? Which of the approaches we take, I don't really have a strong opinion on.

A zero-length file isn't valid in C++ (you have to have a newline at the end)

LLVM is used for many languages, and they might not all have that requirement. Also, I just tried running clang -c -g on a zero-length file; it didn't issue any diagnostics at all, and produced an object file (although it did not have any debug-info sections).

A zero-length file isn't valid in C++ (you have to have a newline at the end)

LLVM is used for many languages, and they might not all have that requirement. Also, I just tried running clang -c -g on a zero-length file; it didn't issue any diagnostics at all, and produced an object file (although it did not have any debug-info sections).

Sure enough. Does look like the Source parameter was intended to be handled differently than other strings (like Name) - other strings are passed in as StringRef, and empty is the null/not-present state, but Source was specifically wrapped in Optional in/out presumably to express a distinction between empty and not present. @scott.linder - do you recall if/how intentional this was? If it is as it looks, then

Old/currently committed code:
When constructing the DIFile and the Source is:
None -> None
Empty -> non-None null pointer
Non-empty -> non-None non-null pointer
When accessing the source:
None -> None
Empty -> crash
Non-empty -> non-None non-null pointer

The new behavior of the current patch:
Empty -> None
Access:
None -> None

So this patch collapses empty with not present - and if that's what we wanted to do, the way to do that would be to remove the Optional entirely - and store Source the same way we store other strings in MDOperands, and pass it around as a StringRef with empty as none and skip the Optional indirection.

If we're keeping the distinction between empty and not-present, then we need to choose which representation we use - we could use non-None null pointer to represent empty, or we could use non-None-non-null to a zero-length MDString. I guess the former is easier and only requires a fix in the accessor, to map non-None-null to Optional<StringRef>("") basically? So maybe do that?

A zero-length file isn't valid in C++ (you have to have a newline at the end)

LLVM is used for many languages, and they might not all have that requirement. Also, I just tried running clang -c -g on a zero-length file; it didn't issue any diagnostics at all, and produced an object file (although it did not have any debug-info sections).

Sure enough. Does look like the Source parameter was intended to be handled differently than other strings (like Name) - other strings are passed in as StringRef, and empty is the null/not-present state, but Source was specifically wrapped in Optional in/out presumably to express a distinction between empty and not present. @scott.linder - do you recall if/how intentional this was? If it is as it looks, then

Yes, as I recall the intention was to explicitly encode the fact that source may not be present, and that this is distinct from an empty source file. Some changes to the corresponding Dwarf extension proposal mean this may or may not end up being representable in what becomes the final Dwarf equivalent, but I don't think that needs to dictate how we represent it here.

The overloading of pointers to LLVMContext-allocated types like MDString to mean either "definitely present, the canonical type for this is just a pointer" or "actually an Optional<MDString> where nullptr==None" is pretty common in the codebase, so I'm still happy if we drop the Optional and simply add a comment documenting that nullptr==None.

Old/currently committed code:
When constructing the DIFile and the Source is:
None -> None
Empty -> non-None null pointer
Non-empty -> non-None non-null pointer
When accessing the source:
None -> None
Empty -> crash
Non-empty -> non-None non-null pointer

The new behavior of the current patch:
Empty -> None
Access:
None -> None

So this patch collapses empty with not present - and if that's what we wanted to do, the way to do that would be to remove the Optional entirely - and store Source the same way we store other strings in MDOperands, and pass it around as a StringRef with empty as none and skip the Optional indirection.

If we're keeping the distinction between empty and not-present, then we need to choose which representation we use - we could use non-None null pointer to represent empty, or we could use non-None-non-null to a zero-length MDString. I guess the former is easier and only requires a fix in the accessor, to map non-None-null to Optional<StringRef>("") basically? So maybe do that?

Hahnfeld updated this revision to Diff 479258.Dec 1 2022, 5:36 AM
Hahnfeld retitled this revision from [DebugInfo] Protect DIFile::Source against empty string to [DebugInfo] Store optional DIFile::Source as pointer.
Hahnfeld edited the summary of this revision. (Show Details)
Hahnfeld added a reviewer: scott.linder.

Thanks for the discussion. I updated the patch to remove the Optional wrapping a MDString *, PTAL.

dblaikie added inline comments.Dec 1 2022, 1:25 PM
llvm/include/llvm/IR/DebugInfoMetadata.h
619

I think we can't use getCanonicalMDString here, because it collapses the empty case into the null case, when they were intended to be differentiated. So this'll need to be manually expanded to the MDString::get(Context, S) without the empty->nullptr part. Then empty -> empty MDString, None -> null, non-empty -> non-empty MDString.

And the test case can be updated to demonstrate that the difference between empty and null is preserved.

I think we can't use getCanonicalMDString here, because it collapses the empty case into the null case, when they were intended to be differentiated. So this'll need to be manually expanded to the MDString::get(Context, S) without the empty->nullptr part. Then empty -> empty MDString, None -> null, non-empty -> non-empty MDString.

Ok, I understood @scott.linder's comment to say that we can collapse the two cases. Maybe I got this wrong, could you clarify if we need to differentiate the two cases? If so, I'm not actually sure it makes sense to have a non-nullptr for the empty string, this would be a first as well, all other cases in this file are using getCanonicalMDString...

I think we can't use getCanonicalMDString here, because it collapses the empty case into the null case, when they were intended to be differentiated. So this'll need to be manually expanded to the MDString::get(Context, S) without the empty->nullptr part. Then empty -> empty MDString, None -> null, non-empty -> non-empty MDString.

Ok, I understood @scott.linder's comment to say that we can collapse the two cases. Maybe I got this wrong, could you clarify if we need to differentiate the two cases? If so, I'm not actually sure it makes sense to have a non-nullptr for the empty string, this would be a first as well, all other cases in this file are using getCanonicalMDString...

Sorry for the confusion! I think we should retain the distinction, I only meant to say I'm OK with a different representation. That is, I am OK with using a nullptr to signal "No Source Available", as distinct from the empty string.

I think we can't use getCanonicalMDString here, because it collapses the empty case into the null case, when they were intended to be differentiated. So this'll need to be manually expanded to the MDString::get(Context, S) without the empty->nullptr part. Then empty -> empty MDString, None -> null, non-empty -> non-empty MDString.

Ok, I understood @scott.linder's comment to say that we can collapse the two cases.

@scott.linder mind clarifying what you had in mind? Did you mean to suggest that we should remove support for zero length source files as distinct from absent source files (eg: MDString * is null for empty or zero-length source, and non-null+non-empty otherwise?) or to drop the Optional but still render 3 cases distinctly (null for not present, non-null but zero-length for zero-length source, and non-null/non-zero-length otherwise)?

Maybe I got this wrong, could you clarify if we need to differentiate the two cases? If so, I'm not actually sure it makes sense to have a non-nullptr for the empty string, this would be a first as well, all other cases in this file are using getCanonicalMDString...

Because, I think, all the other cases don't need to differentiate not-present from empty, but this case does, maybe. So I think that it could be different in that way.

I think we can't use getCanonicalMDString here, because it collapses the empty case into the null case, when they were intended to be differentiated. So this'll need to be manually expanded to the MDString::get(Context, S) without the empty->nullptr part. Then empty -> empty MDString, None -> null, non-empty -> non-empty MDString.

Ok, I understood @scott.linder's comment to say that we can collapse the two cases. Maybe I got this wrong, could you clarify if we need to differentiate the two cases? If so, I'm not actually sure it makes sense to have a non-nullptr for the empty string, this would be a first as well, all other cases in this file are using getCanonicalMDString...

Sorry for the confusion! I think we should retain the distinction, I only meant to say I'm OK with a different representation. That is, I am OK with using a nullptr to signal "No Source Available", as distinct from the empty string.

Ah, thanks for clarifying (& sorry for my delayed response that came after your reply - I end up with tabs open for far too long sometimesl

Hahnfeld updated this revision to Diff 480022.Dec 5 2022, 1:59 AM

I updated the patch to use MDString::get directly and represent the empty file by a non-nullptr. I'm not sure yet if there are any bad implications by not having a "canonical" MDString...

Hahnfeld marked an inline comment as done.Dec 5 2022, 1:59 AM

FWIW if we keep the std::optional and just deal with the possibility of a nullptr inside, all we seem to need is

patch
diff --git a/llvm/include/llvm/IR/DebugInfoMetadata.h b/llvm/include/llvm/IR/DebugInfoMetadata.h
index c85c21ddd9f1..2a757fb74674 100644
--- a/llvm/include/llvm/IR/DebugInfoMetadata.h
+++ b/llvm/include/llvm/IR/DebugInfoMetadata.h
@@ -654,8 +654,9 @@ public:
     return StringRefChecksum;
   }
   std::optional<StringRef> getSource() const {
-    return Source ? std::optional<StringRef>((*Source)->getString())
-                  : std::nullopt;
+    if (!Source)
+      return std::nullopt;
+    return std::optional<StringRef>(*Source ? (*Source)->getString() : "");
   }
 
   MDString *getRawFilename() const { return getOperandAs<MDString>(0); }

and the same tests as added here will pass.

dblaikie accepted this revision.Dec 5 2022, 2:15 PM

Looks good - thanks!

This revision is now accepted and ready to land.Dec 5 2022, 2:15 PM
This revision was landed with ongoing or failed builds.Dec 8 2022, 12:59 AM
This revision was automatically updated to reflect the committed changes.