getCanonicalMDString() also returns a nullptr for empty strings, which tripped over the getSource() method. Solve the ambiguity of None versus an Optional containing a nullptr by simply storing a pointer.
Details
Diff Detail
Event Timeline
Where did this issue come up in practice? Change seems reasonable, but wouldn't mind having more context.
llvm/include/llvm/IR/DebugInfoMetadata.h | ||
---|---|---|
656–657 | How is this different from the previous code? Optional::operator* would have an assert in it, right? So the previous code would've been asserting in the same places/ways, I think? |
Please see https://reviews.llvm.org/D137152 for context. I think what's happening is that we (the Cling interpreter in ROOT) create an empty source file while injecting some code during startup. It may be possible to fix that, but LLVM shouldn't crash from it.
llvm/include/llvm/IR/DebugInfoMetadata.h | ||
---|---|---|
656–657 | Yes, there is an assert in OptionalStorage::value, but it only checks that there is a value. nullptr is a valid value inside a Optional<MDString *>, but the expectation of this class is that Source is either None or contains a non-null pointer. (Side remark: It may be possible to change Source to just a MDString * and make nullptr mean that there is no source. That's equally fine with me, but requires changing the signature of getRawSource. Let me know if you prefer that.) |
hmm, more context might be nice - like where's the Cling code that's doing something different than the Clang code when it's creating DIFiles?
I think it might be more suitable to say that DIFile shouldn't be given a non-None Optional that refers to an empty string? & it's up to the caller (Cling) to fix that - we could add an assertion to that effect in the getImpl?
Or perhaps there's an intent to support present-but-zero-length source, in which case this change isn't suitable either, I guess? (since this change collapses the empty and None cases together)
If None and empty are meant to be identical, maybe removing the Optional layer entirely and using the empty string to communicate not-present would be suitable? I'm not sure.
I think it's actually Clang code that just passes on the result of getSource: https://github.com/llvm/llvm-project/blob/fb2f3b30b2ab2494aedf54940f80258e073f0486/clang/lib/CodeGen/CGDebugInfo.cpp#L426 which is not protected against empty strings: https://github.com/llvm/llvm-project/blob/fb2f3b30b2ab2494aedf54940f80258e073f0486/clang/lib/CodeGen/CGDebugInfo.cpp#L376-L388
I think it might be more suitable to say that DIFile shouldn't be given a non-None Optional that refers to an empty string? & it's up to the caller (Cling) to fix that - we could add an assertion to that effect in the getImpl?
Or perhaps there's an intent to support present-but-zero-length source, in which case this change isn't suitable either, I guess? (since this change collapses the empty and None cases together)
If None and empty are meant to be identical, maybe removing the Optional layer entirely and using the empty string to communicate not-present would be suitable? I'm not sure.
Well yes, these are the other three possibilities we have. I'm fine with any and went with the least intrusive, but I can implement any of them if they are more suitable from a DebugInfo point of view (I'm not an expert there).
That seems a Cling-novelty though, right? A zero-length file isn't valid in C++ (you have to have a newline at the end) and I guess if you give clang a truly zero-length file it doesn't generate DWARF at least, or doesn't do any code generation at all? So maybe Cling should do something similar to whatever clang does with a zero-length file? (or does clang hit the same problem if you give it a zero-length file?)
So yes, it is an oversight (?) in Cling that we can easily work around: https://github.com/root-project/root/pull/11803 However, I would still argue that LLVM should not crash in this case and just "do the right thing"? Which of the approaches we take, I don't really have a strong opinion on.
A zero-length file isn't valid in C++ (you have to have a newline at the end)
LLVM is used for many languages, and they might not all have that requirement. Also, I just tried running clang -c -g on a zero-length file; it didn't issue any diagnostics at all, and produced an object file (although it did not have any debug-info sections).
Sure enough. Does look like the Source parameter was intended to be handled differently than other strings (like Name) - other strings are passed in as StringRef, and empty is the null/not-present state, but Source was specifically wrapped in Optional in/out presumably to express a distinction between empty and not present. @scott.linder - do you recall if/how intentional this was? If it is as it looks, then
Old/currently committed code:
When constructing the DIFile and the Source is:
None -> None
Empty -> non-None null pointer
Non-empty -> non-None non-null pointer
When accessing the source:
None -> None
Empty -> crash
Non-empty -> non-None non-null pointer
The new behavior of the current patch:
Empty -> None
Access:
None -> None
So this patch collapses empty with not present - and if that's what we wanted to do, the way to do that would be to remove the Optional entirely - and store Source the same way we store other strings in MDOperands, and pass it around as a StringRef with empty as none and skip the Optional indirection.
If we're keeping the distinction between empty and not-present, then we need to choose which representation we use - we could use non-None null pointer to represent empty, or we could use non-None-non-null to a zero-length MDString. I guess the former is easier and only requires a fix in the accessor, to map non-None-null to Optional<StringRef>("") basically? So maybe do that?
Yes, as I recall the intention was to explicitly encode the fact that source may not be present, and that this is distinct from an empty source file. Some changes to the corresponding Dwarf extension proposal mean this may or may not end up being representable in what becomes the final Dwarf equivalent, but I don't think that needs to dictate how we represent it here.
The overloading of pointers to LLVMContext-allocated types like MDString to mean either "definitely present, the canonical type for this is just a pointer" or "actually an Optional<MDString> where nullptr==None" is pretty common in the codebase, so I'm still happy if we drop the Optional and simply add a comment documenting that nullptr==None.
Old/currently committed code:
When constructing the DIFile and the Source is:
None -> None
Empty -> non-None null pointer
Non-empty -> non-None non-null pointer
When accessing the source:
None -> None
Empty -> crash
Non-empty -> non-None non-null pointerThe new behavior of the current patch:
Empty -> None
Access:
None -> NoneSo this patch collapses empty with not present - and if that's what we wanted to do, the way to do that would be to remove the Optional entirely - and store Source the same way we store other strings in MDOperands, and pass it around as a StringRef with empty as none and skip the Optional indirection.
If we're keeping the distinction between empty and not-present, then we need to choose which representation we use - we could use non-None null pointer to represent empty, or we could use non-None-non-null to a zero-length MDString. I guess the former is easier and only requires a fix in the accessor, to map non-None-null to Optional<StringRef>("") basically? So maybe do that?
Thanks for the discussion. I updated the patch to remove the Optional wrapping a MDString *, PTAL.
llvm/include/llvm/IR/DebugInfoMetadata.h | ||
---|---|---|
619 | I think we can't use getCanonicalMDString here, because it collapses the empty case into the null case, when they were intended to be differentiated. So this'll need to be manually expanded to the MDString::get(Context, S) without the empty->nullptr part. Then empty -> empty MDString, None -> null, non-empty -> non-empty MDString. And the test case can be updated to demonstrate that the difference between empty and null is preserved. |
I think we can't use getCanonicalMDString here, because it collapses the empty case into the null case, when they were intended to be differentiated. So this'll need to be manually expanded to the MDString::get(Context, S) without the empty->nullptr part. Then empty -> empty MDString, None -> null, non-empty -> non-empty MDString.
Ok, I understood @scott.linder's comment to say that we can collapse the two cases. Maybe I got this wrong, could you clarify if we need to differentiate the two cases? If so, I'm not actually sure it makes sense to have a non-nullptr for the empty string, this would be a first as well, all other cases in this file are using getCanonicalMDString...
Sorry for the confusion! I think we should retain the distinction, I only meant to say I'm OK with a different representation. That is, I am OK with using a nullptr to signal "No Source Available", as distinct from the empty string.
@scott.linder mind clarifying what you had in mind? Did you mean to suggest that we should remove support for zero length source files as distinct from absent source files (eg: MDString * is null for empty or zero-length source, and non-null+non-empty otherwise?) or to drop the Optional but still render 3 cases distinctly (null for not present, non-null but zero-length for zero-length source, and non-null/non-zero-length otherwise)?
Maybe I got this wrong, could you clarify if we need to differentiate the two cases? If so, I'm not actually sure it makes sense to have a non-nullptr for the empty string, this would be a first as well, all other cases in this file are using getCanonicalMDString...
Because, I think, all the other cases don't need to differentiate not-present from empty, but this case does, maybe. So I think that it could be different in that way.
Ah, thanks for clarifying (& sorry for my delayed response that came after your reply - I end up with tabs open for far too long sometimesl
I updated the patch to use MDString::get directly and represent the empty file by a non-nullptr. I'm not sure yet if there are any bad implications by not having a "canonical" MDString...
FWIW if we keep the std::optional and just deal with the possibility of a nullptr inside, all we seem to need is
patch diff --git a/llvm/include/llvm/IR/DebugInfoMetadata.h b/llvm/include/llvm/IR/DebugInfoMetadata.h index c85c21ddd9f1..2a757fb74674 100644 --- a/llvm/include/llvm/IR/DebugInfoMetadata.h +++ b/llvm/include/llvm/IR/DebugInfoMetadata.h @@ -654,8 +654,9 @@ public: return StringRefChecksum; } std::optional<StringRef> getSource() const { - return Source ? std::optional<StringRef>((*Source)->getString()) - : std::nullopt; + if (!Source) + return std::nullopt; + return std::optional<StringRef>(*Source ? (*Source)->getString() : ""); } MDString *getRawFilename() const { return getOperandAs<MDString>(0); }
and the same tests as added here will pass.
I think we can't use getCanonicalMDString here, because it collapses the empty case into the null case, when they were intended to be differentiated. So this'll need to be manually expanded to the MDString::get(Context, S) without the empty->nullptr part. Then empty -> empty MDString, None -> null, non-empty -> non-empty MDString.
And the test case can be updated to demonstrate that the difference between empty and null is preserved.