This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Also recognize DWARF UTF base types using their size
ClosedPublic

Authored by Geod24 on May 7 2020, 2:31 AM.

Details

Summary

The D programming language has 'char', 'wchar', and 'dchar' as base types,
which are defined as UTF-8, UTF-16, and UTF-32, respectively.

It also has type constructors (e.g. 'const' and 'immutable'),
that leads to D compilers emitting DW_TAG_base_type with DW_ATE_UTF
and name 'char', 'immutable(wchar)', 'const(char)', etc...

Before this patch, DW_ATE_UTF would only recognize types that
followed the C/C++ naming, and emit an error message for the rest, e.g.:

error: need to add support for DW_TAG_base_type 'immutable(char)'
encoded with DW_ATE = 0x10, bit_size = 8

The code was changed to check the byte size first,
then fall back to the old name-based check.

Diff Detail

Event Timeline

Geod24 created this revision.May 7 2020, 2:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 7 2020, 2:31 AM
Geod24 added a comment.May 7 2020, 2:34 AM

I wasn't sure what would be the best place / how to add an unit test for this. Could a reviewer provide some pointers ?

labath added a subscriber: labath.May 7 2020, 2:59 AM

Thanks for the patch.

I wasn't sure what would be the best place / how to add an unit test for this. Could a reviewer provide some pointers ?

I think the simplest approach would be to create some assembly representing a compile unit which declares a couple of global variables of this type and then run "lldb -o "target variable var1 var2..." over the compiled object. You can look at some of the files in test/Shell/SymbolFile/DWARF (e.g. DW_OP_piece-struct.s) for inspiration. Using llvm IR instead of assembly would work too, assuming you can represent this thing in IR..

Geod24 updated this revision to Diff 262818.May 7 2020, 11:47 PM

Added Shell test

A shell test was added using LLVM IR.
The test has to run the generated binary, as there is currently a (likely) bug
that leads to types being printed differently before and after 'run' is issued.

The running aspect is somewhat unfortunate, as it forces the input to be compatible with the host system. I wouldn't be surprised if the test broke somewhere (in fact, I would be surprised if it didn't), because ir already hardcodes a lot of assumptions about the host system.

Given that the fix really is about how types are parsed, it should be sufficient to run something like "type lookup" on these types and verifying that. Right now I get

struct string {
    unsigned long length;
    void *ptr;
}

for type lookup string. I'm guessing that after your patch the void * will change into something else. Alternatively, you could just check for the existing formatting of "target variable utf8", but leave a note that this does not check the precise formatting of the variable, just its utf8-ness. Then we can just update the expectation when the pretty printer bug is fixed.

Geod24 updated this revision to Diff 263401.May 12 2020, 5:10 AM

Changed the test from running to 'type lookup string'

Due to bug https://bugs.llvm.org/show_bug.cgi?id=45856 the test was originally runnable,
but a non-runnable alternative was suggested during review.

labath accepted this revision.May 12 2020, 6:21 AM

Looks great now, just a small tweak to the test case.

lldb/test/Shell/SymbolFile/DWARF/DW_TAG_basic_type_DW_ATE_UTF_nonC.ll
20

Now that we're not running the binary, you can remove the main function, and replace %clang_host with %clang --target original-ldc2-target -c. Without that, we might as well have run the binary, because if it compiles for the host, chances are it would have run correctly too...

This revision is now accepted and ready to land.May 12 2020, 6:21 AM
Geod24 updated this revision to Diff 263442.May 12 2020, 8:13 AM

Remove un-needed main & use x86_64-pc-gnu as target instead of host

Since we're not running the program anymore, we no longer need a main.
The comment, build step, and IR of the test were updated accordingly.

Geod24 updated this revision to Diff 263462.May 12 2020, 10:03 AM

Remove a trailing whitespace in TypeSystemClang.cpp

The pre-merge checks are apparently failing, and from what I can gather from the logs,
it seems to be because of a whitespace issue:

[2020-05-12T16:19:00.309Z] + clang-tidy-diff -p1 -quiet
[2020-05-12T16:19:00.309Z] + sed '/^[[:space:]]*$/d'
[2020-05-12T16:19:03.616Z] 972 warnings and 1 error generated.
[2020-05-12T16:19:03.616Z] Error while processing [...]/lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp.

Since there was a trailing whitespace in that file, I added an extra commit to remove it.

Geod24 updated this revision to Diff 263463.May 12 2020, 10:08 AM

Add missing commit

See previous diff.

Geod24 updated this revision to Diff 263477.May 12 2020, 11:05 AM

Merge whitespace-removing commit into the main one

Since I can't seem to get arc to comply.

Yeah, harbormaster is a bit temperamental these days. I wouldn't worry too much about it reporting problems clearly unrelated to your patch.

Btw, do you have commit access, or need someone to commit this for you?

Good to know, thanks! It's also my first time using arc, so I had to poke around a bit. In any case I didn't seem to include that whitespace fix even after squashing the commits together. If this is fine as is, then great.

No I don't have commit access.

labath closed this revision.May 13 2020, 3:58 AM

Committed as e16111ce2f. I've also added checks for the dstring/wstring types.

Thanks!
I suppose this won't be in the next patch release (10.0.1) but instead in the next major ?