This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Add support for UTF-8 unicode formatting
ClosedPublic

Authored by ljmf00 on Oct 26 2021, 11:02 AM.

Details

Summary

This patch adds missing formatting for UTF-8 unicode.

Cross-referencing https://reviews.llvm.org/D66447

Diff Detail

Event Timeline

ljmf00 created this revision.Oct 26 2021, 11:02 AM
ljmf00 requested review of this revision.Oct 26 2021, 11:02 AM

I can reproduce this with LDC (D LLVM-based compiler), but I don't know a way to do it with clang. Probably I need to manually write a .ll file for this. Can someone guide me on adding tests for this?

ljmf00 added a subscriber: Geod24.Oct 26 2021, 12:29 PM

I will also create an issue and cross-reference this.

labath added a subscriber: labath.Oct 26 2021, 11:37 PM

What exactly do you need the test to do? Is printing a global variable (char8_t_ish foo[] = "my string") sufficient?
If so, you can take a look at the tests in test/Shell/SymbolFile/DWARF for inspiration. There are .s and .ll tests there.
The pattern would roughly be:

# RUN: llvm-mc (or llc) -o %t %s
# RUN: %lldb %t -o "target variable foo" | FileCheck %s

# CHECK: (char8_t_ish[]) foo = "my string"

your code here

What exactly do you need the test to do? Is printing a global variable (char8_t_ish foo[] = "my string") sufficient?
If so, you can take a look at the tests in test/Shell/SymbolFile/DWARF for inspiration. There are .s and .ll tests there.
The pattern would roughly be:

# RUN: llvm-mc (or llc) -o %t %s
# RUN: %lldb %t -o "target variable foo" | FileCheck %s

# CHECK: (char8_t_ish[]) foo = "my string"

your code here

Yeah, it is a bit more complicated than that although, due to https://bugs.llvm.org/show_bug.cgi?id=45856 . I'm going to take a look at those tests and write some.

What exactly do you need the test to do? Is printing a global variable (char8_t_ish foo[] = "my string") sufficient?
If so, you can take a look at the tests in test/Shell/SymbolFile/DWARF for inspiration. There are .s and .ll tests there.
The pattern would roughly be:

# RUN: llvm-mc (or llc) -o %t %s
# RUN: %lldb %t -o "target variable foo" | FileCheck %s

# CHECK: (char8_t_ish[]) foo = "my string"

your code here

I was writing the tests locally, although I can't reproduce the failure with a simple type. I can get the error when using a custom LLDB formatter, however. See https://github.com/ljmf00/lldb-d/blob/main/lldb/source/Plugins/Language/D/DTypeUtils.cpp#L82 . I should also rename this patch to something less misleading, since it is not the type name the issue, I just got this right due to fallback on invalid encoding on my side.

That said, do you know an easier way to test this?

ljmf00 updated this revision to Diff 395777.Dec 21 2021, 5:31 PM
ljmf00 retitled this revision from [lldb] Add support for custom char8_t types with different name to [lldb] Add support for UTF-8 unicode formatting.
ljmf00 edited the summary of this revision. (Show Details)

@labath I found the missing part, DumpTypeValue depends on lldb::Format and eFormatUnicode8 case should be added there too. I added tests and also changed the patch message. Can you re-review?

labath accepted this revision.Dec 22 2021, 3:38 AM

This looks fine. Thanks for tracking this down.

This revision is now accepted and ready to land.Dec 22 2021, 3:38 AM
This revision was automatically updated to reflect the committed changes.