This is an archive of the discontinued LLVM Phabricator instance.

[Clang][CodeGen][ObjC]: Fix non-bridged CoreFoundation builds on ELF targets that use `-fconstant-cfstrings`.
ClosedPublic

Authored by kristina on Sep 21 2018, 2:32 AM.

Details

Summary

Attempt to unbreak behavior caused by D44491 resulting in inability to build standalone/unbridged CoreFoundation on an ELF target with -fconstant-cfstrings as the cast<> expression would fail when attempting to use the builtin since the faux-ObjectiveC class backing CFSTR constant strings is an llvm::Constant being cast to an llvm::GlobalValue. This only affects ELF targets, PE/COFF targets will still follow the same path so this should not affect them and takes an approach more in-line with how Apple toolchains behave.

I will add a lit test for Clang codegen shortly after I figure out how to get it to work correctly, though with said patch I'm able to build standalone CoreFoundation with constant CFStrings without it being bridged against Foundation or Swift runtimes (as far as I know that was previously only supported on Windows).

Trying to make a similar test to cfstring-windows.c but still trying to get the hang of FileCheck.

********************
Testing Time: 10.56s
********************
Failing Tests (1):
    Clang :: CodeGen/cfstring-linux-unbridged.c

Diff Detail

Repository
rL LLVM

Event Timeline

kristina created this revision.Sep 21 2018, 2:32 AM

Does appear to work properly:

cfstring:00000000002AC640 off_2AC640      dq offset __CFConstantStringClassReference
cfstring:00000000002AC640                                         ; DATA XREF: skipDTD+19D↑o
cfstring:00000000002AC640                                         ; skipDTD+2F7↑o
cfstring:00000000002AC648                 dq offset stru_7B8.st_size
cfstring:00000000002AC650                 dq offset dword_0
cfstring:00000000002AC658                 dq offset qword_28+4

Though it seems it's best off if the section for it were renamed to something like .text.cfstring as it otherwise results in a rather bizarre ELF file layout. I should probably mark the section as constant when building it as well:

[18] cfstring          PROGBITS        00000000002a9580 2a9580 007740 00  WA  0   0  8
compnerd accepted this revision.Sep 21 2018, 11:38 AM

This looks fine to me, but this definitely should have an accompanying test.

As to the other items you mention, the current section naming actually is important as it enables the CFString structures to be located (the linker will create __start_cfstring and __stop_cfstring synthetic symbols to mark the beginning and ending of the section), allowing the runtime to get to the section (the section is meant to be an array of objects). Additionally, the section is not marked as constant because the runtime can actually change the contents (CFConstantString is *not* constant as you would think, the underlying value can be changed as can the isa pointer).

lib/CodeGen/CodeGenModule.cpp
4101 ↗(On Diff #166414)

CGV doesn't really follow the expectation (GV was short for GlobalValue, this is a Constant so C seems more appropriate).

4108 ↗(On Diff #166414)

I think that the comment isn't particularly helpful - it basically is directing you to look at the history of the file.

This revision is now accepted and ready to land.Sep 21 2018, 11:38 AM
rjmccall added inline comments.Sep 21 2018, 12:23 PM
lib/CodeGen/CodeGenModule.cpp
4108 ↗(On Diff #166414)

I think we should just skip this step, even on COFF, if it's not a GlobalValue.

Probably the most correct move would be to only apply this logic if the IR declaration was synthesized. Or maybe even just change this code to look for a global variable called __CFConstantStringClassReference in the first place and then emit a reference to it if found, and only otherwise try to build the synthetic variable. But just bailing out early if something weird is going on is also a legitimate step.

Would you be okay with me renaming cfstring to .cfstring for ELF so it's in line with ELF section naming conventions (I'm not entirely sure if that could cause issues with ObjC stuff)? And yes I think it's best to avoid that code-path altogether if it turns out to be a constant as that's likely from the declaration of the type, I'll write a FileCheck test in a moment and check that I can apply the same logic to ELF aside from the DLL stuff as CoreFoundation needs to export the symbol somehow while preserving the implicit extern attribute for every other TU that comes from the builtin that CFSTR expands to. Is what I'm suggesting making sense? If not, let me know, I may be misunderstanding what's happening here.

I'll add the test and another revision in a second, maybe that will clear it up a bit.

lib/CodeGen/CodeGenModule.cpp
4108 ↗(On Diff #166414)

Doesn't this code practically do this since creating a runtime global is a no-op if it already exists within the module, however there is potentially one module which can have it which is CFString.c (assuming no bridging here of any kind) while it makes sense for the rest to refer to it as an implicit extern (in which case I need to add an ELF code path to do exactly the same.

Would you be okay with me renaming cfstring to .cfstring for ELF so it's in line with ELF section naming conventions (I'm not entirely sure if that could cause issues with ObjC stuff)? And yes I think it's best to avoid that code-path altogether if it turns out to be a constant as that's likely from the declaration of the type, I'll write a FileCheck test in a moment and check that I can apply the same logic to ELF aside from the DLL stuff as CoreFoundation needs to export the symbol somehow while preserving the implicit extern attribute for every other TU that comes from the builtin that CFSTR expands to. Is what I'm suggesting making sense? If not, let me know, I may be misunderstanding what's happening here.

Following the ELF conventions seems like the right thing to do, assuming it doesn't cause compatibility problems. David Chisnall might know best here.

lib/CodeGen/CodeGenModule.cpp
4108 ↗(On Diff #166414)

Notice that this logic runs regardless of whether we actually created the global, though; we're potentially overwriting stuff that was decided based on content from the actual declaration.

Would you be okay with me renaming cfstring to .cfstring for ELF so it's in line with ELF section naming conventions (I'm not entirely sure if that could cause issues with ObjC stuff)? And yes I think it's best to avoid that code-path altogether if it turns out to be a constant as that's likely from the declaration of the type, I'll write a FileCheck test in a moment and check that I can apply the same logic to ELF aside from the DLL stuff as CoreFoundation needs to export the symbol somehow while preserving the implicit extern attribute for every other TU that comes from the builtin that CFSTR expands to. Is what I'm suggesting making sense? If not, let me know, I may be misunderstanding what's happening here.

Following the ELF conventions seems like the right thing to do, assuming it doesn't cause compatibility problems. David Chisnall might know best here.

We would lose the __start_ and __end_ markers (which @compnerd mentioned) if the section was renamed to .cfstring, because the linker only generates those markers for sections whose names are valid C identifiers, so I don't think that rename would be valid. Here's how LLD does it, for example: https://reviews.llvm.org/diffusion/L/browse/lld/trunk/ELF/Writer.cpp;342772$1764-1776

kristina updated this revision to Diff 166576.Sep 21 2018, 4:35 PM

Does this look about right? I'm not sure what's up with DSO local attribute, it doesn't seem to apply, I think this test makes sense though. ninja check-clang-codegen succeeded this time:

Testing Time: 12.77s
  Expected Passes    : 1242
  Expected Failures  : 2
  Unsupported Tests  : 120

Bitcode looks like this (for the 3rd case, which is a fairly odd one since it's a variable sized array without extern):

/q/org.llvm.caches/llvm-8.0/4132/bin/clang -cc1 -internal-isystem /q/org.llvm.caches/llvm-8.0/4132/lib/clang/8.0.4132/include -nostdsysteminc -triple x86_64-elf -S -emit-llvm /SourceCache/llvm-trunk-8.0.4132/tools/clang/test/CodeGen/cfstring-elf.c -o -

; ModuleID = '/SourceCache/llvm-trunk-8.0.4132/tools/clang/test/CodeGen/cfstring-elf.c'
source_filename = "/SourceCache/llvm-trunk-8.0.4132/tools/clang/test/CodeGen/cfstring-elf.c"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-unknown-elf"

%struct.__NSConstantString_tag = type { i32*, i32, i8*, i64 }
%struct.__CFString = type opaque

@.str = private unnamed_addr constant [7 x i8] c"string\00", section ".rodata", align 1
@_unnamed_cfstring_ = private global %struct.__NSConstantString_tag { i32* getelementptr inbounds ([0 x i32], [0 x i32]* bitcast ([1 x i64]* @__CFConstantStringClassReference to [0 x i32]*), i32 0, i32 0), i32 1992, i8* getelementptr inbounds ([7 x i8], [7 x i8]* @.str, i32 0, i32 0), i64 6 }, section "cfstring", align 8
@string = constant %struct.__CFString* bitcast (%struct.__NSConstantString_tag* @_unnamed_cfstring_ to %struct.__CFString*), align 8
@__CFConstantStringClassReference = common global [1 x i64] zeroinitializer, align 8

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"Kristina's toolchain (8.0.4132+assert+modules+thinlto+tests+debug) clang version 8.0.4132 (https://llvm.googlesource.com/clang 8c1a8e6be4042ef6e930f19162f5e308ac56a38e) (https://llvm.googlesource.com/llvm bf40f1cb4e37e0046f94428f6332255ccf00e440) (based on LLVM 8.0.4132svn)"}
kristina marked 2 inline comments as done.EditedSep 21 2018, 4:48 PM

Binary layout also looks sane (compiled with -fPIC -faddrsig -Wl,--icf=safe), just digging it through it in a dissasembler:

[/q/src/clwn]$ llvm-readobj -elf-output-style=GNU -sections /q/org.llvm.caches/clownschool/clwn-sysroot/System/Library/Frameworks/CoreFoundation.framework/libCoreFoundation.so
There are 36 section headers, starting at offset 0x3f57d8:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .dynsym           DYNSYM          0000000000000200 000200 00fd08 18   A  5   1  8
  [ 2] .gnu.version      VERSYM          000000000000ff08 00ff08 001516 02   A  1   0  2
  [ 3] .gnu.version_r    VERNEED         0000000000011420 011420 0001a0 00   A  5   5  4
  [ 4] .gnu.hash         GNU_HASH        00000000000115c0 0115c0 003bf8 00   A  1   0  8
  [ 5] .dynstr           STRTAB          00000000000151b8 0151b8 011677 00   A  0   0  1
  [ 6] .rela.dyn         RELA            0000000000026830 026830 01f860 18   A  1   0  8
  [ 7] .rela.plt         RELA            0000000000046090 046090 006408 18   A  1   0  8
  [ 8] .rodata           PROGBITS        000000000004c4a0 04c4a0 00fdd0 00 AMS  0   0 32
  [ 9] .gcc_except_table PROGBITS        000000000005c270 05c270 0000cc 00   A  0   0  4
  [10] .eh_frame_hdr     PROGBITS        000000000005c33c 05c33c 005934 00   A  0   0  4
  [11] .eh_frame         PROGBITS        0000000000061c70 061c70 017fac 00   A  0   0  8
  [12] .text             PROGBITS        000000000007a000 07a000 228e1f 00  AX  0   0 16
  [13] .init             PROGBITS        00000000002a2e20 2a2e20 00001a 00  AX  0   0  4
  [14] .fini             PROGBITS        00000000002a2e3c 2a2e3c 000009 00  AX  0   0  4
  [15] .plt              PROGBITS        00000000002a2e50 2a2e50 0042c0 00  AX  0   0 16
  [16] .data             PROGBITS        00000000002a8000 2a8000 001580 00  WA  0   0 16
  [17] .tm_clone_table   PROGBITS        00000000002a9580 2a9580 000000 00  WA  0   0  8
  [18] cfstring          PROGBITS        00000000002a9580 2a9580 007740 00  WA  0   0  8
  [19] .got.plt          PROGBITS        00000000002b0cc0 2b0cc0 002170 00  WA  0   0  8
  [20] .fini_array       FINI_ARRAY      00000000002b3000 2b3000 000008 00  WA  0   0  8
  [21] .init_array       INIT_ARRAY      00000000002b3008 2b3008 000010 00  WA  0   0  8
  [22] .data.rel.ro      PROGBITS        00000000002b3020 2b3020 007a18 00  WA  0   0 16
  [23] .dynamic          DYNAMIC         00000000002baa38 2baa38 000200 10  WA  5   0  8
  [24] .got              PROGBITS        00000000002bac38 2bac38 000718 00  WA  0   0  8
  [25] .bss              NOBITS          00000000002bc000 2bb350 0050a8 00  WA  0   0 16
  [26] .comment          PROGBITS        0000000000000000 2bb350 0002b3 01  MS  0   0  1
  [27] .debug_str        PROGBITS        0000000000000000 2bb603 00b6a0 01  MS  0   0  1
  [28] .debug_abbrev     PROGBITS        0000000000000000 2c6ca3 00162d 00      0   0  1
  [29] .debug_info       PROGBITS        0000000000000000 2c82d0 0261bc 00      0   0  1
  [30] .debug_macinfo    PROGBITS        0000000000000000 2ee48c 000053 00      0   0  1
  [31] .debug_line       PROGBITS        0000000000000000 2ee4df 0cb0f5 00      0   0  1
  [32] .debug_ranges     PROGBITS        0000000000000000 3b95d4 000030 00      0   0  1
  [33] .symtab           SYMTAB          0000000000000000 3b9608 01d370 18     35 2288  8
  [34] .shstrtab         STRTAB          0000000000000000 3d6978 000165 00      0   0  1
  [35] .strtab           STRTAB          0000000000000000 3d6add 01ecf4 00      0   0  1

Code style is a bit odd in one place but even with 2 space indentation sometimes 80 column rule is difficult to maneuver around. I'm curious about what common symbols in ELF are, never quite figured it out but in tests they seem to fit in where dso_local previously was when using declspecs, not sure if this is normal or if this is going to affect LTO.

kristina marked 3 inline comments as done.EditedSep 22 2018, 4:05 PM

Want to see what @rnk has to say about this before landing it since he wrote the original code and if my understanding of common vs dso_local is accurate or not since I don't have much experience with the Windows specific metadata, so I'll likely land it on Monday if he's happy with it. I kept cfstring as is, without the dot as per comment and now ELF and PE/COFF share the same paths sans the DLLImport/DLLExport part.

Also moved actual string literals to .rodata for ELF similar to Mach-O so they can be ICF'd. I don't think distinguishing them from UTF16 will make much of a difference and UTF16 is not really used much on ELF targets anyway.

Would you be okay with me renaming cfstring to .cfstring for ELF so it's in line with ELF section naming conventions (I'm not entirely sure if that could cause issues with ObjC stuff)? And yes I think it's best to avoid that code-path altogether if it turns out to be a constant as that's likely from the declaration of the type, I'll write a FileCheck test in a moment and check that I can apply the same logic to ELF aside from the DLL stuff as CoreFoundation needs to export the symbol somehow while preserving the implicit extern attribute for every other TU that comes from the builtin that CFSTR expands to. Is what I'm suggesting making sense? If not, let me know, I may be misunderstanding what's happening here.

Following the ELF conventions seems like the right thing to do, assuming it doesn't cause compatibility problems. David Chisnall might know best here.

I don't believe it will have any compatibility issues. We expose builtins for creating CF- and NSString objects from C code. For Apple targets, these are equivalent. For targets that don't care about Objective-C interop, the CF version is sensible because it doesn't introduce any dependencies on an Objective-C runtime. For code targeting a non-Apple runtime and wanting CF / Foundation interop, the CFSTR macro expands to the NS version.

As others have pointed out, the section must be a valid C identifier for the magic __start_ and __stop_ symbols to be created. I don't really like this limitation and it would be nice if we could improve lld (and encourage binutils to do the same) so that __start_.cfstring and __stop_.cfstring could be used to create symbols pointing to the start and end of a section because it's useful to name things in such a way that they can't conflict with any valid C identifier. With PE/COFF, we have a mechanism for more precise control and can give sections any names that we want (and also arrange for start and end symbols for subsections).

I can respect wanting to change the rules on ELF, but it sounds like we do need to stick with the current section names. Absent an ABI break to stop looking for the existing section names, CF will misbehave if they exist even if we change the compiler output, so effectively those identifiers are claimed in a way that cannot be incrementally changed.

Also, it's probably correct for CF on non-Darwin OSes to stick to non-system namespaces anyway.

I can respect wanting to change the rules on ELF, but it sounds like we do need to stick with the current section names. Absent an ABI break to stop looking for the existing section names, CF will misbehave if they exist even if we change the compiler output, so effectively those identifiers are claimed in a way that cannot be incrementally changed.

Also, it's probably correct for CF on non-Darwin OSes to stick to non-system namespaces anyway.

I can respect wanting to change the rules on ELF, but it sounds like we do need to stick with the current section names. Absent an ABI break to stop looking for the existing section names, CF will misbehave if they exist even if we change the compiler output, so effectively those identifiers are claimed in a way that cannot be incrementally changed.

Also, it's probably correct for CF on non-Darwin OSes to stick to non-system namespaces anyway.

Yes makes sense, I kept it as cfstring, should be in that revision.

Alright, I guess I'll land it if there's no objections to it, cfstring is staying as is, the PE/COFF codepath is not affected in terms of functionality and test is fine I think, hopefully x86_64-elf will not yield different results on different machines in terms of alignment etc. That should allow building CF on x86_64 Linux without any additional runtime and fix a crash, as well as make the check explicit instead of relying on a blind cast<>.

kristina closed this revision.Sep 24 2018, 7:19 AM

Closed by rL342883 (rC342883). Manually closing it as Phabricator is slightly broken, hopefully it makes the links when it catches up.

kristina reopened this revision.EditedSep 24 2018, 11:27 AM

Cascade of build failures stemming from GV->setSection(".rodata");, reverted the commit, it seems that CFString.c is causing all those issues despite passing when ran locally. Not sure if it's actually causing a regression or if the test needs some bigger updates for ELF.

This revision is now accepted and ready to land.Sep 24 2018, 11:27 AM

Turns out test didn't execute as there are two tests named CFString.c and cfstring.c which usually shouldn't matter unless LLVM source checkout is done into a case insensitive filesystem in which case the failing test is skipped, this seems like a bug, going to rename CFString.c to CFString3.c and revise it to adjust backing stores for actual literals to quote .rodata like in the other test. Pretty silly issue and why test suite ran just fine locally but failed when ran on case-sensitive buildbots.

kristina updated this revision to Diff 166996.Sep 25 2018, 3:00 PM

Split up the tests for ELF into tests 1). Checking whether CF can be built and linked correctly 2). Asm section tests originally from CFStrings.c as they had additional ELF-related cruft that should have been a separate test. Renamed CFStrings.c to cfstring3.c for consistency, which now tests ELF/COFF/MachO in a consistent fashion versus having a hunk of out of place ELF-specific tests.

This revision was automatically updated to reflect the committed changes.
cfe/trunk/lib/CodeGen/CodeGenModule.cpp