This is an archive of the discontinued LLVM Phabricator instance.

{tablegen] Emit string literals instead of char arrays
ClosedPublic

Authored by ldrumm on Jan 20 2020, 6:58 AM.
Tokens
"Love" token, awarded by mstojanovic.

Details

Summary

[tablegen] Emit string literals instead of char arrays

This changes the generated (Instr|Asm|Reg|Regclass)Name tables from this
form:

extern const char HexagonInstrNameData[] = {
  /* 0 */ 'G', '_', 'F', 'L', 'O', 'G', '1', '0', 0,
  /* 9 */ 'E', 'N', 'D', 'L', 'O', 'O', 'P', '0', 0,
  /* 18 */ 'V', '6', '_', 'v', 'd', 'd', '0', 0,
  /* 26 */ 'P', 'S', '_', 'v', 'd', 'd', '0', 0,
  [...]
};

...to this:

extern const char HexagonInstrNameData[] = {
  /* 0 */ "G_FLOG10\0"
  /* 9 */ "ENDLOOP0\0"
  /* 18 */ "V6_vdd0\0"
  /* 26 */ "PS_vdd0\0"
  [...]
};

This should make debugging and exploration a lot easier for mortals,
while providing a significant compile-time reduction for common compilers.

To avoid issues with low implementation limits, this is disabled by
default for visual studio or when cross-compiling.

To force output one way or the other, pass
--long-string-literals=<bool> to tablegen

Diff Detail

Event Timeline

ldrumm created this revision.Jan 20 2020, 6:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 20 2020, 6:58 AM
Herald added a subscriber: mgorny. · View Herald Transcript
mstorsjo added inline comments.Jan 20 2020, 9:37 AM
utils/TableGen/CMakeLists.txt
64 ↗(On Diff #239119)

I'm afraid this isn't sufficient; I normally first do a fully native build, then when crosscompiling, I point it at the already built fully native tblgen.

Also likewise if I don't have an already built tblgen when crosscompiling, cmake is invoked a second time recursively for building these tools, and I'd guess CMAKE_CROSSCOMPILING isn't set there, as it's a normal native build for those tools.

I think the one safe way of handling it would be to add a runtime option to tblgen for making msvc compatible output, and set that when generating tables, if the target compiler is msvc. Then it should work the same with any combination of build configurations. I don't know the tblgen internals if this is doable or what it looks like wrt the tblgen invocations in cmake though...

ldrumm updated this revision to Diff 239162.Jan 20 2020, 10:05 AM
ldrumm marked an inline comment as done.

Ensure elaborate cross-compilation setups are supported

ldrumm updated this revision to Diff 239163.Jan 20 2020, 10:06 AM
rnk accepted this revision.Jan 20 2020, 1:05 PM

Looks good to me, but please let @mstorsjo confirm that this will work before landing. Thanks!

This revision is now accepted and ready to land.Jan 20 2020, 1:05 PM
mstorsjo added inline comments.Jan 20 2020, 1:18 PM
cmake/modules/TableGen.cmake
68 ↗(On Diff #239163)

I think this needs to go inside a if (MSVC) clause instead, as the generator often can be Ninja even though you're building with MSVC.

ldrumm updated this revision to Diff 239272.Jan 21 2020, 3:37 AM

Change the cmake flag check to if(MSVC) rather than if(CMAKE_GENERATOR MATCHES "Visual Studio"). CMAKE_GENERATOR is not the compiler

First char of the title: { -> [.

ldrumm marked an inline comment as done.Jan 21 2020, 8:34 AM

First char of the title: { -> [.

Thanks. Phabricator doesn't seem to track the git commit message, but I've fixed this is on my tree

cmake/modules/TableGen.cmake
68 ↗(On Diff #239163)

fixed

utils/TableGen/CMakeLists.txt
64 ↗(On Diff #239119)

The logic to do that already includes workarounds for visual studio.
I've added them in the same place:

diff
diff --git a/cmake/modules/TableGen.cmake b/cmake/modules/TableGen.cmake
index a1c28870c14..4d888696ec7 100644
--- a/cmake/modules/TableGen.cmake
+++ b/cmake/modules/TableGen.cmake
@@ -63,8 +63,13 @@ function(tablegen project ofn)
     # behavior. Since it doesn't do restat optimizations anyway, just don't
     # pass --write-if-changed there.
     set(tblgen_change_flag)
+    # Visual Studio has problems with string literals greater than 65534 bytes
+    # in length. Ensure tablgen works around this when the generator is VS
+    set(tblgen_long_string_literals_flag "--long-string-literals=0")
   else()
     set(tblgen_change_flag "--write-if-changed")
+    # otherwise use the configuration default
+    set(tblgen_long_string_literals_flag)
   endif()
 
   # We need both _TABLEGEN_TARGET and _TABLEGEN_EXE in the  DEPENDS list
@@ -81,6 +86,7 @@ function(tablegen project ofn)
     ${LLVM_TABLEGEN_FLAGS}
     ${LLVM_TARGET_DEFINITIONS_ABSOLUTE}
     ${tblgen_change_flag}
+    ${tblgen_long_string_literals_flag}
     ${additional_cmdline}
     # The file in LLVM_TARGET_DEFINITIONS may be not in the current
     # directory and local_tds may not contain it, so we must
mstorsjo added inline comments.Jan 21 2020, 10:54 AM
utils/TableGen/CMakeLists.txt
64 ↗(On Diff #239272)

There's no need to check the crosscompiling flag any longer. If the compiler that will compile the generated code is msvc (be it cross or not), we need the legacy mode, but if crosscompiling with something else, we can generate the more efficient code.

I would suggest dropping this bit of the change altogether (and the tablegen-config.h file). The parameter when invoking the tool should always know precisely whether it's needed or not.

ldrumm updated this revision to Diff 239390.Jan 21 2020, 11:30 AM

Remove compile-time detection. Make this a runtime only flag

Awesome, this looks fine with me. I haven't tested it, but it looks sensible to me. I haven't looked at the implementation itself though, but @rnk already approved an earlier version wrt that.

This revision was automatically updated to reflect the committed changes.

Hello @ldrumm,

Sorry, I had to revert this commit in e464b31c1565204e3be114d043bcbf4de61fe2e9, because it broke some Windows builds:
http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/13870

Understood.
It looks like tablegen command-line options are dotted over llvm/lib/Tablegen and utils/Tablegen and the clang tablegen binary just links against the llvm/lib/Tablegen code, but as I only built llvm, not clang the tests all passed for me. Given this option can only affect the llvm-tablegen utility (it's use is in a private header of the llvm-tablegen source tree), it seems like we need to detect this with cmake. I'll update the patch.

There is also the issue of warnings about the size of the generated string. This would cause -Werror builds to fail.

In file included from /w/src/llvm.org/llvm/lib/Target/X86/MCTargetDesc/X86MCTargetDesc.cpp:44:
/w/bld/org/lib/Target/X86/X86GenInstrInfo.inc:32952:11: warning: string literal of length 209447 exceeds maximum length 65536 that C++ compilers are required to support [-Woverlength-strings]
  /* 0 */ "G_FLOG10\0"
          ^~~~~~~~~~~~
ldrumm updated this revision to Diff 239925.EditedJan 23 2020, 9:35 AM
  • only pass the --long-string-literals flags to *LLVM* tablegen (fixes clang-tablegen problem)
  • suppress overlenth-string-literals warning seen in the wild: this behaviour is intended and shouldn't be warned against
rnk added inline comments.Jan 23 2020, 11:11 AM
llvm/cmake/modules/HandleLLVMOptions.cmake
652 ↗(On Diff #239925)

Instead of this, I would suggest doing this in the generated source:

#ifdef __GNUC
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Woverlength-strings"
#endif
...
#ifdef __GNUC
#pragma GCC diagnostic pop
#endif

That way, we still benefit from this warning in places where we aren't explicitly taking steps to avoid the problem.

rnk added inline comments.Jan 23 2020, 11:39 AM
llvm/cmake/modules/TableGen.cmake
66

This works for now, but I just recently I wanted to use SequenceToOffsetTable in clang, but I used StringToOffsetTable.h instead. So far our such tables are small enough not to cause problems, but eventually we probably want to move this thing to the TableGen library, etc, etc. Anyway, this is not a blocking concern.
f63d7637387995765e9ece0e10fe1b5a4f0612b5

ldrumm marked an inline comment as done.Jan 24 2020, 4:00 AM
ldrumm added inline comments.
llvm/cmake/modules/HandleLLVMOptions.cmake
652 ↗(On Diff #239925)

While limiting the surface of the warning might be nice, I'm not convinced by the value proposition of adding code to emit this in every tablegen backend.
I could add emit_boilerplate_macros_prologue/epilogue() functions to SequenceToTableOffset that are called at the beginning and end of every tablegen backend, but again I don't think it's worth the extra complexity vs silencing a basically pointless warning on a compiler that has never had this limitatation

ldrumm marked an inline comment as done.Jan 24 2020, 4:01 AM
ldrumm added inline comments.
llvm/cmake/modules/TableGen.cmake
66

Sounds good. I'm happy to revisit this.

rnk added inline comments.Jan 24 2020, 1:08 PM
llvm/cmake/modules/HandleLLVMOptions.cmake
652 ↗(On Diff #239925)

I guess I was hoping it would be enough to splat that pragma out from emit(). If we can't place the pragma inside the array with long strings, one could update the API of emit to take the type and name of the array so we can get the pragma in the right place without too much duplication. All the callers use the same code pattern anyway, if you ignore the (IMO unnecessary) std::array usage:
https://reviews.llvm.org/P8187

ldrumm updated this revision to Diff 240579.Jan 27 2020, 7:39 AM
ldrumm marked an inline comment as done.

Now emit long-string disabling GCC diagnostic pragmas around string definitions to silence gcc's -Woverlength-strings on a case-by-case basis

ldrumm reopened this revision.Jan 27 2020, 7:39 AM
ldrumm added inline comments.
llvm/cmake/modules/HandleLLVMOptions.cmake
652 ↗(On Diff #239925)

Okay. I understand. We can't have the pragma *inside* the initializer (hard error), so we have to take the declaration name from the user code and emit that after the pragma a'la:

#ifdef __GNUC__
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Woverlength-strings"
#endif
extern const char AArch64RegStrings[] = {
  /* 0 */ "B10\0"
 [...]
  /* 2397 */ "NZCV\0"
};
#ifdef __GNUC__
#pragma GCC diagnostic pop
#endif

and the api for this now looks like this in use:

c++
  // Emit the string table.
  RegStrings.layout();
  RegStrings.emitStringLiteralDef(
      OS, Twine("extern const char ") + TargetName + "RegStrings[]");
This revision is now accepted and ready to land.Jan 27 2020, 7:39 AM
rnk accepted this revision.Jan 27 2020, 10:06 AM

Thanks!

This revision was automatically updated to reflect the committed changes.