Page MenuHomePhabricator

[llvm] Let Intrinsics.h depend only on the relevant part of Intrinsics.gen (NFC, part of the fix for bug#11951)
Needs ReviewPublic

Authored by leon on May 27 2015, 2:41 PM.

Details

Summary

This is my first patch for LLVM, so I selected reviewers based on the information of git blame as well as those on the CC list of the related bug. I am new to cmake so I will appreciate if experts review my changes in the cmake files carefully.

Motivation for this change

Intrinsics.h needs from the TableGen'erated file Intrinsics.gen only the list of Intrinsics ID enumerators, however it used to #include all of Intrinsics.gen. The latter by now is over 3MiB, while its fragment that is relevant to Intrinsics.h is less than 350KiB.

Solution:

Intrinsics.gen is post-processed into a new file Intrinsics.ids. Post-processing is performed by a new utility extract_ifdef_block. I explored the following alternative solutions before selecting this one as the most optimal one:

  1. Make TableGen produce Intrinsics.ids directly (e.g. through a new mode -gen-intrinsics-ids) Cons:
    • As a continuation of the work on bug#11951 I will need to extract an extra couple of sections from Intrinsics.gen into files of their own. Adding more options to TableGen would unnecessarily pollute its API.
  1. Use the standard preprocessor. Cons:
    • Issues in CMake with making it work on different platforms
    • The standard preprocessor implicitly includes compiler support header files (e.g. /usr/include/stdc-predef.h for gcc) and pollutes the output with the following lines:
      1. 1 "include/llvm/IR/Intrinsics.gen"
      2. 1 "<command-line>"
      3. 1 "/usr/include/stdc-predef.h" 1 3 4
      4. 1 "<command-line>" 2
      5. 1 "include/llvm/IR/Intrinsics.gen"
      6. 19 "include/llvm/IR/Intrinsics.gen"

        Moreover, if the preprocessor is instructed to preserve the comments (so that the intrinsics enum ids are accompanied by the intrinsics names) then the comments from the compiler support files show up too. All this garbage may be a little confusing if anyone needs to look into Intrinsics.ids.

extract_ifdef_block is a rudimentary pre-processor that filters from the input the mentioned #ifdef/#endif block. There are actually two implementations (if the LLVM build system allowed using python or perl during the build proper step, I would create just one version in one of those languages):

  1. utils/Misc/extract_ifdef_block.sh: bash version, serving the autoconf build
  2. utils/Misc/extract_ifdef_block.cpp: produces a binary executable target for the CMake build

Diff Detail

Repository
rL LLVM

Event Timeline

leon updated this revision to Diff 26629.May 27 2015, 2:41 PM
leon retitled this revision from to [llvm] Avoid #including Intrinsics.gen in Intrinsics.h (NFC, part of bug#11951).
leon updated this object.
leon edited the test plan for this revision. (Show Details)
leon set the repository for this revision to rL LLVM.
leon retitled this revision from [llvm] Avoid #including Intrinsics.gen in Intrinsics.h (NFC, part of bug#11951) to [llvm] Avoid #including Intrinsics.gen in Intrinsics.h (NFC, part of fix for bug#11951).May 27 2015, 2:44 PM
leon added a subscriber: Unknown Object (MLST).
leon updated this revision to Diff 26705.May 28 2015, 12:05 PM
leon retitled this revision from [llvm] Avoid #including Intrinsics.gen in Intrinsics.h (NFC, part of fix for bug#11951) to [llvm] Let Intrinsics.h depend only on the relevant part of Intrinsics.gen (NFC, part of the fix for bug#11951).
leon updated this object.

Fixed the autoconf build.

rnk added a subscriber: rnk.May 28 2015, 1:49 PM

I agree we should do this, but why don't we just change tblgen to output multiple intrinsics files?

leon added a comment.May 28 2015, 8:52 PM

The current use model of tblgen is to write a single output file
specified through the -o option. The file is opened by the
llvm::TableGenMain() function and reaches the emitting code as
raw_ostream&, which prevents the emitting code from writing additional
output files with their names derived from the name of the primary
output file.

rnk added a comment.May 29 2015, 9:38 AM
In D10073#180808, @leon wrote:

The current use model of tblgen is to write a single output file
specified through the -o option. The file is opened by the
llvm::TableGenMain() function and reaches the emitting code as
raw_ostream&, which prevents the emitting code from writing additional
output files with their names derived from the name of the primary
output file.

True, but you can still run tablegen multiple times with separate generators, and this naturally leverages our other tablegen optimization which only updates the .inc file if it would've changed. I expect this optimization is worth more than the cost of reparsing the .td files.

leon added a comment.May 29 2015, 12:01 PM

True, but you can still run tablegen multiple times with separate generators,

I have considered that possibility (and mentioned it at
http://reviews.llvm.org/D10073), but in my humble opinion enhancing
tblgen's interface with additional named generators for finer-grained
control over intrinsics generation wouldn't be the right decision.

rnk added a comment.May 29 2015, 1:40 PM
In D10073#181161, @leon wrote:

I have considered that possibility (and mentioned it at
http://reviews.llvm.org/D10073), but in my humble opinion enhancing
tblgen's interface with additional named generators for finer-grained
control over intrinsics generation wouldn't be the right decision.

Consider that every target essentially starts from Arch.td and generates many different kinds of tables (assembler maps, disassembler maps, SDAG state machines, etc). This is a pretty common pattern already, so it seems better to be consistent and not add more tools for source code generation than we already have. The CMake logic for controlling them is ugly and hard to get right across multiple generators, but we already have it wrapped up and abstracted for tablegen.

espindola edited reviewers, added: espindola; removed: rafael.Mar 15 2018, 11:02 AM