This is an archive of the discontinued LLVM Phabricator instance.

[C++20][Modules][HU 4/5] Handle pre-processed header units.
ClosedPublic

Authored by iains on Mar 7 2022, 2:28 AM.

Details

Summary

We wish to support emitting a pre-processed output for an importable
header unit, that can be consumed to produce the same header units as
the original source.

This means that ee need to find the original filename used to produce
the re-preprocessed output, so that it can be assigned as the module
name. This is peeked from the first line of the pre-processed source
when the action sets up the files.

Diff Detail

Event Timeline

iains created this revision.Mar 7 2022, 2:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 2:28 AM
iains published this revision for review.Mar 7 2022, 4:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 4:39 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
iains updated this revision to Diff 413612.Mar 7 2022, 1:17 PM

rebased

iains updated this revision to Diff 414891.Mar 12 2022, 3:21 PM

rebased.

It lacks tests. This is not NFC, right?


I am a little bit confused for the intuition. Couldn't we just import the pre-processed header? What's the problem? Could you elaborate on this?

iains added a comment.Mar 15 2022, 2:14 PM

It lacks tests. This is not NFC, right?

Right, (there are tests with the next patch which introduces the mechanism for producing the pre-processed output)
but, I will find a suitable one for this...


I am a little bit confused for the intuition. Couldn't we just import the pre-processed header? What's the problem? Could you elaborate on this?

We cannot import a pre-processed file; the preprocessed file is still source code, not a CMI.

In addition, the pre-processor output for a header unit requires further pre-processing on read.
This is because a header unit actually preserves some of the pre-processor information [macro definitions] (which would normally be discarded after phase 4).

Header units are identifiable by a name (in common with GCC, we make the name == the path by which the header is specified) -- that name is intentionally not a legal named module name (neither is it, in general, a legal identifier).

I am a little bit confused for the intuition. Couldn't we just import the pre-processed header? What's the problem? Could you elaborate on this?

We cannot import a pre-processed file; the preprocessed file is still source code, not a CMI.

But if I understand correctly, we are able to import a pre-processed file as a header unit. Couldn't we?

For example:

// cc -E foo.h > foo.preprocessed
import "foo.preprocessed"
...

Is this not acceptable?

In addition, the pre-processor output for a header unit requires further pre-processing on read.
This is because a header unit actually preserves some of the pre-processor information [macro definitions] (which would normally be discarded after phase 4).

Header units are identifiable by a name (in common with GCC, we make the name == the path by which the header is specified) -- that name is intentionally not a legal named module name (neither is it, in general, a legal identifier).

So it doesn't eliminate my confusion. I guess there is some workflows that we need to import a preprocessed header as header unit (and we need to find the position of original file). Or in what circumstances, do we need to import a pre-processed header as header unit?

iains added a comment.Mar 16 2022, 1:36 AM

I am a little bit confused for the intuition. Couldn't we just import the pre-processed header? What's the problem? Could you elaborate on this?

We cannot import a pre-processed file; the preprocessed file is still source code, not a CMI.

But if I understand correctly, we are able to import a pre-processed file as a header unit. Couldn't we?

For example:

// cc -E foo.h > foo.preprocessed
import "foo.preprocessed"
...

Is this not acceptable?

In addition, the pre-processor output for a header unit requires further pre-processing on read.
This is because a header unit actually preserves some of the pre-processor information [macro definitions] (which would normally be discarded after phase 4).

Header units are identifiable by a name (in common with GCC, we make the name == the path by which the header is specified) -- that name is intentionally not a legal named module name (neither is it, in general, a legal identifier).

So it doesn't eliminate my confusion. I guess there is some workflows that we need to import a preprocessed header as header unit (and we need to find the position of original file). Or in what circumstances, do we need to import a pre-processed header as header unit?

A pre-processed header unit can only be used to build that header unit - just like a pre-processed c++ source is used to build an object for the original source file.

In this code, we are not considering importing the HU into another module - that can only be done once the HU is built as a CMI.

The pre-processed output has two immediate uses I could see:
(1) traditionally, it can be helpful in debugging and problem reporting [since the included headers are flattened]
(2) because we will have processed the directives, it might be useful for scanners that want to determine remaining dependencies.

OK, I feel good if this one contains tests.

iains updated this revision to Diff 416847.Mar 21 2022, 1:35 AM

rebased, added test.

yeah, gcc's preprocessor has peeking code to deal with this (more generally than modules)

urnathan accepted this revision.Mar 22 2022, 10:31 AM
This revision is now accepted and ready to land.Mar 22 2022, 10:31 AM
iains updated this revision to Diff 418082.Mar 24 2022, 4:42 PM

rebased

This revision was automatically updated to reflect the committed changes.