Page MenuHomePhabricator

Make parseBitcodeFile use a named StructType, if it exists and matches.
Needs ReviewPublic

Authored by arnt on Apr 12 2019, 7:23 AM.



Until now, parseBitcodeFile() would use an already-existing StructType if
it EITHER had the right structure and no name, OR the right name and no
structure, but not if it had both the right name and the right structure.
This change makes parseBitcodeFile() check the struct and use it if

It still creates a new StructType with a new name if the existing type and
what it needs to read differs in structure.

It also add a new function to retrieve a named StructType by name. The new
function is a lookup-only function; it doesn't change anything at all.

Diff Detail

Event Timeline

arnt created this revision.Apr 12 2019, 7:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2019, 7:23 AM
arnt added a comment.Apr 12 2019, 7:26 AM

I added lebedev.ri and the last two people to touch the file as reviewers, I hope that's not too much of an imposition.

dnsampaio resigned from this revision.Apr 12 2019, 8:19 AM

Sorry, I don't believe I ever touched these files before.

lebedev.ri added inline comments.Apr 12 2019, 8:28 AM

Pass by non-const value




I'm not sure this is clang-formatted

  return nullptr;



sporadic newline

arnt updated this revision to Diff 194907.Apr 12 2019, 9:37 AM
arnt marked an inline comment as done.

more LLVM-y style, and better style too. Modern.

arnt marked an inline comment as done and an inline comment as not done.Apr 12 2019, 9:42 AM
arnt added a subscriber: dnsampaio.

@dnsampaio Sorry about that. I looked at my shell history now, and I think I added you because I got a wildcard wrong and picked the most recent two committers for the wrong set of files.


Oh, that's neat. I hadn't noticed that being done elsewhere, but I like it better. I'll push as soon as the unit tests have run.

(not all inline remarks were addressed)


This probably deserves a /// doxygen comment


here and elsewhere - not from clang-format
(you can just setup a git pre-commit hook so all your commits will have correct formatting)

arnt marked 4 inline comments as done.Apr 15 2019, 7:13 AM

I didn't mean to suggest that I'd done all of them, I just ran out of working time on Friday. I attended to the rest now and will push a new revision as soon as the tests have run.



There could be a lot more of them in general. Doxygen sucks.


I added the emacs integration. I'll try the git commit hook before my next pull request, thanks for the suggestion.

arnt updated this revision to Diff 195180.Apr 15 2019, 7:39 AM
arnt marked 2 inline comments as done.

Attended to the rest of Lebedev.ri's comments, plus clang-format

For reference, there is a trivial way to correctly format all the changed code:



arnt marked an inline comment as done.Apr 21 2019, 12:46 PM


FWIW I noticed that the author of the git-commit hook suggested using an emacs-lisp clang-format wrapper, and decided to try that first, until I grow unhappy.


Yes, OK, will fix, although YAGNI and so on.

arnt updated this revision to Diff 196206.Apr 23 2019, 3:53 AM

This revision uses more appropriate gtest matchers.

arnt added a comment.May 2 2019, 4:40 AM

This patch overlooks a case and should not be merged for now.

Specifically, forward references in the bitcode file cause BitcodeReader::createIdentifiedStructType(LLVMContext &) to create a type, even though the intended type might already exist in the LLVMContext. I'm not sure how to create a test that exercises this clearly and cleanly, or whether this can really be solved. Will have a coffee and a think.

JFYI, my eventual purpose is to do things like reading a class's superclass's Module into the same LLVMContext as the class's own Module.

arnt updated this revision to Diff 198472.May 7 2019, 7:55 AM

This modifies the patch to handle forward type references, and adds
relevant unit testing.

Unfortunately this is a rather large change.

arnt updated this revision to Diff 198473.May 7 2019, 8:01 AM

Whitespace changes only.

It appears that the emacs clang-format stuff is too demanding for me; I'll
try to the git precommit hook instead and see if that agrees better with

lebedev.ri resigned from this revision.Jun 21 2019, 10:57 AM
lebedev.ri added a reviewer: t.p.northover.

This patch seems to make sense but as it is all but guaranteed for LLVM patches, it is stalling and is starting to bitrot.
If you're still interested in it i'd maybe recommend posting to llvm-dev, or llvm-weekly review-wanted section

arnt updated this revision to Diff 206864.Jun 27 2019, 7:34 AM

Rebased on top of today's LLVM.

No functional changes, I just had to fix a couple of merge conflicts.

I'll continue rebasing this (I really need it myself) and I'd be happy to
see it reviewed and merged. But if noone else cares about the
functionality, then I don't think I ought to pester anyone for reviews...
so I won't, at least not in the near future.

I think the clang-format should be restricted to the actual diff. Doing the whole file makes this change really hard to read, and disrupts the blame for the future. git-clang-format in the Clang repo does the right thing by default I think.

arnt added a comment.Jun 27 2019, 8:05 AM

Sorry about that; I'll revisit. But I won't have time to do this today. FWIW the patch looked right (ie. only my lines were touched by clang-format) immediately before I ran arc diff.


arnt updated this revision to Diff 206922.Jun 27 2019, 1:39 PM

This should avoid clang-formatting the entire files. Should.

Awfully sorry about that.

t.p.northover added inline comments.Jul 2 2019, 2:32 AM

Really? I thought loading multiple bitcode files into the same LLVMContext was fine and conflicting types were automatically renamed.

Am I misreading or will this turn that situation into an error?

Sorry for the double reply, Phab ate my comment.


This looks like it owns the SmallVector so it would be a lot simpler to make it a simple value type and std::move when emplacing into the std::vector. Reduce the size of the diff substantially too, I think.

arnt marked 2 inline comments as done.Jul 24 2019, 4:39 AM

I have to type this in order to publish the comments above?


Sure, I can do that... if this is going to be merged at all. But I have a feeling that this isn't going to be merged anywway, for substantive reasons. It rocks an important boat and only I have a reason for wanting it. So I'll wait with this.


The test for "conflicting type" was not terribly sensible. It assumed that if all modules contained opaque references to the same opaque type, then the the same name meant the same type. However, if one module had a defined type and the others had opaque types, then it was a naming conflict, which was silently resolved by renaming.

The case that blew up for me involved loading the superclasses for a class into the same context. Suppose that C inherits B, which inherits A, and each has its own type and module. A's module contains one or more A-related struct types, B's and C's module contain opaque references. B's module contains, etc. Loading B and C into the same context would break, because C's references to B would be renamed, while those to A would be preserved. I struggled to find a rationale for this.

I chose to make it an error because there are cases where renaming isn't safe, and I can't see a way to detect it. Suppose modules D and E both contain defined struct types T, and module F contains an opaque T. What is intended? The existing code would behave differently depending on load order. If you load only D and E, renaming is safe. But when loading the second of D/E, the reader doesn't know whether F will appear.

t.p.northover added inline comments.Aug 2 2019, 5:53 AM

Loading B and C into the same context would break, because C's references to B would be renamed, while those to A would be preserved. I struggled to find a rationale for this.

It sounds like it'd result in weird names and inconsistent usage of each type, but not fundamentally change the semantics of the IR. I believe this is how LTO actually works.

Suppose modules D and E both contain defined struct types T, and module F contains an opaque T. What is intended?

D and E would continue to use a type structurally equivalent to their version of T, but one of them would have to give up the name (and get something like %T.1).

I don't believe there are any constraints on what happens to F: it could use yet another (still) opaque %T.2 or either of the versions from D/E. Because the type was opaque it can't be doing any operations that actually care.