This is an archive of the discontinued LLVM Phabricator instance.

[clang] Use i64 for the !srcloc metadata on asm IR nodes.
ClosedPublic

Authored by simon_tatham on Jul 6 2021, 9:26 AM.

Details

Summary

This is part of a patch series working towards the ability to make
SourceLocation into a 64-bit type to handle larger translation units.

!srcloc is generated in clang codegen, and pulled back out by llvm
functions like AsmPrinter::emitInlineAsm that need to report errors in
the inline asm. From there it goes to LLVMContext::emitError, is
stored in DiagnosticInfoInlineAsm, and ends up back in clang, at
BackendConsumer::InlineAsmDiagHandler(), which reconstitutes a true
clang::SourceLocation from the integer cookie.

Throughout this code path, it's now 64-bit rather than 32, which means
that if SourceLocation is expanded to a 64-bit type, this error report
won't lose half of the data.

The compiler will tolerate both of i32 and i64 !srcloc metadata in
input IR without faulting. Test added in llvm/MC. (The semantic
accuracy of the metadata is another matter, but I don't know of any
situation where that matters: if you're reading an IR file written by
a previous run of clang, you don't have the SourceManager that can
relate those source locations back to the original source files.)

Original version of the patch by Mikhail Maltsev.

Diff Detail

Event Timeline

simon_tatham created this revision.Jul 6 2021, 9:26 AM
simon_tatham requested review of this revision.Jul 6 2021, 9:26 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 6 2021, 9:26 AM
miyuki added a subscriber: miyuki.Jul 6 2021, 10:11 AM

Looks sensible to me, I don't think slightly expanding the size of the metadata and the diagnostic will be an issue.

Does this need an upgrade for old bitcode that has serialized 32-bit types? If not, can you explain why not?

@dexonsmith I don't think they need upgraded. Most of the places I can see referencing !srcloc are copying it around and will preserve the i32 type. Cases which actually read the value are already reading it as 64 bit via getZExtValue and have been updated here, e.g. MachineInstr::emitError

@dexonsmith I don't think they need upgraded. Most of the places I can see referencing !srcloc are copying it around and will preserve the i32 type. Cases which actually read the value are already reading it as 64 bit via getZExtValue and have been updated here, e.g. MachineInstr::emitError

Seems reasonable to me. I'd suggest documenting in the textual IR testcases for !srcloc (I assume there are existing ones for consumers, such as AsmPrinter::emitInlineAsm or something?) that i32 should continue to work, even though emission will be i64 now, and that if that changes a bitcode upgrade should be added. And maybe you can/should add examples to the test that use i64.

I looked into this yesterday, and realised that I don't actually know what the use case is for emitting !srcloc metadata in an IR file.

It's more or less ignored by llc, as far as I can see: if there's a late-breaking error in the inline asm string, you just get a "note: !srcloc = <nnn>" alongside the main error.

And if the IR file is read back in by a second invocation of clang, then the frontend callback does get the !srcloc back from the IR, but it can't relate that to the original source code, because the new clang's SourceManager doesn't know anything about the C source files that the IR was made from. In the example test I just did by hand, the error message cites a location in the IR file that corresponds to the byte position of the error in the original C, which is more or less useless.

So I think this mechanism is only actually useful when the whole compilation is happening within a single invocation of clang, so that the !srcloc finds its way back to a SourceManager that still has all the actual source code in mind. As soon as IR files are exported and re-imported, the best we can hope for is "no crash".

But I can add a couple of tests along those lines for i32 and i64 versions of !srcloc, if that's helpful.

simon_tatham edited the summary of this revision. (Show Details)

Added an i64 !srcloc to the only existing test of them I could find.

dexonsmith accepted this revision.Jul 20 2021, 2:53 PM

LGTM (one comment in the test), although it'd be good to get someone more involved in lib/CodeGen to take a quick look / sign off (ideally someone that knows the use case for !srcloc...).

llvm/test/MC/ARM/inline-asm-srcloc.ll
19–25

Please add a comment somewhere saying something along the lines of "This tests that both i32 and i64 work, as a proxy for testing that there won't be a crash if old bitcode contains the wrong type" (or anything that will prevent someone from updating the i32s to i64s as a future cleanup, losing us the test coverage)

This revision is now accepted and ready to land.Jul 20 2021, 2:53 PM

although it'd be good to get someone more involved in lib/CodeGen to take a quick look / sign off (ideally someone that knows the use case for !srcloc...).

Any idea who that might be? Looking through the logs, @lattner seems to have made all the commits that set up !srcloc in the first place, and nobody seems to have modified the mechanism much since then.

although it'd be good to get someone more involved in lib/CodeGen to take a quick look / sign off (ideally someone that knows the use case for !srcloc...).

Any idea who that might be? Looking through the logs, @lattner seems to have made all the commits that set up !srcloc in the first place, and nobody seems to have modified the mechanism much since then.

I wonder if this is even reachable anymore, given that the one testcase is inline asm (I believe the frontend catches inline asm errors now) -- I mean, CGStmt still attaches these, but maybe the backend never fires a diagnostic in practice because the frontend error checking is good enough to avoid emitting IR that's broken. (Could be useful for other frontends, maybe, though...)

Anyway, I'm comfortable signing off; feel free to commit without finding someone else (remember to add a comment to the testcase).

The usecase for this is to get good diagnostics out of the integrated assembler. Try doing something like: asm("bogus");, GCC would produce an error pointing to the generated .s file. Clang should point to the C file (including macros expanded etc).

> cat f.c

void x() {
  asm("bogus");
}
> clang f.c
f.c:3:7: error: invalid instruction mnemonic 'bogus'
  asm("bogus");
      ^
<inline asm>:1:2: note: instantiated into assembly here
        bogus
        ^~~~~
1 error generated.

@lattner, thanks for the help. In this case, the real question is whether there's any use case for !srcloc that involves writing it out into a bitcode or IR file and then having a separate instance of clang load it back in again.

I think that no such case can possibly give a useful mapping back to the original source, because IR doesn't serialise the SourceManager that knows how to turn source locations back into a useful error message.

This revision was landed with ongoing or failed builds.Jul 22 2021, 2:25 AM
This revision was automatically updated to reflect the committed changes.
simon_tatham marked an inline comment as done.

(But I've pushed this patch anyway, because I'm reasonably confident of the answer; if it turns out that there's some case along those lines that I should have taken care of, I'll fix or revert.)

@lattner, thanks for the help. In this case, the real question is whether there's any use case for !srcloc that involves writing it out into a bitcode or IR file and then having a separate instance of clang load it back in again.

I think that no such case can possibly give a useful mapping back to the original source, because IR doesn't serialise the SourceManager that knows how to turn source locations back into a useful error message.

Right, the only use case is with the entire compiler integrated together, for exactly the reason you point out.