This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen]: don't treat structures returned in registers as memory inputs
ClosedPublic

Authored by glider on Jul 24 2019, 11:17 AM.

Details

Summary

The "=r" output constraint for a structure variable passed to inline asm
shouldn't be converted to "=*r", as this changes the asm directive
semantics and prevents DSE optimizations.
Instead, preserve the constraints and return such structures as integers
of corresponding size, which are converted back to structures when
storing the result.

Fixes PR42672.

Diff Detail

Repository
rL LLVM

Event Timeline

glider created this revision.Jul 24 2019, 11:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 24 2019, 11:17 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
glider added a subscriber: vitalybuka.
efriedma added inline comments.
clang/lib/CodeGen/CGStmt.cpp
2287 ↗(On Diff #211568)

Not "=="?

2325 ↗(On Diff #211568)

Will this work if the struct is an unusual size, like sizeof(struct s) == 3 or sizeof(struct s) == 32? (3 is unlikely to show up in real code, but 32 could correspond to a vector register.)

clang/test/CodeGen/PR42672.c
2 ↗(On Diff #211568)

We usually try to avoid tests that run "-O2"; it's simpler to just check the unoptimized clang output directly.

nickdesaulniers added inline comments.
clang/lib/CodeGen/CGStmt.cpp
1989 ↗(On Diff #211568)

Are we able to use something other than std::vector<bool> here from ADT?
http://llvm.org/docs/ProgrammersManual.html#bit-storage-containers-bitvector-sparsebitvector

glider updated this revision to Diff 211919.Jul 26 2019, 5:35 AM
glider marked 2 inline comments as done.

Fixed comments from Eli and Nick, added tests for unusual struct sizes

glider marked 2 inline comments as done.Jul 26 2019, 5:37 AM
glider added inline comments.
clang/lib/CodeGen/CGStmt.cpp
1989 ↗(On Diff #211568)

I don't think std::vector<bool> is a bottleneck here, but since it might be deprecated someday let's just not use it.

2287 ↗(On Diff #211568)

Turns out ResultRegDests can be also extended by addReturnRegisterOutputs above (line 2122), so no.
I've added a comment to clarify that.

2325 ↗(On Diff #211568)

For a 3-byte struct this works as follows: a 3-byte structure is allocated on the stack with alignment 2, then the assembly returns a 4-byte integer that's written to that 3-byte structure.
This conforms to what GCC does, and I think it's legal to let the assembly write more than a structure size here, as the corresponding register size is bigger (i.e. it's a bug in the C code, not Clang).

For a 32-bit struct we crash now, whereas GCC reports an "impossible constraint in ‘asm’"
It's interesting that the crash happens in the frontend, i.e. it's somehow knows the maximum possible size for the register operand.
We can check that getContext().getIntTypeForBitwidth(Size, /*Signed*/ false) is a nonnull type to prevent the crashes.

clang/test/CodeGen/PR42672.c
50 ↗(On Diff #211919)

The acceptable size actually depends on the target platform. Not sure we can test for all of them, and we'll probably need to restrict this test to e.g. x86

glider updated this revision to Diff 211920.Jul 26 2019, 5:42 AM

Make big_struct() test triple-specific

efriedma added inline comments.Jul 26 2019, 11:47 AM
clang/test/CodeGen/PR42672.c
3 ↗(On Diff #211920)

Might as well just make all these tests use an x86 triple; I don't really want to speculate how an unknown target will react to any specific size.

40 ↗(On Diff #211920)

This isn't a three-byte struct; it's a four-byte struct where one of the bytes is only used for padding.

50 ↗(On Diff #211919)

The interesting case for a 32-byte struct would actually be something like "=x"(str)... which currently passes the clang frontend, since 32 is a legal size for that constraint (although it eventually fails in the backend).

glider marked 3 inline comments as done.Jul 29 2019, 7:07 AM
glider added inline comments.
clang/test/CodeGen/PR42672.c
40 ↗(On Diff #211920)

You're right. Making this structure packed yields an error, which conforms to GCC behavior.
I've added a test for that as well.

50 ↗(On Diff #211919)

Changing "=r" to "=X" indeed makes this particular test pass (there's nothing to fail in the backend, as we don't actually generate instructions that write to memory)
I'm however unsure adding a test for "=X" makes any difference, as our patch is irrelevant to this constraint.

glider updated this revision to Diff 212164.Jul 29 2019, 7:09 AM

Addressed Eli's comments, added a test for a packed struct

glider updated this revision to Diff 212293.Jul 30 2019, 1:38 AM

Test for =X somehow sneaked in - drop it.

+srhines
Heads up: this patch reduces the size of the following Android kernel functions:

_raw_spin_lock and _raw_spin_lock_bh: 132 bytes->68 bytes
_raw_spin_lock_irq: 136 bytes->72 bytes
_raw_spin_lock_irqsave: 144 bytes->80 bytes
_raw_spin_trylock: 156 bytes->112 bytes
_raw_spin_trylock_bh: 136 bytes->92 bytes
glider added a comment.Aug 5 2019, 4:02 AM

Eli, any other comments?

For the "=x" thing I was talking about, try the following testcase:

void a() { struct S { unsigned x[4]; } z; asm volatile("%0":"=x"(z)); }
void a2() { struct S { unsigned x[8]; } z; asm volatile("%0":"=x"(z)); }

clang trunk gives "error: couldn't allocate output register for constraint 'x'" in the backend for both functions. gcc prints "%xmm0" for the first, and rejects the second; not exactly sure why it's rejecting the second, though. It would be nice if both worked, although I guess it's okay if we print a reasonable error message. Please add a testcase, at least.

clang/test/CodeGen/x86_64-PR42672.c
3 ↗(On Diff #212293)

You can prefix a command in a RUN line with "not" if you expect it to return a non-zero error code. So you can write not %clang_cc1 [...] 2>&1 | FileCheck %s or something like that.

glider marked an inline comment as done.Aug 7 2019, 8:49 AM

For the "=x" thing I was talking about, try the following testcase:

void a() { struct S { unsigned x[4]; } z; asm volatile("%0":"=x"(z)); }
void a2() { struct S { unsigned x[8]; } z; asm volatile("%0":"=x"(z)); }

clang trunk gives "error: couldn't allocate output register for constraint 'x'" in the backend for both functions. gcc prints "%xmm0" for the first, and rejects the second; not exactly sure why it's rejecting the second, though. It would be nice if both worked, although I guess it's okay if we print a reasonable error message. Please add a testcase, at least.

Note that this is invalid assembly, as it emits a register name instead of an assembly instruction. I don't think it should work at all.
If we replace "%0" with a "nop", Clang will reject the second test, but will accept the first one.

glider updated this revision to Diff 213920.Aug 7 2019, 8:50 AM

Added test cases for =x, replaced grep with not

This revision is now accepted and ready to land.Aug 12 2019, 2:35 PM
glider updated this revision to Diff 217831.Aug 29 2019, 4:11 AM

Rebased the patch

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2019, 4:20 AM

Landed r370335, thank you!