This is an archive of the discontinued LLVM Phabricator instance.

[XCOFF][AIX] Handle TOC entries that could not be reached by positive range in small code model
ClosedPublic

Authored by jasonliu on Aug 31 2020, 8:36 AM.

Details

Summary

In small code model, AIX assembler could not deal with labels that could not be reached within the [-0x8000, 0x8000) range from TOC base.
So when generating the assembly, we would need to help the assembler by subtracting an offset from the label to keep the actual value within [-0x8000, 0x8000).

Diff Detail

Event Timeline

jasonliu created this revision.Aug 31 2020, 8:36 AM
jasonliu requested review of this revision.Aug 31 2020, 8:36 AM
jasonliu edited the summary of this revision. (Show Details)Aug 31 2020, 8:44 AM
jasonliu edited the summary of this revision. (Show Details)
Xiangling_L added inline comments.Sep 2 2020, 2:08 PM
llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
603

Can we add an assertion here to make sure EntryDistanceFromTOCBase - PositiveTOCRange do not go beyond -0x8000?

llvm/test/CodeGen/PowerPC/aix-overflow-toc.test
39 ↗(On Diff #288953)

Since toc entry is 8 bytes under 64bit mode, does that mean we can only have no more than 8192 entries?

jasonliu added inline comments.Sep 2 2020, 2:30 PM
llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
603

EntryDistanceFromTOCBase - PositiveTOCRange is always going to be a positive number. So not sure why we would need an assertion for it to be >= 0x8000?

llvm/test/CodeGen/PowerPC/aix-overflow-toc.test
39 ↗(On Diff #288953)

The compiler and assembler could still generate more than 8192 entries. But if you need to link it, you would need -bbigtoc for the link to be successful. Basically, if there are more than 8192 entries, we would need to rely on linker to generate branch out code to make everything work.

jasonliu added inline comments.Sep 2 2020, 2:38 PM
llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
603

Sorry, I mean >= -0x8000

Xiangling_L added inline comments.Sep 3 2020, 8:22 AM
llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
588

The assertion here seems a bit unnecessary, since lookUpOrCreateTOCEntry will always return us a TOCEntry.

593

s/need/needs

597

s/enties/entries;
s/could represented/could be represented or you can remove "could".

603

Sorry for the confusion, I meant Expr -TOCRange * Multiply. And I think your reply below has clarified that there is no such assertion needed.

llvm/test/CodeGen/PowerPC/aix-overflow-toc.test
33 ↗(On Diff #288953)

nit: trailing whitespace.

jasonliu updated this revision to Diff 289743.Sep 3 2020, 9:11 AM
jasonliu marked 4 inline comments as done.

Address comments.

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
588

I put the assertion here as a defensive mechanism in case the code get move around in the future and someone calls this function before lookUpOrCreateTOCEntry.

llvm/test/CodeGen/PowerPC/aix-overflow-toc.test
1 ↗(On Diff #288953)

Noting that most tests that use the test file itself as a Python script are named with a py file extension (e.g., llvm/test/MC/COFF/bigobj.py).

jasonliu added inline comments.Sep 3 2020, 10:59 AM
llvm/test/CodeGen/PowerPC/aix-overflow-toc.test
1 ↗(On Diff #288953)

Yeah... I noticed that.
But for some reason, when I initially use .py as file extension, it results in lit not running that test when running a folder of tests (It's fine if I only run that test) . Could be that I need to tweak some lit config file to make it not skip over? But I didn't get to the bottom of it, and used '.test' as workaround.

llvm/test/CodeGen/PowerPC/aix-overflow-toc.test
1 ↗(On Diff #288953)

Something like llvm/test/CodeGen/SystemZ/Large/lit.local.cfg would could be placed into llvm/test/CodeGen/PowerPC/. Indeed, llvm/test/MC/COFF/bigobj.py doesn't seem to run.

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
590

Is this an int or is this more a ptrdiff_t?

592

This could be constexpr?

595

Since the TOC entries are multiples of 4 or 8, both the < and <= results are the same; however, the <= version matches the semantics better.

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
598–607

I'd like to see a "cleanly derived" version of this for general offsets from the TOC-base virtual address. Also, this part should be pulled into another lambda (taking an input expression and associated offset) that generally handles adjustment of TOC relocation expressions.

For example, the result should be 0 when using an offset value from 0 to 32768 (inclusive) and an offset of 98303 should produce 1 instead of 2.

Part of my rationale is that it is a "code smell" that the current calculation does not work in general. Another factor is that TOC-related accesses at non-zero offsets from the beginning of the entry are meaningful. The TD storage mapping class is used to store data within the TOC (as opposed to storing a pointer to the data in the TOC). We should be able to reuse the calculation for the general-offset case when we need it.

It's probably easier to start with the comment:
We're not just keeping the offset to be within some range. We are causing the modified notional offset from the TOC base (to be encoded into the instruction's D or DS field) to be the signed 16-bit truncation of the original notional offset from the TOC base. This is consistent with the treatment used both by XL C/C++ and by AIX ld -r.

Solving for the adjustment value to subtract:
(int16_t)original_offset = original_offset - adjustment
(int16_t)original_offset + adjustment = original_offset
adjustment = original_offset - (int16_t)original_offset

But we're dealing with C++14 code, so (with the hopes that SignExtend32 is sufficiently well written):
adjustment = original_offset - llvm::SignExtend32<16>(original_offset)

jasonliu updated this revision to Diff 291029.Sep 10 2020, 10:49 AM
jasonliu marked 4 inline comments as done.

Address review comments.

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
598–607

Thanks. Please let me know if my new revision does not address your concern.

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
585–589

Adjust the comment to fit the context.

595

How would this relate to XCOFFObjectWriter::recordRelocation?

I notice that there's a line there:

FixedValue = SectionMap[SymASec]->Address - TOCCsects.front().Address;

Does the new expression formed here mean that we should account for it through Target.getConstant()?

609–614

Since the adjustment is now factored out, the comment doesn't quite fit. Indeed, the code would read fine without a comment if the condition was reversed to make it obvious when special handling is needed.

llvm/test/CodeGen/PowerPC/aix-overflow-toc.py
4

Indentation does not match with line 5.

jasonliu added inline comments.Sep 10 2020, 1:05 PM
llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
595

From the testing I have done, there seems to be no particular action need to be taken there (except removing that report_fatal_error in there), as the value we put into the instruction could already wrap around naturally when it goes out of the range.

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
595

I think we should add the Target.getConstant() because it could show up for using the TD storage mapping class or maybe for integrated assembly. The report_fatal_error enforces a constraint that the no wraparound is necessary at that stage. As in, we may want the object-writing path to require input that would also work on the assembly-writing path.

jasonliu updated this revision to Diff 291095.Sep 10 2020, 3:43 PM
jasonliu marked 2 inline comments as done.

Address comments.

jasonliu marked an inline comment as done.Sep 10 2020, 3:43 PM

LGTM with minor comments.

llvm/lib/MC/XCOFFObjectWriter.cpp
433

Update the comment to match the code change.

llvm/test/CodeGen/PowerPC/aix-overflow-toc.py
64

Minor nit: Indentation.

This revision is now accepted and ready to land.Sep 10 2020, 4:30 PM
fhahn added a subscriber: fhahn.Sep 15 2020, 1:11 AM

It appears this patch added a test that is failing with expensive checks on GreenDragon: http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-expensive/17604/console. It would be great if you could take a look and revert if it takes longer to fix.

It appears this patch added a test that is failing with expensive checks on GreenDragon: http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-expensive/17604/console. It would be great if you could take a look and revert if it takes longer to fix.

It says:

Timeout: Reached timeout of 600 seconds

We believe that the test is necessarily large. Has the associated bot encountered similar issues before? What was the recommended fix to retain the test in more appropriate environments?

We believe that the test is necessarily large. Has the associated bot encountered similar issues before? What was the recommended fix to retain the test in more appropriate environments?

Perhaps the test case should be restricted to some key platforms where cross-compilation to AIX is relevant? Perhaps Linux (PPC is already required).

jasonliu added a comment.EditedSep 15 2020, 7:16 AM

We believe that the test is necessarily large. Has the associated bot encountered similar issues before? What was the recommended fix to retain the test in more appropriate environments?

Perhaps the test case should be restricted to some key platforms where cross-compilation to AIX is relevant? Perhaps Linux (PPC is already required).

I could add
# REQUIRES: aix || linux
to the test if that helps.

fhahn added a comment.Sep 16 2020, 7:30 AM

We believe that the test is necessarily large. Has the associated bot encountered similar issues before? What was the recommended fix to retain the test in more appropriate environments?

Perhaps the test case should be restricted to some key platforms where cross-compilation to AIX is relevant? Perhaps Linux (PPC is already required).

I could add
# REQUIRES: aix || linux
to the test if that helps.

The problem here is with expensive checks, which makes it take too long I think. It works fine on Darwin bots without expensive checks. If it is not possible to reduce the size of the test, then we need to disable it at least for the configuration used for http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-expensive. I am not sure if there's another way to force a big gap in PPC assembly.