This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Adjust strd/ldrd codegen alignment requirements
ClosedPublic

Authored by momo5502 on Jun 26 2023, 11:59 AM.

Details

Summary

In change https://reviews.llvm.org/D152790, it was discovered that the alignment requirement calculation for LDRD/STRD codegen was suboptimal and the calculation for volatile loads and stores was adjusted.

This change here adopts the calculation for the remaining non-volatile occurances.

Diff Detail

Event Timeline

momo5502 created this revision.Jun 26 2023, 11:59 AM
momo5502 requested review of this revision.Jun 26 2023, 11:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 26 2023, 11:59 AM
efriedma accepted this revision.Jun 26 2023, 12:28 PM
efriedma added a subscriber: efriedma.

LGTM

I suspect we're actually missing a check somewhere for LDRD formation post-RA. But this looks like an improvement.

(I'll commit for you.)

This revision is now accepted and ready to land.Jun 26 2023, 12:28 PM

LGTM

I suspect we're actually missing a check somewhere for LDRD formation post-RA. But this looks like an improvement.

(I'll commit for you.)

Thank you very much. Maybe I have time to check for other occurances tomorrow.

This revision was landed with ongoing or failed builds.Jul 2 2023, 2:25 PM
This revision was automatically updated to reflect the committed changes.

I've reverted this due to a failure on all of Linaro's 2 stage 32 bit Arm bots, both v7 and v8 (though both run on v8 hardware).

For example:
https://lab.llvm.org/buildbot/#/builders/178/builds/5128
https://lab.llvm.org/buildbot/#/builders/187/builds/11994
https://lab.llvm.org/buildbot/#/builders/182/builds/6724

            7: error: Simplified template DW_AT_name could not be reconstituted: 
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            8:  original: f3<unsigned char, (unsigned char)'\x00'> 
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            9:  reconstituted: f3<unsigned char, (unsigned char)'\x7f'>

I haven't dug deeper yet but if that is one byte misplaced, it could make some sense.

Do you have access to a machine where you can reproduce this? A 32 bit container running on 64 bit hardware will also work.

If you do not, I can do the investigation.

I've reverted this due to a failure on all of Linaro's 2 stage 32 bit Arm bots, both v7 and v8 (though both run on v8 hardware).

For example:
https://lab.llvm.org/buildbot/#/builders/178/builds/5128
https://lab.llvm.org/buildbot/#/builders/187/builds/11994
https://lab.llvm.org/buildbot/#/builders/182/builds/6724

            7: error: Simplified template DW_AT_name could not be reconstituted: 
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            8:  original: f3<unsigned char, (unsigned char)'\x00'> 
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            9:  reconstituted: f3<unsigned char, (unsigned char)'\x7f'>

I haven't dug deeper yet but if that is one byte misplaced, it could make some sense.

Do you have access to a machine where you can reproduce this? A 32 bit container running on 64 bit hardware will also work.

If you do not, I can do the investigation.

I think I do have access to a machine where I can reproduce this, however I am currently very busy and won't have time to investigate probably until end of this week.
If you have time to look at it, I would be very thankful. Otherwise I can investigate this as soon as I have time.

Good to know, I'll look into it this week then. Let you know later what the status is.

So far I have narrowed it down to something in DWARFDebugFrame.cpp.o in libLLVMDebugInfoDWARF.a. You can confirm this by using ar x to get the .o from a working build, then patching it into the broken build with ar r (interestingly, this object seems to have fewer stdrd/ldrd after this change).

From there I am not sure how to narrow it down. If I could set an strd/ldrd alignment per function that would help but as far as I can see strict alignment applies per object. One could split the file into two and gradually move functions from one to the other I suppose.

And I should say that I don't think the alignment here is wrong. If it were we would have an exception not a test failure. More likely that we are now emitting ldrd/strd in a new situation and there is a bug elsewhere in handling that situation.

One could split the file into two and gradually move functions from one to the other I suppose.

opt-bisect-limit?

@DavidSpickett do you have instructions for me on how to setup everything to reproduce the problem.
So I can support you with the investigations.

I recommend following what the armv8-a builder does: https://lab.llvm.org/buildbot/#/builders/178/builds/5155. If you click on the "cmake stage..." steps it will show you the commands used. You can cut out some of the enabled projects and/or targets if you want to save some time.

We are running on a 64 bit Arm machine with a 32 bit docker container.

In case the hardware is a factor (I hope not) it is a Ampere Mt. Jade with 160x Neoverse-N1 (this information can also be found in the "worker" tab of the buildbot web UI).

$ arch
armv8l
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.6 LTS
Release:        20.04
Codename:       focal

If you really want the exact container you can use https://git.linaro.org/ci/dockerfiles.git/tree/focal-armhf-tcwg-base/focal-armhf-tcwg-llvmbot/Dockerfile but if you do, change the entrypoint. As by default it's looking to launch a buildbot agent.

You're probably fine using the official Ubuntu Focal container and installing things as needed.

At the time of the build the host compiler was clang 15.0.0, using the release https://github.com/llvm/llvm-project/releases/download/llvmorg-15.0.0/clang+llvm-15.0.0-armv7a-linux-gnueabihf.tar.xz. For unrelated reasons, I'm about to update the bots to use 16.0.4, but given that this is a stage 2 failure it shouldn't make a difference.

It was useful for me to build 2 copies, one with this change and one without. That was how I was able to narrow it to llvm-dwarfdump because using the dwarfdump from the working copy passed the test.

To run only the failing test:

./bin/llvm-lit ../llvm-project/llvm/test/tools/llvm-dwarfdump/X86/simplified-template-names.s -a

I managed to reproduce this and isolate the function responsible for the triggering the bug. It's llvm::format_object<long long>::snprint(char*, unsigned int) const. I haven't dug deeper yet, but that's at least some progress :D

Great. @luporl is on bots for us this week, so they can help you if you need more info.

momo5502 added a comment.EditedJul 10 2023, 10:41 AM

Here is the IR code of that specific function. The first load has an align of 4, which used to be excluded by the load optimization before this commit.

; Function Attrs: mustprogress nounwind uwtable(sync)
define linkonce_odr hidden noundef i32 @_ZNK4llvm13format_objectIJxEE7snprintEPcj(ptr noundef nonnull align 8 dereferenceable(16) %this, ptr noundef %Buffer, i32 noundef %BufferSize) unnamed_addr #6 comdat align 2 {
entry:
  %Fmt.i = getelementptr inbounds %"class.llvm::format_object_base", ptr %this, i32 0, i32 1
  %0 = load ptr, ptr %Fmt.i, align 4, !tbaa !63
  %Vals.i = getelementptr inbounds %"class.llvm::format_object.75", ptr %this, i32 0, i32 1
  %1 = load i64, ptr %Vals.i, align 8, !tbaa !44
  %call2.i = tail call i32 (ptr, i32, ptr, ...) @snprintf(ptr noundef %Buffer, i32 noundef %BufferSize, ptr noundef %0, i64 noundef %1) #26
  ret i32 %call2.i
}

The diff of lowering that load with the optimization (on the left) and without the optimization from this commit (on the right) is attached.

That code diff doesn't look like it should have any effect. My best guess is that there's an uninitialized variable somewhere else, and the push/pop makes it have a different value.

I think I figured it out:

There are 4 arguments to snprintf here: Buffer, BufferSize, Format, Value. Value being the long long to be formatted as unsigned char.
It seems that the calling convention snprintf expects is all values in registers, so r0, r1, r2, r3. Meaning it expects the Value to be in r3.

However, the code that is generated in this function prepares the arguments like this: r0, r1, stack, stack, so the last two arguments are on the stack. I assume this is what we generate if the function is a vararg.
That seems to be an ABI incompatibility. I'm not sure what the correct calling convention is, but it seems like we generate what is mentioned here: https://github.com/NationalSecurityAgency/ghidra/issues/3927

A bit of context as to why it used to work:
The value to be formatted is passed to the function is the value of [r0+#12]. It happens to be 0.
The correct code should move that into r3 for the call to snprintf, however, it is instead pushed onto the stack. By chance, r3 was overwritten with a different 0 value in the old case, so it was still 'correct'. With my change active, it is now being overwriten by the third argument passed to our function, r2, which happens to be 0x7F.

I'm not an ARM expert and I'm again not sure what the right calling convention for varargs is, but that is for sure the cause of this issue.

It seems that the calling convention snprintf expects is all values in registers, so r0, r1, r2, r3. Meaning it expects the Value to be in r3.

A "long long" can't fit into r3.

The normal calling convention rules (https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst#parameter-passing) require skipping r3 and storing the "long long" value on the stack. So snprintf should ignore the value in r3 unless we're passing in a bad format string. (Which is possible, I guess... it looks like the "llvm::format" function isn't marked with a format attribute, so the format string isn't checked anywhere.)

It seems that the calling convention snprintf expects is all values in registers, so r0, r1, r2, r3. Meaning it expects the Value to be in r3.

A "long long" can't fit into r3.

The normal calling convention rules (https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst#parameter-passing) require skipping r3 and storing the "long long" value on the stack. So snprintf should ignore the value in r3 unless we're passing in a bad format string. (Which is possible, I guess... it looks like the "llvm::format" function isn't marked with a format attribute, so the format string isn't checked anywhere.)

Oh you are absolutely right, I overlooked that :D

The first 3 arguments are passed in r0, r1, r2 and only the value is passed on the stack as two 4 byte blocks.
snprintf seems to expect them in r3 and r4 then.

As the format string truncates the value, only the lower byte is relevant, which is probably the reason I overlooked it.

The way you describe it sounds like a bug in snprintf then, doesn't it?
It seems that gcc also puts them on the stack: https://godbolt.org/z/h1oY1o3vc

I used the Dockerfile @DavidSpickett linked. Does it contain a custom variant of the C-runtime? So could it be an issue only with that specific version of snprintf?

snprintf(buf, bufsize, "%d", 1LL) has undefined behavior.

momo5502 added a comment.EditedJul 11 2023, 12:20 PM

snprintf(buf, bufsize, "%d", 1LL) has undefined behavior.

Jup. that's it.

All these formats should be something along the lines of %llx not %x: https://github.com/llvm/llvm-project/blob/89fab98e884f05076bbd420d95b5de3596f5452c/llvm/lib/DebugInfo/DWARF/DWARFTypePrinter.cpp#L427
Changing that fixes the issue.

Thanks for your help :D I will check if there are more occurances and prepare a proper fix tomorrow.

I thought about static_cast'ing Val instead of changing the format string. In case the underlying type of Val changes in the future, one would otherwise have to adapt the format string as well.
Or is there a way to emit a compiler warning if the format string mismatches? You mentioned marking llvm::format with a format attribute, @eli.friedman? Does that trigger a warning?

llvm::format currently doesn't have a format attribute, but we might be able to add it; see D112579.

Or you can switch the code to llvm::formatv, which doesn't have this kind of issue.