This is an archive of the discontinued LLVM Phabricator instance.

[MC] Pass MCSubtargetInfo to fixupNeedsRelaxation and applyFixup
ClosedPublic

Authored by peter.smith on Mar 27 2018, 6:33 AM.

Details

Summary

On targets like Arm some relaxations may only be performed when certain architectural features are available. As functions can be compiled with differing levels of architectural support we must make a judgement on whether we can relax based on the MCSubtargetInfo for the function. This change passes through the MCSubtargetInfo for the function to fixupNeedsRelaxation so that the decision on whether to relax can be made per function. In this patch, only the ARM backend makes use of this information. We must also pass the MCSubtargetInfo to applyFixup because some fixups skip error checking on the assumption that relaxation has occurred, to prevent code-generation errors applyFixup must see the same MCSubtargetInfo as fixupNeedsRelaxation.

This addresses part of PR36542 where each function has the cpu and target features it was compiled with in its attributes, these features included support for Thumb2, whereas the default cpu passed to the lto plugin did not. As the decision on whether to relax was using the default cpu, a branch, used for a tail call by a Thumb2 function did not get relaxed to a wide branch.

I don't think that there is an alternative to making the relaxation decision based on per function information as if we use some kind of aggregate MCSubtargetInfo determined from a combination of all the Functions and the module we run into two self contradictory use-cases

  • I want to do runtime selection of a specific function, we should use the least capable target feature set for a per module relaxation decision.
  • I want to run on a specific target architecture, we should use the target features of that arch/cpu for a per module relaxation decision.

As we can't tell which of these use-cases I think we should make the decision per function.

Diff Detail

Event Timeline

peter.smith created this revision.Mar 27 2018, 6:33 AM
fhahn added a subscriber: fhahn.Mar 27 2018, 8:13 AM

Thanks Peter! I think using the function attributes here makes a lot of sense.

Currently, ARMAsmBackend still has a STI member variable, which gets initialized when constructing the backend. At most places, the current STI passed in, but at some places (e.g. adjustFixupValue, line 592) still use the member variable. Are there use-cases where the initial STI contains the correct features, but the current STI does not? I can't think of any off the top of my head and to avoid confusion over which STI is used, it might make sense to get rid of the member variable?

Thanks for pointing that out. I should check each use of the member to see if the information can be used incorrectly. I think it would be good if we can remove the MCSubtargetInfo, I have my suspicions that it might be used from Assembly and for BuildAttributes generation (on Arm), again some more checking is needed.

I think ARMAsmBackend::writeNopData and ARMAsmBackend::applyFixup also need to use a passed-in STI rather than one stored on the ARMAsmBackend.

I think ARMAsmBackend::writeNopData and ARMAsmBackend::applyFixup also need to use a passed-in STI rather than one stored on the ARMAsmBackend.

Agreed. I'm reasonably certain the cached one should just be removed.

nhaehnle removed a subscriber: nhaehnle.Apr 2 2018, 12:26 AM

Thanks for the comments and apologies for the delayed response (Easter break). The patch here is somewhat fortunate in that the MCRelaxableFragment already has a reference to the MCSubtargetInfo. The other instances ARMAsmBackend::writeNopData and ARMAsmBackend::applyFixup may be called on other Fragments that don't have the MCSubtargetInfo member such as MCAlignFragment and MCDataFragment. I'll do some investigation to see what the best options are.

I've done a bit of thinking about how best to get the necessary information to ARMAsmBackend::writeNopData and ARMAsmBackend::applyFixup. I made an attempt at passing MCSubtargetInfo through the necessary layers so that STI could be removed. I think that this is impractical for writeNopData, although possible for applyFixup. What I propose to do:

  • The writeNopData() currently checks to see if the Architectural NOP instruction is supported, based on the MCSubtargetInfo in ARMAsmBackend. I think that an EmitAssemblerFlag that is used in the same way as the flag to switch from Thumb to Arm can toggle the support for the Architectural NOP without having to do large amounts of source code changes, most of them outside the ARM backend.
  • All the fixups that test the MCSubTargetInfo for target features are for instruction fixups, these are covered by the MCRelaxableFragment type that already has STI embedded in it. Rather than adding STI to the Data Fragments with relaxations I propose to pass a pointer to STI that will be null for Data Fragments.

Rather than adding STI to the Data Fragments with relaxations I propose to pass a pointer to STI that will be null for Data Fragments.

This seems fine.

I think that an EmitAssemblerFlag that is used in the same way as the flag to switch from Thumb to Arm can toggle the support for the Architectural NOP without having to do large amounts of source code changes, most of them outside the ARM backend.

This is really hacky... the fragments in question should really have an STI instead. Plus there are actually separate features; whether NOP is supported in ARM mode (which is what hasNOP queries), and whether NOP is supported in Thumb mode (i.e. whether Thumb2 is supported). Plus other architectures have similar problems (e.g. x86 changes the generated NOPs based on the target). If you're not going to fix it properly, please just leave it as-is for now, with a FIXME.

I think that an EmitAssemblerFlag that is used in the same way as the flag to switch from Thumb to Arm can toggle the support for the Architectural NOP without having to do large amounts of source code changes, most of them outside the ARM backend.

This is really hacky... the fragments in question should really have an STI instead. Plus there are actually separate features; whether NOP is supported in ARM mode (which is what hasNOP queries), and whether NOP is supported in Thumb mode (i.e. whether Thumb2 is supported). Plus other architectures have similar problems (e.g. x86 changes the generated NOPs based on the target). If you're not going to fix it properly, please just leave it as-is for now, with a FIXME.

Ok; I'd like to fix the problem as it would allow us to get rid of the STI in the ARM/X86 AsmBackend class. I think it would be better off as a separate patch as I don't have a good answer for how to achieve this right now. The main difficulty is that Nops can be written as padding for some fragments such as MCAlignFragment created by EmitCodeAlignment(). There are places where EmitCodeAlignment() is called where some calculation is needed to work out what the STI should be, for example the Constant Island emission, or the STI is not easily available such as parseDirectiveAlign(). These calculations might also need other MCFragments to record the STI so we can walk backwards to find the last STI. I think that these are soluble problems but I'm still thinking that I'm missing something simpler. I'd be very grateful if you have any ideas or suggestions?

For the nop thing, fundamentally, I don't think there's any simpler approach. We need to annotate each MCAlignFragment created by MCObjectStreamer::EmitCodeAlignment with the correct STI, so we can figure out the correct nops later. We currently throw away that information, and there isn't any way to recover it later.

Consider something like the following; there are two alignment fragments, which each need a different STI.

.arch armv6
foo:
mov r1, r0
nop
.p2align 4
bx lr
.arch armv7-a
bar:
mov r1, r0
nop
.p2align 4
bx lr

I just tried binutils as for ARM, and it has an amusing bug: the nops used for all alignment directives in a file is based on the last ".arch" directive in that file.

peter.smith retitled this revision from [MC] Pass MCSubtargetInfo through to fixupNeedsRelaxation to [MC] Pass MCSubtargetInfo to fixupNeedsRelaxation and applyFixup.
peter.smith edited the summary of this revision. (Show Details)

I've updated the diff, description and title to include passing through MCSubtargetInfo to applyFixup() as well. I think that it is worth combining these changes into one as it is possible to introduce a code generation fault with just fixupNeedsRelaxation() receiving the STI.

I found out that it is possible for MCDataFragments to contain instructions with fixups so I've moved up the MCSubtargetInfo STI member from MCRelaxedFragment. To make sure that an STI is present when hasInstructions() is true, I've modified setHasInstructions to take an STI. This flushed out a few streamers that were not setting setHasInstructions() for MCDataFragments that contained instructions with fixups.

I note that the per fragment use of STI in the ARMAsmBackend::applyFixup() seems at best an optimisation and at worst redundant. As I understand it, each use is just skipping the out of range error message check, which we would never see for those subtargets because the instruction would have previously been relaxed. I can remove the tests that check STI->getFeatureBits() and all the tests still pass, I could be missing something though. I've added an extra test case that will silently generate incorrect code (skips an out of range immediate check) if STI is passed to fixupNeedsRelaxation() but not applyFixup().

I've not covered writeNopData() in this patch as I think that it would be best covered by a separate patch or patches.

efriedma added inline comments.Apr 9 2018, 2:48 PM
include/llvm/MC/MCAsmBackend.h
83

This comment needs to be updated?

lib/MC/MCELFStreamer.cpp
90

Is it possible for these two subtargets to be different?

Thanks for the comments. I think your second one brings up a couple of wider problems that I'll need to address:

  • getOrCreateDataFragment() needs to start a new fragment when the original contains instructions and the new one has a different Subtarget. Otherwise the later Subtarget will overwrite the first, I don't think that this makes any practical difference for the MCAsmBackend functions changed in this patch, however it will do for the WriteNops().
  • For instruction bundling in cases where the previous fragment must be used then it must either have the same Subtarget, or we permit the later Subtarget to overwrite the first.

I'm going to be away from the office for the next 7 days for a couple of conferences (including Euro LLVM) so I'll get back to this then.

I've updated the diff to account for instruction bundling. When an instruction must be put in the same fragment for the purposes of bundling we now check that the MCSubtargetInfo for the fragment is consistent. I've added some tests to check that we detect an attempt to change subtarget within a bundle.

My understanding is that instruction bundling was implemented for Arm and X86 NaCl targets https://developer.chrome.com/native-client/reference/sandbox_internals/arm-32-bit-sandbox . Given that NaCl on Arm is only supported on Arm V7 ARM only I think it is very unlikely that anyone has attempted to do this in production. On X86 I think that switching between 32-bit/64-bit mode within a bundle is also highly unlikely.

I'm not sure I completely follow the bundle-related changes; adding Petr Hosek as a reviewer.

Otherwise LGTM, but I'd feel more comfortable if this got a second review.

Thanks for the review. I'll wait a bit to see if there is a second opinion, I too would value more input on instruction bundling as I've not used it myself. I've completed a set of patches that will pass through the subtarget to writeNops() and remove STI from the X86 and ARM AsmBackends, these have some further reorganisation of the position of instruction bundling inside the MCFragment hierarchy. I'll add these as dependent reviews to this one and link them in a comment.

I've added reviews for the support of writeNops(). My apologies for the size of the changes, the first three make sure the STI is recorded in the fragment so that we can retrieve it when calling writeNops. The final one passes STI through to writeNops and removes STI from the X86 and ARM AsmBackends.
1 [MC] Move bundling and MCSubtargetInfo to MCEncodedFragment [NFC] D45959
2 [MC] Add MCSubtargetInfo to MCPaddingFragment [NFC] D45960
3 [MC] Add MCSubtargetInfo to MCAlignFragment [NFC] D45961
4 [MC] Use local MCSubtargetInfo in writeNops D45962

I've not had any more comments this week on the approach to instruction bundling. Would anyone have any objections to committing this next week as this will fix pr36542, and will at least not make the writeNops case any worse?

@phosek Are you planning to look at this?

I've rebased the patch against current trunk. The littleEndian parameter was removed from applyFixup, NFC for the patch.

Taking a quick glance through this I wonder if adding the subtarget to the fragment is necessary since we've got it at encodeInstruction time? I haven't done a thorough look, but I'm concerned we're missing something here.

Thoughts? Also one similar sort of inline comment.

lib/Target/ARM/MCTargetDesc/ARMAsmBackend.h
22–25

I think we should just rely on a passed in STI wherever necessary and only store module/object level caches here. Also applies to isThumbMode below :)

Thoughts?

Thanks very much for the comments. Although the STI is available at the time the instruction is encoded; due to relaxation being a separate pass resolved after all instructions have been emitted we don't have the STI to pass to applyFixups() as all we have is a list of fragments. For example the MCRelaxableFragment has to store the STI as it might need to re-encode the instruction during relaxation. Thinking about some possible alternatives:

  • Make another pass over the input during relaxation (and at other times we need the STI, such as emitting NOPS in alignment padding), so that we know what STI is in effect at the time. I think that this would be a much larger change, in effect making the assembler 2 pass.
  • Maintain some kind of mapping table between fragment and STI. I think that this is logically equivalent to storing the STI in the fragment though.
  • Store the STI per MCInst, this seems wasteful and I don't think it would work for writeNops() as these come from alignment and padding fragments.

In short I've not been able to come up with anything better. If I have missed something and there is a better alternative I'd be happy to try implementing it.

lib/Target/ARM/MCTargetDesc/ARMAsmBackend.h
22–25

I've got some follow up patches that do this for STI; the other use is in writeNops() . Unfortunately the patches needed to do this are substantially larger and more complicated than this one. Given that writeNops() matters less for correctness than this one I thought it better to do in separate patches. I've not yet looked at isThumbMode but my intuition is that it should also be possible to do as a follow up patch.

Repeating the comment from earlier in the history:

I've added reviews for the support of writeNops(). My apologies for the size of the changes, the first three make sure the STI is recorded in the fragment so that we can retrieve it when calling writeNops. The final one passes STI through to writeNops and removes STI from the X86 and ARM AsmBackends.
1 [MC] Move bundling and MCSubtargetInfo to MCEncodedFragment [NFC] D45959
2 [MC] Add MCSubtargetInfo to MCPaddingFragment [NFC] D45960
3 [MC] Add MCSubtargetInfo to MCAlignFragment [NFC] D45961
4 [MC] Use local MCSubtargetInfo in writeNops D45962

Ping. Is anyone willing to approve or has any thoughts on a better design?

If I can summarise so far:

  • Each MachineFunction can have its own Subtarget.
  • On targets such as Arm, and I think also Mips, it is possible to change the Subtarget on the fly in assembler with a .arch directive.
  • On Arm, relaxation and fixup calculation decisions can be affected by the Subtarget. PR36542 shows a real example with LTO where a default CPU of ARM7TDMI (Architecture v4t) was passed to MC, yet the MachineFunctions were generated using a much more recent Armv7A, this caused a fixup error as the code-generator was expecting the 2-byte Thumb1 Branch to be relaxed to a Thumb2 4-byte branch by MC.
  • A legitimate use case of .arch is to do runtime selection of architecture so it is not appropriate to just pick either the lowest or highest Subtarget that we find in the Module.
  • At the time we do relaxation and fixups (after layout) we don't know what the current active Subtarget is so we must have a way of recovering the Subtarget for relaxations and fixups.
  • This patch extends the example of MCRelaxableFragment and stores the Subtarget in the MCDataFragment.
  • There is at most one Subtarget per MCDataFragment so we create a new MCDataFragment when the Subtarget changes. This causes some complications with the AlignedBundling and RelaxAll used by NativeClient as all the instructions in a Bundle must be within the same fragment. This patch gives an error message if the Subtarget is changed within a bundle (possible in assembly only). A possible more complex alternative is to accept multiple Subtargets per Fragment but given the restrictions on target of NativeClient (ARM v7A instructions only) and deprecation of NativeClient it is extremely unlikely that this case will need to be handled.
  • On both x86 and Arm the writeNops() function is also affected but is not handled by this patch due to the size of the change. There are a series of follow up patches (see earlier comments) for that.

At this stage I can't think of a logical alternative to making a mapping between Fragment and Subtarget. At present the Subtarget is stored in the fragment itself, it could be possible to maintain an external mapping table but I'm not sure whether that would be an improvement as it would need to be threaded through to where it is needed.

echristo accepted this revision.Jun 6 2018, 1:27 AM

Right now I don't have a better design without quite a bit more in depth look. That said, this gets us through a correctness issue and can be revisited in the future so let's go with it.

Thanks for the patience.

-eric

This revision is now accepted and ready to land.Jun 6 2018, 1:27 AM
This revision was automatically updated to reflect the committed changes.