Page MenuHomePhabricator

CodeGen: Don't lazily construct MachineFunctionInfo
Needs ReviewPublic

Authored by arsenm on May 19 2020, 2:39 PM.

Details

Summary

This fixes what I consider to be an API flaw I've tripped over
multiple times. The point this is constructed isn't well defined, so
depending on where this is first called, you can conclude different
information based on the MachineFunction. For example, the AMDGPU
implementation inspected the MachineFrameInfo on construction for the
stack objects and if the frame has calls. This kind of worked in
SelectionDAG which visited all allocas up front, but broke in
GlobalISel which hasn't visited any of the IR when arguments are
lowered.

I've run into similar problems before with the MIR parser and trying
to make use of other MachineFunction fields, so I think it's best to
just categorically disallow dependency on the MachineFunction state in
the constructor and to always construct this at the same time as the
MachineFunction itself.

A missing feature I still could use is a way to access an custom
analysis pass on the IR here.

I'll update the rest of the targets after initial comments

Diff Detail

Event Timeline

arsenm created this revision.May 19 2020, 2:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2020, 2:39 PM
ychen added a subscriber: ychen.May 19 2020, 5:24 PM

I'm not comfortable accepting this as I'm not familiar with the code, but it LGTM. The pattern of constructing the MachineFunctionInfo lazily on the first call to getInfo seems very non-obvious; it isn't implied by the name and it isn't described in the doc comment. As you note it is also particularly dangerous if it depends on values which are mutable over the life of the MachineFunction.

Is there no reasonable default MFI we can create?

Is there no reasonable default MFI we can create?

The point of it is to carry target specific information, so I don't think anything would make sense to be there by default.

I think there are a bunch of APIs that have this issue, so I think the general direction is good. I don't expect this to have any memory/compile-time impact, since probably even the smallest input will create the MachineFunctionInfo anyway. Was this the initial intention of having it build lazily?

So you should probably go ahead, I doubt anyone has any objections after all this time.

Nit: maybe commit the changes from AMDGPUMachineFunction.cpp separately

kparzysz added a comment.EditedJun 22 2020, 1:54 PM

A "create" function shouldn't return a null pointer. I think there should be a default MFI that has nothing in it, and it would be what the createMachineFunctionInfo returns a pointer to in the absence of overrides. It could even be statically allocated since is has no members:

virtual MachineFunctionInfo *
createMachineFunctionInfo(BumpPtrAllocator &Allocator, const Function &F,
                          const TargetSubtargetInfo *STI) const {
  static MachineFunctionInfo default;
  return &default;
}

Edit: fix indentation in the code

A "create" function shouldn't return a null pointer. I think there should be a default MFI that has nothing in it, and it would be what the createMachineFunctionInfo returns a pointer to in the absence of overrides. It could even be statically allocated since is has no members:

virtual MachineFunctionInfo *
createMachineFunctionInfo(BumpPtrAllocator &Allocator, const Function &F,
                          const TargetSubtargetInfo *STI) const {
  static MachineFunctionInfo default;
  return &default;
}

Edit: fix indentation in the code

Alternatively, make it pure virtual or make it llvm_unreachable?

arsenm updated this revision to Diff 274620.Jun 30 2020, 2:46 PM
arsenm retitled this revision from WIP: CodeGen: Don't lazily construct MachineFunctionInfo to CodeGen: Don't lazily construct MachineFunctionInfo.

Updated rest of the targets

A "create" function shouldn't return a null pointer. I think there should be a default MFI that has nothing in it, and it would be what the createMachineFunctionInfo returns a pointer to in the absence of overrides. It could even be statically allocated since is has no members:

virtual MachineFunctionInfo *
createMachineFunctionInfo(BumpPtrAllocator &Allocator, const Function &F,
                          const TargetSubtargetInfo *STI) const {
  static MachineFunctionInfo default;
  return &default;
}

Edit: fix indentation in the code

I did try to do this, but it didn't work out. It introduces a dependence on CodeGen from Target. I don't think anywhere else uses this pattern to avoid overrides, and just returns null. I don't think this is really an issue in practice, since this is only ever called from target code

A "create" function shouldn't return a null pointer. I think there should be a default MFI that has nothing in it, and it would be what the createMachineFunctionInfo returns a pointer to in the absence of overrides. It could even be statically allocated since is has no members:

virtual MachineFunctionInfo *
createMachineFunctionInfo(BumpPtrAllocator &Allocator, const Function &F,
                          const TargetSubtargetInfo *STI) const {
  static MachineFunctionInfo default;
  return &default;
}

Edit: fix indentation in the code

Alternatively, make it pure virtual or make it llvm_unreachable?

This is ok with me.

A "create" function shouldn't return a null pointer. I think there should be a default MFI that has nothing in it, and it would be what the createMachineFunctionInfo returns a pointer to in the absence of overrides. It could even be statically allocated since is has no members:

virtual MachineFunctionInfo *
createMachineFunctionInfo(BumpPtrAllocator &Allocator, const Function &F,
                          const TargetSubtargetInfo *STI) const {
  static MachineFunctionInfo default;
  return &default;
}

Edit: fix indentation in the code

Alternatively, make it pure virtual or make it llvm_unreachable?

This is ok with me.

As I mentioned in a previous comment, I think this ended up not working well. The dummy implementation I believe introduces build layer violations, and the pure virtual added more boilerplate to a couple of targets. Since this is only ever accessed from target code, I don't think there's much danger in using null here

Do you really need to change the signature of MachineFunctionInfo::create? That seems orthogonal to the problem you're trying to solve, or perhaps a redundant solution.

The problem is: due to lazy construction and being able to refer to the MachineFunction, the state of the MachineFunctionInfo can have surprising accidental differences depending on when exactly it was created. There are two possible solutions:

  1. Prevent the MachineFunctionInfo from accessing the MachineFunction.
  2. Always create the MachineFunctionInfo in the same place.

This change does both, but either solution should be sufficient, and I personally find #1 counter-intuitive: it seems to me that the additional target-specific info about an object should be allowed to access that object.

Do you really need to change the signature of MachineFunctionInfo::create? That seems orthogonal to the problem you're trying to solve, or perhaps a redundant solution.

The problem is: due to lazy construction and being able to refer to the MachineFunction, the state of the MachineFunctionInfo can have surprising accidental differences depending on when exactly it was created. There are two possible solutions:

  1. Prevent the MachineFunctionInfo from accessing the MachineFunction.
  2. Always create the MachineFunctionInfo in the same place.

This change does both, but either solution should be sufficient, and I personally find #1 counter-intuitive: it seems to me that the additional target-specific info about an object should be allowed to access that object.

What state in the MF is valid for the MFI to rely on at that well-defined point? If the MF is mutable, isn't the MFI potentially stale the moment the MF is mutated?

Do you really need to change the signature of MachineFunctionInfo::create? That seems orthogonal to the problem you're trying to solve, or perhaps a redundant solution.

The problem is: due to lazy construction and being able to refer to the MachineFunction, the state of the MachineFunctionInfo can have surprising accidental differences depending on when exactly it was created. There are two possible solutions:

  1. Prevent the MachineFunctionInfo from accessing the MachineFunction.
  2. Always create the MachineFunctionInfo in the same place.

This change does both, but either solution should be sufficient, and I personally find #1 counter-intuitive: it seems to me that the additional target-specific info about an object should be allowed to access that object.

You can't really do anything with the empty MachineFunction on construction. At the construction point, there isn't any information to access and it would always be a mistake to inspect it or make any construction decisions based on it (it would also be bad for the constructor to modify the MachineFunction).

Do you really need to change the signature of MachineFunctionInfo::create? That seems orthogonal to the problem you're trying to solve, or perhaps a redundant solution.

The problem is: due to lazy construction and being able to refer to the MachineFunction, the state of the MachineFunctionInfo can have surprising accidental differences depending on when exactly it was created. There are two possible solutions:

  1. Prevent the MachineFunctionInfo from accessing the MachineFunction.
  2. Always create the MachineFunctionInfo in the same place.

This change does both, but either solution should be sufficient, and I personally find #1 counter-intuitive: it seems to me that the additional target-specific info about an object should be allowed to access that object.

You can't really do anything with the empty MachineFunction on construction. At the construction point, there isn't any information to access and it would always be a mistake to inspect it or make any construction decisions based on it (it would also be bad for the constructor to modify the MachineFunction).

MFI is mutable, and so referring back to the MachineFunction as part of the mutation is potentially something reasonable to do. So the point isn't to use data from the MachineFunction during construction of the MFI, but rather to allow the "child" object to have a backlink to its "parent".

Do you really need to change the signature of MachineFunctionInfo::create? That seems orthogonal to the problem you're trying to solve, or perhaps a redundant solution.

The problem is: due to lazy construction and being able to refer to the MachineFunction, the state of the MachineFunctionInfo can have surprising accidental differences depending on when exactly it was created. There are two possible solutions:

  1. Prevent the MachineFunctionInfo from accessing the MachineFunction.
  2. Always create the MachineFunctionInfo in the same place.

This change does both, but either solution should be sufficient, and I personally find #1 counter-intuitive: it seems to me that the additional target-specific info about an object should be allowed to access that object.

You can't really do anything with the empty MachineFunction on construction. At the construction point, there isn't any information to access and it would always be a mistake to inspect it or make any construction decisions based on it (it would also be bad for the constructor to modify the MachineFunction).

MFI is mutable, and so referring back to the MachineFunction as part of the mutation is potentially something reasonable to do. So the point isn't to use data from the MachineFunction during construction of the MFI, but rather to allow the "child" object to have a backlink to its "parent".

As in, the bug is for an MFI implementation to "cache" properties of the MF (e.g. AMDGPU taking a snapshot of MF properties in the MFI constructor), because the MF is mutable and there is no mechanism to invalidate the cache?

Do you really need to change the signature of MachineFunctionInfo::create? That seems orthogonal to the problem you're trying to solve, or perhaps a redundant solution.

The problem is: due to lazy construction and being able to refer to the MachineFunction, the state of the MachineFunctionInfo can have surprising accidental differences depending on when exactly it was created. There are two possible solutions:

  1. Prevent the MachineFunctionInfo from accessing the MachineFunction.
  2. Always create the MachineFunctionInfo in the same place.

This change does both, but either solution should be sufficient, and I personally find #1 counter-intuitive: it seems to me that the additional target-specific info about an object should be allowed to access that object.

You can't really do anything with the empty MachineFunction on construction. At the construction point, there isn't any information to access and it would always be a mistake to inspect it or make any construction decisions based on it (it would also be bad for the constructor to modify the MachineFunction).

MFI is mutable, and so referring back to the MachineFunction as part of the mutation is potentially something reasonable to do. So the point isn't to use data from the MachineFunction during construction of the MFI, but rather to allow the "child" object to have a backlink to its "parent".

But you can just as well pass in the machine function in these cases. I don't think the minor convenience of not passing in the function is worth the footgun this gives you