Page MenuHomePhabricator

[globalisel][legalizer] Attempt to write down the minimal legalization rules
ClosedPublic

Authored by dsanders on May 24 2019, 1:33 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

dsanders created this revision.May 24 2019, 1:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 24 2019, 1:33 PM
arsenm added inline comments.May 24 2019, 1:45 PM
llvm/docs/GlobalISel.rst
456–457 ↗(On Diff #201323)

I don't follow this. How do you know they can be lowered to target instructions?

464–465 ↗(On Diff #201323)

This ignores the existence of non-integral pointers

arsenm added inline comments.May 24 2019, 1:47 PM
llvm/docs/GlobalISel.rst
464–465 ↗(On Diff #201323)

This should also probably say "selected" to copy, since IIRC COPY is not allowed to convert the type with generic virtual registers

paquette added inline comments.May 24 2019, 2:27 PM
llvm/docs/GlobalISel.rst
405 ↗(On Diff #201323)

"GMIR" is used elsewhere in the document, probably best to use that instead of "gMIR"

408–413 ↗(On Diff #201323)

Is it possible to use the fancy definition list thing here, or use some formatting to separate Producer Type Set/Consumer Type Set from the rest of the paragraph? (e.g. bold or italics)

It's not terribly, long, but if I was skimming for the definitions, I think it would help.

420 ↗(On Diff #201323)

four

426 ↗(On Diff #201323)

Since the point of the section is to get the minimum rule set across I think this flow makes a bit more sense:

  1. G_ANYEXT and G_TRUNC must have legalization rules
  2. Introduce the other instructions
  3. Explanations for why you don't have mandatory rules for the other instructions

I think it would be good to have the mandatory instructions in a list rather than a paragraph too, for the sake of skimmability.

450 ↗(On Diff #201323)

"LLVM IR" is more idiomatic, I think

dsanders marked 3 inline comments as done.May 24 2019, 3:51 PM
dsanders added inline comments.
llvm/docs/GlobalISel.rst
426 ↗(On Diff #201323)

That makes sense. I'll re-work it to that

456–457 ↗(On Diff #201323)

I'm trying to convey that even if the target doesn't use scalarize() to eliminate the vector, there's multiple choices on how to handle G_BUILD_VECTOR and no requirement to pick a specific method. In particular there's no requirement for the end point of legalization to be a .legal*() rule that marks the G_BUILD_VECTOR legal. An alternative is .custom*() that replaces it with a sequence of target instructions. The overall effect is that there aren't any legal G_BUILD_VECTOR's but vectors are still supported.

I guess something like this would be clearer:

There are no legality requirements for G_BUILD_VECTOR, or G_BUILD_VECTOR_TRUNC since these can be scalarized, lowered to other instructions, or lowered to target instructions.
464–465 ↗(On Diff #201323)

This ignores the existence of non-integral pointers

How are those represented in gMIR? I haven't had to deal with them

This should also probably say "selected" to copy, since IIRC COPY is not allowed to convert the type with generic virtual registers

That's a good point, it needs to be constrained to a register class otherwise the types have to match.

dsanders updated this revision to Diff 201379.May 24 2019, 7:20 PM
dsanders marked 7 inline comments as done.

Update to account for review comments so far except for the non-integral
pointers one for which I need more information.

Producer/Consumer type set and minimum rules for scalars are assuming that set of available integer scalars is the same as set of all available floating point scalars. This does not have to be the case.
Could we define Producer/Consumer type set in terms of available sizes in register banks?
That is Producer/Consumer type set for G_ANYEXT, G_SEXT, G_ZEXT, G_TRUNC corresponds to available sizes in GPR register bank (integer instructions).
That is Producer/Consumer type set for G_FPEXT, G_FPTRUNC corresponds to available sizes in FPR register bank (floating point instructions).

This is the problem with GMIR in general since type (integer/float) is not available in low level type, but in most cases it is possible to figure out the type from opcode (float opcodes start with G_F , integer start with G_).
There are also ambiguous opcodes that are used for both integer and floating point instructions assuming they are available for same sizes (G_PHI, G_LOAD, G_STORE, G_IMPLICIT_DEF).

llvm/docs/GlobalISel.rst
422 ↗(On Diff #201379)

Here we are assuming that set of available integer scalars is the same as set of all available floating point scalars. This does not have to be the case.
e.g. "-mtriple=mipsel-linux-gnu" "-mtriple=i386-linux-gnu -mattr=+sse2"(http://lists.llvm.org/pipermail/llvm-dev/2017-July/114930.html) have i32, float/double available but i64 is not available.
G_FPTRUNC is legal for {s32, s64} but G_TRUNC is not (for -mtriple=mipsel-linux-gnu I would say that it is unsupported since there are no register classes that hold less then 32 bits, and would expect artifact combiner to deal with G_TRUNC).

Hi Daniel,

Thanks for writing this down. I think the description is accurate (modulo the bit cast stuff) of what we do today.
That said, the vision (that we don't implement currently) for the legalizer was actually much simpler:

  • The target needs to provide a way to legalize exts from producer/consumer sets.
  • The target needs to support loading and storing from producer/consumer sets.

From there, supporting truncates, merge/unmerge, build vector and so on is a matter of issuing the right loads and stores. That's not very efficient thus we encourage targets to do what you wrote down, but the idea was that it is more an optimization than anything else.

Cheers,
-Quentin

llvm/docs/GlobalISel.rst
455 ↗(On Diff #201379)

Just a remark.
Bitcasts can actually be tricky to lower (from langref: The conversion is done as if the value had been stored to memory and read back as type ty2) because it can shuffle bits around.
Therefore we should add that if the bistcast for the dst and src type is not a no-op, then the target should know how to expand it.

arsenm added inline comments.May 27 2019, 5:50 PM
llvm/docs/GlobalISel.rst
464–465 ↗(On Diff #201323)

They're the same as any other, it's just a field in the datalayout. The target defines these, so some set of operations on them must be legal. G_PTRTOINT/G_INTTOPTR is not allowed for them

dsanders marked 2 inline comments as done.Jun 17 2019, 1:26 PM

Producer/Consumer type set and minimum rules for scalars are assuming that set of available integer scalars is the same as set of all available floating point scalars. This does not have to be the case.

Scalars are a bag of bits that do not have an associated floating-point/integer interpretation. That interpretation is provided by the operation

Could we define Producer/Consumer type set in terms of available sizes in register banks?
That is Producer/Consumer type set for G_ANYEXT, G_SEXT, G_ZEXT, G_TRUNC corresponds to available sizes in GPR register bank (integer instructions).
That is Producer/Consumer type set for G_FPEXT, G_FPTRUNC corresponds to available sizes in FPR register bank (floating point instructions).

No, because register banks aren't a concept that legalization operates on. They're mostly introduced by the Register Bank Allocator

This is the problem with GMIR in general since type (integer/float) is not available in low level type, but in most cases it is possible to figure out the type from opcode (float opcodes start with G_F , integer start with G_).

You shouldn't be trying to figure out the data format of a scalar in GMIR as you'll get contradictory results (e.g. G_ADD feeding into G_FADD). The operations impart an interpretation on the bits in a register w.r.t that operation but the values in registers are just abstract bundles of bits.

There are also ambiguous opcodes that are used for both integer and floating point instructions assuming they are available for same sizes (G_PHI, G_LOAD, G_STORE, G_IMPLICIT_DEF).

They're not really ambiguous, the data format just has no meaning to them. G_PHI/G_LOAD/G_STORE are just moving bundles of bits around without interpreting them. G_IMPLICIT_DEF just invents a bundle of bits with unspecified values.

Hi Daniel,

Thanks for writing this down. I think the description is accurate (modulo the bit cast stuff) of what we do today.
That said, the vision (that we don't implement currently) for the legalizer was actually much simpler:

  • The target needs to provide a way to legalize exts from producer/consumer sets.
  • The target needs to support loading and storing from producer/consumer sets.

    From there, supporting truncates, merge/unmerge, build vector and so on is a matter of issuing the right loads and stores. That's not very efficient thus we encourage targets to do what you wrote down, but the idea was that it is more an optimization than anything else.

Those rules work when there's a load/store for every type but they run into unable-to-legalize if that's not the case. For example, suppose you have 8, 16, and 32-bit load/store but you also have 64-bit add. The s64 would need to narrowScalar to be usable with the s32 load/stores but that's not possible if narrowScalar needs an s64 store to narrow it. You'd need either G_TRUNC + G_LSHR (the G_LSHR can be a libcall) or G_UNMERGE_VALUES (which itself can legalize to G_TRUNC+G_LSHR) to legalize that case.

llvm/docs/GlobalISel.rst
422 ↗(On Diff #201379)

Here we are assuming that set of available integer scalars is the same as set of all available floating point scalars. This does not have to be the case.
e.g. "-mtriple=mipsel-linux-gnu" "-mtriple=i386-linux-gnu -mattr=+sse2"(http://lists.llvm.org/pipermail/llvm-dev/2017-July/114930.html) have i32,
float/double available but i64 is not available.

No, the interpretation of the bits as being integer or floating point is irrelevant to this. An s64 containing a floating point value is still required to be truncatable with G_TRUNC. For example, this is necessary to feed it into an s32 store instruction. Similarly, an s32 containing half of a floating point value is still required to be any-extendable with G_ANYEXT in order to compose it with the other half to construct a full value.

G_FPTRUNC is legal for {s32, s64} but G_TRUNC is not (for -mtriple=mipsel-linux-gnu I would say that it is unsupported since there are no register > classes that hold less then 32 bits, and would expect artifact combiner to deal with G_TRUNC).

G_TRUNC must be legal from s64 to s32 for a target to be guaranteed a means to store the s64 to memory (MIPS would implement it using mfc1). You're unlikely to actually use that path since MIPS you also have legal operations that make the guaranteed path redundant (e.g. sdc1)

455 ↗(On Diff #201379)

Yep, I should clarify that. The intent was that because some targets can cope without it there's no requirement in the minimal set. I'll update the wording on this shortly

dsanders updated this revision to Diff 205166.Jun 17 2019, 1:27 PM

Clarify the legality of G_BITCAST where non-trivial bitcasts appear.

Those rules work when there's a load/store for every type but they run into unable-to-legalize if that's not the case.

Indeed, and that was the intent. For the case that are not true, like in the example you describe, the target has to support those cases with custom expansion or other. In other words, what you describing get into specifics we didn't want to cover initially in the generic code.

llvm/docs/GlobalISel.rst
422 ↗(On Diff #201379)

OK, let's add rules defined here to ones that already exist.
MIPS will then often have to select sequences of move instructions because Legalizer did not perform desired combine/narrow scalar because type information was not available.

Where could we perform desired legalization/combine ? Assume that we are able to find out what needs to be combined/legalized with narrow scalar.
Maybe target custom postLegalizer Legalizer pass or target custom postRegBankSelect Legalizer pass. It would narrow scalar i64 G_LOAD but would not touch f64 G_LOAD.
Or to let targets to put together a InstList/ArtifactList during RegBankSelect and forward these lists to legalize (function that would hold main do/while loop from Legalizer.cpp) at the end of regBankSelect, this way we don't have to process all instructions just ones that need to be re-legalized. Here we need different LegalizerInfo object than one used in Legalizer pass.

dsanders marked an inline comment as done.Jun 18 2019, 11:29 AM
dsanders added inline comments.
llvm/docs/GlobalISel.rst
422 ↗(On Diff #201379)

You seem to be folding the job of RegBankSelect (or a custom pass) into the legalizer.

Using your G_LOAD example, the availability of ldc1 makes s64 G_LOAD legal because it's possible to handle it any s64 G_LOAD that occurs. It might not be the most efficient thing to do in a given context but efficiency is not the legalizers job. The legalizer is merely tasked with eliminating things that are (or might be) impossible to handle in subsequent passes. As a side note, the artifact combiner doesn't really belong in the legalizer but exists as a concession to prevent the legalizer blowing up the size of the GMIR too much. It's intentionally limited to combining cases that the legalizer often emits around widenScalar/narrowScalar and we don't want the scope of the artifact combiner to increase because it shouldn't really be there at all.

Binding the s64 G_LOAD to a register bank in the most efficient way is the job of the RegBankAllocator. Note that this isn't purely a matter of assigning a register bank to a vreg, it can also change the instructions.
You have two main implementation choices in RegBankAllocator and the choice will depend on the cost of the alternatives. The simplest option is to bind it to the only register bank capable of doing the s64 G_LOAD (FPRB on MIPS). This will encourage the RegBankAllocator to bind the data dependent operations to FPRB and emit a copy when it has to use GPRB (or doing so would require fewer copies). This isn't the ideal option though as it forces all 64-bit loads through the FPU and the mfhc1/mfc1/ldc1 costs more than lw/lw.
The better option is to give the RegBankAllocator a choice on how to bind s64 G_LOAD so that if it chooses GPRB it costs two loads plus any cross bank move cost caused by data dependencies and if it chooses FPRB it costs one load plus any cross bank move cost. This gives it the freedom to make the right decision for the context:

  • If you're just copying data from one memory location to another then (assuming s64 G_STORE is handled similarly) then it will bind to FPRB and use ldc1/sdc1 to copy the data which beats lw/sw/lw/sw.
  • If you're doing floating point operations on the data then it will generally bind to FPRB and use something like ldc1/fadd.d which beats lw/lw/mthc1/mtc1/fadd.d
  • If you're doing bit twiddling on floating point data (e.g. fneg vs xor 0x800000000000000, or fabs vs and 0x7fffffffffffffff) then it will have a choice. It will most likely pick FPRB and ldc1/fneg.d rather than lw/lw/xor 0x80000000 if there's other data-dependent floating point operations nearby but the costs are fairly close so the decision can go with GPRB in the right context
  • If you're doing integer operations on the data then it will generally bind to GPRB and use something like lw/lw/setlt/add/add/add (for a narrowScalar'd s64 G_ADD).

In summary, trying to legalize i64 G_LOAD and f64 G_LOAD differently is bad because it's not the type of the data that determines what the most efficient instructions are and whichever decision you make you're locking the code generator out of selecting the best code. It's really the operations you perform on the data that are important as in some cases, i64 is best handled with floating point instructions and in others f64 is best handled with integer instructions.

llvm/docs/GlobalISel.rst
422 ↗(On Diff #201379)

Thanks for the explanation.
Following up comment above, let's assume that we figured out that most efficient solution is to modify instruction (e.g. turn s64 G_LOAD into two s32 G_LOADs). What I meant by legalize is that changing instruction in e.g. `applyMappingImpl' is equivalent to applying some of legalization / artifact combine rules. This way we duplicate some of the code from Legalizer and legalize/combine some instructions manually. If e.g. G_LOAD needs to be turned into lw/lw/G_MERGE there is a G_UNMERGE somewhere waiting, dealing with them is equivalent to narrowScalar for G_LOAD + MERGE/UNMERGE combine.
It looks nicer to collect all instructions that have to be "relegalized" and forward them to legalizer with RegBankLegalizerInfo (this one has narrowScalar for G_LOAD from s64 to s32, the one from Legalizer has G_LOAD legal for both s64 and s32 because of Minimum Rule Set). Is this job for RegBankSelect or Target custom Pass (Legalizer2) ?

Hi @Petar.Avramovic ,

I second Daniel's comment, this is RegBankSelect's job to do this choice and the Legalizer shouldn't need to know about f64 vs. s64.

This way we duplicate some of the code from Legalizer and legalize/combine some instructions manually.

The way it was thought was that RegBankSelect should be able to call any Legalizer helper or whatever way you want to share the code between the legalizer and RegBankSelect.

Cheers,
-Quentin

dsanders marked an inline comment as done.Jun 20 2019, 10:38 AM
dsanders added inline comments.
llvm/docs/GlobalISel.rst
422 ↗(On Diff #201379)

As Quentin mentioned, the LegalizerHelper is available to all passes to enable the sharing of that code with any pass that needs it and there's no requirement that its usage is driven by consulting a LegalizerInfo. So after deciding to bind to GPRB, your code would just call narrowScalar(MI, 0, s32) to break it into two G_LOAD's. I'm not sure off-hand if the artifact combiner code is available in that interface at the moment but it should be so we can add it if it's not available today.

Ping. I think we reached a conclusion on this but I lack an LGTM :-)

Petar.Avramovic added inline comments.Thu, Aug 8, 3:16 AM
llvm/docs/GlobalISel.rst
429 ↗(On Diff #205166)

Do G_LOAD and G_STORE deserve a mention here? They might have some complications depending on values in MemDesc, but should be legal for all inputs from producer and consumer type set in combination with some values in MemDesc.

This revision is now accepted and ready to land.Thu, Aug 8, 3:16 AM
dsanders marked an inline comment as done.Thu, Aug 8, 10:51 AM
dsanders added inline comments.
llvm/docs/GlobalISel.rst
429 ↗(On Diff #205166)

There's no minimal requirement for G_LOAD/G_STORE aside from being able to load/store _somehow_. You need at least one legal G_LOAD and G_STORE but there's no common subset that all targets can agree on.

For example, while it's very common for targets to support all their producer/consumer types it's still possible to have targets that have s64 in their producer type set but don't support any s64 loads/stores. One case I know of is MIPS32 with the DSP ASE has 64-bit results from madd but doesn't have any 64-bit load/store instructions. I've also worked on a proprietary target in the past that had a restriction like this.

Focusing more on the MemDesc side of things, it's very common for targets to support 1-byte accesses but it's still possible to not support that so long as at least one size is supported (you have to do the smaller sizes as a read-modify-write operation). I don't have an example to hand on this but there have been occasional threads on llvm-dev asking about a target like this. It's also possible to not support atomics (of course, you can't use language/library features that depend on it in that case)

This revision was automatically updated to reflect the committed changes.