This is an archive of the discontinued LLVM Phabricator instance.

[SPIRV 6/6] Add the module analysis pass and the simplest tests
ClosedPublic

Authored by iliya-diyachkov on Dec 31 2021, 2:58 PM.

Details

Summary

This patch adds one SPIRV analysis pass and extends AsmPrinter. It is essential for minimum SPIR-V output. Also it adds several simplest tests to show that the target basically works.

Diff Detail

Event Timeline

iliya-diyachkov requested review of this revision.Dec 31 2021, 2:58 PM
Herald added a project: Restricted Project. · View Herald Transcript

You don't seem to have tests checking if the encoding of the instructions are correct when printing or parsing, leaving the Printer/Parser completely untested.

I'm guessing you don't have any arguments or return values because you haven't yet implemented prologue/epilogue and frame lowering.

Without that, it's hard to really test anything, but also makes the two passes a bit pointless, because you're not testing them either.

llvm/lib/Target/SPIRV/SPIRVBlockLabeler.cpp
11 ↗(On Diff #396830)

Isn't this guaranteed by IR passes and the verifier?

What would create basic blocks that don't have a suitable terminator that still passes the verify?

llvm/lib/Target/SPIRV/SPIRVGlobalTypesAndRegNumPass.cpp
23 ↗(On Diff #396830)

That's a bit scary. Is that what created BBs without terminators?

llvm/lib/Target/SPIRV/SPIRVTargetMachine.cpp
154

I don't see where this is called in this code...

I think all the CHECKs we have in the added tests do check instruction Printer (we don't use Parser) since all the lines in asm output (OpName, OpType***, OpFunction, OpLabel, OpFunctionParameter etc) are instructions in terms of SPIR-V and Printer prints them as normal instructions. As you can see, some of them have arguments that are also checked for correctness.

The actual reason for adding the only simplest tests is not the missed frame lowering. SPIR-V uses high-level type information that should be maintained after IR translator and other GlobalISel passes. The necessary passes and utility code have not been added yet, since the simplest tests can be passed without them. However, if we want to have a test that contains something substantial, these functionality (>1k lines) should be added. But I'm not sure we really need that in this series of patches.

By the way, slightly extending the second pass I can add a test that has a simple argument usage with value returning.

The 2 passes in this patch are actually checked by the added tests because the tests check the instructions added during the passes (e.g. OpLabel/OpFunctionEnd in the first pass and OpSource/OpName in the second one).

llvm/lib/Target/SPIRV/SPIRVBlockLabeler.cpp
11 ↗(On Diff #396830)

It looks like an outdated description. In any case, I can't immediately find a test that would require additional terminators at the time of the pass invocation.

The main purpose of the pass is to insert OpLabel instructions and to put the corresponding registers to OpBranchConditional, OpBranch, and OpPhi instructions. Also OpFunctionEnd (last instruction in functions) is inserted here.

These instructions are required to pass simple tests with defined functions.

llvm/lib/Target/SPIRV/SPIRVGlobalTypesAndRegNumPass.cpp
23 ↗(On Diff #396830)

Actually the pass creates a dummy function that represents the global context of the module containing types, constants, global variables and other SPIR-V specific global information. This dummy function has a linear CFG, and in fact its BBs do not contain terminators, however, they can be inserted or a single BB can be used here (because all these BBs are created just for the convenience of the code motion from real functions to this dummy one).

This pass works just before the code emission and does not break any other passes. It is expected that the verifier will not work after it, since registers are renamed according to requirements for SPIR-V code emission, and it will be wrong for the verifier.

llvm/lib/Target/SPIRV/SPIRVTargetMachine.cpp
154

It's overrided standard LLVM pass which is called immediately before machine code is emitted (llvm/include/llvm/CodeGen/TargetPassConfig.h:443).

MaskRay added inline comments.Jan 3 2022, 4:09 PM
llvm/test/CodeGen/SPIRV/optnone.ll
6 ↗(On Diff #396830)

Without a positive test in the same file, a negative pattern can easily go stale without being noticed.

Consider split-file to place highly related tests together.

17 ↗(On Diff #396830)

delete unneeded attributes

27 ↗(On Diff #396830)

incorrect version.

delete unneeded metadata

arsenm added inline comments.Jan 4 2022, 11:21 AM
llvm/lib/Target/SPIRV/SPIRVBlockLabeler.cpp
102 ↗(On Diff #396830)

Why does this need to be a codegen pass? Can't you handle this during the encoding/emission?

104 ↗(On Diff #396830)

Using MachineIRBuilder outside of a globalisel pass is weird

llvm/lib/Target/SPIRV/SPIRVGlobalTypesAndRegNumPass.cpp
23 ↗(On Diff #396830)

I think having the verifier fail at any point is a problem. Why does this have to be a dedicated pass, and not just part of the AsmPrinter?

232 ↗(On Diff #396830)

MI print includes the newline already

llvm/test/CodeGen/SPIRV/metadata-opencl12.ll
1 ↗(On Diff #396830)

Probably should make global-isel the target default so you don't have to add the flag to every test

llvm/lib/Target/SPIRV/SPIRVBlockLabeler.cpp
102 ↗(On Diff #396830)

Each OpLabel defines a virtual register which is used in terminator instructions. I'm not sure that creating OpLabels, then finding and patching correspondent terminators is an appropriate task for the encoding/emission. Moreover the next pass (SPIRVGlobalTypesAndRegNumPass) should rename all virtual registers globally (i.e. each register in a module gets an unique number). As a result OpLabels and terminators (since they have registers) should be processed before SPIRVGlobalTypesAndRegNumPass.

104 ↗(On Diff #396830)

Yes, we need to use BuildMI like in addPreEmitPass2's passes of other targets.

llvm/lib/Target/SPIRV/SPIRVGlobalTypesAndRegNumPass.cpp
23 ↗(On Diff #396830)

Currently only a part of this pass is presented in the review. The entire pass works at module level and performs following things:

  • create a dummy function with some instructions;
  • move some other instructions from regular functions to that dummy function;
  • rename all registers globally, each new register name is unique in a whole module.

I don't think that operating at global level (copying instructions from one function to another and global naming of registers) can be implemented using the AsmPrinter (please correct me if I'm wrong). However we can try to move some functionality to AsmPrinter and make MIR consistent after this pass, which solves the verifier issue.

llvm/test/CodeGen/SPIRV/metadata-opencl12.ll
1 ↗(On Diff #396830)

Yes, I agree.

llvm/test/CodeGen/SPIRV/optnone.ll
6 ↗(On Diff #396830)

Actually this is a positive test with the confusing checker name CHECK-SPIRV-NO-EXT. It was copied form SPIRV Translator test base as many of our LIT tests (sometimes with simplification). I suppose it is not a good candidate for this series, so I'll find another one instead.

The issues noted by Fangrui Song and Matt Arsenault have been eliminated.

iliya-diyachkov marked 7 inline comments as done.Jan 22 2022, 2:34 PM

This change makes LIT tests passed on LLVM which is built with -DLLVM_ENABLE_EXPENSIVE_CHECKS=ON -DLLVM_ENABLE_ABI_BREAKING_CHECKS=ON.

Correct the code format.

I'm still not convinced that these passes are the right way to go, but I won't block if others think it has merit, even if just temporary, until the proper implementation emerges.

We don't need these patches to be perfect, just good enough to generate some instructions and tell us that the direction the target is going is in sync with what we think should be.

If this is something that can be changed later, I'd be fine with it.

@arsenm @MaskRay

I'm still not convinced that these passes are the right way to go, but I won't block if others think it has merit, even if just temporary, until the proper implementation emerges.

We don't need these patches to be perfect, just good enough to generate some instructions and tell us that the direction the target is going is in sync with what we think should be.

If this is something that can be changed later, I'd be fine with it.

@arsenm @MaskRay

@rengolin what would be considered as a proper implementation? Currently the only reasonable way this can be modified is by moving the hoisting and deduplication logic to the printing/emission stage, and it's difficult to see another alternative due to the gap between SPIR-V and MIR in this matter - obviously MIR is not able to represent arbitrary instructions not belonging to a MachineFunction (and I don't even think it should be). So it seems the activity performed by these passes should be moved to a level lower than MIR, but for now it looks like that's the only major enhancement that could be done here.

@rengolin what would be considered as a proper implementation?

I don't think I have quite understood what the problem is, tbh.

If this is just some global value numbering, why is it being done so late (ie, not in IR)?

If this is a SPIR-V hardware requirement, that needs a specific guaranteed shape that cannot rely on the state of IR->MIR conversions (especially the generic parts), then there may not be another way out.

But I'll leave for others to express their opinions, as I'm not familiar with the problem.

zuban32 added a comment.EditedFeb 2 2022, 9:49 AM

@rengolin what would be considered as a proper implementation?

I don't think I have quite understood what the problem is, tbh.

If this is just some global value numbering, why is it being done so late (ie, not in IR)?

If this is a SPIR-V hardware requirement, that needs a specific guaranteed shape that cannot rely on the state of IR->MIR conversions (especially the generic parts), then there may not be another way out.

But I'll leave for others to express their opinions, as I'm not familiar with the problem.

Then let me describe the problem in more details, I'd like us all to be on the same page with this.

This patch adds 2 passes SPIRVGlobalTypesAndRegNumPass and SPIRVBlockLabelerPass.

The latter seems to be a minor problem - the reason for it to exist is SPIRV binary format requiring global numbering of all the defined values, i.e. if there're 2 functions each containing only 1 instruction then these instructions can't be both numbered %0.

The former one attempts to solve a bigger problem. Basically the SPIRV binary is structured as following:

...
types, constants
...
functions definitions

Let me provide a short example in SPIR-V:

%0 = OpTypeInt 32
%1 = OpTypeInt 64
%2 = OpConstant %1, 5

%3 = OpFunction %1 ...
%4 = OpFunctionParameter %0
%5 = OpFunctionParameter %1
%6 = OpUConvert %1 %4
%7 = OpIAdd %1 %6 %5
%8 = OpIMul %1 %7 %2
OpReturnValue %8
OpFunctionEnd

%9 = OpFunction %1 ...
%10 = OpFunctionParameter %0
%11 = OpFunctionParameter %1
%12 = OpUConvert %1 %10
%13 = OpISub %1 %12 %11
%14 = OpIMul%1 %13 %2
OpReturnValue %14
OpFunctionEnd

The LLVM IR counterparts for the functions above are the following:

define i64 @foo(i32 %a, i64 %b)
  %0 = i64 zext i32 %a to i64
  %1 = i64 add %0, %b
  %2 = i64 mul %1, 5
  ret %2

define i64 @bar(i32 %a, i64 %b)
  %0 = i64 zext i32 %a to i64
  %1 = i64 sub %0, %b
  %2 = i64 mul %1, 5
  ret %2

In the example above you can see that OpType*, OpConstant are out of any function's scope, they're global to the whole module, and this can't be natively expressed in MIR. Because of that in our approach each MachineFunction has its own copies of type decls and constants (and other stuff like that) during the whole translation process, but eventually they need to be deduplicated and 'globalized'. That can be done either on MIR level or during binary emission, but it has done be done anyway.

Hope this helps

Then let me describe the problem in more details, I'd like us all to be on the same page with this.

Awesome! This is much clearer to me, thanks!

The latter seems to be a minor problem - the reason for it to exist is SPIRV binary format requiring global numbering of all the defined values, i.e. if there're 2 functions each containing only 1 instruction then these instructions can't be both numbered %0.

Do you have a maximum ID? If not, than in theory, you just need a static global auto-increment ID (like a get_next_id() on some target emission class) and off you go emitting globally unique IDs.

The former one attempts to solve a bigger problem. Basically the SPIRV binary is structured as following:

...
types, constants
...
functions definitions

I see, so your "global" function contain the type and constant declarations, that the functions use in their own declarations.

So, you're not "moving" anything, just creating a placeholder for those definitions and indexing them.

in our approach each MachineFunction has its own copies of type decls and constants (and other stuff like that) during the whole translation process, but eventually they need to be deduplicated and 'globalized'. That can be done either on MIR level or during binary emission, but it has done be done anyway.

IIUC, they're in each function because you generate for each function you pass through, so:

func1
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
  ...
func2
  %0 = OpTypeFloat 32
  %1 = OpTypeFloat 64
  ...

and then you need to get all types of the same "type" and call a single ID.

So, these passes seem *necessary* if you want a lean and tidy representation, but not necessary if you can cope with duplicate definitions while still keeping unique IDs.

For example, using the auto-increment proposal above, we could have:

func1
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
  ...
func2
  %2 = OpTypeFloat 32
  %3 = OpTypeInt 64
  ...

Note two definitions of Int 64 but with two different IDs. Would that be a binary format error for (native?) types? Would that be an error for constants, too?

If each type can only be defined once, you could have a list of all possible types and defined them in the beginning (in your "global" function), from %0 to %N. Then, while emitting each function, you just use those IDs and you don't need to deduplicate.

Of course, this doesn't work with constants, but I'm guessing it's more likely that, if there is a rule of single-declaration for types, there may not be for constants (otherwise compilers would suffer regardless of representation).

In that sense, you wouldn't need either pass, just initialisation of the known types, declaration of the global function with them, and reuse in each function, using the global auto-increment for all the IDs.

Does that fit with the spec?

iliya-diyachkov added a comment.EditedFeb 2 2022, 1:51 PM

Do you have a maximum ID? If not, than in theory, you just need a static global auto-increment ID (like a get_next_id() on some target emission class) and off you go emitting globally unique IDs.

Actually we need to output Bound (not implemented in this series) so all IDs should satisfy "0 < id < Bound". I need to note that according to the SPIR-V specification "Bound should be small, smaller is better, with all IDs in a module being densely packed and near 0".

I would say that we use a numbering similar to described get_next_id() but inside this pass. We store its results in a map and substitute them during emission. I suppose numbering on emission with get_next_id() would also use Register - ID mapping to give same IDs to the same registers.

Note two definitions of Int 64 but with two different IDs. Would that be a binary format error for (native?) types? Would that be an error for constants, too?

Yes for types. From the specification: "Two different type IDs form, by definition, two different types. It is invalid to declare multiple nonaggregate, non-pointer type IDs having the same opcode and operands..."

As I know constant duplication is not forbidden, but such duplication looks like a bad solution since the Bound of IDs is recommended to be small. Also our reference tool (SPIRV-LLVM-Translator) does not duplicate constants as possible, and we would like to be conformant with it so the translator users will get similar results with the SPIRV LLVM backend (on transition to this tool).

If each type can only be defined once, you could have a list of all possible types and defined them in the beginning (in your "global" function), from %0 to %N. Then, while emitting each function, you just use those IDs and you don't need to deduplicate.

"a list of all possible types" would be very long because it includes not only simple types but also pointer types, composite types, builtin types... And please remember about densely packed IDs: types also have IDs and we don't want to have gaps in ID space due to type IDs unused in a module.

In addition to types and constants, we need to put global variables, external function declaration, annotations, debug info and some other instructions to that "global" function. All this is presented in the form of instructions with their register operands. These instructions should be only in the global function with no duplication in regular functions (however, there are a few exceptions)!

All these instructions are generated in regular functions at first, and we need a pass (running on module) to copy them to the "global" function. Also we avoid duplication of all of the instructions where possible (as SPIRV-LLVM-Translator does too).

In the last implementation we copy required instructions to the "global" function in SPIRVGlobalTypesAndRegNum pass. We keep in the regular functions the instructions (that were copied to the "global" function) till emission but do not emit them (while the instructions' copies in the "global" function are emitted). Also we number all registers globally avoiding duplication. These global register numbers are stored in a "register alias" map. On emission we output IDs from the map instead of the old local register numbers.

In this way MIR is still consistent after the pass (unlike we had in the earlier implementation) and verifier does not output errors. If it's desirable I can provide MIR/SPIR-V dumps demonstrating how it works on specific tests.

zuban32 added a comment.EditedFeb 2 2022, 1:52 PM

Then let me describe the problem in more details, I'd like us all to be on the same page with this.

Awesome! This is much clearer to me, thanks!

The latter seems to be a minor problem - the reason for it to exist is SPIRV binary format requiring global numbering of all the defined values, i.e. if there're 2 functions each containing only 1 instruction then these instructions can't be both numbered %0.

Do you have a maximum ID? If not, than in theory, you just need a static global auto-increment ID (like a get_next_id() on some target emission class) and off you go emitting globally unique IDs.

The former one attempts to solve a bigger problem. Basically the SPIRV binary is structured as following:

...
types, constants
...
functions definitions

I see, so your "global" function contain the type and constant declarations, that the functions use in their own declarations.

So, you're not "moving" anything, just creating a placeholder for those definitions and indexing them.

in our approach each MachineFunction has its own copies of type decls and constants (and other stuff like that) during the whole translation process, but eventually they need to be deduplicated and 'globalized'. That can be done either on MIR level or during binary emission, but it has done be done anyway.

IIUC, they're in each function because you generate for each function you pass through, so:

You're right, we want our MIR to pass the verification on all the stages. But in SPIR-V every instruction producing a value has its resulting type as an operand so types should be already defined. And since we can't emit MIR instructions out of a function scope (and can't use machine operands of instructions from another functions each MF has its own copies of the types/consts/etc used within its body. Since in the final binary those types/consts/etc should be in the global scope we create some placeholder MF for this purpose, and since these items may be duplicated in different functions so we'd like to deduplicate them during the hoisting.

func1
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
  ...
func2
  %0 = OpTypeFloat 32
  %1 = OpTypeFloat 64
  ...

and then you need to get all types of the same "type" and call a single ID.

So, these passes seem *necessary* if you want a lean and tidy representation, but not necessary if you can cope with duplicate definitions while still keeping unique IDs.

For example, using the auto-increment proposal above, we could have:

func1
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
  ...
func2
  %2 = OpTypeFloat 32
  %3 = OpTypeInt 64
  ...

Note two definitions of Int 64 but with two different IDs. Would that be a binary format error for (native?) types? Would that be an error for constants, too?

If each type can only be defined once, you could have a list of all possible types and defined them in the beginning (in your "global" function), from %0 to %N. Then, while emitting each function, you just use those IDs and you don't need to deduplicate.

Of course, this doesn't work with constants, but I'm guessing it's more likely that, if there is a rule of single-declaration for types, there may not be for constants (otherwise compilers would suffer regardless of representation).

In that sense, you wouldn't need either pass, just initialisation of the known types, declaration of the global function with them, and reuse in each function, using the global auto-increment for all the IDs.

Does that fit with the spec?

Well, in fact the space allows multiple definitions of both types (my bad, types are not allowed as Illiya mentioned above) and constants. But that may lead to a huge binary size increase and such a suboptimal codegen quality will become an obstacle to convincing people use our backend instead of the existing translator.

I didn't mean the duplicated types/constants would survive into codegen, but I get your meaning, it would have to be commoned up at some point.

So, what about a singleton emitter for types and constants?

Instead of just get_next_id() you could have a get_type(...) and get_constant(...) that returns a cached MIR node if it has created a similar one already, or creates a new one, caches and returns it.

In this way, there's only ever one copy of each and they're only generated once, with no need to deduplicate.

The only issue with this implementation is that types, constants, functions, instructions will all have intertwined IDs, but never refer to larger IDs, by design.

When emitting "int foo()", if int hadn't been emitted before, it'd get the next id (say, %23) and then foo would get %24 and would refer to %23 as its type.

Eventually, all types used in the program would have been emitted already and be brought up from the cache every time.

This way, there's no need to renumber or common up and the space is still optimal.

What am I missing?

I didn't mean the duplicated types/constants would survive into codegen, but I get your meaning, it would have to be commoned up at some point.

So, what about a singleton emitter for types and constants?

Instead of just get_next_id() you could have a get_type(...) and get_constant(...) that returns a cached MIR node if it has created a similar one already, or creates a new one, caches and returns it.

In this way, there's only ever one copy of each and they're only generated once, with no need to deduplicate.

The only issue with this implementation is that types, constants, functions, instructions will all have intertwined IDs, but never refer to larger IDs, by design.

When emitting "int foo()", if int hadn't been emitted before, it'd get the next id (say, %23) and then foo would get %24 and would refer to %23 as its type.

Eventually, all types used in the program would have been emitted already and be brought up from the cache every time.

This way, there's no need to renumber or common up and the space is still optimal.

What am I missing?

Which stage are you talking about here: InstPrinter, any post-InstructionSelection pass, etc? On gMIR level I don't understand how that'd work if we define VReg %1 in func f1 and then use it in both f1 and f2.

Which stage are you talking about here: InstPrinter, any post-InstructionSelection pass, etc? On gMIR level I don't understand how that'd work if we define VReg %1 in func f1 and then use it in both f1 and f2.

Whatever level you were planning to do the de-duplication. My point is just to replace de-duplication by enforcement (as post-generation pass) to de-duplication by construction (a singleton generator).

Though, I am not super familiar with MIR's representation, as that was introduced after I stopped working on back-ends directly, so don't take my word on that.

Perhaps @arsenm can chime in?

So, what about a singleton emitter for types and constants?

Instead of just get_next_id() you could have a get_type(...) and get_constant(...) that returns a cached MIR node if it has created a similar one already, or creates a new one, caches and returns it.

In this way, there's only ever one copy of each and they're only generated once, with no need to deduplicate.

I agree that this is a natural desire to use some caching in this situation. But I join Alexander in the question how the proposed caching would work at module (cross-function) level? As I understand we can define %type1 register in func1 and refer to the register inside func1, but it's illegal to refer %type1 in func2 without a local definition of %type1 (let me remind that in SPIR-V the type definition is a regular MIR instruction which produces %type register).

In fact we do map types, constants, global vars and external function decls but at local function level (however I did not include this implementation to this series, replacing this with simple type/const instruction copying in extractInstructionsWithGlobalRegsToMetablockForMBB for now, in the next patches it will be changed to the map using). In llvm/lib/Target/SPIRV/SPIRVGlobalRegistry.h you can find the interface getOrCreateSPIRVType, which returns either existing OpType*** instruction or create new one. Then in GlobalTypesAndRegNumPass we walk the map and create only required type/const/var/fdecl instructions in the "global" function.
Anyway in this approach we need to leave the instructions' copies in regular functions to pass the verifier (in LLVM build with -DLLVM_ENABLE_EXPENSIVE_CHECKS=on -DLLVM_ENABLE_ABI_BREAKING_CHECKS=on).

Taking into account the implemented mapping now we keep duplicates at function level to make MIR consistent. For example:

After generation:

func1
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
  ...
  %4 = OpIAdd %1 %2 %3
  ...
func2
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
  ...
  %4 = OpIMul %1 %2 %3
  ...

After GlobalTypesAndRegNumPass:

global_func:
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
...
func1
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
  ...
  %4 = OpIAdd %1 %2 %3
  ...
func2
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
  ...
  %4 = OpIMul %1 %2 %3
  ...

After emitting:

global_func:
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
...
func1
  ...
  %4 = OpIAdd %1 %2 %3
  ...
func2
  ...
  %7 = OpIMul %1 %5 %6
  ...

And as far as I understand your suggestion this de-duplicated variant will break the verifier (after generation):

func1
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
  ...
  %4 = OpIAdd %1 %2 %3
  ...
func2
  // (no definition of %1)
  ... 
  %4 = OpIMul %1 %2 %3
  ...
rengolin added a comment.EditedFeb 3 2022, 4:25 AM

And as far as I understand your suggestion this de-duplicated variant will break the verifier (after generation):

func1
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
  ...
  %4 = OpIAdd %1 %2 %3
  ...
func2
  // (no definition of %1)
  ... 
  %4 = OpIMul %1 %2 %3
  ...

No, I'm suggestion you emit on every function (as is functionally correct) but reuse the same ID, and keep that same ID on global_function.

The singleton is just caching the VReg numbers, not the actual MNodes.

Code emission will be the same as what you have today after that.

I agree that this is a natural desire to use some caching in this situation. But I join Alexander in the question how the proposed caching would work at module (cross-function) level? As I understand we can define %type1 register in func1 and refer to the register inside func1, but it's illegal to refer %type1 in func2 without a local definition of %type1 (let me remind that in SPIR-V the type definition is a regular MIR instruction which produces %type register).

By just re-declaring each used type/constant in each subsequent function with the same VReg ID.

In your example:

After generation:

func1
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
  ...
  %4 = OpIAdd %1 %2 %3
  ...
func2
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
  ...
  %4 = OpIMul %1 %2 %3
  ...

After GlobalTypesAndRegNumPass:

global_func:
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
...
func1
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
  ...
  %4 = OpIAdd %1 %2 %3
  ...
func2
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
  ...
  %4 = OpIMul %1 %2 %3
  ...

To have the same MIR in your second example, straight from generation, you have to follow these steps:

  • First time you see, generate a new type (OpTypeInt 32) with a new auto-increment ID, cache the ID for the original type, emit on global_func and *also* on the function you're generating.
  • Every other time you see, get the cached VReg ID and re-emit (a new MNode) using that same ID.

In your example:

  • Inside func1 you generate a new type (OpTypeInt 32) and call it %0, cache %0 for i32, emit on global_func and *also* on func1.
  • Inside func2 you get the cached version (of i32 as %0) and generate it *again* only on func2 (on a new Node, using %0). global_func already have it's the same VReg number on all previous functions.

If you use a global auto-increment next_id(), there's no way VReg IDs for operations in any function will clash with previously defined types and constants (in previous functions).

The VReg numbers on the later functions will be (numerically) large, but the number of different IDs in each function will still be the same, so code generation should be equally efficient.

And as far as I understand your suggestion this de-duplicated variant will break the verifier (after generation):

func1
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
  ...
  %4 = OpIAdd %1 %2 %3
  ...
func2
  // (no definition of %1)
  ... 
  %4 = OpIMul %1 %2 %3
  ...

No, I'm suggestion you emit on every function (as is functionally correct) but reuse the same ID, and keep that same ID on global_function.

The singleton is just caching the VReg numbers, not the actual MNodes.

Code emission will be the same as what you have today after that.

I agree that this is a natural desire to use some caching in this situation. But I join Alexander in the question how the proposed caching would work at module (cross-function) level? As I understand we can define %type1 register in func1 and refer to the register inside func1, but it's illegal to refer %type1 in func2 without a local definition of %type1 (let me remind that in SPIR-V the type definition is a regular MIR instruction which produces %type register).

By just re-declaring each used type/constant in each subsequent function with the same VReg ID.

In your example:

After generation:

func1
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
  ...
  %4 = OpIAdd %1 %2 %3
  ...
func2
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
  ...
  %4 = OpIMul %1 %2 %3
  ...

After GlobalTypesAndRegNumPass:

global_func:
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
...
func1
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
  ...
  %4 = OpIAdd %1 %2 %3
  ...
func2
  %0 = OpTypeInt 32
  %1 = OpTypeInt 64
  ...
  %4 = OpIMul %1 %2 %3
  ...

To have the same MIR in your second example, straight from generation, you have to follow these steps:

  • First time you see, generate a new type (OpTypeInt 32) with a new auto-increment ID, cache the ID for the original type, emit on global_func and *also* on the function you're generating.
  • Every other time you see, get the cached VReg ID and re-emit (a new MNode) using that same ID.

In your example:

  • Inside func1 you generate a new type (OpTypeInt 32) and call it %0, cache %0 for i32, emit on global_func and *also* on func1.
  • Inside func2 you get the cached version (of i32 as %0) and generate it *again* only on func2 (on a new Node, using %0). global_func already have it's the same VReg number on all previous functions.

If you use a global auto-increment next_id(), there's no way VReg IDs for operations in any function will clash with previously defined types and constants (in previous functions).

The VReg numbers on the later functions will be (numerically) large, but the number of different IDs in each function will still be the same, so code generation should be equally efficient.

Not sure how we can control the VReg IDs emitted for the types - it seems it'd require customizing IRTranslator which is not allowed by the current GlobalISel design. Am I missing something?

Not sure how we can control the VReg IDs emitted for the types - it seems it'd require customizing IRTranslator which is not allowed by the current GlobalISel design. Am I missing something?

As I said, I haven't worked with MIR directly (it was after my time on back-ends), so very likely I am the one missing context.

I don't mind this as it is on an experimental target, but it would be nice to not have to do that in the long run (though, it wouldn't be tragic either).

If @arsenm and @MaskRay are happy with this patch, I'm happy too. I'll let them approve this one.

If you use a global auto-increment next_id(), there's no way VReg IDs for operations in any function will clash with previously defined types and constants (in previous functions).

The VReg numbers on the later functions will be (numerically) large, but the number of different IDs in each function will still be the same, so code generation should be equally efficient.

Thanks for the explanation, now I see the whole idea and I like it in general. The following example illustrates its work.

After func1 translation:

global_func
  %0 = OpTypeConstGVarFDecl1

func1
  %0 = OpTypeConstGVarFDecl1
  ...
  %3 = OpIAdd %0 %1 %2
  ...
  %N = ...

After func2 translation:

global_func
  %0 = OpTypeConstGVarFDecl1
  %N+1 = OpTypeConstGVarFDecl2

func2
  %0 = OpTypeConstGVarFDecl1
  %N+1 = OpTypeConstGVarFDecl2
  ...
  %N+4 = OpIMul %0 %N+2 %N+3
  %N+7 = OpIAdd %N+1 %N+5 %N+6
  ...
  %M = ...

And so on... However I see several issues:

  1. We probably need to operate with global_func and the current function (func1 or func2 in the example) simultaneously when we generate a new type. I'm not sure is it possible in a MachineFunctionPass.

Perhaps we would introduce a new ModulePass which generates most of type/const/gvar/fdecl instrs. But some of the instrs can be generated in CallLowering and InstructionSelector which are not module-level, so we could not modify both global_func and the current function.

  1. As Aleksander wrote some virtual registers in regular functions are generated by IRTranslator and we don't directly control their numbers (normally it starts from %0).

We could use some weird (or "non-statutory") approaches for setting first NumVirtRegs in each next regular function so it will generate %N+1, %N+2 as in example. I have just implemented such one without changes in target-independent code. But since this is a non-standard thing, we cannot be sure that it will work always and reliably.

Either we could make a new method in IRTranslator that allows to start NumVirtRegs from desirable number, but it will be a target independent feature and its justification (it is unlikely that it will ever be needed for other targets), review and commit may take a noticeable time.

  1. This may require a noticeable rework of the current implementation (I don't mean this patch series, but the entire backend that is in Khronos repository, because all changes we do in this patch series are implemented in the main code base at first with checking for no degradation on >200 LITs and OpenCL CTS).

So if we don't have obvious solutions for the 1st and 2nd issues, I would add this suggestion to the TODO list (we'll take it in mind during further code revisions) and go forward with the existing GlobalTypesAndRegNumPass.

So if we don't have obvious solutions for the 1st and 2nd issues, I would add this suggestion to the TODO list (we'll take it in mind during further code revisions) and go forward with the existing GlobalTypesAndRegNumPass.

Agreed. With that in mind, this looks good to me.

I'll let the other reviewers have the final approval, just to make sure we all agree.

Thanks!

iliya-diyachkov retitled this revision from [SPIRV 6/6] Add 2 essential passes and the simplest tests to [SPIRV 6/6] Add one essential pass and the simplest tests.
iliya-diyachkov edited the summary of this revision. (Show Details)

SPIRVBlockLabeler pass was removed and its functionality migrated to SPIRVAsmPrinter. The change was inspired by Renato's suggestion about get_next_id().

arsenm added inline comments.Feb 7 2022, 3:44 PM
llvm/lib/Target/SPIRV/SPIRVGlobalTypesAndRegNumPass.cpp
23 ↗(On Diff #396830)

This sounds like a function of emission to me. I don't see why you need a dummy MIR function or move anything for the purpose of having a global value. You could just emit the global values directly where they need to go as part of the emission of the individual operation.

The AsmPrinter is a module pass, so it's possible to operate globally

arsenm added inline comments.Feb 7 2022, 3:55 PM
llvm/lib/Target/SPIRV/SPIRVGlobalTypesAndRegNumPass.cpp
23 ↗(On Diff #396830)

Basically the dummy function feels like an awkward hack of SPIR-V's constraints into MIR. It would be more natural to handle the structural differences in representation when actually producing the final SPIR-V

zuban32 added inline comments.Feb 8 2022, 7:18 AM
llvm/lib/Target/SPIRV/SPIRVGlobalTypesAndRegNumPass.cpp
23 ↗(On Diff #396830)

The AsmPrinter is a module pass, so it's possible to operate globally

But it seems currently AsmPrinter is a MachineFunctionPass

zuban32 added inline comments.Feb 9 2022, 4:35 PM
llvm/lib/Target/SPIRV/SPIRVGlobalTypesAndRegNumPass.cpp
23 ↗(On Diff #396830)

Ok, now I see, AsmPrinter still has emit* methods operating on the whole module (but it's still a MachineFunctionPass, isn't there some design flaw?)

arsenm added inline comments.Feb 9 2022, 5:09 PM
llvm/lib/Target/SPIRV/SPIRVGlobalTypesAndRegNumPass.cpp
23 ↗(On Diff #396830)

The final module necessarily requires emitting module level directives of some kind. This is fine as long as you aren't relying on changes in other functions when emitting another

iliya-diyachkov added a comment.EditedFeb 14 2022, 1:32 PM

This sounds like a function of emission to me. I don't see why you need a dummy MIR function or move anything for the purpose of having a global value. You could just emit the global values directly where they need to go as part of the emission of the individual operation.

The AsmPrinter is a module pass, so it's possible to operate globally
Basically the dummy function feels like an awkward hack of SPIR-V's constraints into MIR. It would be more natural to handle the structural differences in representation when actually producing the final SPIR-V

Thanks for your explanation, Matt. I agree that a dummy "global" function looks like a kind of hack, and working with the function is not elegant and not very convenient.

Believing that AsmPrinter can be used as a module pass, I have implemented a prototype of moved GlobalTypesAndRegNum functionality to AsmPrinter with partial success. And I have found this issue.

If GlobalTypesAndRegNum pass is removed (it was a global pass and the last pass before AsmPrinter), we have following sequence of passes/calls for a program with N functions:

SPIRVAsmPrinter::doInitialization 
    AsmPrinter::emitStartOfAsmFile

func1:
SPIRVGISelPass1 ... SPIRVGISelPassLast
AsmPrinter::runOnMachineFunction
    SPIRVAsmPrinter::emitFunctionHeader
        emit func1's instructions by SPIRVAsmPrinter::emitInstruction
***
funcN:
SPIRVGISelPass1...SPIRVGISelPassLast
AsmPrinter::runOnMachineFunction
    SPIRVAsmPrinter::emitFunctionHeader
        emit funcN's instructions by SPIRVAsmPrinter::emitInstruction

SPIRVAsmPrinter::emitEndOfAsmFile

I.e. at the moment of AsmPrinter::runOnMachineFunction invocation for func1 we have not whole MIR and cannot number registers globally and output global instruction at the beginning of the module but func1's instructions should be emitted.

In this pass sequence I don't see a point where all functions' MIR is available in AsmPrinter and the first func1 is not emitted yet. Therefore global register numbering and the global instruction output cannot be applied correctly (on the entire MIR).

On other hand if I leave an empty global pass as a last of post GISel passes, the sequence is different:

SPIRVAsmPrinter::doInitialization
    AsmPrinter::emitStartOfAsmFile

func1:
SPIRVGISelPass1 ... SPIRVGISelPassLast
***
funcN:
SPIRVGISelPass1 ... SPIRVGISelPassLast

SPIRVDummyGlobalPass

func1:
AsmPrinter::runOnMachineFunction
    SPIRVAsmPrinter::emitFunctionHeader
        emit func1's instructions by SPIRVAsmPrinter::emitInstruction
***
funcN:
AsmPrinter::runOnMachineFunction
    SPIRVAsmPrinter::emitFunctionHeader
        emit funcN's instructions by SPIRVAsmPrinter::emitInstruction

SPIRVAsmPrinter::emitEndOfAsmFile

Now I can run the functionality ported from GlobalTypesAndRegNum on the first invocation of AsmPrinter::runOnMachineFunction and get correct register numbering and global instruction output. Using this way the backend has passed all our LIT tests.

@arsenm, when you suggested moving GlobalTypesAndRegNum' operations to AsmPrinter, did you implied that we need to insert SPIRVDummyGlobalPass after all SPIRVGISel passes to have whole MIR at the first run of AsmPrinter? Or is there another approach (or some pass configuration) to achieve the correct processing of the whole MIR in AsmPrinter?

@arsenm, when you suggested moving GlobalTypesAndRegNum' operations to AsmPrinter, did you implied that we need to insert SPIRVDummyGlobalPass after all SPIRVGISel passes to have whole MIR at the first run of AsmPrinter? Or is there another approach (or some pass configuration) to achieve the correct processing of the whole MIR in AsmPrinter?

[I'm not Matt but] to me it seems like a hack which is not much better that the one we used to have with the MIR pass. I suppose the passes ordering would be the same if we just made the AsmPrinter a module pass instead of adding a dummy one, not sure about the potential consequences in the other targets though. @arsenm what do you think?

iliya-diyachkov added a comment.EditedFeb 14 2022, 2:50 PM

@arsenm, when you suggested moving GlobalTypesAndRegNum' operations to AsmPrinter, did you implied that we need to insert SPIRVDummyGlobalPass after all SPIRVGISel passes to have whole MIR at the first run of AsmPrinter? Or is there another approach (or some pass configuration) to achieve the correct processing of the whole MIR in AsmPrinter?

[I'm not Matt but] to me it seems like a hack which is not much better that the one we used to have with the MIR pass. I suppose the passes ordering would be the same if we just made the AsmPrinter a module pass instead of adding a dummy one, not sure about the potential consequences in the other targets though. @arsenm what do you think?

Thanks for your opinion, Aleksandr. I think this would be a smaller hack than we have now. I'll share the rough implementation in Khronos soon, so (since you know the full implementation of SPIRVGlobalTypesAndRegNumPass.cpp) you can appreciate the efficiency of this approach. In short, using it, we don't need to

  1. create the dummy "global" function,
  2. copy global MIs from real functions to the "global" one,
  3. make MIR consistent in the "global" and real functions after 1 and 2.

Also the implementation becomes shorter and looks simpler and more natural.

What about making AsmPrinter a module pass, it's a big target-independent change, I'm not sure that it can be justified only by the fact that the SPIRV experimental backend can get some advantage.

@arsenm, when you suggested moving GlobalTypesAndRegNum' operations to AsmPrinter, did you implied that we need to insert SPIRVDummyGlobalPass after all SPIRVGISel passes to have whole MIR at the first run of AsmPrinter? Or is there another approach (or some pass configuration) to achieve the correct processing of the whole MIR in AsmPrinter?

If you can emit this global information at the end of the printed module instead of the start, that would be easiest. If you really have to emit it at the top, you may be able to get away with something similar to what AMDGPU does for reporting total register counts used in the presence of indirect function calls. AMDGPUAsmPrinter depends on a module analysis, AMDGPUResourceUsageAnalysis. This effectively forces all functions to be codegened before final emission (which does have an associated compile time / memory cost)

@arsenm, when you suggested moving GlobalTypesAndRegNum' operations to AsmPrinter, did you implied that we need to insert SPIRVDummyGlobalPass after all SPIRVGISel passes to have whole MIR at the first run of AsmPrinter? Or is there another approach (or some pass configuration) to achieve the correct processing of the whole MIR in AsmPrinter?

If you can emit this global information at the end of the printed module instead of the start, that would be easiest. If you really have to emit it at the top, you may be able to get away with something similar to what AMDGPU does for reporting total register counts used in the presence of indirect function calls. AMDGPUAsmPrinter depends on a module analysis, AMDGPUResourceUsageAnalysis. This effectively forces all functions to be codegened before final emission (which does have an associated compile time / memory cost)

Thanks for the advice, Matt! The SPIR-V specification requires the info at the beginning of the module.

So we need to implement SPIRVModuleAnalysis, make SPIRVAsmPrinter to be depended on it and use the analysis results in the first invocation of AsmPrinter::runOnMachineFunction to output global instructions before all real functions.

In general the approach works fine, but if we have no functions at module, SPIRVAsmPrinter::runOnMachineFunction() is not called and we have to output global instructions later, during AsmPrinter::doFinalization(). At this point, calling getAnalysis<SPIRVModuleAnalysis>() fails because Resolver (in Pass::getAnalysisID(AnalysisID PI)) cannot find SPIRVModuleAnalysis.

The issue can be solved by storing the analysis results in a static member of SPIRVModuleAnalysis. In such implementation, the results are available both in SPIRVAsmPrinter::runOnMachineFunction() and in SPIRVAsmPrinter::doFinalization() and we don't even need to call getAnalysis<SPIRVModuleAnalysis>() in SPIRVAsmPrinter.

I think this small trick is acceptable for the experimental backend if there is no other way to get the analysis results in doFinalization().

iliya-diyachkov retitled this revision from [SPIRV 6/6] Add one essential pass and the simplest tests to [SPIRV 6/6] Add the module analysis pass and the simplest tests.
iliya-diyachkov edited the summary of this revision. (Show Details)

The functionality of SPIRVGlobalTypesAndRegNum pass was moved to SPIRVModuleAnalysis pass and extended AsmPrinter.

The formatting issue is fixed.

I think all noticed issues are fixed in this patch, as well as in others in the series. The patches were rebased to the latest LLVM version (0712c575b90a7eb508bf43d15c38c1c0b0d69695, Fri Feb 18). Also the regular and special builds (suggested by Fengrui Song in the first patch) were passed without new fails in LIT tests.

@arsenm @MaskRay could you please review the updated patches again.

Move MS static array inside struct ModuleAnalysisInfo.

Herald added a project: Restricted Project. · View Herald TranscriptMar 5 2022, 4:10 PM

@arsenm @MaskRay could you please review the patches again?

As I wrote earlier, all issues were fixed in all patches of the series. The day before yesterday I rebased all patches to the latest LLVM version (17a68065c378, Sun Mar 6) and checked the regular and special builds (suggested in the first patch).

Fixed typos in comments.

Update after the last changes in the previous patches.

Update after the last changes in the 3th and 4th patches.

iliya-diyachkov updated this revision to Diff 415659.EditedMar 15 2022, 6:40 PM

Update after the last change in the 5th patch.

The format issue is fixed.

Gentle ping for final review and acceptance :)

Hi @arsenm, could you please take a look at the 4th, 5th, and 6th patch and let us know whether your suggestions were resolved? We would like to kindly ask for "LGTM".

Gentle ping.

@arsenm @MaskRay could you please continue or finish the review of the 4th, 5th and 6th patches? As I wrote earlier, all the issues were fixed in all patches of the series.

I updated LLVM to today's revision 73777b4c35a3 and retested all recommended builds. All of them are passed.

The patch has been updated after the changes in 4th and 5th ones. Also, "unsigned int"s were replaced with "unsigned"s, and missed dots were inserted into comments, as Fangrui recommended in the 4th patch.

Update after the change in the 3rd patch.

arsenm added inline comments.Mar 28 2022, 1:43 PM
llvm/lib/Target/SPIRV/SPIRVAsmPrinter.cpp
43

Might as well use SPIRVSubtarget?

49

getSubtarget<SPIRVSubtarget>() instead of static_cast

121

Can't have global constructors

139

Typo correspnding

140

static?

164

If this isn't checked by the verifier (which would be preferable), should be report_fatal_error

llvm/lib/Target/SPIRV/SPIRVModuleAnalysis.cpp
162

I don't think this can fail at this point? If not, should early continue or assert and reduce indentation

201

Ditto

llvm/lib/Target/SPIRV/SPIRVAsmPrinter.cpp
43

You probably mean removing STI and using OutContext.getSubtargetInfo().

49

As I see we need to have either MachineFunction or Function to call getSubtarget<SPIRVSubtarget>().
However in SPIRVAsmPrinter ST is required even if there are no functions in the module. I don't see any other way than to use static_cast as some other targets do (e.g. in NVPTXAsmPrinter::emitStartOfAsmFile or in AVRAsmPrinter::emitStartOfAsmFile). If you know, please advise.

121

I'll move BBNumToRegMap into struct ModuleAnalysisInfo.

140

getOrCreateMBBRegister() was not static since it was called in SPIRVMCInstLower.cpp (outside of this module). I'll make getOrCreateMBBRegister() a member of struct ModuleAnalysisInfo as well as BBNumToRegMap.

164

I believe that it is not checked. I'll add the appropriate TODO and change llvm_unreachable to report_fatal_error.

llvm/lib/Target/SPIRV/SPIRVModuleAnalysis.cpp
162

This can fail if F is a declaration. I'll add early continue.

iliya-diyachkov marked 8 inline comments as done.Mar 28 2022, 6:33 PM

The patch was updated after the last changes in the 5th one. Also ModuleSectionType and ModuleAnalysisInfo were placed in the SPIRV namespace.

The changes in SPIRVUtils from the 6th patch were moved to the 5th patch. The last case of "unsigned int" was replaced with "unsigned".

@arsenm, @MaskRay, I think all the issues have been fixed. Could you please take the last look at the patches?

This one still has some unactioned comments...

This one still has some unactioned comments...

@rengolin, do you mean that some issues pointed out by Matt have not been fixed? I thought everything was done, except for the suggestion to use getSubtarget<SPIRVSubtarget>() instead of static_cast in SPIRVAsmPrinter constructor. I explained that it was not clear how to efficiently apply this suggestion to our case.

Apologies, it seems I was looking into a very old version of the review!

This looks good to me. @arsenm did that answer your comment?

arsenm added inline comments.Apr 7 2022, 6:34 AM
llvm/lib/Target/SPIRV/SPIRVAsmPrinter.cpp
49

Oh, this points out a deeper issue that you're using the global subtarget. Ideally you're always querying the subtarget from the context function

arsenm added a comment.Apr 7 2022, 6:35 AM

Apologies, it seems I was looking into a very old version of the review!

This looks good to me. @arsenm did that answer your comment?

Yes, but it points to another issue (one that SPIRV specifically probably doesn't care about)

iliya-diyachkov added inline comments.Apr 7 2022, 8:01 AM
llvm/lib/Target/SPIRV/SPIRVAsmPrinter.cpp
49

I understand this, but in some cases we have a module without functions, and we still need to use Subtarget.

We could get Subtarget using static_cast when initializing AsmPrinter. Then if there are MachineFunctions in the module, we will get Subtarget using MF.getSubtarget<SPIRVSubtarget>() in emitFunctionHeader() for each MachineFunction.
Does this approach make sense?

arsenm added inline comments.Apr 7 2022, 8:02 AM
llvm/lib/Target/SPIRV/SPIRVAsmPrinter.cpp
49

Yes, each function should query the subtarget

iliya-diyachkov added inline comments.Apr 7 2022, 8:05 AM
llvm/lib/Target/SPIRV/SPIRVAsmPrinter.cpp
49

Ok, I'll implement it later today.

iliya-diyachkov updated this revision to Diff 421347.EditedApr 7 2022, 3:02 PM

Query ST for each function (in SPIRVAsmPrinter).

iliya-diyachkov marked 2 inline comments as done.Apr 7 2022, 3:13 PM

Updated after the last changes in the 4th and 5th patches.

Ping! We kindly ask for "LGTM" :)

Gentle ping. @arsenm please take a look at the final version of this patch.

arsenm accepted this revision.Apr 18 2022, 5:36 PM

LGTM with nit

llvm/lib/Target/SPIRV/SPIRVModuleAnalysis.cpp
210–216

Could use another continue and indentation reduction

This revision is now accepted and ready to land.Apr 18 2022, 5:36 PM

The indentation issue has been fixed.

iliya-diyachkov marked an inline comment as done.Apr 19 2022, 2:34 AM

Thank you for finishing the review of this series, Matt!

This revision was landed with ongoing or failed builds.Apr 19 2022, 4:28 PM
This revision was automatically updated to reflect the committed changes.