This is an archive of the discontinued LLVM Phabricator instance.

[X86][mem-fold] Speed up test by not re-generating the .inc file
ClosedPublic

Authored by skan on Apr 7 2023, 9:34 PM.

Diff Detail

Event Timeline

skan created this revision.Apr 7 2023, 9:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 7 2023, 9:34 PM
skan requested review of this revision.Apr 7 2023, 9:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 7 2023, 9:34 PM

Could you describe in more detail the purpose/value of this test? Judging by what I can see here it looks like it's a golden file test that verifies a copy of a file matches the current one - this doesn't seem valuable/seems like busy-work to keep an extra copy of a file around?

skan added a comment.EditedApr 7 2023, 11:46 PM

Could you describe in more detail the purpose/value of this test? Judging by what I can see here it looks like it's a golden file test that verifies a copy of a file matches the current one - this doesn't seem valuable/seems like busy-work to keep an extra copy of a file around?

Sure, let me explain in more detail. X86GenFoldTables.inc consists of tables for memory folding and is used in some optimizations like RA, LICM, which is generated by a tblgen backend "-gen-x86-fold-tables". The .inc file is not in source code of LLVM and is produced in the build process. x86-fold-tables.inc is the only source file that contains the tables, so there is no “extra copy" here.

We define some common rules in X86FoldTablesEmitter.cpp for the tblgen backend and these rules apply to most of existing and future instructions. But you know, there are always exceptions and a future instruction may not comply with these rules, causing some entries in X86GenFoldTables.inc to be incorrect. We need a way a to detect it, this is the first reason this test exists.

Before D147527 and we had this test, we expected developers to generate the .inc file in their local workspace and update hand-written tables biweekly or monthly. Unfortunately, people may forget it and tend to not fix the rules in X86FoldTablesEmitter.cpp when they were broken. The longer we forget, the bigger the diff we need to review. The same wrong entries appeared in front of the reviewer again and again b/c no one fixed the broken rules.

If the developer can update the table or correct the broken rules each time a small set of instructions are added, the problem can be solved. This is the second reason this test exists.

pengfei accepted this revision.Apr 10 2023, 6:26 PM

LGTM.

llvm/test/TableGen/x86-fold-tables.td
6

Maybe change name to x86-fold-tables.test given it is a pure compare test now?

This revision is now accepted and ready to land.Apr 10 2023, 6:26 PM
skan updated this revision to Diff 512309.Apr 10 2023, 7:05 PM

Address review comments: rename x86-fold-tables.td to x86-fold-tables.test

thakis added a subscriber: thakis.Apr 11 2023, 5:14 AM

In the GN build, the .inc file is at a different path.

As far as I can tell, nothing in this discussion mentions how time this saves – is it a lot?

I suppose we can add a dedicated %var for the path to the .inc line?

(Another point: It's generally possible to build LLVM on one machine and copy just built binaries and the test/ folder and then run tests on another machine. This change makes that harder since you now also need to copy over that .inc file.)

On my system, running llvm-tblgen -gen-x86-fold-tables -asmwriternum=1 llvm/lib/Target/X86/X86.td -I llvm/lib/Target/X86 -I llvm/include -o tmp takes 1 second. That doesn't seem to justify this change. Is it much slower on your end?

(Just to be clear, the GN build is unsupported and it's not a problem that that breaks – I just need to find some way to make it work again :) But the build/test split concern is universal.)

I reverted this in cbe4499d6b88b1bd6e4c30096cd33d4218fc0a84 for now pending discussion. I'll reland it for you if this is a huge time safe for you though :)

skan added a comment.Apr 11 2023, 5:54 AM

In the GN build, the .inc file is at a different path.

As far as I can tell, nothing in this discussion mentions how time this saves – is it a lot?

I suppose we can add a dedicated %var for the path to the .inc line?

(Another point: It's generally possible to build LLVM on one machine and copy just built binaries and the test/ folder and then run tests on another machine. This change makes that harder since you now also need to copy over that .inc file.)

I'm surprised that the path does not work for GN build. For curiosity, could I know where is it for GN build? I didn't realize anyone have the requirement of building on one machine and testing on another...

On my system, running llvm-tblgen -gen-x86-fold-tables -asmwriternum=1 llvm/lib/Target/X86/X86.td -I llvm/lib/Target/X86 -I llvm/include -o tmp takes 14 seconds for DEBUG build, and after this change it takes 0 second. So it saves 14 seconds.

Is using LLVM_OPTIMIZED_TABLEGEN an option for you?

(To answer your question, the path as-is expands to llvm-project/out/gn/lib/Target/X86/X86GenFoldTables.inc but the file is at out/gn/gen/llvm/lib/Target/X86/X86GenFoldTables.inc. But I'd argue that tests shouldn't make assumptions about the layout of stuff in the build dir – managing that is lit's job. So if we must do this, I think we should add a substitution for this – llvm_gen_dir maybe, but even then you'd hardcode the layout of stuff below that dir. Maybe just llvm_x86_gen_fold_tables_inc_path. See llvm/test/lit.site.cfg.py.in.)

skan added a comment.Apr 11 2023, 6:38 AM

Is using LLVM_OPTIMIZED_TABLEGEN an option for you?

@thakis Yes, it's an option to me for most cases, thanks. Even 14 seconds is acceptable for me. I created this patch just for a better speed not for a concern. I'm fine with your revert. BTW, should we add back the REQUIRES: x86-registered-target in the test? I don't think users care about the test when x86 is not the build targets.

@thakis Yes, it's an option to me for most cases, thanks. Even 14 seconds is acceptable for me. I created this patch just for a better speed not for a concern. I'm fine with your revert. BTW, should we add back the REQUIRES: x86-registered-target in the test? I don't think users care about the test when x86 is not the build targets.

Great, thanks!

If the requires isn't needed, I wouldn't add it. I think it's nice if as many tests as possible run independent of build settings. That makes it easier to debug test failures locally.

Could you describe in more detail the purpose/value of this test? Judging by what I can see here it looks like it's a golden file test that verifies a copy of a file matches the current one - this doesn't seem valuable/seems like busy-work to keep an extra copy of a file around?

Sure, let me explain in more detail. X86GenFoldTables.inc consists of tables for memory folding and is used in some optimizations like RA, LICM, which is generated by a tblgen backend "-gen-x86-fold-tables". The .inc file is not in source code of LLVM and is produced in the build process. x86-fold-tables.inc is the only source file that contains the tables, so there is no “extra copy" here.

We define some common rules in X86FoldTablesEmitter.cpp for the tblgen backend and these rules apply to most of existing and future instructions. But you know, there are always exceptions and a future instruction may not comply with these rules, causing some entries in X86GenFoldTables.inc to be incorrect. We need a way a to detect it, this is the first reason this test exists.

Before D147527 and we had this test, we expected developers to generate the .inc file in their local workspace and update hand-written tables biweekly or monthly. Unfortunately, people may forget it and tend to not fix the rules in X86FoldTablesEmitter.cpp when they were broken. The longer we forget, the bigger the diff we need to review. The same wrong entries appeared in front of the reviewer again and again b/c no one fixed the broken rules.

If the developer can update the table or correct the broken rules each time a small set of instructions are added, the problem can be solved. This is the second reason this test exists.

I still don't really follow. This is a golden file test that boils down to "if this file changes, go check the other file/make updates to it" by checking in the built-result into x86-fold-tables.inc and comparing against the newly built result...

This doesn't seem suitable as a test in general, even without this quirky "test that reads a file from the build directory/depends on the build directory layout".

Could you point to some examples of problems this test (or the previous version, or the problems that arose in the absence of this test) helps identify? Perhaps there are other ways to address them.

skan added a comment.EditedApr 19 2023, 2:14 AM

Could you describe in more detail the purpose/value of this test? Judging by what I can see here it looks like it's a golden file test that verifies a copy of a file matches the current one - this doesn't seem valuable/seems like busy-work to keep an extra copy of a file around?

Sure, let me explain in more detail. X86GenFoldTables.inc consists of tables for memory folding and is used in some optimizations like RA, LICM, which is generated by a tblgen backend "-gen-x86-fold-tables". The .inc file is not in source code of LLVM and is produced in the build process. x86-fold-tables.inc is the only source file that contains the tables, so there is no “extra copy" here.

We define some common rules in X86FoldTablesEmitter.cpp for the tblgen backend and these rules apply to most of existing and future instructions. But you know, there are always exceptions and a future instruction may not comply with these rules, causing some entries in X86GenFoldTables.inc to be incorrect. We need a way a to detect it, this is the first reason this test exists.

Before D147527 and we had this test, we expected developers to generate the .inc file in their local workspace and update hand-written tables biweekly or monthly. Unfortunately, people may forget it and tend to not fix the rules in X86FoldTablesEmitter.cpp when they were broken. The longer we forget, the bigger the diff we need to review. The same wrong entries appeared in front of the reviewer again and again b/c no one fixed the broken rules.

If the developer can update the table or correct the broken rules each time a small set of instructions are added, the problem can be solved. This is the second reason this test exists.

I still don't really follow. This is a golden file test that boils down to "if this file changes, go check the other file/make updates to it" by checking in the built-result into x86-fold-tables.inc and comparing against the newly built result...

This doesn't seem suitable as a test in general, even without this quirky "test that reads a file from the build directory/depends on the build directory layout".

Could you point to some examples of problems this test (or the previous version, or the problems that arose in the absence of this test) helps identify? Perhaps there are other ways to address them.

This patch was already reverted. The build directory layout is not an issue anymore and I think we only need to discuss about why we need the "golden" file test.

First, I think it's suitable as a test in general. Tablgen generates the tables for compiler here and then compiler optimizes based on the tables. Excluding the tables, the optimization is well tested. So what we need to check is the tables. From this perspective, the test checks the functionality of the compiler.
Second, there are some examples where this test could have helped identify. (I searched them by git log --follow f3d9abf1f87c~ -- llvm/lib/Target/X86/X86MemFoldTables.inc, git log --follow 4f4b2161ec3c~ -- llvm/lib/Target/X86/X86InstrFoldTables.cpp )

This patch was already reverted. The build directory layout is not an issue anymore and I think we only need to discuss about why we need the "golden" file test.

I think the test is problematic in either form - either reaching into the source tree from the test tree, or reaching into the build tree from the test tree. But, yes, also I disagree with the substance of the test as well, regardless of the means it uses to achieve that.

First, I think it's suitable as a test in general. Tablgen generates the tables for compiler here and then compiler optimizes based on the tables. Excluding the tables, the optimization is well tested. So what we need to check is the tables.

I generally disagree there - we should test the behavior of the compiler. So we should test how the behavior changes when the table changes.

If it's so boring (like we could test the features the table can describe, then all the tests that the features are used in the right way would be totally repetitive) as to be not worth testing that behavior - but then I don't see how testing the result of tblgen helps either.

Is it really non-obvious what the table does? (ie: when someone makes a change to the table, do they often make the wrong change to the table? And seeing the result of tblgen they would realize they'd made a mistake?)

From this perspective, the test checks the functionality of the compiler.
Second, there are some examples where this test could have helped identify. (I searched them by git log --follow f3d9abf1f87c~ -- llvm/lib/Target/X86/X86MemFoldTables.inc, git log --follow 4f4b2161ec3c~ -- llvm/lib/Target/X86/X86InstrFoldTables.cpp )

Could you explain the process in each case, in terms of how the test would've helped avoid the bug being introduced?

https://reviews.llvm.org/D147527#4249000 - @craig.topper - if you're around, perhaps you've got a take on this that'd help me wrap my head around it?

This patch was already reverted. The build directory layout is not an issue anymore and I think we only need to discuss about why we need the "golden" file test.

I think the test is problematic in either form - either reaching into the source tree from the test tree, or reaching into the build tree from the test tree. But, yes, also I disagree with the substance of the test as well, regardless of the means it uses to achieve that.

First, I think it's suitable as a test in general. Tablgen generates the tables for compiler here and then compiler optimizes based on the tables. Excluding the tables, the optimization is well tested. So what we need to check is the tables.

I generally disagree there - we should test the behavior of the compiler. So we should test how the behavior changes when the table changes.

If it's so boring (like we could test the features the table can describe, then all the tests that the features are used in the right way would be totally repetitive) as to be not worth testing that behavior - but then I don't see how testing the result of tblgen helps either.

Is it really non-obvious what the table does? (ie: when someone makes a change to the table, do they often make the wrong change to the table? And seeing the result of tblgen they would realize they'd made a mistake?)

From this perspective, the test checks the functionality of the compiler.
Second, there are some examples where this test could have helped identify. (I searched them by git log --follow f3d9abf1f87c~ -- llvm/lib/Target/X86/X86MemFoldTables.inc, git log --follow 4f4b2161ec3c~ -- llvm/lib/Target/X86/X86InstrFoldTables.cpp )

Could you explain the process in each case, in terms of how the test would've helped avoid the bug being introduced?

https://reviews.llvm.org/D147527#4249000 - @craig.topper - if you're around, perhaps you've got a take on this that'd help me wrap my head around it?

The majority of X86 instructions have a register only form and a form that has a memory address operand to load from. Or sometimes store to. Each of the two form is represented separately in the X86:: opcode space and is a separate record in tablegen.

The tables here pair up the register only form and the memory form of the instructions into a single row in the table. This information is used to convert between the two forms. For example, to fold a stack reload created by register allocation into its using instruction. Or to unfold a load so it can be hoisted by MachineLICM.

There is no direct connection between the two instruction records in the tablegen definitions. So the tablegen emitter has to compare the opcodes, format field, and other encoding information from the tablegen records to find the matching instructions to pair up.

Unfortunately, just because two instructions are encoded in very similar way does not mean they are really two different forms of the same operation. This has gotten worse as the opcode map has gotten more crowed. Intel and AMD have gone back and filled in new instructions near the opposite form of an older instruction. It would not be legal to convert between these. There are rules in the tablegen emitter or tablegen records to block some of these false cases. It may be possible to detect this by doing some string matching of the names if we came up with a reliable way to extract the instruction name from the form suffix we had to describe the operands. For example ADD32rr and ADD32rm are paired up and the "rm" and "rr" suffix indicate one is memory and one is register.

If two instructions are the same instruction, it doesn't mean it is always safe to fold or unfold. Sometimes only one direction is ok or sometimes neither directions is ok. The tablegen emitter has rules for the known blocking reasons, but new blocking reasons could be created by future instructions.

As I understand it, the idea here is that every time a new instruction is added this test will fail and the person adding a new instruction will have to audit the newly generated table entries for correctness. The test won't fail if the newly added instructions hit one of the existing black list rules. Usually the new entries will be fine and they can be copied over to the golden file and be done. Less often a new restriction will need to be added to prevent tablegen from generating an incorrect entry in the table. Relatively few people are adding new instructions so most developers will never see this test fail.

Previously this audit happened only occasionally when someone manually ran the tablegen emitter and checked for entries that weren't in the golden file. This test makes this audit occur when the instructions get added instead of some unknown point in the future.

We don't know how to make a future proof tablegen emitter so we want to have a human audit in the future. Unfortunately, the human audit can also be wrong, but hopefully other humans will also audit as part of the code review when adding the new instructions.

This patch was already reverted. The build directory layout is not an issue anymore and I think we only need to discuss about why we need the "golden" file test.

I think the test is problematic in either form - either reaching into the source tree from the test tree, or reaching into the build tree from the test tree. But, yes, also I disagree with the substance of the test as well, regardless of the means it uses to achieve that.

First, I think it's suitable as a test in general. Tablgen generates the tables for compiler here and then compiler optimizes based on the tables. Excluding the tables, the optimization is well tested. So what we need to check is the tables.

I generally disagree there - we should test the behavior of the compiler. So we should test how the behavior changes when the table changes.

If it's so boring (like we could test the features the table can describe, then all the tests that the features are used in the right way would be totally repetitive) as to be not worth testing that behavior - but then I don't see how testing the result of tblgen helps either.

Is it really non-obvious what the table does? (ie: when someone makes a change to the table, do they often make the wrong change to the table? And seeing the result of tblgen they would realize they'd made a mistake?)

From this perspective, the test checks the functionality of the compiler.
Second, there are some examples where this test could have helped identify. (I searched them by git log --follow f3d9abf1f87c~ -- llvm/lib/Target/X86/X86MemFoldTables.inc, git log --follow 4f4b2161ec3c~ -- llvm/lib/Target/X86/X86InstrFoldTables.cpp )

Could you explain the process in each case, in terms of how the test would've helped avoid the bug being introduced?

https://reviews.llvm.org/D147527#4249000 - @craig.topper - if you're around, perhaps you've got a take on this that'd help me wrap my head around it?

The majority of X86 instructions have a register only form and a form that has a memory address operand to load from. Or sometimes store to. Each of the two form is represented separately in the X86:: opcode space and is a separate record in tablegen.

The tables here pair up the register only form and the memory form of the instructions into a single row in the table. This information is used to convert between the two forms. For example, to fold a stack reload created by register allocation into its using instruction. Or to unfold a load so it can be hoisted by MachineLICM.

There is no direct connection between the two instruction records in the tablegen definitions. So the tablegen emitter has to compare the opcodes, format field, and other encoding information from the tablegen records to find the matching instructions to pair up.

Unfortunately, just because two instructions are encoded in very similar way does not mean they are really two different forms of the same operation. This has gotten worse as the opcode map has gotten more crowed. Intel and AMD have gone back and filled in new instructions near the opposite form of an older instruction. It would not be legal to convert between these. There are rules in the tablegen emitter or tablegen records to block some of these false cases. It may be possible to detect this by doing some string matching of the names if we came up with a reliable way to extract the instruction name from the form suffix we had to describe the operands. For example ADD32rr and ADD32rm are paired up and the "rm" and "rr" suffix indicate one is memory and one is register.

If two instructions are the same instruction, it doesn't mean it is always safe to fold or unfold. Sometimes only one direction is ok or sometimes neither directions is ok. The tablegen emitter has rules for the known blocking reasons, but new blocking reasons could be created by future instructions.

As I understand it, the idea here is that every time a new instruction is added this test will fail and the person adding a new instruction will have to audit the newly generated table entries for correctness. The test won't fail if the newly added instructions hit one of the existing black list rules. Usually the new entries will be fine and they can be copied over to the golden file and be done. Less often a new restriction will need to be added to prevent tablegen from generating an incorrect entry in the table. Relatively few people are adding new instructions so most developers will never see this test fail.

Previously this audit happened only occasionally when someone manually ran the tablegen emitter and checked for entries that weren't in the golden file. This test makes this audit occur when the instructions get added instead of some unknown point in the future.

We don't know how to make a future proof tablegen emitter so we want to have a human audit in the future. Unfortunately, the human audit can also be wrong, but hopefully other humans will also audit as part of the code review when adding the new instructions.

Thanks for the context - I don't suppose it's worthwhile to make these associations explicit (you mentioned " There is no direct connection between the two instruction records in the tablegen definitions.") so there weren't these unreliable heuristics to contend with?

In any case - the test is still a bit of a layering violation - it's not a test of tblgen itself, I think (lib/TableGen only depends on Support, not on Target, etc)? This test is really a test of the X86.td file, by the sounds of it (well, maybe that and the TableGen backend here - which I guess has all the special case rules encoded?) - perhaps the test should live in Target/X86 instead? (though I don't think it'd fix Google's internal layering issues - of having a lit test reading a source file - but would at least feel more right to me - in terms of library dependency graph, at least)