...vs a test for each simd operation
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Bitcode/simd_ops/simd_ops.cpp | ||
---|---|---|
83 ↗ | (On Diff #49392) | &out has type buffer*. out_value is the one that's double*. |
Bitcode/simd_ops/simd_ops.cpp | ||
---|---|---|
83 ↗ | (On Diff #49392) | Oh yeah, thanks, that what I missed! |
Bitcode/simd_ops/CMakeLists.txt | ||
---|---|---|
6 ↗ | (On Diff #49392) | Generally tools like llvm-link need cmake configuration options so you can switch between different versions of them in different paths. |
Bitcode/simd_ops/from_bc_to_s.sh | ||
25 ↗ | (On Diff #49392) | There should also be a way to configure the PATH/name that is used for llvm-dis and llc and again I would commend to use clang. In this specific case I'd also recommend to use clang to compile the files as that is a proper driver meant for production as opposed to llc which is only meant to create llvm unit tests. |
CMakeLists.txt | ||
107 ↗ | (On Diff #49392) | We need some sort of configuration option here, because as far as I understand it you the tests won't work universally. We must check some things like llvm utilities being present and the target being an X86_64 with support for AVX2? |
Bitcode/simd_ops/CMakeLists.txt | ||
---|---|---|
6 ↗ | (On Diff #49392) | Thanks, I will update this. |
Bitcode/simd_ops/from_bc_to_s.sh | ||
25 ↗ | (On Diff #49392) | Will update with PATH as a first step and look into replacing with clang. |
CMakeLists.txt | ||
107 ↗ | (On Diff #49392) | Yes, you're right. I need to update this, especially since these tests will come with versions for different archs. |
- Remove use of llc, using clang
- Updated optimized bitcode tests
- Include unoptimized bitcode tests
- Individual testing of each operation
- Add random seed option for test reproducibility
- Error checking codegeneration by relaxed error margin
- Error checking optimizations by comparing optimized vs unoptimized versions.
- Tests run only on x86 w/ AVX
Bitcode/simd_ops/CMakeLists.txt | ||
---|---|---|
54 ↗ | (On Diff #50345) | Is there a better way to do this? |
Bitcode/simd_ops/simd_ops.reference_output | ||
1 ↗ | (On Diff #50345) | Does it make sense to have this reference file? Test is already checking against "reference" version; the only check needed is the error code. Is there a better alternative? |
Bitcode/CMakeLists.txt | ||
---|---|---|
1 ↗ | (On Diff #50345) | This can be simplified to if(ARCH STREQUAL "x86") |
4–9 ↗ | (On Diff #50345) | Please don't commit commented code. |
Bitcode/simd_ops/CMakeLists.txt | ||
2–5 ↗ | (On Diff #50345) | Remove the space after the '(' braces. |
7–13 ↗ | (On Diff #50345) | Please don't commit commented code. |
15–48 ↗ | (On Diff #50345) | This is a lot of code duplication. We should rather add a variant of test_suite_add_executable() that does not automatically call llvm_add_test(). You can probably put something like your llvm_add_test_common_ref into the generic cmake code as well; Maybe call it llvm_write_test_file() and let llvm_add_test() use the common code? |
52–53 ↗ | (On Diff #50345) | There's loops, custom macros and a GLOB call in cmake, that should be enough to construct something like that in a few lines rather than generating a gigantic file. |
60 ↗ | (On Diff #50345) | commented code... |
64–67 ↗ | (On Diff #50345) | commented code... |
Bitcode/simd_ops/custom_test.sh | ||
1–35 ↗ | (On Diff #50345) | I wonder if all this filtering and file creation is really necessary. Would it be possible to create the .bc files to all export the same function name? And then just different link lines. In spirit like: linking "simd_ops_driver.o and test1.bc" to get test1, linking "simd_ops_driver.o and test2.bc" to get test2, ... |
Bitcode/simd_ops/simd_ops.reference_output | ||
1 ↗ | (On Diff #50345) | The *.reference_output is a convention which can be pretty annoying. However it is only used automatically if the test did not call llvm_test_verify() and llvm_test_run() in the cmake files. For newly created cmake modules we should not work with .reference_output files IMO. lit should already abort if the RUN: part exited with exit code != 0 (I fixed a bug related to that a few days ago) so you should be able to get away with just a llvm_test_run() and no llvm_test_verify() if you do not require additional verification steps. If you require verification steps, I'd also recommend to set "config.traditional_output = False" in a lit.local.cfg (see External/SPEC/CMakeLists.txt for an example) to avoid the pointless "exit XX" line at the end. |
Current update:
- Updated bitcode files to use the same function name
- Simplify CMakeLists accordingly (use llvm_multisource)
- Remove check against reference file. Tests check against reference bitcode and return error code if results differ.
- General header/script cleanup
Previous update:
- Remove use of llc, using clang; updated tests; individual testing.
- Remove use of llc, using clang; updated tests; individual testing.
- Updated cflags/link flags (from SingleMultiSource.cmake)
- Stable patch, including unoptimized bitcode tests
- Error checking 1. codegeneration by error margin and 2. optimizations by comparing optimized vs unoptimized versions.
- Simplify CMakeLists
- Simplified CMakeLists
- Update all tests to the same function name. Cleanup all.
Thanks for the update. Just to let you know that I plan on looking at this more closely next week.
Great! I hope I addressed all feedback so far. Current patch is pretty clean.
Github is updated as well.
Thanks for working on this, the latest version is considerably cleaner and looks good to me (but please wait for joker.eph before pushing)
Long term we should look into splitting the test-suite up into different repositories with a common test driver/cmake build core. But as we do not have that today this should go into the normal test-suite repository as proposed here.
- Name-change test functions and restructure in preparation for adding arm tests
- Add Regression folder and test for bug 26953
This is the final change to the current patch (minus future suggestions). I needed to make some name changes and minor restructure to enable easy adding of tests for different arch. These additional tests will be added shortly after this patch lands.
I'm not planning to land the current patch until getting approval from both MatzeB and joker.eph.
Marking comment in CMakeLists.txt as Done. Issue addressed in inner CMakeLists and main driver.
Pinging for review.
Apologize I wanted to get this done this weekend, but I ended up being too busy with other urgent stuff, will get to this ASAP.
I wanted to look more in details, but I am under tight deadlines this month (and my computer died last week, which didn't help my productivity...) I still gave a quick look.
Here are some comments:
- Looking one of the "unoptimized" bitcode file, I see variables named like %.lcssa.lcssa. The optimizer inserts such name, so it is suspicious to see these in the output of a frontend. Can you explain how did you generate the opt/unopt?
- I didn't expect to have the optimized bitcode files here in the repository, I was expecting to see *only* the unoptimized one. It would then be compiled with or without optimizations.
- The *output* buffer is marked as readonly in the bitcode, unless I missed something. This sounds wrong...
- It would be nice to have a README describing the structure of the bitcode files (especially the role of the 3 functions: unopt_test_op_argv, unopt_test_op, and __unopt_test_op).
- AFAICT, the "filters" are still responsible to compute two versions of the code (vectorized and scalar) and produce the "diff" (sad or whatever). Is this correct? I was expecting the different versions of the filters to be exposed at this level in the "driver" code, and the diff to be computed in the driver as well.
This is great feedback, much appreciated! Answering each point:
- The reason for that is that I used O3 and O1. There was a Halide bug that prevented getting O0, which just got fixed, so I regenerated the unoptimized bitcode files. I will replace them in the next update to the patch, but I want to address the points below as well.
- I agree with you but I want to do this right. The optimization pipeline used by Halide is essentially O3 + always inline. Let me look into how to reproduce the same optimization pipeline in cmake and verify with opt it is doing the same thing as the internal one embedded in Halide. If you have suggestions on the best way to do this, I'd be happy to hear them.
- Yes. The identifiers are actually added by the optimization pass because the buffer_t* is not modified (the readonly does not appear in the O0 bitcode files). The host field inside, is the actual output and the one being modified.
- This is equivalent to a README on how Halide generates code. Those methods are "standard" for any function definition. In these tests it just happens to be named "unopt_test_op". I will add a small description in the current directory, but it will have to move to a "root" Halide directory once more tests are pushed in. I think this is a good thing to have in general to help understand those who may end up debugging failures in these tests.
- Yes, that's correct. The two versions of the filters are both in Halide. Having both vector and scalar would be equivalent to having 2 bitcode files for each filter. To me this seems to complicate the test suite without much benefit. If you see a great improvement in splitting each test into 2 let me know and we can discuss it further. I believe the check in the driver for a value smaller than some epsilon (i.e. an equivalence between vector and scalar) for both optimized and unoptimized, plus a result equivalence between the optimized and unoptimized (i.e. the same epsilon) should be sufficient right now.
- Replaced unoptimized bitcode files with raw .bc generated by Halide
- Comparing unoptimized tests with optimized tests from opt.
Notes about the latest updates:
- I replaced unoptimized files with the raw bitcode as generated by Halide before any optimizations
- The tests compare the raw bitcode vs "opt -O3". Note this does not give the same result as the internal Halide optimization pipeline. I'm looking into this, but it is not critical from LLVM's point of view and having these tests in sooner rather than later is preferred.
- Item pending: README on Halide code generation pattern (functions unopt_test_op_argv, unopt_test_op, and __unopt_test_op)
To clarify, these simd_op tests are the only ones that exhibit this pattern.
I currently have two more things pipelined to upstream:
- Extension to the simd_op folder to include similar tests for arm and aarch64
- Benchmarks that do image processing and test against an image reference output.
The simd_ops tests are meant to be unit tests. I'm not planning to include these in benchmarking and the CMakelists shows this. Actually, the structure I have in mind is moving "simd_ops" to "UnitTests" and adding the benchmarks I'm working on into "Benchmarks", the idea being to replicate the folder structure in SingleSource, MultiSource etc.
Again, the benchmarks do not have this pattern.
More to the point:
The results are not initialized to zero. All buffers are filled with random values (the output buffers as well). There is always the possibility of that random value ending up being 0, but it's not a repeatable behavior; i.e. if the result is incorrect - non zero - the test will more than likely fail. It's also not the case that we end up with 0 being stored after constant folding; tests are still complex enough for the compiler to infer the equivalence of the two filters (loops).
The testing logic will essentially remain in the test itself even after splitting the filters. It will be only be moved to the driver: call both filters and check difference is eps. I intend these tests to continue to run on random data, so having a reference output is not an option.
Having said all that, I'm open to splitting the filters if the additional cost of having double the number of bitcode files is acceptable.
This will double the current number of x86 tests (348) and also the incoming ones for arm and aarch64 (>1300 each; >5.5k in total after the split). The size of each file also increased compared to the initial tests because they are now unoptimized (2.5k lines each vs 0.5k lines after -O3), and it won't be reduced by a lot after the split because of the redundant generated declarations.
If this sounds worth doing to you, I'll start working on splitting the filters.
Bitcode/simd_ops/simd_ops.cpp | ||
---|---|---|
66 ↗ | (On Diff #54943) | These initializations seem to leak and are dead, or I missed where they are used. |
Bitcode/simd_ops/simd_ops.cpp | ||
---|---|---|
66 ↗ | (On Diff #54943) | Can it be totally removed and replace all uses of out_value[i] with`out[i].host` instead? |
Bitcode/simd_ops/simd_ops.cpp | ||
---|---|---|
67 ↗ | (On Diff #55323) | Yes, of course. I re-added it w/o the leak in the latest revision for cleaner-looking code when checking correctness, i.e. not having to do the cast every time. I don't have a strong preference either way. |
Might I ask you to land this when you get a chance? I don't have permissions to do so.
Apparently Phabricator does not play well with the binary files (bitcode), I can't figure how to land that. If you can run git format-patch (or similar) and email me the patch I'll commit it.
Update: Landing this failed due to the script still using opt.
I removed the script and updated the driver.
I updated the driver to check for the existence of the call "__builtin_cpu_supports".
I also updated the driver and tests to test on smaller inputs with large padding to account of Halide's out pf bounds accesses on stencils.
Updated patch for easy pull: https://github.com/alinas/Bitcode.
I have this error at build time:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ -O3 -DNDEBUG [...] Bitcode/simd_ops/CMakeFiles/simd_ops_test_op_pavgw_80.dir/simd_ops.cpp.o Bitcode/simd_ops/CMakeFiles/simd_ops_test_op_pavgw_80.dir/simd_op_check_runtime.bc.o Bitcode/simd_ops/CMakeFiles/simd_ops_test_op_pavgw_80.dir/x86_tests/test_op_pavgw_80.bc.o Bitcode/simd_ops/CMakeFiles/simd_ops_test_op_pavgw_80.dir/x86_scalar_tests/scalar_test_op_pavgw_80.bc.o -o Bitcode/simd_ops/simd_ops_test_op_pavgw_80 -Wl,-no_pie -lpthread -ldl
Undefined symbols for architecture x86_64:
"___cpu_model", referenced from: _main in simd_ops.cpp.o
Updating simd_op tests to latest Halide. This increases the number of tests from 348 to 402.
Updated Halide runtime and main driver as well.
Following the patch to compiler_rt the tests should run ok on OSX.
Tested on Linux and I'm in the process of testing on OSX.
If there is no additional feedback, I'm planning to land this on Monday morning.
I found the Bitcode tests failing with (older?) apple clang versions:
error: Unknown attribute kind (52) (Producer: 'LLVM3.9.0svn' Reader: 'LLVM APPLE_1_800.0.25_0')
I guess this is mostly a question for @mehdi_amini: My understanding is that we do not have enough versioning information in the bitcode fails to detect compatibility? Would it be enough to check a small .bc file during cmake configuration time would that be enough to ensure all further tests work? Or should we make the bitcode tests opt-in with a cmake option?
@alina: you need to use a released version of LLVM to produce the bitcode you want to archive in general. We don't provide any guarantee on the bitcode generated by a development version.
I guess this is mostly a question for @mehdi_amini: My understanding is that we do not have enough versioning information in the bitcode fails to detect compatibility? Would it be enough to check a small .bc file during cmake configuration time would that be enough to ensure all further tests work? Or should we make the bitcode tests opt-in with a cmake option?
There is no silver bullet, for instance if we change the representation of attributes and your small .bc does not have them it will be processed fine, but another .bc that has the attribute won't be read properly.
Ah I guess we provide forward compatibility? So picking a baseline like clang version >= 3.8 and checking for that would work?
@asbirlea: Could you make the Bitcode tests opt-in until we have a reliable way to detect whether the provided clang works?
Something along the lines of
set(TEST_SUITE_ENABLE_BITCODE_TESTS "False" CACHE BOOL "Enable bitcode tests") if(TEST_SUITE_ENABLE_BITCODE_TESTS) # ... endif()
in Bitcode/CMakeLists.txt.
Well we can simply disable the tests by default if apple clang is detected or pick a different baseline version. The important thing is not having an experience where the test-suite fails out of the box.
Somehow it comes down to how much we tie a test-suite version to a specific version of clang: having an "opt-out" instead of "opt-in" could make sense as well: bots that are testing LLVM bleeding edge would test this out-of-the box. Downstream user would have to update their cmake configuration.
Both points are valid, so let me try to find some middle ground.
I created the patch with the cmake option to disable the Bitcode tests, but they are still enabled by default right now. Ideally these would always match the latest release. I'll look into updating with the 3.9 branch so they match the latest release not the development version. Does that sound reasonable for now?
Also, is there another way to check for Apple llvm other than defined(APPLE) and should there be an option to disable the tests in that case?
defined(__APPLE__) is about testing the platform, not the compiler itself.
We still want the OS X bot to run these test with the current trunk.
The issue is for every downstream users (every company that have a private branch forked out of a previous version of LLVM) that won't be able to test this. My suggestion is to let them add the option to their configuration to exclude these tests.
I am pretty sure cmake already differentiates the differentiates apple clang from "normal" clang variants, I can take a look later on how to write a proper test for it.
Interesting, didn't know that.
I can take a look later on how to write a proper test for it.
But we would still want the "apple" compiler to test bitcode produced by "old" open-source version and I don't see a good way to express that?
But we would still want the "apple" compiler to test bitcode produced by "old" open-source version and I don't see a good way to express that?
I thought the issue is that these bitcode tests were too "new" and "older" version of the compiler will not be compatible. Testing older open-source version would be covered if the versions are backward compatible with each other and in the case of the "apple" compiler with the LLVM version the compiler was forked from; guessing this info can be made available?
(Please correct me if I got this whole thing wrong.)
I am pretty sure cmake already differentiates the differentiates apple clang from "normal" clang variants, I can take a look later on how to write a proper test for it.
That would be great, I have no idea how to check that (or that it was possible).