Thanks for adding this Martin!
I'd really like to see a simple test running from the Clang side as well! Like calling llvm-rc with a file that requires preprocessing:
That sounds like a good idea, as there's indeed a gap in testing at that spot right now.
What would be the most appropriate place under clang/test for it?
Maybe under clang\test\Preprocessor ?
Let me know if I should request this review from someone else. It is important for scalability, as this change, in combination with my follow-up change (https://reviews.llvm.org/D100771) eliminate the need to load all .dwo files in the most common debugging scenarios.
there are still pre-merge failures. you may need update your test
I'm other projects, we include an _ods_common.py trampoline in the project specific dialects directory: https://github.com/llvm/circt/blob/main/lib/Bindings/Python/circt/dialects/_ods_common.py
This looks basically good to me.
Just a sec: we solved this on the circt side - need to look at how.
We want this to support the segment load/store intrinsics defined here https://github.com/riscv/rvv-intrinsic-doc/blob/master/intrinsic_funcs/03_vector_load_store_segment_instructions_zvlsseg.md These return 2 to 8 vectors that have been loaded into consecutive registers. I believe SVE has similar instructions. I believe SVE represents these using types wider than their normal scalable vector types and relies on the type legalizer to split them up in the backend. This works for SVE because there is only one known minimum size for all scalable vector types so the type legalizer will always split down to that minimum type.
Thanks for providing the context!
For RISC-V vectors we already use 7 different sizes of scalable vectors to represent the ability of our instructions to operate on 2, 4, or 8 registers simultaneously. And for 1/2, 1/4, and 1/8 fractional registers. The segment load/store instructions add an extra dimension where they can produce/consume 2, 3, or 4 pairs of registers or 2 quadruples, for examples. Following the SVE strategy would give us ambiguous types for the type legalizer.
How does that look in terms of IR? Is the number of registers somehow represented in the (LLVM IR) vector type? Or are the types the same, but the compiler generates different code depending on what mode is set? For SVE we know we can split the vector because <vscale x 8 x i32> is twice the size of <vscale x 4 x i32>, regardless of the value for vscale. Indeed we know SVE vectors area multiple of 128bits, and therefore that <vscale x 4 x i32> is legal. In order to make any assumptions about splitting/legalization, the compiler will need to know which types are legal, and so would expect the compiler to know the mode (2, 4 ,8) for RVV when generating the code, and therefore have similar knowledge about which types are legal and how the vectors are represented/split into registers. How does that lead to ambiguous types?
To solve this we would like to use a struct for the segment load/stores to separate them in IR. Since clang needs an address for every variable and needs to be able to load/store them we need to support load/store/alloca.
These (C/C++-level) intrinsics are probably implemented using target-specific intrinsics or perhaps a common LLVM IR intrinsic like masked.load, which should be able to take/return a struct with scalable members after D94142. If so, it should be possible to handle this in Clang by emitting extractvalue instructions and storing each member individually. That would avoid any changes to LLVM IR. Is that something you've considered?
Does anyone have any remaining concerns or a desire to take another pass through this patch? If not, then I'll commit this later this week on Aaron's LGTM.
In the previous version of the patch, the actual Constant values used by the ConstantAsMetadata arguments to DIArgList were not being enumerated until the function that used them was incorporated, which resulted in incorrect bitcode output (all constant values in the bitcode are expected to appear at the module level).
Rebased to not depend on other cl
I don't think we should have unused pseudo instructions since they should all have lit tests. So if they can't be tested soon, they should be added when they are needed.
Add comment about the Other ID type, making clear that currently there are VPUsers in VPBlockBase, but in the future Other should only be used for live-outs.
Why would code handling llvm.dbg.cu need to add an if statement? I'd expect it to walk through whatever's in llvm.dbg.cu and if there's nothing it'd be fine with that?
Update test formatting