- User Since
- Jan 26 2016, 2:17 AM (169 w, 1 d)
Removed the change in findOptimalMemOpLowering, because that belongs to D59766 which I have just updated.
Changed the assert into an early return from function findOptimalMemOpLowering.
We might run into this assert when we actually start using this in D59787.
Mon, Apr 15
- LLVM subtarget features to make all the new instructions conditional on
- CPU and FPU names for use on armclang's command line, with default FPUs set so that "armv8.1-m.main+fp" and "armv8.1-m.main+fp.dp" will select the right FPU features
- architecture extension names "mve" and "mve.fp"
- ABI build attribute support for v8.1-M (a new value for Tag_CPU_arch) and MVE (a new actual tag)
Fri, Apr 5
Run code-size benchmark CSiBe, triggered an assert in function FindOptimalMemOpLowering, and fixed that.
Wed, Apr 3
Nice one, thanks for fixing this!
It appears the .cpu, .arch and .arch_extension directives can only enable features defined in ExtensionMap in the AArch64AsmParser.cpp and ignores them otherwise. ExtensionMap seems incomplete, in the patch you mentioned only tlb-rmi, pan-rwv and ccpp features were added.
Tue, Apr 2
FYI: a new ACLE version has been published, please find it here: https://developer.arm.com/architectures/system-architectures/software-standards/acle
This change looks also fine to me.
From a quick grep in the test directory, looks like we don't have many tests for the .arch_extension directive. I.e., I see only one, for the simd directive. I think that confirms my suspicion that we tend to forget to add tests/support for the .arch_extension directive when we upstream new architecture support. So, sorry to be a pain here, but while you're add it, my request would be to add tests for some other architecture extensions that we are missing. For example, rasv8_4 might be an interesting one because it has an underscore in it. Looks like the refactoring in D54633 introduced new extensions that we could test.
Mon, Apr 1
Hi Florian, just to clarify my previous message. I would be more than happy to change it, but was only curious if just passing function attributes was too restrictive somehow. Let me know what you think. Would you mind having a quick look at D59787 too? :-) Cheers.
Fri, Mar 29
- cleaned up the diff,
- tweaked the cost for generating a library call.
Ah, oops, that was a experiment in my local branch. I will get rid of this, and upload a new diff soon. So, thanks for spotting this, you haven't missed anything!
Thu, Mar 28
Ignoring for a moment how I would like to use this, I think this is a nice self-contained improvement (or perhaps even bug fix) so that getIntstructionCost/getIntrinsicCost returns something an awful lot more reasonable for memcpy's. I think the dependent refactoring patches are general improvements too. So, a friendly ping, any opinions on this?
Tue, Mar 26
Hi Florian, thanks for taking a look!
Mar 25 2019
Restored the comments.
Summarising, these are the patch sets that we need, i.e. the patches that show the whole flow:
Mar 23 2019
LGTM, just a question inline.
Mar 22 2019
Mar 20 2019
Mar 19 2019
Thanks for looking Sam!
I will wait a day with committing this, just in case there are more comments.
Mar 14 2019
Oops, also forgot about BE.
Looks like a sensible workaround to me.
Thanks, this looks good to me.
Mar 13 2019
In my previous comment, I commented on a possible default cost calculation for a memcpy, which approximated the cost by calculating the number of load/stores pairs. I probably could see this working as a default estimation, because I think when this would be used, the result will be that memcpy's are considered expensive relative to other instructions, which is probably correct. However, I appreciate that this is an approximation, and that there many things are missing here, like source/destination alignment, available load/store instruction, etc., which really shows this is target dependent decision. So, I've changed getMemcpyCost to just simply return TCC_Expensive, so that target can override this method and implement their decision making there, like is done for many/most functions here in TTI.
Looks very reasonable to me
Mar 12 2019
Mainly nits, one question about the tests.
The transform is more powerful working the other way: it could catch cases that were't originally written as "memcpy" in the source code. Not sure that means we shouldn't do something in SROA, though. Ideally, I think we want the "canonical" form to be the promoted version, and transform it > to the compact version as part of lowering. But I'm not sure how much work it would be to make the reverse transform reliable, and I don't want to block an incremental improvement.
MemCpyOpt doesn't actually merge memcpys at the moment, I think. It does merge memsets, and it has a lot of memcpy-related transforms, but that one in particular is missing. But it's the right sort of logic, at least...
Mar 11 2019
I was sort of thinking you could write the reverse transform in MemCpyOpt, since it does a similar sorts of analysis, but maybe we'd have to do it later, or patch SROA anyway, to avoid two transforms fighting each other.
Mar 8 2019
Just for your information, how I would like to use this patch: I've uploaded work-in-progress patch D59129. This is using getInstructionCost to decide if alloca's need to be lowered. The missing piece is the memcpy instruction cost decision making that I stripped out (the intial plumbing) from this patch, that I will start working on now.
Thanks for the reviews!
- removed the memcpy business from this patch.
- and we do have more const pointers now.
Mar 7 2019
Mar 6 2019
Mar 5 2019
Committed in rL355134
Committed in rL355050
Mar 1 2019
Looks like a good fix to me
Looks like a straightforward fix to me.
Feb 25 2019
LGTM, with one nit inline.
Feb 21 2019
Feb 19 2019
The ACLE has been updated and a new version with change included will be released soon.
Feb 17 2019
I am discussing this with our GCC team as we would like both Clang/GCC implementation to be the same. But you're right that _f16 looks like to be the more consistent choice. I will let you know as soon I know more.
Feb 15 2019
Feb 14 2019
Feb 13 2019
LGTM, with one nit inline.
Perhaps wait a day in case someone else has comments.
Feb 11 2019
Thanks Eli, also for your thoughts on this:
Feb 8 2019
Looks okay to me, with one nit inline.
Jan 31 2019
Jan 30 2019
It probably should be shouldExpandShift or something?
Thanks for reviewing!
Maybe put the tests into test/CodeGen/ARM/subtarget-no-movt.ll? We have some similar tests there from https://reviews.llvm.org/D23090 .