Change Lexer to use offsets instead of direct pointers to buffer so that even if we swap the buffer address in the middle, Lexer will be still functional.
Incremental input (via clang-repl, cling, etc) adds code line by line growing the TU. One of the last elements which needs to support growing is the source code buffer. One of the challenges is that when we grow the buffer, practically the buffer address can change. Since Lexer is using direct pointer to some point in buffer, once buffer is swapped every pointer needs to be updated including all trivial local variables -- which is very challenging to do without sacrificing readability of code.
This change solves this issue nicely. Since we will be only adding code at the back of the buffer, the offsets are always constant even if we grow the buffer many times and all the access to new buffer will be valid. We do add a number of indirections to BufferStart, but performance impact on actual compile time turned out to be negligible. The only visible performance trend seems to be 0.5%~0.7% increase in instruction count.
The debian failure is due to some clang-format issue which is unrelated to this change.
Should that use SourceLocation::UIntTy?
Looking at comments in SourceManager, I think there was an attempt at supporting > 2GB file but I don't think it got anywhere.
Nevertheless, using SourceLocation::UIntTy would arguably be more consistent
It does seem to be a huge undertaking to change it though, I'm not sure it would be worth it at all. There would be far bigger issues with ridiculously large source files anyway.