The scanner assumed that ~ really meant ^, which may be helpful when
coming from C. But ~ is not a valid Go token, and pretending that it
should be ^ can lead to confusing error messages. Better to be upfront
about it and complain about the invalid character in the first place.
This was code "inherited" from the original yacc parser which was
derived from a C compiler. It's 10 years later and we can probably
assume that people are less confused about C and Go.
Fixes#23587.
Change-Id: I8d8f9b55b0dff009b75c1530d729bf9092c5aea6
Reviewed-on: https://go-review.googlesource.com/94160
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
R=go1.11
In order to collect comments in the AST and for error testing purposes,
the scanner needs to not only recognize and skip comments, but also be
able to report them if so desired. This change adds a mode flag to the
scanner's init function which controls the scanner behavior around
comments.
In the common case where comments are not needed, there must be no
significant overhead. Thus, comments are reported via a handler upcall
rather than being returned as a _Comment token (which the parser would
have to filter out with every scanner.next() call).
Because the handlers for error messages, directives, and comments all
look the same (they take a position and text), and because directives
look like comments, and errors never start with a '/', this change
simplifies the scanner's init call to only take one (error) handler
instead of 2 or 3 different handlers with identical signature. It is
trivial in the handler to determine if we have an error, directive,
or general comment.
Finally, because directives are comments, when reporting directives
the full comment text is returned now rather than just the directive
text. This simplifies the implementation and makes the scanner API
more regular. Furthermore, it provides important information about
the comment style used by a directive, which may matter eventually
when we fully implement /*line file:line:col*/ directives.
Change-Id: I2adbfcebecd615e4237ed3a832b6ceb9518bf09c
Reviewed-on: https://go-review.googlesource.com/88215
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
R=go1.11
This implements parsing of /*line file:line*/ and /*line file:line:col*/
directives and also extends the optional column format to regular //line
directives, per #22662.
For a line directive to be recognized, its comment text must start with
the prefix "line " which is followed by one of the following:
:line
:line:col
filename:line
filename:line:col
with at least one : present. The line and col values must be unsigned
decimal integers; everything before is considered part of the filename.
Valid line directives are:
//line :123
//line :123:8
//line foo.go:123
//line C:foo.go:123 (filename is "C:foo.go")
//line C:foo.go:123:8 (filename is "C:foo.go")
/*line ::123*/ (filename is ":")
No matter the comment format, at the moment all directives act as if
they were in //line comments, and column information is ignored.
To be addressed in subsequent CLs.
For #22662.
Change-Id: I1a2dc54bacc94bc6cdedc5229ee13278971f314e
Reviewed-on: https://go-review.googlesource.com/86037
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
We could leave it alone and fix line offset (column) numbers when
reporting errors, but that is likely to cause confusion (internal
numbers don't match reported numbers). Instead, switch to default
numbering starting at 1.
For package syntax-internal use only, introduced constants defining
the line and column bases, and use them throughout the code and its
tests. It is possible to change these constants and package syntax
will continue to work. But changing them is going to break any client
that makes explicit assumptions about line and column numbers (which
is "all of them").
Change-Id: Ia3d136a8ec8d9372ed9c05ca47d3dff222cf030e
Reviewed-on: https://go-review.googlesource.com/37996
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Towards better syntax error messages: With this change, the parser knows whether
a semicolon was an actual ';' in the source, or whether it was an automatically
inserted semicolon as result of a '\n' or EOF. Using this information in error
messages makes them more understandable.
For #17328.
Change-Id: I8cd9accee8681b62569d0ecef922d38682b401eb
Reviewed-on: https://go-review.googlesource.com/36636
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
- make the scanner unconditionally gc compatible
- consistently use "invalid" instead "illegal" in errors
Reviewed in and cherry-picked from https://go-review.googlesource.com/#/c/33896/.
Change-Id: I4c4253e7392f3311b0d838bbe503576c9469b203
Reviewed-on: https://go-review.googlesource.com/34237
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
- use syntax.Pos in syntax.Error (rather than line, col)
- use syntax.Pos in syntax.PragmaHandler (rather than just line)
- update uses
- better documentation in various places
Also:
- make Pos methods use Pos receiver (rather than *Pos)
Reviewed in and cherry-picked from https://go-review.googlesource.com/#/c/33891/.
With minor adjustments to noder.go to make merge compile.
Change-Id: I5507cea6c2be46a7677087c1aeb69382d31033eb
Reviewed-on: https://go-review.googlesource.com/34236
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed in and cherry-picked from https://go-review.googlesource.com/#/c/33873/.
- simplify error handling in source.go
(move handling of first error into parser, where it belongs)
- clean up error handling in scanner.go
- move pragma and position base handling from scanner
to parser where it belongs
- have separate error methods in parser to avoid confusion
with handlers from scanner.go and source.go
- (source.go) and (scanner.go, source.go, tokens.go)
may be stand-alone packages if so desired, which means
these files are now less entangled and easier to maintain
Change-Id: I81510fc7ef943b78eaa49092c0eab2075a05878c
Reviewed-on: https://go-review.googlesource.com/34235
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed in and cherry-picked from https://go-review.googlesource.com/#/c/33764/.
Minor adjustment in noder.go to make merge compile again.
Change-Id: Ib5029b52b59944f207b0f2438c8a5aa576eb25b8
Reviewed-on: https://go-review.googlesource.com/34233
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This reverts commit 32db3f2756.
Reason: Decision to back out current alias implementation.
For #16339.
Change-Id: Ib05e3d96041d8347e49cae292f66bec791a1fdc8
Reviewed-on: https://go-review.googlesource.com/32825
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Permits parsing of alias declarations with -newparser
const/type/var/func T => p.T
but the compiler will reject it with an error. For now this
also accepts
type T = p.T
so we can experiment with a type-alias only scenario.
- renamed _Arrow token to _Larrow (<-)
- introduced _Rarrow token (=>)
- introduced AliasDecl node
- extended scanner to accept _Rarrow
- extended parser and printer to handle alias declarations
Change-Id: I0170d10a87df8255db9186d466b6fd405228c38e
Reviewed-on: https://go-review.googlesource.com/29355
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Instead of saving all pragmas and processing them after parsing is
finished, process them immediately during scanning like the current
lexer does.
This is a bit unfortunate because it means we can't use
syntax.ParseFile to concurrently parse files yet, but it fixes how we
report syntax errors in the presence of //line pragmas.
While here, add a bunch more gcCompat entries to syntax/parser.go to
get "go build -toolexec='toolstash -cmp' std cmd" passing. There are
still a few remaining cases only triggered building unit tests, but
this seems like a nice checkpoint.
Change-Id: Iaf3bbcf2849857a460496f31eea228e0c585ce13
Reviewed-on: https://go-review.googlesource.com/28226
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
This makes a bunch of changes to package syntax to tweak line numbers
for AST nodes. For example, short variable declaration statements are
now associated with the location of the ":=" token, and function calls
are associated with the location of the final ")" token. These help
satisfy many unit tests that assume the old parser's behavior.
Because many of these changes are questionable, they're guarded behind
a new "gcCompat" const to make them easy to identify and revisit in
the future.
A handful of remaining tests are too difficult to make behave
identically. These have been updated to execute with -newparser=0 and
comments explaining why they need to be fixed.
all.bash now passes with both the old and new parsers.
Change-Id: Iab834b71ca8698d39269f261eb5c92a0d55a3bf4
Reviewed-on: https://go-review.googlesource.com/27199
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
Syntax tree nodes, scanner, parser, basic printers.
Builds syntax trees for entire Go std lib at a rate of ~1.8M lines/s
in warmed up state (MacMini, 2.3 GHz Intel Core i7, 8GB RAM):
$ go test -run StdLib -fast
parsed 1074617 lines (2832 files) in 579.66364ms (1853863 lines/s)
allocated 282.212Mb (486.854Mb/s)
PASS
Change-Id: Ie26d9a7bf4e5ff07457aedfcc9b89f0eba72ae3f
Reviewed-on: https://go-review.googlesource.com/27195
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Robert Griesemer <gri@golang.org>