Commit graph

19 commits

Author SHA1 Message Date
Robert Griesemer
33eb0633e1 cmd/compile/internal/syntax: don't assume (operator) ~ means operator ^
The scanner assumed that ~ really meant ^, which may be helpful when
coming from C. But ~ is not a valid Go token, and pretending that it
should be ^ can lead to confusing error messages. Better to be upfront
about it and complain about the invalid character in the first place.

This was code "inherited" from the original yacc parser which was
derived from a C compiler. It's 10 years later and we can probably
assume that people are less confused about C and Go.

Fixes #23587.

Change-Id: I8d8f9b55b0dff009b75c1530d729bf9092c5aea6
Reviewed-on: https://go-review.googlesource.com/94160
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-02-15 16:41:24 +00:00
Robert Griesemer
b890688986 cmd/compile/internal/syntax: implement comment reporting in scanner
R=go1.11

In order to collect comments in the AST and for error testing purposes,
the scanner needs to not only recognize and skip comments, but also be
able to report them if so desired. This change adds a mode flag to the
scanner's init function which controls the scanner behavior around
comments.

In the common case where comments are not needed, there must be no
significant overhead. Thus, comments are reported via a handler upcall
rather than being returned as a _Comment token (which the parser would
have to filter out with every scanner.next() call).

Because the handlers for error messages, directives, and comments all
look the same (they take a position and text), and because directives
look like comments, and errors never start with a '/', this change
simplifies the scanner's init call to only take one (error) handler
instead of 2 or 3 different handlers with identical signature. It is
trivial in the handler to determine if we have an error, directive,
or general comment.

Finally, because directives are comments, when reporting directives
the full comment text is returned now rather than just the directive
text. This simplifies the implementation and makes the scanner API
more regular. Furthermore, it provides important information about
the comment style used by a directive, which may matter eventually
when we fully implement /*line file:line:col*/ directives.

Change-Id: I2adbfcebecd615e4237ed3a832b6ceb9518bf09c
Reviewed-on: https://go-review.googlesource.com/88215
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-02-12 22:57:57 +00:00
Robert Griesemer
ac45cb9aa0 cmd/compile/internal/syntax: permit /*line file:line:col*/ directives
R=go1.11

This implements parsing of /*line file:line*/ and /*line file:line:col*/
directives and also extends the optional column format to regular //line
directives, per #22662.

For a line directive to be recognized, its comment text must start with
the prefix "line " which is followed by one of the following:

:line
:line:col
filename:line
filename:line:col

with at least one : present. The line and col values must be unsigned
decimal integers; everything before is considered part of the filename.

Valid line directives are:

//line :123
//line :123:8
//line foo.go:123
//line C:foo.go:123	(filename is "C:foo.go")
//line C:foo.go:123:8	(filename is "C:foo.go")
/*line ::123*/		(filename is ":")

No matter the comment format, at the moment all directives act as if
they were in //line comments, and column information is ignored.
To be addressed in subsequent CLs.

For #22662.

Change-Id: I1a2dc54bacc94bc6cdedc5229ee13278971f314e
Reviewed-on: https://go-review.googlesource.com/86037
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-02-12 22:57:52 +00:00
Robert Griesemer
1f9f0ea32b cmd/compile/internal/syntax: start line offset (column) numbers at 1
We could leave it alone and fix line offset (column) numbers when
reporting errors, but that is likely to cause confusion (internal
numbers don't match reported numbers). Instead, switch to default
numbering starting at 1.

For package syntax-internal use only, introduced constants defining
the line and column bases, and use them throughout the code and its
tests. It is possible to change these constants and package syntax
will continue to work. But changing them is going to break any client
that makes explicit assumptions about line and column numbers (which
is "all of them").

Change-Id: Ia3d136a8ec8d9372ed9c05ca47d3dff222cf030e
Reviewed-on: https://go-review.googlesource.com/37996
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-09 23:28:48 +00:00
Robert Griesemer
1693e7b6f2 cmd/compile/internal/syntax: better errors and recovery for invalid character literals
Fixes #15611.

Change-Id: I352b145026466cafef8cf87addafbd30716bda24
Reviewed-on: https://go-review.googlesource.com/37138
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-02-16 21:46:43 +00:00
Robert Griesemer
d390283ff4 cmd/compile/internal/syntax: compiler directives must start at beginning of line
- ignore them, if they don't.
- added tests

Fixes #18393.

Change-Id: I13f87b81ac6b9138ab5031bb3dd6bebc4c548156
Reviewed-on: https://go-review.googlesource.com/37020
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-02-15 06:49:21 +00:00
Robert Griesemer
9799622f09 cmd/compile/internal/syntax: differentiate between ';' and '\n' in syntax errors
Towards better syntax error messages: With this change, the parser knows whether
a semicolon was an actual ';' in the source, or whether it was an automatically
inserted semicolon as result of a '\n' or EOF. Using this information in error
messages makes them more understandable.

For #17328.

Change-Id: I8cd9accee8681b62569d0ecef922d38682b401eb
Reviewed-on: https://go-review.googlesource.com/36636
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-02-09 01:45:17 +00:00
Robert Griesemer
4b8895e2dd [dev.inline] cmd/compile/internal/syntax: remove gcCompat uses in scanner
- make the scanner unconditionally gc compatible
- consistently use "invalid" instead "illegal" in errors

Reviewed in and cherry-picked from https://go-review.googlesource.com/#/c/33896/.

Change-Id: I4c4253e7392f3311b0d838bbe503576c9469b203
Reviewed-on: https://go-review.googlesource.com/34237
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09 01:35:23 +00:00
Robert Griesemer
3d5df64b3f [dev.inline] cmd/compile/internal/syntax: use syntax.Pos for all external positions
- use syntax.Pos in syntax.Error (rather than line, col)
- use syntax.Pos in syntax.PragmaHandler (rather than just line)
- update uses
- better documentation in various places

Also:
- make Pos methods use Pos receiver (rather than *Pos)

Reviewed in and cherry-picked from https://go-review.googlesource.com/#/c/33891/.
With minor adjustments to noder.go to make merge compile.

Change-Id: I5507cea6c2be46a7677087c1aeb69382d31033eb
Reviewed-on: https://go-review.googlesource.com/34236
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09 01:35:14 +00:00
Robert Griesemer
54ef0447fe [dev.inline] cmd/compile/internal/syntax: clean up error and pragma handling
Reviewed in and cherry-picked from https://go-review.googlesource.com/#/c/33873/.

- simplify error handling in source.go
  (move handling of first error into parser, where it belongs)

- clean up error handling in scanner.go

- move pragma and position base handling from scanner
  to parser where it belongs

- have separate error methods in parser to avoid confusion
  with handlers from scanner.go and source.go

- (source.go) and (scanner.go, source.go, tokens.go)
  may be stand-alone packages if so desired, which means
  these files are now less entangled and easier to maintain

Change-Id: I81510fc7ef943b78eaa49092c0eab2075a05878c
Reviewed-on: https://go-review.googlesource.com/34235
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-12-09 01:35:03 +00:00
Robert Griesemer
e97c8a592f [dev.inline] cmd/compile/internal/syntax: simplified position code
Reviewed in and cherry-picked from https://go-review.googlesource.com/#/c/33805/.

Change-Id: I859d9bd5f2256ca78f7b24b330290f7ae600854d
Reviewed-on: https://go-review.googlesource.com/34234
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-12-09 01:35:00 +00:00
Robert Griesemer
32bf2829a1 [dev.inline] cmd/compile/internal/syntax: process //line pragmas in scanner
Reviewed in and cherry-picked from https://go-review.googlesource.com/#/c/33764/.

Minor adjustment in noder.go to make merge compile again.

Change-Id: Ib5029b52b59944f207b0f2438c8a5aa576eb25b8
Reviewed-on: https://go-review.googlesource.com/34233
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-12-09 00:43:04 +00:00
Robert Griesemer
8d20b25779 [dev.inline] cmd/compile/internal/syntax: introduce general position info for nodes
Reviewed in and cherry-picked from https://go-review.googlesource.com/#/c/33758/.
Minor adjustments in noder.go to fix merge.

Change-Id: Ibe429e327c7f8554f8ac205c61ce3738013aed98
Reviewed-on: https://go-review.googlesource.com/34231
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09 00:42:42 +00:00
Robert Griesemer
429edcff10 Revert "cmd/compile/internal/syntax: support for alias declarations"
This reverts commit 32db3f2756.

Reason: Decision to back out current alias implementation.

For #16339.

Change-Id: Ib05e3d96041d8347e49cae292f66bec791a1fdc8
Reviewed-on: https://go-review.googlesource.com/32825
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-11-05 00:46:39 +00:00
Robert Griesemer
32db3f2756 cmd/compile/internal/syntax: support for alias declarations
Permits parsing of alias declarations with -newparser

	const/type/var/func T => p.T

but the compiler will reject it with an error. For now this
also accepts

	type T = p.T

so we can experiment with a type-alias only scenario.

- renamed _Arrow token to _Larrow (<-)
- introduced _Rarrow token (=>)
- introduced AliasDecl node
- extended scanner to accept _Rarrow
- extended parser and printer to handle alias declarations

Change-Id: I0170d10a87df8255db9186d466b6fd405228c38e
Reviewed-on: https://go-review.googlesource.com/29355
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-09-16 20:58:20 +00:00
Matthew Dempsky
cfc0d6d884 cmd/compile/internal/syntax: remove strbyteseql
cmd/compile already optimizes "string(b) == s" to avoid allocating a
temporary string.

Change-Id: I4244fbeae8d350261494135c357f9a6e2ab7ace3
Reviewed-on: https://go-review.googlesource.com/28931
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-09 19:25:56 +00:00
Matthew Dempsky
ee161e8591 cmd/compile: handle pragmas immediately with -newparser=1
Instead of saving all pragmas and processing them after parsing is
finished, process them immediately during scanning like the current
lexer does.

This is a bit unfortunate because it means we can't use
syntax.ParseFile to concurrently parse files yet, but it fixes how we
report syntax errors in the presence of //line pragmas.

While here, add a bunch more gcCompat entries to syntax/parser.go to
get "go build -toolexec='toolstash -cmp' std cmd" passing. There are
still a few remaining cases only triggered building unit tests, but
this seems like a nice checkpoint.

Change-Id: Iaf3bbcf2849857a460496f31eea228e0c585ce13
Reviewed-on: https://go-review.googlesource.com/28226
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2016-08-31 19:49:53 +00:00
Matthew Dempsky
70544c91ff cmd/compile/internal/syntax: match old parser errors and line numbers
This makes a bunch of changes to package syntax to tweak line numbers
for AST nodes. For example, short variable declaration statements are
now associated with the location of the ":=" token, and function calls
are associated with the location of the final ")" token. These help
satisfy many unit tests that assume the old parser's behavior.

Because many of these changes are questionable, they're guarded behind
a new "gcCompat" const to make them easy to identify and revisit in
the future.

A handful of remaining tests are too difficult to make behave
identically. These have been updated to execute with -newparser=0 and
comments explaining why they need to be fixed.

all.bash now passes with both the old and new parsers.

Change-Id: Iab834b71ca8698d39269f261eb5c92a0d55a3bf4
Reviewed-on: https://go-review.googlesource.com/27199
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
2016-08-19 01:10:21 +00:00
Robert Griesemer
c8683ff797 cmd/compile/internal/syntax: fast Go syntax trees, initial commit.
Syntax tree nodes, scanner, parser, basic printers.

Builds syntax trees for entire Go std lib at a rate of ~1.8M lines/s
in warmed up state (MacMini, 2.3 GHz Intel Core i7, 8GB RAM):

$ go test -run StdLib -fast
parsed 1074617 lines (2832 files) in 579.66364ms (1853863 lines/s)
allocated 282.212Mb (486.854Mb/s)
PASS

Change-Id: Ie26d9a7bf4e5ff07457aedfcc9b89f0eba72ae3f
Reviewed-on: https://go-review.googlesource.com/27195
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
2016-08-18 21:33:38 +00:00