Commit graph

103 commits

Author SHA1 Message Date
Russ Cox
80c98fa901 runtime/trace: record event sequence numbers explicitly
Nearly all the flaky failures we've seen in trace tests have been
due to the use of time stamps to determine relative event ordering.
This is tricky for many reasons, including:
 - different cores might not have exactly synchronized clocks
 - VMs are worse than real hardware
 - non-x86 chips have different timer resolution than x86 chips
 - on fast systems two events can end up with the same time stamp

Stop trying to make time reliable. It's clearly not going to be for Go 1.5.
Instead, record an explicit event sequence number for ordering.
Using our own counter solves all of the above problems.

The trace still contains time stamps, of course. The sequence number
is just used for ordering.

Should alleviate #10554 somewhat. Then tickDiv can be chosen to
be a useful time unit instead of having to be exact for ordering.

Separating ordering and time stamps lets the trace parser diagnose
systems where the time stamp order and actual order do not match
for one reason or another. This CL adds that check to the end of
trace.Parse, after all other sequence order-based checking.
If that error is found, we skip the test instead of failing it.
Putting the check in trace.Parse means that cmd/trace will pick
up the same check, refusing to display a trace where the time stamps
do not match actual ordering.

Using net/http's BenchmarkClientServerParallel4 on various CPU counts,
not tracing vs tracing:

name                      old time/op    new time/op    delta
ClientServerParallel4       50.4µs ± 4%    80.2µs ± 4%  +59.06%        (p=0.000 n=10+10)
ClientServerParallel4-2     33.1µs ± 7%    57.8µs ± 5%  +74.53%        (p=0.000 n=10+10)
ClientServerParallel4-4     18.5µs ± 4%    32.6µs ± 3%  +75.77%        (p=0.000 n=10+10)
ClientServerParallel4-6     12.9µs ± 5%    24.4µs ± 2%  +89.33%        (p=0.000 n=10+10)
ClientServerParallel4-8     11.4µs ± 6%    21.0µs ± 3%  +83.40%        (p=0.000 n=10+10)
ClientServerParallel4-12    14.4µs ± 4%    23.8µs ± 4%  +65.67%        (p=0.000 n=10+10)

Fixes #10512.

Change-Id: I173eecf8191e86feefd728a5aad25bf1bc094b12
Reviewed-on: https://go-review.googlesource.com/12579
Reviewed-by: Austin Clements <austin@google.com>
2015-07-29 22:32:14 +00:00
Austin Clements
87f97c73d3 runtime: avoid race between SIGPROF traceback and stack barriers
The following sequence of events can lead to the runtime attempting an
out-of-bounds access on a stack barrier slice:

1. A SIGPROF comes in on a thread while the G on that thread is in
   _Gsyscall. The sigprof handler calls gentraceback, which saves a
   local copy of the G's stkbar slice. Currently the G has no stack
   barriers, so this slice is empty.

2. On another thread, the GC concurrently scans the stack of the
   goroutine being profiled (it considers it stopped because it's in
   _Gsyscall) and installs stack barriers.

3. Back on the sigprof thread, gentraceback comes across a stack
   barrier in the stack and attempts to look it up in its (zero
   length) copy of G's old stkbar slice, which causes an out-of-bounds
   access.

This commit fixes this by adding a simple cas spin to synchronize the
SIGPROF handler with stack barrier insertion.

In general I would prefer that this synchronization be done through
the G status, since that's how stack scans are otherwise synchronized,
but adding a new lock is a much smaller change and G statuses are full
of subtlety.

Fixes #11863.

Change-Id: Ie89614a6238bb9c6a5b1190499b0b48ec759eaf7
Reviewed-on: https://go-review.googlesource.com/12748
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-29 19:31:46 +00:00
Austin Clements
e42413cecc runtime: fix saved PC/SP after safe-point function in syscall
Running a safe-point function on syscall entry uses systemstack() and
hence clobbers g.sched.pc and g.sched.sp. Fix this by re-saving them
after the systemstack, just like in the other uses of systemstack in
reentersyscall.

Change-Id: I47868a53eba24d81919fda56ef6bbcf72f1f922e
Reviewed-on: https://go-review.googlesource.com/12125
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-15 21:09:16 +00:00
Austin Clements
edfc979725 runtime: run safe-point function before entering _Psyscall
Currently, we run a P's safe-point function immediately after entering
_Psyscall state. This is unsafe, since as soon as we put the P in
_Psyscall, we no longer control the P and another M may claim it.
We'll still run the safe-point function only once (because doing so
races on an atomic), but the P may no longer be at a safe-point when
we do so.

In particular, this means that the use of forEachP to dispose all P's
gcw caches is unsafe. A P may enter a syscall, run the safe-point
function, and dispose the P's gcw cache concurrently with another M
claiming the P and attempting to use its gcw cache. If this happens,
we may empty the gcw's workbuf after putting it on
work.{full,partial}, or add pointers to it after putting it in
work.empty. This will cause an assertion failure when we later pop the
workbuf from the list and its object count is inconsistent with the
list we got it from.

Fix this by running the safe-point function just before putting the P
in _Psyscall.

Related to #11640. This probably fixes this issue, but while I'm able
to show that we can enter a bad safe-point state as a result of this,
I can't reproduce that specific failure.

Change-Id: I6989c8ca7ef2a4a941ae1931e9a0748cbbb59434
Reviewed-on: https://go-review.googlesource.com/12124
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-15 21:09:07 +00:00
Brad Fitzpatrick
2ae77376f7 all: link to https instead of http
The one in misc/makerelease/makerelease.go is particularly bad and
probably warrants rotating our keys.

I didn't update old weekly notes, and reverted some changes involving
test code for now, since we're late in the Go 1.5 freeze. Otherwise,
the rest are all auto-generated changes, and all manually reviewed.

Change-Id: Ia2753576ab5d64826a167d259f48a2f50508792d
Reviewed-on: https://go-review.googlesource.com/12048
Reviewed-by: Rob Pike <r@golang.org>
2015-07-11 14:36:33 +00:00
Austin Clements
4b2774f5ea runtime: make sysmon-triggered GC concurrent
sysmon triggers a GC if there has been no GC for two minutes.
Currently, this is a STW GC. There is no reason for this to be STW, so
make it concurrent.

Fixes #10261.

Change-Id: I92f3ac37272d5c2a31480ff1fa897ebad08775a9
Reviewed-on: https://go-review.googlesource.com/11955
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-07-09 05:53:21 +00:00
Russ Cox
2028077899 runtime: randomize scheduling in -race mode
Basic randomization of goroutine scheduling for -race mode.
It is probably possible to do much better (there's a paper linked
in the issue that I haven't read, for example), but this suffices
to introduce at least some unpredictability into the scheduling order.
The goal here is to have _something_ for Go 1.5, so that we don't
start hitting more of these scheduling order-dependent bugs
if we change the scheduler order again in Go 1.6.

For #11372.

Change-Id: Idf1154123fbd5b7a1ee4d339e93f97635cc2bacb
Reviewed-on: https://go-review.googlesource.com/11795
Reviewed-by: Austin Clements <austin@google.com>
2015-07-07 21:27:38 +00:00
Austin Clements
840965f8d7 runtime: always clear stack barriers on G exit
Currently the runtime fails to clear a G's stack barriers in gfput if
the G's stack allocation is _FixedStack bytes. This causes the runtime
to panic if the following sequence of events happens:

1) The runtime installs stack barriers on a G.

2) The G exits by calling runtime.Goexit. Since this does not
   necessarily return through the stack barriers installed on the G,
   there may still be untriggered stack barriers left on the G's stack
   in recorded in g.stkbar.

3) The runtime calls gfput to add the exiting G to the free pool. If
   the G's stack allocation is _FixedStack bytes, we fail to clear
   g.stkbar.

4) A new G starts and allocates the G that was just added to the free
   pool.

5) The new G begins to execute and overwrites the stack slots that had
   stack barriers in them.

6) The garbage collector enters mark termination, attempts to remove
   stack barriers from the new G, and finds that they've been
   overwritten.

Fix this by clearing the stack barriers in gfput in the case where it
reuses the stack.

Fixes #11256.

Change-Id: I377c44258900e6bcc2d4b3451845814a8eeb2bcf
Reviewed-on: https://go-review.googlesource.com/11461
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-29 15:02:30 +00:00
Alex Brainman
9d968cb47b runtime: rename cgocall_errno and asmcgocall_errno into cgocall and asmcgocall
Change-Id: I5917bea8bb35b0e725dcc56a68f3a70137cfc180
Reviewed-on: https://go-review.googlesource.com/9387
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-06-19 01:47:11 +00:00
Dmitry Vyukov
e72f5f67a1 runtime: fix tracing of syscallexit
There were two issues.
1. Delayed EvGoSysExit could have been emitted during TraceStart,
while it had not yet emitted EvGoInSyscall.
2. Delayed EvGoSysExit could have been emitted during next tracing session.

Fixes #10476
Fixes #11262

Change-Id: Iab68eb31cf38eb6eb6eee427f49c5ca0865a8c64
Reviewed-on: https://go-review.googlesource.com/9132
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-18 13:59:55 +00:00
Alex Brainman
2858b73843 runtime: remove cgocall and asmcgocall
In preparation for rename of cgocall_errno into cgocall and
asmcgocall_errno into asmcgocall in the fllowinng CL.
rsc requested CL 9387 to be split into two parts. This is first part.

Change-Id: I7434f0e4b44dd37017540695834bfcb1eebf0b2f
Reviewed-on: https://go-review.googlesource.com/11166
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-06-18 04:42:53 +00:00
Russ Cox
cfa3eda587 runtime: fix race in scanvalid assertion
Change-Id: I389b2e10fe667eaa55f87b71b1e004994694d4a3
Reviewed-on: https://go-review.googlesource.com/11173
Reviewed-by: Austin Clements <austin@google.com>
2015-06-17 20:12:37 +00:00
Russ Cox
3c60e6e8cf runtime: fix races in stack scan
This fixes a hang during runtime.TestTraceStress.
It also fixes double-scan of stacks, which leads to
stack barrier installation failures.

Both of these have shown up as flaky failures on the dashboard.

Fixes #10941.

Change-Id: Ia2a5991ce2c9f43ba06ae1c7032f7c898dc990e0
Reviewed-on: https://go-review.googlesource.com/11089
Reviewed-by: Austin Clements <austin@google.com>
2015-06-17 17:56:26 +00:00
Ainar Garipov
7f9f70e5b6 all: fix misprints in comments
These were found by grepping the comments from the go code and feeding
the output to aspell.

Change-Id: Id734d6c8d1938ec3c36bd94a4dbbad577e3ad395
Reviewed-on: https://go-review.googlesource.com/10941
Reviewed-by: Aamir Khan <syst3m.w0rm@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-06-11 14:18:57 +00:00
Russ Cox
3ffcbb633e runtime: default GOMAXPROCS to NumCPU(), not 1
See golang.org/s/go15gomaxprocs for details.

Change-Id: I8de5df34fa01d31d78f0194ec78a2474c281243c
Reviewed-on: https://go-review.googlesource.com/10668
Reviewed-by: Rob Pike <r@golang.org>
2015-06-05 04:38:04 +00:00
Austin Clements
3f6e69aca5 runtime: steal space for stack barrier tracking from stack
The stack barrier code will need a bookkeeping structure to keep track
of the overwritten return PCs. This commit introduces and allocates
this structure, but does not yet use the structure.

We don't want to allocate space for this structure during garbage
collection, so this commit allocates it along with the allocation of
the corresponding stack. However, we can't do a regular allocation in
newstack because mallocgc may itself grow the stack (which would lead
to a recursive allocation). Hence, this commit makes the bookkeeping
structure part of the stack allocation itself by stealing the
necessary space from the top of the stack allocation. Since the size
of this bookkeeping structure is logarithmic in the size of the stack,
this has minimal impact on stack behavior.

Change-Id: Ia14408be06aafa9ca4867f4e70bddb3fe0e96665
Reviewed-on: https://go-review.googlesource.com/10313
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-02 19:57:57 +00:00
Austin Clements
e610c25df0 runtime: decouple stack bounds and stack allocation size
Currently the runtime assumes that the allocation for the stack is
exactly [stack.lo, stack.hi). We're about to steal a small part of
this allocation for per-stack GC metadata. To prepare for this, this
commit adds a field to the G for the allocated size of the stack.
With this change, stack.lo and stack.hi continue to act as the true
bounds on the stack, but are no longer also used as the bounds on the
stack allocation.

(I also tried this the other way around, where stack.lo and stack.hi
remained the allocation bounds and I introduced a new top of stack.
However, there are far more places that assume stack.hi is the true
top of the stack than there are places that assume it's the top of the
allocation.)

Change-Id: Ifa9d956753be53d286d09cbc73d47fb34a18c0c6
Reviewed-on: https://go-review.googlesource.com/10312
Reviewed-by: Russ Cox <rsc@golang.org>
2015-06-02 19:57:50 +00:00
Elias Naur
84cfba17c2 runtime: don't always unblock all signals
Ian proposed an improved way of handling signals masks in Go, motivated
by a problem where the Android java runtime expects certain signals to
be blocked for all JVM threads. Discussion here

https://groups.google.com/forum/#!topic/golang-dev/_TSCkQHJt6g

Ian's text is used in the following:

A Go program always needs to have the synchronous signals enabled.
These are the signals for which _SigPanic is set in sigtable, namely
SIGSEGV, SIGBUS, SIGFPE.

A Go program that uses the os/signal package, and calls signal.Notify,
needs to have at least one thread which is not blocking that signal,
but it doesn't matter much which one.

Unix programs do not change signal mask across execve.  They inherit
signal masks across fork.  The shell uses this fact to some extent;
for example, the job control signals (SIGTTIN, SIGTTOU, SIGTSTP) are
blocked for commands run due to backquote quoting or $().

Our current position on signal masks was not thought out.  We wandered
into step by step, e.g., http://golang.org/cl/7323067 .

This CL does the following:

Introduce a new platform hook, msigsave, that saves the signal mask of
the current thread to m.sigsave.

Call msigsave from needm and newm.

In minit grab set up the signal mask from m.sigsave and unblock the
essential synchronous signals, and SIGILL, SIGTRAP, SIGPROF, SIGSTKFLT
(for systems that have it).

In unminit, restore the signal mask from m.sigsave.

The first time that os/signal.Notify is called, start a new thread whose
only purpose is to update its signal mask to make sure signals for
signal.Notify are unblocked on at least one thread.

The effect on Go programs will be that if they are invoked with some
non-synchronous signals blocked, those signals will normally be
ignored.  Previously, those signals would mostly be ignored.  A change
in behaviour will occur for programs started with any of these signals
blocked, if they receive the signal: SIGHUP, SIGINT, SIGQUIT, SIGABRT,
SIGTERM.  Previously those signals would always cause a crash (unless
using the os/signal package); with this change, they will be ignored
if the program is started with the signal blocked (and does not use
the os/signal package).

./all.bash completes successfully on linux/amd64.

OpenBSD is missing the implementation.

Change-Id: I188098ba7eb85eae4c14861269cc466f2aa40e8c
Reviewed-on: https://go-review.googlesource.com/10173
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2015-05-22 20:24:08 +00:00
Rick Hudson
197aa9e64d runtime: remove unused quiesce code
This is dead code. If you want to quiesce the system the
preferred way is to use forEachP(func(*p){}).

Change-Id: Ic7677a5dd55e3639b99e78ddeb2c71dd1dd091fa
Reviewed-on: https://go-review.googlesource.com/10267
Reviewed-by: Austin Clements <austin@google.com>
2015-05-20 17:56:44 +00:00
Rick Hudson
913db7685e runtime: run background mark helpers only if work is available
Prior to this CL whenever the GC marking was enabled and
a P was looking for work we supplied a G to help
the GC do its marking tasks. Once this G finished all
the marking available it would release the P to find another
available G. In the case where there was no work the P would drop
into findrunnable which would execute the mark helper G which would
immediately return and the P would drop into findrunnable again repeating
the process. Since the P was always given a G to run it never blocks.
This CL first checks if the GC mark helper G has available work and if
not the P immediately falls through to its blocking logic.

Fixes #10901

Change-Id: I94ac9646866ba64b7892af358888bc9950de23b5
Reviewed-on: https://go-review.googlesource.com/10189
Reviewed-by: Austin Clements <austin@google.com>
2015-05-19 15:57:50 +00:00
Austin Clements
f0dd002895 runtime: use separate count and note for forEachP
Currently, forEachP reuses the stopwait and stopnote fields from
stopTheWorld to track how many Ps have not responded to the safe-point
request and to sleep until all Ps have responded.

It was assumed this was safe because both stopTheWorld and forEachP
must occur under the worlsema and hence stopwait and stopnote cannot
be used for both purposes simultaneously and callers could always
determine the appropriate use based on sched.gcwaiting (which is only
set by stopTheWorld). However, this is not the case, since it's
possible for there to be a window between when an M observes that
gcwaiting is set and when it checks stopwait during which stopwait
could have changed meanings. When this happens, the M decrements
stopwait and may wakeup stopnote, but does not otherwise participate
in the forEachP protocol. As a result, stopwait is decremented too
many times, so it may reach zero before all Ps have run the safe-point
function, causing forEachP to wake up early. It will then either
observe that some P has not run the safe-point function and panic with
"P did not run fn", or the remaining P (or Ps) will run the safe-point
function before it wakes up and it will observe that stopwait is
negative and panic with "not stopped".

Fix this problem by giving forEachP its own safePointWait and
safePointNote fields.

One known sequence of events that can cause this race is as
follows. It involves three actors:

G1 is running on M1 on P1. P1 has an empty run queue.

G2/M2 is in a blocked syscall and has lost its P. (The details of this
don't matter, it just needs to be in a position where it needs to grab
an idle P.)

GC just started on G3/M3/P3. (These aren't very involved, they just
have to be separate from the other G's, M's, and P's.)

1. GC calls stopTheWorld(), which sets sched.gcwaiting to 1.

Now G1/M1 begins to enter a syscall:

2. G1/M1 invokes reentersyscall, which sets the P1's status to
   _Psyscall.

3. G1/M1's reentersyscall observes gcwaiting != 0 and calls
   entersyscall_gcwait.

4. G1/M1's entersyscall_gcwait blocks acquiring sched.lock.

Back on GC:

5. stopTheWorld cas's P1's status to _Pgcstop, does other stuff, and
   returns.

6. GC does stuff and then calls startTheWorld().

7. startTheWorld() calls procresize(), which sets P1's status to
   _Pidle and puts P1 on the idle list.

Now G2/M2 returns from its syscall and takes over P1:

8. G2/M2 returns from its blocked syscall and gets P1 from the idle
   list.

9. G2/M2 acquires P1, which sets P1's status to _Prunning.

10. G2/M2 starts a new syscall and invokes reentersyscall, which sets
    P1's status to _Psyscall.

Back on G1/M1:

11. G1/M1 finally acquires sched.lock in entersyscall_gcwait.

At this point, G1/M1 still thinks it's running on P1. P1's status is
_Psyscall, which is consistent with what G1/M1 is doing, but it's
_Psyscall because *G2/M2* put it in to _Psyscall, not G1/M1. This is
basically an ABA race on P1's status.

Because forEachP currently shares stopwait with stopTheWorld. G1/M1's
entersyscall_gcwait observes the non-zero stopwait set by forEachP,
but mistakes it for a stopTheWorld. It cas's P1's status from
_Psyscall (set by G2/M2) to _Pgcstop and proceeds to decrement
stopwait one more time than forEachP was expecting.

Fixes #10618. (See the issue for details on why the above race is safe
when forEachP is not involved.)

Prior to this commit, the command
  stress ./runtime.test -test.run TestFutexsleep\|TestGoroutineProfile
would reliably fail after a few hundred runs. With this commit, it
ran for over 2 million runs and never crashed.

Change-Id: I9a91ea20035b34b6e5f07ef135b144115f281f30
Reviewed-on: https://go-review.googlesource.com/10157
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 14:55:47 +00:00
Austin Clements
277acca286 runtime: hold worldsema while starting the world
Currently, startTheWorld releases worldsema before starting the
world. Since startTheWorld can change gomaxprocs after allowing Ps to
run, this means that gomaxprocs can change while another P holds
worldsema.

Unfortunately, the garbage collector and forEachP assume that holding
worldsema protects against changes in gomaxprocs (which it *almost*
does). In particular, this is causing somewhat frequent "P did not run
fn" crashes in forEachP in the runtime tests because gomaxprocs is
changing between the several loops that forEachP does over all the Ps.

Fix this by only releasing worldsema after the world is started.

This relates to issue #10618. forEachP still fails under stress
testing, but much less frequently.

Change-Id: I085d627b70cca9ebe9af28fe73b9872f1bb224ff
Reviewed-on: https://go-review.googlesource.com/10156
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 14:55:37 +00:00
Austin Clements
9c44a41dd5 runtime: disallow preemption during startTheWorld
Currently, startTheWorld clears preemptoff for the current M before
starting the world. A few callers increment m.locks around
startTheWorld, presumably to prevent preemption any time during
starting the world. This is almost certainly pointless (none of the
other callers do this), but there's no harm in making startTheWorld
keep preemption disabled until it's all done, which definitely lets us
drop these m.locks manipulations.

Change-Id: I8a93658abd0c72276c9bafa3d2c7848a65b4691a
Reviewed-on: https://go-review.googlesource.com/10155
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 14:55:31 +00:00
Austin Clements
a1da255aa0 runtime: factor stoptheworld/starttheworld pattern
There are several steps to stopping and starting the world and
currently they're open-coded in several places. The garbage collector
is the only thing that needs to stop and start the world in a
non-trivial pattern. Replace all other uses with calls to higher-level
functions that implement the entire pattern necessary to stop and
start the world.

This is a pure refectoring and should not change any code semantics.
In the following commits, we'll make changes that are easier to do
with this abstraction in place.

This commit renames the old starttheworld to startTheWorldWithSema.
This is a slight misnomer right now because the callers release
worldsema just before calling this. However, a later commit will swap
these and I don't want to think of another name in the mean time.

Change-Id: I5dc97f87b44fb98963c49c777d7053653974c911
Reviewed-on: https://go-review.googlesource.com/10154
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-18 14:55:25 +00:00
Austin Clements
a0fc306023 runtime: eliminate runqvictims and a copy from runqsteal
Currently, runqsteal steals Gs from another P into an intermediate
buffer and then copies those Gs into the current P's run queue. This
intermediate buffer itself was moved from the stack to the P in commit
c4fe503 to eliminate the cost of zeroing it on every steal.

This commit follows up c4fe503 by stealing directly into the current
P's run queue, which eliminates the copy and the need for the
intermediate buffer. The update to the tail pointer is only committed
once the entire steal operation has succeeded, so the semantics of
stealing do not change.

Change-Id: Icdd7a0eb82668980bf42c0154b51eef6419fdd51
Reviewed-on: https://go-review.googlesource.com/9998
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-05-17 01:08:42 +00:00
Rick Hudson
c4fe503119 runtime: reduce thrashing of gs between ps
One important use case is a pipeline computation that pass values
from one Goroutine to the next and then exits or is placed in a
wait state. If GOMAXPROCS > 1 a Goroutine running on P1 will enable
another Goroutine and then immediately make P1 available to execute
it. We need to prevent other Ps from stealing the G that P1 is about
to execute. Otherwise the Gs can thrash between Ps causing unneeded
synchronization and slowing down throughput.

Fix this by changing the stealing logic so that when a P attempts to
steal the only G on some other P's run queue, it will pause
momentarily to allow the victim P to schedule the G.

As part of optimizing stealing we also use a per P victim queue
move stolen gs. This eliminates the zeroing of a stack local victim
queue which turned out to be expensive.

This CL is a necessary but not sufficient prerequisite to changing
the default value of GOMAXPROCS to something > 1 which is another
CL/discussion.

For highly serialized programs, such as GoroutineRing below this can
make a large difference. For larger and more parallel programs such
as the x/benchmarks there is no noticeable detriment.

~/work/code/src/rsc.io/benchstat/benchstat old.txt new.txt
name                old mean              new mean              delta
GoroutineRing       30.2µs × (0.98,1.01)  30.1µs × (0.97,1.04)     ~    (p=0.941)
GoroutineRing-2      113µs × (0.91,1.07)    30µs × (0.98,1.03)  -73.17% (p=0.004)
GoroutineRing-4      144µs × (0.98,1.02)    32µs × (0.98,1.01)  -77.69% (p=0.000)
GoroutineRingBuf    32.7µs × (0.97,1.03)  32.5µs × (0.97,1.02)     ~    (p=0.795)
GoroutineRingBuf-2   120µs × (0.92,1.08)    33µs × (1.00,1.00)  -72.48% (p=0.004)
GoroutineRingBuf-4   138µs × (0.92,1.06)    33µs × (1.00,1.00)  -76.21% (p=0.003)

The bench benchmarks show little impact.
    	  	      old  	 new
garbage	      	      7032879	 7011696
httpold		        25509	   25301
splayold	      1022073	 1019499
jsonold		     28230624   28081433

Change-Id: I228c48fed8d85c9bbef16a7edc53ab7898506f50
Reviewed-on: https://go-review.googlesource.com/9872
Reviewed-by: Austin Clements <austin@google.com>
2015-05-13 12:55:24 +00:00
Austin Clements
350fd548b3 runtime: don't run runq tests on the system stack
Running these tests on the system stack is problematic because they
allocate Ps, which are large enough to overflow the system stack if
they are stack-allocated. It used to be necessary to run these tests
on the system stack because they were written in C, but since this is
no longer the case, we can fix this problem by simply not running the
tests on the system stack.

This also means we no longer need the hack in one of these tests that
forces the allocated Ps to escape to the heap, so eliminate that as
well.

Change-Id: I9064f5f8fd7f7b446ff39a22a70b172cfcb2dc57
Reviewed-on: https://go-review.googlesource.com/9923
Reviewed-by: Rick Hudson <rlh@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-05-12 19:58:08 +00:00
Russ Cox
1635ab7dfe runtime: remove wbshadow mode
The write barrier shadow heap was very useful for
developing the write barriers initially, but it's no longer used,
clunky, and dragging the rest of the implementation down.

The gccheckmark mode will find bugs due to missed barriers
when they result in missed marks; wbshadow mode found the
missed barriers more aggressively, but it required an entire
separate copy of the heap. The gccheckmark mode requires
no extra memory, making it more useful in practice.

Compared to previous CL:
name                   old mean              new mean              delta
BinaryTree17            5.91s × (0.96,1.06)   5.72s × (0.97,1.03)  -3.12% (p=0.000)
Fannkuch11              4.32s × (1.00,1.00)   4.36s × (1.00,1.00)  +0.91% (p=0.000)
FmtFprintfEmpty        89.0ns × (0.93,1.10)  86.6ns × (0.96,1.11)    ~    (p=0.077)
FmtFprintfString        298ns × (0.98,1.06)   283ns × (0.99,1.04)  -4.90% (p=0.000)
FmtFprintfInt           286ns × (0.98,1.03)   283ns × (0.98,1.04)  -1.09% (p=0.032)
FmtFprintfIntInt        498ns × (0.97,1.06)   480ns × (0.99,1.02)  -3.65% (p=0.000)
FmtFprintfPrefixedInt   408ns × (0.98,1.02)   396ns × (0.99,1.01)  -3.00% (p=0.000)
FmtFprintfFloat         587ns × (0.98,1.01)   562ns × (0.99,1.01)  -4.34% (p=0.000)
FmtManyArgs            1.94µs × (0.99,1.02)  1.89µs × (0.99,1.01)  -2.85% (p=0.000)
GobDecode              15.8ms × (0.98,1.03)  15.7ms × (0.99,1.02)    ~    (p=0.251)
GobEncode              12.0ms × (0.96,1.09)  11.8ms × (0.98,1.03)  -1.87% (p=0.024)
Gzip                    648ms × (0.99,1.01)   647ms × (0.99,1.01)    ~    (p=0.688)
Gunzip                  143ms × (1.00,1.01)   143ms × (1.00,1.01)    ~    (p=0.203)
HTTPClientServer       90.3µs × (0.98,1.01)  89.1µs × (0.99,1.02)  -1.30% (p=0.000)
JSONEncode             31.6ms × (0.99,1.01)  31.7ms × (0.98,1.02)    ~    (p=0.219)
JSONDecode              107ms × (1.00,1.01)   111ms × (0.99,1.01)  +3.58% (p=0.000)
Mandelbrot200          6.03ms × (1.00,1.01)  6.01ms × (1.00,1.00)    ~    (p=0.077)
GoParse                6.53ms × (0.99,1.03)  6.54ms × (0.99,1.02)    ~    (p=0.585)
RegexpMatchEasy0_32     161ns × (1.00,1.01)   161ns × (0.98,1.05)    ~    (p=0.948)
RegexpMatchEasy0_1K     541ns × (0.99,1.01)   559ns × (0.98,1.01)  +3.32% (p=0.000)
RegexpMatchEasy1_32     138ns × (1.00,1.00)   137ns × (0.99,1.01)  -0.55% (p=0.001)
RegexpMatchEasy1_1K     887ns × (0.99,1.01)   878ns × (0.99,1.01)  -0.98% (p=0.000)
RegexpMatchMedium_32    253ns × (0.99,1.01)   252ns × (0.99,1.01)  -0.39% (p=0.001)
RegexpMatchMedium_1K   72.8µs × (1.00,1.00)  72.7µs × (1.00,1.00)    ~    (p=0.485)
RegexpMatchHard_32     3.85µs × (1.00,1.01)  3.85µs × (1.00,1.01)    ~    (p=0.283)
RegexpMatchHard_1K      117µs × (1.00,1.01)   117µs × (1.00,1.00)    ~    (p=0.175)
Revcomp                 922ms × (0.97,1.08)   903ms × (0.98,1.05)  -2.15% (p=0.021)
Template                126ms × (0.99,1.01)   126ms × (0.99,1.01)    ~    (p=0.943)
TimeParse               628ns × (0.99,1.01)   634ns × (0.99,1.01)  +0.92% (p=0.000)
TimeFormat              668ns × (0.99,1.01)   698ns × (0.98,1.03)  +4.53% (p=0.000)

It's nice that the microbenchmarks are the ones helped the most,
because those were the ones hurt the most by the conversion from
4-bit to 2-bit heap bitmaps. This CL brings the overall effect of that
process to (compared to CL 9706 patch set 1):

name                   old mean              new mean              delta
BinaryTree17            5.87s × (0.94,1.09)   5.72s × (0.97,1.03)  -2.57% (p=0.011)
Fannkuch11              4.32s × (1.00,1.00)   4.36s × (1.00,1.00)  +0.87% (p=0.000)
FmtFprintfEmpty        89.1ns × (0.95,1.16)  86.6ns × (0.96,1.11)    ~    (p=0.090)
FmtFprintfString        283ns × (0.98,1.02)   283ns × (0.99,1.04)    ~    (p=0.681)
FmtFprintfInt           284ns × (0.98,1.04)   283ns × (0.98,1.04)    ~    (p=0.620)
FmtFprintfIntInt        486ns × (0.98,1.03)   480ns × (0.99,1.02)  -1.27% (p=0.002)
FmtFprintfPrefixedInt   400ns × (0.99,1.02)   396ns × (0.99,1.01)  -0.84% (p=0.001)
FmtFprintfFloat         566ns × (0.99,1.01)   562ns × (0.99,1.01)  -0.80% (p=0.000)
FmtManyArgs            1.91µs × (0.99,1.02)  1.89µs × (0.99,1.01)  -1.10% (p=0.000)
GobDecode              15.5ms × (0.98,1.05)  15.7ms × (0.99,1.02)  +1.55% (p=0.005)
GobEncode              11.9ms × (0.97,1.03)  11.8ms × (0.98,1.03)  -0.97% (p=0.048)
Gzip                    648ms × (0.99,1.01)   647ms × (0.99,1.01)    ~    (p=0.627)
Gunzip                  143ms × (1.00,1.00)   143ms × (1.00,1.01)    ~    (p=0.482)
HTTPClientServer       89.2µs × (0.99,1.02)  89.1µs × (0.99,1.02)    ~    (p=0.740)
JSONEncode             32.3ms × (0.97,1.06)  31.7ms × (0.98,1.02)  -1.95% (p=0.002)
JSONDecode              106ms × (0.99,1.01)   111ms × (0.99,1.01)  +4.22% (p=0.000)
Mandelbrot200          6.02ms × (1.00,1.00)  6.01ms × (1.00,1.00)    ~    (p=0.417)
GoParse                6.57ms × (0.97,1.06)  6.54ms × (0.99,1.02)    ~    (p=0.404)
RegexpMatchEasy0_32     162ns × (1.00,1.00)   161ns × (0.98,1.05)    ~    (p=0.088)
RegexpMatchEasy0_1K     561ns × (0.99,1.02)   559ns × (0.98,1.01)  -0.47% (p=0.034)
RegexpMatchEasy1_32     145ns × (0.95,1.04)   137ns × (0.99,1.01)  -5.56% (p=0.000)
RegexpMatchEasy1_1K     864ns × (0.99,1.04)   878ns × (0.99,1.01)  +1.57% (p=0.000)
RegexpMatchMedium_32    255ns × (0.99,1.04)   252ns × (0.99,1.01)  -1.43% (p=0.001)
RegexpMatchMedium_1K   73.9µs × (0.98,1.04)  72.7µs × (1.00,1.00)  -1.55% (p=0.004)
RegexpMatchHard_32     3.92µs × (0.98,1.04)  3.85µs × (1.00,1.01)  -1.80% (p=0.003)
RegexpMatchHard_1K      120µs × (0.98,1.04)   117µs × (1.00,1.00)  -2.13% (p=0.001)
Revcomp                 936ms × (0.95,1.08)   903ms × (0.98,1.05)  -3.58% (p=0.002)
Template                130ms × (0.98,1.04)   126ms × (0.99,1.01)  -2.98% (p=0.000)
TimeParse               638ns × (0.98,1.05)   634ns × (0.99,1.01)    ~    (p=0.198)
TimeFormat              674ns × (0.99,1.01)   698ns × (0.98,1.03)  +3.69% (p=0.000)

Change-Id: Ia0e9b50b1d75a3c0c7556184cd966305574fe07c
Reviewed-on: https://go-review.googlesource.com/9706
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-05-11 14:55:11 +00:00
Daniel Morsing
db6f88a84b runtime: enable profiling on g0
Since we now have stack information for code running on the
systemstack, we can traceback over it. To make cpu profiles useful,
add a case in gentraceback to jump over systemstack switches.

Fixes #10609.

Change-Id: I21f47fcc802c07c5d4a1ada56374314e388a6dc7
Reviewed-on: https://go-review.googlesource.com/9506
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-05-11 08:44:30 +00:00
Michael Hudson-Doyle
fa896733b5 runtime: check consistency of all module data objects
Current code just checks the consistency (that the functab is correctly
sorted by PC, etc) of the moduledata object that the runtime belongs to.
Change to check all of them.

Change-Id: I544a44c5de7445fff87d3cdb4840ff04c5e2bf75
Reviewed-on: https://go-review.googlesource.com/9773
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-05-07 15:06:08 +00:00
Keith Randall
5a828cfcde runtime: let freezetheworld work even when gomaxprocs=1
Freezetheworld still has stuff to do when gomaxprocs=1.
In particular, signals can come in on other Ms (like the GC M, say)
and the single user M is still running.

Fixes #10546

Change-Id: I2f07f17d1c81e93cf905df2cb087112d436ca7e7
Reviewed-on: https://go-review.googlesource.com/9551
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-05-05 15:11:10 +00:00
Keith Randall
a55b131393 cmd/dist, runtime: Make stack guard larger for non-optimized builds
Kind of a hack, but makes the non-optimized builds pass.

Fixes #10079

Change-Id: I26f41c546867f8f3f16d953dc043e784768f2aff
Reviewed-on: https://go-review.googlesource.com/9552
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-01 15:41:55 +00:00
David Chase
7fbb1b36c3 cmd/internal/gc: improve flow of input params to output params
This includes the following information in the per-function summary:

outK = paramJ   encoded in outK bits for paramJ
outK = *paramJ  encoded in outK bits for paramJ
heap = paramJ   EscHeap
heap = *paramJ  EscContentEscapes

Note that (currently) if the address of a parameter is taken and
returned, necessarily a heap allocation occurred to contain that
reference, and the heap can never refer to stack, therefore the
parameter and everything downstream from it escapes to the heap.

The per-function summary information now has a tuneable number of bits
(2 is probably noticeably better than 1, 3 is likely overkill, but it
is now easy to check and the -m debugging output includes information
that allows you to figure out if more would be better.)

A new test was  added to check pointer flow through struct-typed and
*struct-typed parameters and returns; some of these are sensitive to
the number of summary bits, and ought to yield better results with a
more competent escape analysis algorithm.  Another new test checks
(some) correctness with array parameters, results, and operations.

The old analysis inferred a piece of plan9 runtime was non-escaping by
counteracting overconservative analysis with buggy analysis; with the
bug fixed, the result was too conservative (and it's not easy to fix
in this framework) so the source code was tweaked to get the desired
result.  A test was added against the discovered bug.

The escape analysis was further improved splitting the "level" into
3 parts, one tracking the conventional "level" and the other two
computing the highest-level-suffix-from-copy, which is used to
generally model the cancelling effect of indirection applied to
address-of.

With the improved escape analysis enabled, it was necessary to
modify one of the runtime tests because it now attempts to allocate
too much on the (small, fixed-size) G0 (system) stack and this
failed the test.

Compiling src/std after touching src/runtime/*.go with -m logging
turned on shows 420 fewer heap allocation sites (10538 vs 10968).

Profiling allocations in src/html/template with
for i in {1..5} ;
  do go tool 6g -memprofile=mastx.${i}.prof  -memprofilerate=1 *.go;
  go tool pprof -alloc_objects -text  mastx.${i}.prof ;
done

showed a 15% reduction in allocations performed by the compiler.

Update #3753
Update #4720
Fixes #10466

Change-Id: I0fd97d5f5ac527b45f49e2218d158a6e89951432
Reviewed-on: https://go-review.googlesource.com/8202
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-05-01 13:47:20 +00:00
Austin Clements
33e0f3d853 runtime: fix some out of date comments and typos
Change-Id: I061057414c722c5a0f03c709528afc8554114db6
Reviewed-on: https://go-review.googlesource.com/9367
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-27 20:08:38 +00:00
Austin Clements
1b01910c06 runtime: rename gcController.findRunnable to findRunnableGCWorker
This avoids confusion with the main findrunnable in the scheduler.

Change-Id: I8cf40657557a8610a2fe5a2f74598518256ca7f0
Reviewed-on: https://go-review.googlesource.com/9305
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-27 19:26:42 +00:00
Austin Clements
bb6320535d runtime: replace STW for enabling write barriers with ragged barrier
Currently, we use a full stop-the-world around enabling write
barriers. This is to ensure that all Gs have enabled write barriers
before any blackening occurs (either in gcBgMarkWorker() or in
gcAssistAlloc()).

However, there's no need to bring the whole world to a synchronous
stop to ensure this. This change replaces the STW with a ragged
barrier that ensures each P has individually observed that write
barriers should be enabled before GC performs any blackening.

Change-Id: If2f129a6a55bd8bdd4308067af2b739f3fb41955
Reviewed-on: https://go-review.googlesource.com/8207
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-27 19:26:37 +00:00
Austin Clements
57afa76471 runtime: add ragged global barrier function
This adds forEachP, which performs a general-purpose ragged global
barrier. forEachP takes a callback and invokes it for every P at a GC
safe point.

Ps that are idle or in a syscall are considered to be at a continuous
safe point. forEachP ensures that these Ps do not change state by
forcing all syscall Ps into idle and holding the sched.lock.

To ensure that Ps do not enter syscall or idle without running the
safe-point function, this adds checks for a pending callback every
place there is currently a gcwaiting check.

We'll use forEachP to replace the STW around enabling the write
barrier and to replace the current asynchronous per-M wbuf cache with
a cooperatively managed per-P gcWork cache.

Change-Id: Ie944f8ce1fead7c79bf271d2f42fcd61a41bb3cc
Reviewed-on: https://go-review.googlesource.com/8206
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-27 19:26:33 +00:00
Austin Clements
b0b1a66052 runtime: reset spinning in mspinning if work was ready()ed
This fixes a bug where the runtime ready()s a goroutine while setting
up a new M that's initially marked as spinning, causing the scheduler
to later panic when it finds work in the run queue of a P associated
with a spinning M. Specifically, the sequence of events that can lead
to this is:

1) sysmon calls handoffp to hand off a P stolen from a syscall.

2) handoffp sees no pending work on the P, so it calls startm with
   spinning set.

3) startm calls newm, which in turn calls allocm to allocate a new M.

4) allocm "borrows" the P we're handing off in order to do allocation
   and performs this allocation.

5) This allocation may assist the garbage collector, and this assist
   may detect the end of concurrent mark and ready() the main GC
   goroutine to signal this.

6) This ready()ing puts the GC goroutine on the run queue of the
   borrowed P.

7) newm starts the OS thread, which runs mstart and subsequently
   mstart1, which marks the M spinning because startm was called with
   spinning set.

8) mstart1 enters the scheduler, which panics because there's work on
   the run queue, but the M is marked spinning.

To fix this, before marking the M spinning in step 7, add a check to
see if work was been added to the P's run queue. If this is the case,
undo the spinning instead.

Fixes #10573.

Change-Id: I4670495ae00582144a55ce88c45ae71de597cfa5
Reviewed-on: https://go-review.googlesource.com/9332
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-04-27 12:49:54 +00:00
Austin Clements
2a46f55b35 runtime: panic when idling a P with runnable Gs
This adds a check that we never put a P on the idle list when it has
work on its local run queue.

Change-Id: Ifcfab750de60c335148a7f513d4eef17be03b6a7
Reviewed-on: https://go-review.googlesource.com/9324
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
2015-04-27 12:49:49 +00:00
Austin Clements
0e6a6c510f runtime: simplify process for starting GC goroutine
Currently, when allocation reaches the GC trigger, the runtime uses
readyExecute to start the GC goroutine immediately rather than wait
for the scheduler to get around to the GC goroutine while the mutator
continues to grow the heap.

Now that the scheduler runs the most recently readied goroutine when a
goroutine yields its time slice, this rigmarole is no longer
necessary. The runtime can simply ready the GC goroutine and yield
from the readying goroutine.

Change-Id: I3b4ebadd2a72a923b1389f7598f82973dd5c8710
Reviewed-on: https://go-review.googlesource.com/9292
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2015-04-24 15:13:05 +00:00
Austin Clements
e870f06c3f runtime: yield time slice to most recently readied G
Currently, when the runtime ready()s a G, it adds it to the end of the
current P's run queue and continues running. If there are many other
things in the run queue, this can result in a significant delay before
the ready()d G actually runs and can hurt fairness when other Gs in
the run queue are CPU hogs. For example, if there are three Gs sharing
a P, one of which is a CPU hog that never voluntarily gives up the P
and the other two of which are doing small amounts of work and
communicating back and forth on an unbuffered channel, the two
communicating Gs will get very little CPU time.

Change this so that when G1 ready()s G2 and then blocks, the scheduler
immediately hands off the remainder of G1's time slice to G2. In the
above example, the two communicating Gs will now act as a unit and
together get half of the CPU time, while the CPU hog gets the other
half of the CPU time.

This fixes the problem demonstrated by the ping-pong benchmark added
in the previous commit:

benchmark                old ns/op     new ns/op     delta
BenchmarkPingPongHog     684287        825           -99.88%

On the x/benchmarks suite, this change improves the performance of
garbage by ~6% (for GOMAXPROCS=1 and 4), and json by 28% and 36% for
GOMAXPROCS=1 and 4. It has negligible effect on heap size.

This has no effect on the go1 benchmark suite since those benchmarks
are mostly single-threaded.

Change-Id: I858a08eaa78f702ea98a5fac99d28a4ac91d339f
Reviewed-on: https://go-review.googlesource.com/9289
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-24 15:12:52 +00:00
Austin Clements
e5e52f4f2c runtime: factor checking if P run queue is empty
There are a variety of places where we check if a P's run queue is
empty. This test is about to get slightly more complicated, so factor
it out into a new function, runqempty. This function is inlinable, so
this has no effect on performance.

Change-Id: If4a0b01ffbd004937de90d8d686f6ded4aad2c6b
Reviewed-on: https://go-review.googlesource.com/9287
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-24 15:12:42 +00:00
Austin Clements
e0c3d85f08 runtime: fix background marking at 25% utilization
Currently, in accordance with the GC pacing proposal, we schedule
background marking with a goal of achieving 25% utilization *total*
between mutator assists and background marking. This is stricter than
was set out in the Go 1.5 proposal, which suggests that the garbage
collector can use 25% just for itself and anything the mutator does to
help out is on top of that. It also has several technical
drawbacks. Because mutator assist time is constantly changing and we
can't have instantaneous information on background marking time, it
effectively requires hitting a moving target based on out-of-date
information. This works out in the long run, but works poorly for
short GC cycles and on short time scales. Also, this requires
time-multiplexing all Ps between the mutator and background GC since
the goal utilization of background GC constantly fluctuates. This
results in a complicated scheduling algorithm, poor affinity, and
extra overheads from context switching.

This change modifies the way we schedule and run background marking so
that background marking always consumes 25% of GOMAXPROCS and mutator
assist is in addition to this. This enables a much more robust
scheduling algorithm where we pre-determine the number of Ps we should
dedicate to background marking as well as the utilization goal for a
single floating "remainder" mark worker.

Change-Id: I187fa4c03ab6fe78012a84d95975167299eb9168
Reviewed-on: https://go-review.googlesource.com/9013
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:50 +00:00
Austin Clements
8d03acce54 runtime: multi-threaded, utilization-scheduled background mark
Currently, the concurrent mark phase is performed by the main GC
goroutine. Prior to the previous commit enabling preemption, this
caused marking to always consume 1/GOMAXPROCS of the available CPU
time. If GOMAXPROCS=1, this meant background GC would consume 100% of
the CPU (effectively a STW). If GOMAXPROCS>4, background GC would use
less than the goal of 25%. If GOMAXPROCS=4, background GC would use
the goal 25%, but if the mutator wasn't using the remaining 75%,
background marking wouldn't take advantage of the idle time. Enabling
preemption in the previous commit made GC miss CPU targets in
completely different ways, but set us up to bring everything back in
line.

This change replaces the fixed GC goroutine with per-P background mark
goroutines. Once started, these goroutines don't go in the standard
run queues; instead, they are scheduled specially such that the time
spent in mutator assists and the background mark goroutines totals 25%
of the CPU time available to the program. Furthermore, this lets
background marking take advantage of idle Ps, which significantly
boosts GC performance for applications that under-utilize the CPU.

This requires also changing how time is reported for gctrace, so this
change splits the concurrent mark CPU time into assist/background/idle
scanning.

This also requires increasing the size of the StackRecord slice used
in a GoroutineProfile test.

Change-Id: I0936ff907d2cee6cb687a208f2df47e8988e3157
Reviewed-on: https://go-review.googlesource.com/8850
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-21 15:35:32 +00:00
Russ Cox
181e26b9fa runtime: replace func-based write barrier skipping with type-based
This CL revises CL 7504 to use explicitly uintptr types for the
struct fields that are going to be updated sometimes without
write barriers. The result is that the fields are now updated *always*
without write barriers.

This approach has two important properties:

1) Now the GC never looks at the field, so if the missing reference
could cause a problem, it will do so all the time, not just when the
write barrier is missed at just the right moment.

2) Now a write barrier never happens for the field, avoiding the
(correct) detection of inconsistent write barriers when GODEBUG=wbshadow=1.

Change-Id: Iebd3962c727c0046495cc08914a8dc0808460e0e
Reviewed-on: https://go-review.googlesource.com/9019
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-04-20 20:20:09 +00:00
Ian Lance Taylor
725aa3451a runtime: no deadlock error if buildmode=c-archive or c-shared
Change-Id: I4ee6dac32bd3759aabdfdc92b235282785fbcca9
Reviewed-on: https://go-review.googlesource.com/9083
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-04-20 17:31:44 +00:00
Austin Clements
c1c667542c runtime: fix dangling pointer in readyExecute
readyExecute passes a closure to mcall that captures an argument to
readyExecute. Since mcall is marked noescape, this closure lives on
the stack of the calling goroutine. However, the closure puts the
calling goroutine on the run queue (and switches to a new
goroutine). If the calling goroutine gets scheduled before the mcall
returns, this stack-allocated closure will become invalid while it's
still executing. One consequence of this we've observed is that the
captured gp variable can get overwritten before the call to
execute(gp), causing execute(gp) to segfault.

Fix this by passing the currently captured gp variable through a field
in the calling goroutine's g struct so that the func is no longer a
closure.

To prevent problems like this in the future, this change also removes
the go:noescape annotation from mcall. Due to a compiler bug, this
will currently cause a func closure passed to mcall to be implicitly
allocated rather than refusing the implicit allocation. However, this
is okay because there are no other closures passed to mcall right now
and the compiler bug will be fixed shortly.

Fixes #10428.

Change-Id: I49b48b85de5643323b89e9eaa4df63854e968c32
Reviewed-on: https://go-review.googlesource.com/8866
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-17 17:59:14 +00:00
Austin Clements
a23a341e10 runtime: make time slice a const
A G will be preempted if it runs for 10ms without blocking. Currently
this constant is hard-coded in retake. Move it to a global const.
We'll use the time slice length in scheduling background GC.

Change-Id: I79a979948af2fad3afe5df9d4af4062f166554b7
Reviewed-on: https://go-review.googlesource.com/8838
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-14 22:06:32 +00:00
Austin Clements
4b956ae317 runtime: start concurrent GC promptly when we reach its trigger
Currently, when allocation reaches the concurrent GC trigger size, we
start the concurrent collector by ready'ing its G. This simply puts it
on the end of the P's run queue, which means we may not actually start
GC for some time as the current G continues to run and then the P
drains other Gs already on its run queue. Since the mutator can
continue to allocate, the heap can potentially be much larger than we
intended by the time GC actually starts. Furthermore, how much larger
is difficult to predict since it depends on the scheduler.

Fix this by preempting the current G and switching directly to the
concurrent GC G as soon as we reach the trigger heap size.

On the garbage benchmark from the benchmarks subrepo with
GOMAXPROCS=4, this reduces the time from triggering the GC to the
beginning of sweep termination by 10 to 30 milliseconds, which reduces
allocation after the trigger by up to 10MB (a large fraction of the
64MB live heap the benchmark tries to maintain).

One other known source of delay before we "really" start GC is the
sweep finalization performed before sweep termination. This has
similar negative effects on heap size and predictability, but is an
orthogonal problem. This change adds a TODO for this.

Change-Id: I8bae98cb43685c1bf353ff55868e4647e3743c47
Reviewed-on: https://go-review.googlesource.com/8513
Reviewed-by: Rick Hudson <rlh@golang.org>
2015-04-10 18:22:52 +00:00
Austin Clements
7c37249639 runtime: make test for freezetheworld more precise
exitsyscallfast checks for freezetheworld, but does so only by
checking if stopwait is positive. This can also happen during
stoptheworld, which is harmless, but confusing. Shortly, it will be
important that we get to the p.status cas even if stopwait is set.

Hence, make this test more specific so it only triggers with
freezetheworld and not other uses of stopwait.

Change-Id: Ibb722cd8360c3ed5a9654482519e3ceb87a8274d
Reviewed-on: https://go-review.googlesource.com/8205
Reviewed-by: Russ Cox <rsc@golang.org>
2015-04-10 18:02:55 +00:00