Commit graph

9 commits

Author SHA1 Message Date
Matthew Dempsky
a8a85281ca runtime: fix documented alignment of 32KiB and 64KiB size classes
As Cherry pointed out on golang.org/cl/299909, the page allocator
doesn't guarantee any alignment for multi-page allocations, so object
alignments are thus implicitly capped at page alignment.

Change-Id: I6f5df27f269b095cde54056f876fe4240f69c5c7
Reviewed-on: https://go-review.googlesource.com/c/go/+/301292
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Trust: Matthew Dempsky <mdempsky@google.com>
2021-03-13 04:53:32 +00:00
Matthew Dempsky
4662029264 runtime: simplify divmagic for span calculations
It's both simpler and faster to just unconditionally do two 32-bit
multiplies rather than a bunch of branching to try to avoid them.
This is safe thanks to the tight bounds derived in [1] and verified
during mksizeclasses.go.

Benchstat results below for compilebench benchmarks on my P920. See
also [2] for micro benchmarks comparing the new functions against the
originals (as well as several basic attempts at optimizing them).

name                      old time/op       new time/op       delta
Template                        295ms ± 3%        290ms ± 1%  -1.95%  (p=0.000 n=20+20)
Unicode                         113ms ± 3%        110ms ± 2%  -2.32%  (p=0.000 n=21+17)
GoTypes                         1.78s ± 1%        1.76s ± 1%  -1.23%  (p=0.000 n=21+20)
Compiler                        119ms ± 2%        117ms ± 4%  -1.53%  (p=0.007 n=20+20)
SSA                             14.3s ± 1%        13.8s ± 1%  -3.12%  (p=0.000 n=17+20)
Flate                           173ms ± 2%        170ms ± 1%  -1.64%  (p=0.000 n=20+19)
GoParser                        278ms ± 2%        273ms ± 2%  -1.92%  (p=0.000 n=20+19)
Reflect                         686ms ± 3%        671ms ± 3%  -2.18%  (p=0.000 n=19+20)
Tar                             255ms ± 2%        248ms ± 2%  -2.90%  (p=0.000 n=20+20)
XML                             335ms ± 3%        327ms ± 2%  -2.34%  (p=0.000 n=20+20)
LinkCompiler                    799ms ± 1%        799ms ± 1%    ~     (p=0.925 n=20+20)
ExternalLinkCompiler            1.90s ± 1%        1.90s ± 0%    ~     (p=0.327 n=20+20)
LinkWithoutDebugCompiler        385ms ± 1%        386ms ± 1%    ~     (p=0.251 n=18+20)
[Geo mean]                      512ms             504ms       -1.61%

name                      old user-time/op  new user-time/op  delta
Template                        286ms ± 4%        282ms ± 4%  -1.42%  (p=0.025 n=21+20)
Unicode                         104ms ± 9%        102ms ±14%    ~     (p=0.294 n=21+20)
GoTypes                         1.75s ± 3%        1.72s ± 2%  -1.36%  (p=0.000 n=21+20)
Compiler                        109ms ±11%        108ms ± 8%    ~     (p=0.187 n=21+19)
SSA                             14.0s ± 1%        13.5s ± 2%  -3.25%  (p=0.000 n=16+20)
Flate                           166ms ± 4%        164ms ± 4%  -1.34%  (p=0.032 n=19+19)
GoParser                        268ms ± 4%        263ms ± 4%  -1.71%  (p=0.011 n=18+20)
Reflect                         666ms ± 3%        654ms ± 4%  -1.77%  (p=0.002 n=18+20)
Tar                             245ms ± 5%        236ms ± 6%  -3.34%  (p=0.000 n=20+20)
XML                             320ms ± 4%        314ms ± 3%  -2.01%  (p=0.001 n=19+18)
LinkCompiler                    744ms ± 4%        747ms ± 3%    ~     (p=0.627 n=20+19)
ExternalLinkCompiler            1.71s ± 3%        1.72s ± 2%    ~     (p=0.149 n=20+20)
LinkWithoutDebugCompiler        345ms ± 6%        342ms ± 8%    ~     (p=0.355 n=20+20)
[Geo mean]                      484ms             477ms       -1.50%

[1] Daniel Lemire, Owen Kaser, Nathan Kurz. 2019. "Faster Remainder by
Direct Computation: Applications to Compilers and Software Libraries."
https://arxiv.org/abs/1902.01961

[2] https://github.com/mdempsky/benchdivmagic

Change-Id: Ie4d214e7a908b0d979c878f2d404bd56bdf374f6
Reviewed-on: https://go-review.googlesource.com/c/go/+/300994
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Trust: Matthew Dempsky <mdempsky@google.com>
Trust: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2021-03-12 18:02:59 +00:00
Matthew Dempsky
735647d92e runtime: add alignment info to sizeclasses.go comments
I was curious about the minimum possible alignment for each size class
and the minimum size to guarantee any particular alignment (e.g., to
know at what class size you can start assuming heap bits are byte- or
word-aligned).

Change-Id: I205b750286e8914986533c4f60712c420c3e63e9
Reviewed-on: https://go-review.googlesource.com/c/go/+/299909
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Trust: Matthew Dempsky <mdempsky@google.com>
2021-03-12 17:41:56 +00:00
Martin Möhrmann
14c7caae50 runtime: add 24 byte allocation size class
This CL introduces a 24 byte allocation size class which
fits 3 pointers on 64 bit and 6 pointers on 32 bit architectures.

Notably this new size class fits a slice header on 64 bit
architectures exactly while previously a 32 byte size class
would have been used for allocating a slice header on the heap.

The main complexity added with this CL is that heapBitsSetType
needs to handle objects that aren't 16-byte aligned but contain
more than a single pointer on 64-bit architectures.

Due to having a non 16 byte aligned size class on 32 bit a
h.shift of 2 is now possible which means a heap bitmap byte might
only be partially written. Due to this already having been
possible on 64 bit before the heap bitmap code only needed
minor adjustments for 32 bit doublecheck code paths.

Note that this CL changes the slice capacity allocated by append
for slice growth to a target capacity of 17 to 24 bytes.

On 64 bit architectures the capacity of the slice returned by
append([]byte{}, make([]byte, 24)...)) is 32 bytes before and
24 bytes after this CL. Depending on allocation patterns of the
specific Go program this can increase the number of total
alloctions as subsequent appends to the slice can trigger slice
growth earlier than before. On the other side if the slice is
never appended to again above its capacity this will lower heap
usage by 8 bytes.

This CL changes the set of size classes reported in the
runtime.MemStats.BySize array due to it being limited to a
total of 61 size classes. The new 24 byte size class is now
included and the 20480 byte size class is not included anymore.

Fixes #8885

name                      old time/op       new time/op       delta
Template                        196ms ± 3%        194ms ± 2%    ~     (p=0.247 n=10+10)
Unicode                        85.6ms ±16%       88.1ms ± 1%    ~     (p=0.165 n=10+10)
GoTypes                         673ms ± 2%        668ms ± 2%    ~     (p=0.258 n=9+9)
Compiler                        3.14s ± 6%        3.08s ± 1%    ~     (p=0.243 n=10+9)
SSA                             6.82s ± 1%        6.76s ± 1%  -0.87%  (p=0.006 n=9+10)
Flate                           128ms ± 7%        127ms ± 3%    ~     (p=0.739 n=10+10)
GoParser                        154ms ± 3%        153ms ± 4%    ~     (p=0.730 n=9+9)
Reflect                         404ms ± 1%        412ms ± 4%  +1.99%  (p=0.022 n=9+10)
Tar                             172ms ± 4%        170ms ± 4%    ~     (p=0.065 n=10+9)
XML                             231ms ± 4%        230ms ± 3%    ~     (p=0.912 n=10+10)
LinkCompiler                    341ms ± 1%        339ms ± 1%    ~     (p=0.243 n=9+10)
ExternalLinkCompiler            1.72s ± 1%        1.72s ± 1%    ~     (p=0.661 n=9+10)
LinkWithoutDebugCompiler        221ms ± 2%        221ms ± 2%    ~     (p=0.529 n=10+10)
StdCmd                          18.4s ± 3%        18.2s ± 1%    ~     (p=0.515 n=10+8)

name                      old user-time/op  new user-time/op  delta
Template                        238ms ± 4%        243ms ± 6%    ~     (p=0.661 n=9+10)
Unicode                         116ms ± 6%        113ms ± 3%  -3.37%  (p=0.035 n=9+10)
GoTypes                         854ms ± 2%        848ms ± 2%    ~     (p=0.604 n=9+10)
Compiler                        4.10s ± 1%        4.11s ± 1%    ~     (p=0.481 n=8+9)
SSA                             9.49s ± 1%        9.41s ± 1%  -0.92%  (p=0.001 n=9+10)
Flate                           149ms ± 6%        151ms ± 7%    ~     (p=0.481 n=10+10)
GoParser                        189ms ± 2%        190ms ± 2%    ~     (p=0.497 n=9+10)
Reflect                         511ms ± 2%        508ms ± 2%    ~     (p=0.211 n=9+10)
Tar                             215ms ± 4%        212ms ± 3%    ~     (p=0.105 n=10+10)
XML                             288ms ± 2%        288ms ± 2%    ~     (p=0.971 n=10+10)
LinkCompiler                    559ms ± 4%        557ms ± 1%    ~     (p=0.968 n=9+10)
ExternalLinkCompiler            1.78s ± 1%        1.77s ± 1%    ~     (p=0.055 n=8+10)
LinkWithoutDebugCompiler        245ms ± 3%        245ms ± 2%    ~     (p=0.684 n=10+10)

name                      old alloc/op      new alloc/op      delta
Template                       34.8MB ± 0%       34.4MB ± 0%  -0.95%  (p=0.000 n=9+10)
Unicode                        28.6MB ± 0%       28.3MB ± 0%  -0.95%  (p=0.000 n=10+10)
GoTypes                         115MB ± 0%        114MB ± 0%  -1.02%  (p=0.000 n=10+9)
Compiler                        554MB ± 0%        549MB ± 0%  -0.86%  (p=0.000 n=9+10)
SSA                            1.28GB ± 0%       1.27GB ± 0%  -0.83%  (p=0.000 n=10+10)
Flate                          21.8MB ± 0%       21.6MB ± 0%  -0.87%  (p=0.000 n=8+10)
GoParser                       26.7MB ± 0%       26.4MB ± 0%  -0.97%  (p=0.000 n=10+9)
Reflect                        75.0MB ± 0%       74.1MB ± 0%  -1.18%  (p=0.000 n=10+10)
Tar                            32.6MB ± 0%       32.3MB ± 0%  -0.94%  (p=0.000 n=10+7)
XML                            41.5MB ± 0%       41.2MB ± 0%  -0.90%  (p=0.000 n=10+8)
LinkCompiler                    105MB ± 0%        104MB ± 0%  -0.94%  (p=0.000 n=10+10)
ExternalLinkCompiler            153MB ± 0%        152MB ± 0%  -0.69%  (p=0.000 n=10+10)
LinkWithoutDebugCompiler       63.7MB ± 0%       63.6MB ± 0%  -0.13%  (p=0.000 n=10+10)

name                      old allocs/op     new allocs/op     delta
Template                         336k ± 0%         336k ± 0%  +0.02%  (p=0.002 n=10+10)
Unicode                          332k ± 0%         332k ± 0%    ~     (p=0.447 n=10+10)
GoTypes                         1.16M ± 0%        1.16M ± 0%  +0.01%  (p=0.001 n=10+10)
Compiler                        4.92M ± 0%        4.92M ± 0%  +0.01%  (p=0.000 n=10+10)
SSA                             11.9M ± 0%        11.9M ± 0%  +0.02%  (p=0.000 n=9+10)
Flate                            214k ± 0%         214k ± 0%  +0.02%  (p=0.032 n=10+8)
GoParser                         270k ± 0%         270k ± 0%  +0.02%  (p=0.004 n=10+9)
Reflect                          877k ± 0%         877k ± 0%  +0.01%  (p=0.000 n=10+10)
Tar                              313k ± 0%         313k ± 0%    ~     (p=0.075 n=9+10)
XML                              387k ± 0%         387k ± 0%  +0.02%  (p=0.007 n=10+10)
LinkCompiler                     455k ± 0%         456k ± 0%  +0.08%  (p=0.000 n=10+9)
ExternalLinkCompiler             670k ± 0%         671k ± 0%  +0.06%  (p=0.000 n=10+10)
LinkWithoutDebugCompiler         113k ± 0%         113k ± 0%    ~     (p=0.149 n=10+10)

name                      old maxRSS/op     new maxRSS/op     delta
Template                        34.1M ± 1%        34.1M ± 1%    ~     (p=0.853 n=10+10)
Unicode                         35.1M ± 1%        34.6M ± 1%  -1.43%  (p=0.000 n=10+10)
GoTypes                         72.8M ± 3%        73.3M ± 2%    ~     (p=0.724 n=10+10)
Compiler                         288M ± 3%         295M ± 4%    ~     (p=0.393 n=10+10)
SSA                              630M ± 1%         622M ± 1%  -1.18%  (p=0.001 n=10+10)
Flate                           26.0M ± 1%        26.2M ± 2%    ~     (p=0.493 n=10+10)
GoParser                        28.6M ± 1%        28.5M ± 2%    ~     (p=0.256 n=10+10)
Reflect                         55.5M ± 2%        55.4M ± 1%    ~     (p=0.436 n=10+10)
Tar                             33.0M ± 1%        32.8M ± 2%    ~     (p=0.075 n=10+10)
XML                             38.7M ± 1%        39.0M ± 1%    ~     (p=0.053 n=9+10)
LinkCompiler                     164M ± 1%         164M ± 1%  -0.27%  (p=0.029 n=10+10)
ExternalLinkCompiler             174M ± 0%         173M ± 0%  -0.33%  (p=0.002 n=9+10)
LinkWithoutDebugCompiler         137M ± 0%         136M ± 2%    ~     (p=0.825 n=9+10)

Change-Id: I9ecf2a10024513abef8fbfbe519e44e0b29b6167
Reviewed-on: https://go-review.googlesource.com/c/go/+/242258
Trust: Martin Möhrmann <moehrmann@google.com>
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-09-14 19:21:56 +00:00
Martin Möhrmann
6ca51f7897 runtime: replace division by span element size by multiply and shifts
Divisions are generally slow. The compiler can optimize a division
to use a sequence of faster multiplies and shifts (magic constants)
if the divisor is not know at compile time.

The value of the span element size in mcentral.grow is not known at
compile time but magic constants to compute n / span.elementsize
are already stored in class_to_divmagic and mspan.
They however need to be adjusted to work for
(0 <= n <= span.npages * pagesize) instead of
(0 <= n <  span.npages * pagesize).

Change-Id: Ieea59f1c94525a88d012f2557d43691967900deb
Reviewed-on: https://go-review.googlesource.com/c/go/+/148057
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2019-03-18 14:05:54 +00:00
Austin Clements
a6ae01a64a runtime: add "max waste" column to size class table comment
This computes the maximum possible waste in a size class due to both
internal and external fragmentation as a percent of the span size.
This parallels the reasoning about overhead in the comment at the top
of mksizeclasses.go and confirms that comment's assertion that (except
for the few smallest size classes), none of the size classes have
worst-case internal and external fragmentation simultaneously.

Change-Id: Idb66fe6c241d56f33d391831d4cd5a626955562b
Reviewed-on: https://go-review.googlesource.com/49370
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-08-10 21:45:01 +00:00
Brad Fitzpatrick
6914b0e3e3 runtime, unicode: use consistent banner for generated code
Per golang.org/s/generatedcode

Updates #nnn

Change-Id: Ia7513ef6bd26c20b62b57b29f7770684a315d389
Reviewed-on: https://go-review.googlesource.com/45470
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Matt Layher <mdlayher@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-06-13 05:33:40 +00:00
Austin Clements
ffedff7e50 runtime: add table of size classes in a comment
Change-Id: I52fae67c9aeceaa23e70f2ef0468745b354f8c75
Reviewed-on: https://go-review.googlesource.com/34932
Reviewed-by: Minux Ma <minux@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-01-08 00:01:30 +00:00
Keith Randall
7ba36f4adb runtime: compute size classes statically
No point in computing this info on startup.
Compute it at build time.
This lets us spend more time computing & checking the size classes.

Improve the div magic for rounding to the start of an object.
We can now use 32-bit multiplies & shifts, which should help
32-bit platforms.

The static data is <1KB.

The actual size classes are not changed by this CL.

Change-Id: I6450cec7d1b2b4ad31fd3f945f504ed2ec6570e7
Reviewed-on: https://go-review.googlesource.com/32219
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2016-10-30 03:48:49 +00:00