A method is more in keeping with the rest of the Writer API and
incidentally allows the comment error to be reported earlier.
Fixes#22737.
Change-Id: I1eee2103a0720c76d0c394ccd6541e6219996dc0
Reviewed-on: https://go-review.googlesource.com/79415
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
The replacement rune is a valid rune and can appear as itself in valid UTF8
(it encodes as three bytes). To check for invalid UTF8 it is necessary to
look for utf8.DecodeRune returning the replacement rune and size==1.
Change-Id: I169be8d1fe61605c921ac13cc2fde94f80f3463c
Reviewed-on: https://go-review.googlesource.com/78126
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
The NonUTF8 field provides users with a way to explictly tell the
ZIP writer to avoid setting the UTF-8 flag.
This is necessary because many readers:
1) (Still) do not support UTF-8
2) And use the local system encoding instead
Thus, even though character encodings other than CP-437 and UTF-8
are not officially supported by the ZIP specification, pragmatically
the world has permitted use of them.
When a non-standard encoding is used, it is the user's responsibility
to ensure that the target system is expecting the encoding used
(e.g., producing a ZIP file you know is used on a Chinese version of Windows).
We adjust the detectUTF8 function to account for Shift-JIS and EUC-KR
not being identical to ASCII for two characters.
We don't need an API for users to explicitly specify that they are encoding
with UTF-8 since all single byte characters are compatible with all other
common encodings (Windows-1256, Windows-1252, Windows-1251, Windows-1250,
IEC-8859, EUC-KR, KOI8-R, Latin-1, Shift-JIS, GB-2312, GBK) except for
the non-printable characters and the backslash character (all of which
are invalid characters in a path name anyways).
Fixes#10741
Change-Id: I9004542d1d522c9137973f1b6e2b623fa54dfd66
Reviewed-on: https://go-review.googlesource.com/75592
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The ModifiedTime and ModifiedDate fields are not expressive enough
for many of the time extensions that have since been added to ZIP,
nor are they easy to access since they in a legacy MS-DOS format,
and must be set and retrieved via the SetModTime and ModTime methods.
Instead, we add new field Modified of time.Time type that contains
all of the previous information and more.
Support for extended timestamps have been attempted before, but the
change was reverted because it provided no ability for the user to
specify the timezone of the legacy MS-DOS fields.
Technically the old API did not either, but users were manually offsetting
the timestamp to achieve the same effect.
The Writer now writes the legacy timestamps according to the timezone
of the FileHeader.Modified field. When the Modified field is set via
the SetModTime method, it is in UTC, which preserves the old behavior.
The Reader attempts to determine the timezone if both the legacy
and extended timestamps are present since it can compute the delta
between the two values.
Since Modified is a superset of the information in ModifiedTime and ModifiedDate,
we mark ModifiedTime, ModifiedDate, ModTime, and SetModTime as deprecated.
Fixes#18359
Change-Id: I29c6bc0a62908095d02740df3e6902f50d3152f1
Reviewed-on: https://go-review.googlesource.com/74970
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
CL 39570 added support for automatically setting flag bit 11 to
indicate that the filename and comment fields are encoded in UTF-8,
which is (conventionally) the encoding using for most Go strings.
However, the detection added is too lose for two reasons:
* We need to ensure both fields are at least possibly UTF-8.
That is, if any field is definitely not UTF-8, then we can't set the bit.
* The utf8.ValidRune returns true for utf8.RuneError, which iterating
over a Go string automatically returns for invalid UTF-8.
Thus, we manually check for that value.
Updates #22367
Updates #10741
Change-Id: Ie8aae388432e546e44c6bebd06a00434373ca99e
Reviewed-on: https://go-review.googlesource.com/72791
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This change added support "end of central directory record comemnt" to the Writer.
There is a new exported field Writer.Comment in this change.
If invalid size of comment was set, Close returns error without closing resources.
Fixes#21634
Change-Id: Ifb62bc6c7f81b9257ac83eb570ad9915de727f8c
Reviewed-on: https://go-review.googlesource.com/59310
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
See: https://pkware.cachefly.net/webdocs/APPNOTE/APPNOTE-6.3.0.TXT
Document says:
> If general purpose bit 11 is set, the filename and comment must support The
> Unicode Standard, Version 4.1.0 or greater using the character encoding form
> defined by the UTF-8 storage specification.
Since Go encode the filename to UTF-8, general purpose bit 11 should be set.
Change-Id: Ica4af02b4dc695e9a5c015ae360e70171efb6ee3
Reviewed-on: https://go-review.googlesource.com/39570
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This change reverts the following CLs:
CL/18274: handle mtime in NTFS/UNIX/ExtendedTS extra fields
CL/30811: only use Extended Timestamp on non-zero MS-DOS timestamps
We are reverting support for extended timestamps since the support was not
not complete. CL/18274 added full support for reading extended timestamp fields
and minimal support for writing them. CL/18274 is incomplete because it made
no changes to the FileHeader struct, so timezone information was lost when
reading and/or writing.
While CL/18274 was a step in the right direction, we should provide full
support for high precision timestamps in both the reader and writer.
This will probably require that we add a new field of type time.Time.
The complete fix is too involved to add in the time remaining for Go 1.8
and will be completed in Go 1.9.
Updates #10242
Updates #17403
Updates #18359Fixes#18378
Change-Id: Icf6d028047f69379f7979a29bfcb319a02f4783e
Reviewed-on: https://go-review.googlesource.com/34651
Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Before this CL:
$ go test -bench=CompressedZipGarbage -count=5 -run=NONE archive/zip
BenchmarkCompressedZipGarbage-8 50 20677087 ns/op 42973 B/op 47 allocs/op
BenchmarkCompressedZipGarbage-8 100 20584764 ns/op 24294 B/op 47 allocs/op
BenchmarkCompressedZipGarbage-8 50 20859221 ns/op 42973 B/op 47 allocs/op
BenchmarkCompressedZipGarbage-8 100 20901176 ns/op 24294 B/op 47 allocs/op
BenchmarkCompressedZipGarbage-8 50 21282409 ns/op 42973 B/op 47 allocs/op
The B/op number is effectively meaningless. There
is a surprisingly large one-time cost that gets
divided by the number of iterations that your
machine can get through in a second.
This CL discards the first run, which helps.
It is not a panacea. Running with -benchtime=10s
will allow the sync.Pool to be emptied,
which brings the problem back.
However, since there are more iterations to divide
the cost through, it’s not quite as bad,
and running with a high benchtime is rare.
This CL changes the meaning of the B/op number,
which is unfortunate, since it won’t have the
same order of magnitude as previous Go versions.
But it wasn’t really comparable before anyway,
since it didn’t have any reliable meaning at all.
After this CL:
$ go test -bench=CompressedZipGarbage -count=5 -run=NONE archive/zip
BenchmarkCompressedZipGarbage-8 100 20881890 ns/op 5616 B/op 47 allocs/op
BenchmarkCompressedZipGarbage-8 50 20622757 ns/op 5616 B/op 47 allocs/op
BenchmarkCompressedZipGarbage-8 50 20628193 ns/op 5616 B/op 47 allocs/op
BenchmarkCompressedZipGarbage-8 100 20756612 ns/op 5616 B/op 47 allocs/op
BenchmarkCompressedZipGarbage-8 100 20639774 ns/op 5616 B/op 47 allocs/op
Change-Id: Iedee04f39328974c7fa272a6113d423e7ffce50f
Reviewed-on: https://go-review.googlesource.com/22585
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
When appending zip data to existing data such as a binary file the
zip headers must use the correct offset. NewWriterWithOptions
allows creating a Writer that uses the provided offset in the zip
headers.
Fixes#8669
Change-Id: I6ec64f1e816cc57b6fc8bb9e8a0918e586fc56b0
Reviewed-on: https://go-review.googlesource.com/2978
Reviewed-by: Andrew Gerrand <adg@golang.org>