[インデックス 11779] ファイルの概要

このコミットは、Go言語の標準ライブラリであるcompress/flate、compress/gzip、compress/zlibパッケージにおけるAPIの一貫性を向上させ、エラーハンドリングを改善することを目的としています。特に、NewWriterXxx系の関数の戻り値の型を統一し、gzipパッケージ内の型名をflateやzlibと整合させるためのリネームが行われました。また、GZIPメタデータに関するコメントの明確化や、テストコードのリファクタリングも含まれています。

コミット

commit cc9ed447d0afd1f5ff32bf30e094624cb704549b
Author: Nigel Tao <nigeltao@golang.org>
Date:   Fri Feb 10 18:49:19 2012 +1100

    compress: make flate, gzip and zlib's NewWriterXxx functions all return
    (*Writer, error) if they take a compression level, and *Writer otherwise.
    Rename gzip's Compressor and Decompressor to Writer and Reader, similar to
    flate and zlib.
    
    Clarify commentary when writing gzip metadata that is not representable
    as Latin-1, and fix io.EOF comment bug.
    
    Also refactor gzip_test to be more straightforward.
    
    Fixes #2839.
    
    R=rsc, r, rsc, bradfitz
    CC=golang-dev
    https://golang.org/cl/5639057

GitHub上でのコミットページへのリンク

https://github.com/golang/go/commit/cc9ed447d0afd1f5ff32bf30e094624cb704549b

元コミット内容

compressパッケージ群において、flate、gzip、zlibのNewWriterXxx関数が、圧縮レベルを引数に取る場合は(*Writer, error)を返し、そうでない場合は*Writerを返すように統一されました。また、gzipパッケージのCompressorとDecompressor型が、flateやzlibに合わせてWriterとReaderにリネームされました。

さらに、GZIPメタデータ（特にコメントやファイル名）がLatin-1で表現できない場合の挙動に関するコメントが明確化され、io.EOFに関するコメントのバグが修正されました。gzip_testのリファクタリングも行われ、より直接的なテストコードになりました。

この変更は、Issue #2839を修正するものです。

変更の背景

Go言語の初期バージョン（Go 1リリース前）において、標準ライブラリの圧縮関連パッケージ（compress/flate, compress/gzip, compress/zlib）のAPIには、いくつかの不整合や改善の余地がありました。

APIの一貫性の欠如: NewWriterXxx系の関数は、圧縮レベルの指定の有無によって戻り値の型が異なっていました。例えば、ある関数は*Writerを返し、別の関数は(*Writer, error)を返すといった状況です。Go言語の設計思想では、エラーを明示的に返すことが推奨されており、APIの一貫性は開発者がライブラリをより容易に利用するために重要です。特に、圧縮レベルの指定はエラーが発生しうる操作であるため、エラーを戻り値として返すのがGoの慣習に沿っています。
型名の不統一: compress/gzipパッケージでは、圧縮器がCompressor、解凍器がDecompressorという型名を使用していました。これに対し、compress/flateやcompress/zlibではそれぞれWriterとReaderというより一般的なI/Oインターフェースに沿った型名が使われていました。この不統一は、開発者が複数の圧縮パッケージを扱う際に混乱を招く可能性がありました。Goの標準ライブラリは、一貫性のある命名規則とAPIデザインを重視しています。
GZIPメタデータのエンコーディング問題: GZIPファイルフォーマットでは、ヘッダ内のコメントやファイル名フィールドにLatin-1エンコーディングが使用されることが規定されています。しかし、Goの文字列はUTF-8であるため、UTF-8文字列をLatin-1として書き込む際に、非Latin-1文字（例えば日本語や一部の特殊文字）が含まれていると、予期せぬ挙動やデータ破損が発生する可能性がありました。このコミットでは、この点に関するドキュメントの明確化と、非Latin-1文字が含まれる場合の適切なエラーハンドリングが求められていました。
テストコードの改善: 既存のテストコードが複雑であったり、読みにくかったりする部分があり、保守性や理解度を向上させるためのリファクタリングが必要でした。

これらの問題に対処するため、本コミットではAPIの統一、型名のリネーム、GZIPメタデータに関する修正、およびテストコードの改善が行われました。特に、Issue #2839はcompress/gzipのNewWriter関数におけるエラーハンドリングの欠如を指摘しており、このコミットによってその問題が解決されました。

前提知識の解説

このコミットの変更内容を理解するためには、以下のGo言語および圧縮技術に関する基本的な知識が必要です。

Go言語の圧縮ライブラリ

Go言語の標準ライブラリには、様々な圧縮アルゴリズムを扱うパッケージが用意されています。

compress/flate: DEFLATEアルゴリズムを実装しています。これは、ZlibやGzipの基盤となる圧縮アルゴリズムです。主に、生のDEFLATEストリームを扱う際に使用されます。
compress/gzip: GZIPファイルフォーマットを実装しています。DEFLATE圧縮データに加えて、ファイル名、コメント、タイムスタンプなどのメタデータや、CRC32チェックサム、元のデータサイズなどの情報を含むヘッダとフッタを追加します。通常、単一のファイルを圧縮・解凍する際に使用されます。
compress/zlib: Zlibデータフォーマットを実装しています。DEFLATE圧縮データに加えて、Adler-32チェックサムを含むヘッダとフッタを追加します。通常、メモリ内のデータを圧縮・解凍する際や、HTTP圧縮などで使用されます。

これらのパッケージは、それぞれ異なるフォーマットを扱いますが、内部的にはDEFLATEアルゴリズムを利用しています。

`io.Writer`と`io.Reader`インターフェース

Go言語では、I/O操作はio.Writerとio.Readerという二つの基本的なインターフェースによって抽象化されています。

io.Writer: Write(p []byte) (n int, err error)メソッドを持つインターフェースです。データを書き込む操作を抽象化します。例えば、ファイル、ネットワーク接続、バイトバッファなど、様々な出力先にデータを書き込むことができます。
io.Reader: Read(p []byte) (n int, err error)メソッドを持つインターフェースです。データを読み込む操作を抽象化します。ファイル、ネットワーク接続、バイトバッファなど、様々な入力元からデータを読み込むことができます。

圧縮ライブラリのNewWriterXxx関数は、通常io.Writerを引数に取り、圧縮されたデータをそのio.Writerに書き込みます。同様に、NewReader関数はio.Readerを引数に取り、圧縮されたデータがそのio.Readerから読み込まれます。

圧縮レベル

DEFLATEアルゴリズムに基づく圧縮では、圧縮の度合いを調整するための「圧縮レベル」を指定できます。Goのcompress/flate、compress/gzip、compress/zlibパッケージでは、以下の定数が定義されています。

DefaultCompression: デフォルトの圧縮レベル。通常、速度と圧縮率のバランスが良い設定です。
NoCompression: 圧縮を行いません。データはそのまま出力されますが、DEFLATE/Zlib/Gzipのフォーマットは維持されます。
BestSpeed: 最速の圧縮レベル。圧縮率は低くなりますが、処理速度が最優先されます。
BestCompression: 最高の圧縮率。処理速度は遅くなりますが、可能な限りデータを小さくします。

これらのレベルは通常、1から9の整数値で表現され、BestSpeedが1、BestCompressionが9に対応します。

エラーハンドリング

Go言語では、関数がエラーを返す場合、慣習として戻り値の最後の要素としてerror型の値を返します。エラーが発生しなかった場合はnilを返します。このコミットでは、NewWriterXxx関数が圧縮レベルを引数に取る場合に、無効なレベルが指定された際にエラーを返すように変更されました。これは、Goのエラーハンドリングの慣習に沿った改善です。

GZIPヘッダとLatin-1エンコーディング

GZIPファイルフォーマット（RFC 1952）では、ヘッダにファイル名（Name）とコメント（Comment）のフィールドが含まれます。これらのフィールドは、NUL終端されたISO 8859-1 (Latin-1) 文字列として格納されることが規定されています。

Latin-1 (ISO 8859-1): 西ヨーロッパ言語で広く使われる8ビット文字エンコーディングです。256種類の文字を表現できます。
UTF-8: Unicode文字を可変長でエンコードする方式です。世界中のほとんどの文字を表現できます。Go言語の文字列は内部的にUTF-8で扱われます。

このエンコーディングの違いが問題となります。GoのUTF-8文字列にLatin-1で表現できない文字（例えば、日本語の文字や、Latin-1の範囲外のUnicode文字）が含まれている場合、GZIPヘッダにそのまま書き込むと文字化けやエラーの原因となります。このコミットでは、この問題に対する注意喚起と、非Latin-1文字が含まれる場合の適切なエラー処理が導入されました。

`io.EOF`

io.EOFは、Go言語のioパッケージで定義されているエラー変数です。これは、入力の終わりに達したことを示すために、io.ReaderのReadメソッドが返す特別なエラーです。通常、Readメソッドは読み込んだバイト数とnilエラーを返しますが、入力の終わりに達した場合は、読み込んだバイト数（0の場合もある）とio.EOFを返します。このコミットでは、io.EOFに関するコメントの誤りが修正されました。

技術的詳細

このコミットは、Go言語の圧縮ライブラリのAPI設計と内部実装に複数の重要な変更を加えています。

`NewWriterXxx`関数の戻り値の統一

以前のバージョンでは、compress/flate、compress/gzip、compress/zlibパッケージのNewWriter系の関数は、圧縮レベルを引数に取るかどうかで戻り値の型が異なっていました。

変更前:
- 圧縮レベルを引数に取らない場合: *Writer
- 圧縮レベルを引数に取る場合: *Writer (エラーを返さない)
変更後:
- 圧縮レベルを引数に取らない場合: *Writer (変更なし)
- 圧縮レベルを引数に取る場合: (*Writer, error)

この変更の主な理由は、Go言語のエラーハンドリングの慣習に合わせるためです。圧縮レベルはユーザーが指定する値であり、無効な値が指定される可能性があります。このような場合にエラーを明示的に返すことで、呼び出し元がエラーを適切に処理できるようになります。例えば、flate.NewWriterやzlib.NewWriterLevelは、無効な圧縮レベルが指定された場合にerrorを返すようになりました。

gzip.NewWriterLevelも同様に(*Writer, error)を返すように変更されましたが、gzip.NewWriter（デフォルト圧縮レベルを使用）は引き続き*Writerを返します。これは、デフォルトレベルではエラーが発生しないという前提に基づいています。

この変更により、APIの一貫性が向上し、開発者は圧縮レベルを指定する際に常にエラーの可能性を考慮し、適切なエラー処理を記述することが求められるようになりました。

`gzip.Compressor`と`gzip.Decompressor`のリネーム

compress/gzipパッケージでは、圧縮器の型がCompressor、解凍器の型がDecompressorと命名されていました。これに対し、compress/flateとcompress/zlibでは、それぞれWriterとReaderという型名が使用されていました。

変更前: gzip.Compressor, gzip.Decompressor
変更後: gzip.Writer, gzip.Reader

このリネームは、Goの標準ライブラリ全体での命名規則の一貫性を確立することを目的としています。io.Writerやio.Readerインターフェースを実装する型に対して、その役割を直接的に示すWriterやReaderという名前を使用することは、Goのイディオムに沿っています。これにより、開発者は異なる圧縮パッケージを扱う際に、より直感的に型名を理解し、コードの可読性が向上します。

この変更に伴い、gzipパッケージを使用している既存のコードは、型名をCompressorからWriterへ、DecompressorからReaderへ手動で更新する必要があります。コミットメッセージやdoc/go1.htmlにもその旨が記載されています。

GZIPメタデータ（コメント、ファイル名）のLatin-1エンコーディングに関する明確化と修正

GZIPファイルフォーマットの仕様（RFC 1952）では、ヘッダ内のCommentとNameフィールドはNUL終端されたISO 8859-1 (Latin-1) 文字列として扱われます。しかし、Goの文字列はUTF-8エンコーディングです。

このコミットでは、gzip.Writer.writeStringメソッド（以前はCompressor.writeString）が、UTF-8文字列をLatin-1として書き込む際の挙動を明確化しました。具体的には、入力文字列にNUL文字（\x00）やLatin-1で表現できない文字（> 0xff）が含まれている場合、エラーを返すようになりました。これにより、不正なデータがGZIPヘッダに書き込まれることを防ぎ、開発者にエンコーディングの制約を意識させるよう促します。

以前は、非Latin-1文字が含まれていてもエラーにならず、文字化けや予期せぬ挙動を引き起こす可能性がありました。この変更により、GZIPヘッダのメタデータが仕様に準拠し、互換性が向上します。

`io.EOF`コメントバグの修正

compress/gzip/gunzip.go内のDecompressor.Read（変更後はReader.Read）のコメントに、io.EOFの扱いに関する誤りがありました。

変更前: Clients should treat data returned by Read as tentative until they receive the successful (zero length, nil error) Read marking the end of the data.
変更後: Clients should treat data returned by Read as tentative until they receive the io.EOF marking the end of the data.

Goのio.Readerインターフェースの慣習では、入力の終わりに達したことを示すのはio.EOFエラーです。Readが0バイトを読み込み、nilエラーを返すことは、通常、それ以上読み込むデータがないことを意味しますが、これはio.EOFとは異なります。この修正により、gzip.ReaderのReadメソッドの挙動がGoの標準的なio.Readerのセマンティクスと一致することが明確化されました。

`gzip_test`のリファクタリング

src/pkg/compress/gzip/gzip_test.goのテストコードが大幅にリファクタリングされました。以前はpipeヘルパー関数を使用してパイプ経由で圧縮・解凍のテストを行っていましたが、新しいテストではbytes.Bufferを使用して直接メモリ内で圧縮・解凍を行うようになりました。

この変更により、テストコードがよりシンプルで直接的になり、理解しやすくなりました。特に、TestRoundTrip関数は、Writerでデータを圧縮し、Readerで解凍するという一連の処理を、より明確なステップで記述しています。これにより、テストの意図が伝わりやすくなり、将来的なメンテナンスが容易になります。

コアとなるコードの変更箇所

このコミットにおける主要なコード変更は、以下のファイルに集中しています。

src/pkg/compress/flate/deflate.go: NewWriterおよびNewWriterDict関数のシグネチャが変更され、(*Writer, error)を返すようになりました。

--- a/src/pkg/compress/flate/deflate.go
+++ b/src/pkg/compress/flate/deflate.go
@@ -408,17 +408,22 @@ func (d *compressor) close() error {
 	return d.w.err
 }
 
-// NewWriter returns a new Writer compressing
-// data at the given level.  Following zlib, levels
-// range from 1 (BestSpeed) to 9 (BestCompression);\n-// higher levels typically run slower but compress more.\n-// Level 0 (NoCompression) does not attempt any\n-// compression; it only adds the necessary DEFLATE framing.\n    -func NewWriter(w io.Writer, level int) *Writer {
+// NewWriter returns a new Writer compressing data at the given level.
+// Following zlib, levels range from 1 (BestSpeed) to 9 (BestCompression);
+// higher levels typically run slower but compress more. Level 0
+// (NoCompression) does not attempt any compression; it only adds the
+// necessary DEFLATE framing. Level -1 (DefaultCompression) uses the default
+// compression level.
+//
+// If level is in the range [-1, 9] then the error returned will be nil.
+// Otherwise the error returned will be non-nil.
+func NewWriter(w io.Writer, level int) (*Writer, error) {
 	const logWindowSize = logMaxOffsetSize
 	var dw Writer
-	dw.d.init(w, level)
-	return &dw
+	if err := dw.d.init(w, level); err != nil {
+		return nil, err
+	}
+	return &dw, nil
 }
 
 // NewWriterDict is like NewWriter but initializes the new
@@ -427,13 +432,16 @@ func NewWriter(w io.Writer, level int) *Writer {
 // any compressed output.  The compressed data written to w
 // can only be decompressed by a Reader initialized with the
 // same dictionary.
-func NewWriterDict(w io.Writer, level int, dict []byte) *Writer {
+func NewWriterDict(w io.Writer, level int, dict []byte) (*Writer, error) {
 	dw := &dictWriter{w, false}
-	zw := NewWriter(dw, level)
+	zw, err := NewWriter(dw, level)
+	if err != nil {
+		return nil, err
+	}
 	zw.Write(dict)
 	zw.Flush()
 	dw.enabled = true
-	return zw
+	return zw, err
 }
 
 type dictWriter struct {

src/pkg/compress/gzip/gunzip.go: Decompressor型がReaderにリネームされ、関連するコメントも更新されました。

--- a/src/pkg/compress/gzip/gunzip.go
+++ b/src/pkg/compress/gzip/gunzip.go
@@ -16,9 +16,6 @@ import (
  	"time"
  )
  
-// BUG(nigeltao): Comments and Names don't properly map UTF-8 character codes outside of
-// the 0x00-0x7f range to ISO 8859-1 (Latin-1).
-
 const (
  	gzipID1     = 0x1f
  	gzipID2     = 0x8b
@@ -41,21 +38,21 @@ var ErrHeader = errors.New("invalid gzip header")
 var ErrChecksum = errors.New("gzip checksum error")
  
 // The gzip file stores a header giving metadata about the compressed file.
-// That header is exposed as the fields of the Compressor and Decompressor structs.
+// That header is exposed as the fields of the Writer and Reader structs.
 type Header struct {
  	Comment string    // comment
  	Extra   []byte    // "extra data"
  	ModTime time.Time // modification time
  	Name    string    // file name
  	OS      byte      // operating system type
 }
  
-// An Decompressor is an io.Reader that can be read to retrieve
+// A Reader is an io.Reader that can be read to retrieve
  // uncompressed data from a gzip-format compressed file.
  //
  // In general, a gzip file can be a concatenation of gzip files,
-// each with its own header.  Reads from the Decompressor
+// each with its own header.  Reads from the Reader
  // return the concatenation of the uncompressed data of each.
-// Only the first header is recorded in the Decompressor fields.
+// Only the first header is recorded in the Reader fields.
  //
  // Gzip files store a length and checksum of the uncompressed data.
-// The Decompressor will return a ErrChecksum when Read
+// The Reader will return a ErrChecksum when Read
  // reaches the end of the uncompressed data if it does not
  // have the expected length or checksum.  Clients should treat data
-// returned by Read as tentative until they receive the successful
-// (zero length, nil error) Read marking the end of the data.
-type Decompressor struct {
+// returned by Read as tentative until they receive the io.EOF
+// marking the end of the data.
+type Reader struct {
  	Header
  	r            flate.Reader
  	decompressor io.ReadCloser
@@ -75,11 +72,11 @@ type Decompressor struct {
  	err          error
  }
  
-// NewReader creates a new Decompressor reading the given reader.
+// NewReader creates a new Reader reading the given reader.
  // The implementation buffers input and may read more data than necessary from r.
-// It is the caller's responsibility to call Close on the Decompressor when done.
-func NewReader(r io.Reader) (*Decompressor, error) {
-	z := new(Decompressor)
+// It is the caller's responsibility to call Close on the Reader when done.
+func NewReader(r io.Reader) (*Reader, error) {
+	z := new(Reader)
  	z.r = makeReader(r)
  	z.digest = crc32.NewIEEE()
  	if err := z.readHeader(true); err != nil {
@@ -93,7 +90,7 @@ func get4(p []byte) uint32 {
  	return uint32(p[0]) | uint32(p[1])<<8 | uint32(p[2])<<16 | uint32(p[3])<<24
  }
  
-func (z *Decompressor) readString() (string, error) {
+func (z *Reader) readString() (string, error) {
  	var err error
  	needconv := false
  	for i := 0; ; i++ {
@@ -122,7 +119,7 @@ func (z *Decompressor) readString() (string, error) {
  	panic("not reached")
  }
  
-func (z *Decompressor) read2() (uint32, error) {
+func (z *Reader) read2() (uint32, error) {
  	_, err := io.ReadFull(z.r, z.buf[0:2])
  	if err != nil {
  		return 0, err
@@ -130,7 +127,7 @@ func (z *Decompressor) read2() (uint32, error) {
  	return uint32(z.buf[0]) | uint32(z.buf[1])<<8, nil
  }
  
-func (z *Decompressor) readHeader(save bool) error {
+func (z *Reader) readHeader(save bool) error {
  	_, err := io.ReadFull(z.r, z.buf[0:10])
  	if err != nil {
  		return err
@@ -196,7 +193,7 @@ func (z *Decompressor) readHeader(save bool) error {
  	return nil
  }
  
-func (z *Decompressor) Read(p []byte) (n int, err error) {
+func (z *Reader) Read(p []byte) (n int, err error) {
  	if z.err != nil {
  		return 0, z.err
  	}
@@ -236,5 +233,5 @@ func (z *Decompressor) Read(p []byte) (n int, err error) {
  	return z.Read(p)
  }
  
-// Calling Close does not close the wrapped io.Reader originally passed to NewReader.
-func (z *Decompressor) Close() error { return z.decompressor.Close() }\n+// Close closes the Reader. It does not close the underlying io.Reader.
+func (z *Reader) Close() error { return z.decompressor.Close() }

src/pkg/compress/gzip/gzip.go: Compressor型がWriterにリネームされ、NewWriterおよびNewWriterLevel関数のシグネチャが変更されました。また、writeStringメソッドのコメントとロジックが更新され、Latin-1エンコーディングの制約が明確化されました。

--- a/src/pkg/compress/gzip/gzip.go
+++ b/src/pkg/compress/gzip/gzip.go
@@ -7,6 +7,7 @@ package gzip
 import (
  	"compress/flate"
  	"errors"
+	"fmt"
  	"hash"
  	"hash/crc32"
  	"io"
@@ -21,9 +22,9 @@ const (
  	DefaultCompression = flate.DefaultCompression
  )
  
-// A Compressor is an io.WriteCloser that satisfies writes by compressing data written
+// A Writer is an io.WriteCloser that satisfies writes by compressing data written
  // to its wrapped io.Writer.
-type Compressor struct {
+type Writer struct {
  	Header
  	w          io.Writer
  	level      int
@@ -35,25 +36,40 @@ type Compressor struct {
  	err        error
  }
  
-// NewWriter calls NewWriterLevel with the default compression level.\n    -func NewWriter(w io.Writer) (*Compressor, error) {
-	return NewWriterLevel(w, DefaultCompression)
+// NewWriter creates a new Writer that satisfies writes by compressing data
+// written to w.
+//
+// It is the caller's responsibility to call Close on the WriteCloser when done.
+// Writes may be buffered and not flushed until Close.
+//
+// Callers that wish to set the fields in Writer.Header must do so before
+// the first call to Write or Close. The Comment and Name header fields are
+// UTF-8 strings in Go, but the underlying format requires NUL-terminated ISO
+// 8859-1 (Latin-1). NUL or non-Latin-1 runes in those strings will lead to an
+// error on Write.
+func NewWriter(w io.Writer) *Writer {
+	z, _ := NewWriterLevel(w, DefaultCompression)
+	return z
 }
  
-// NewWriterLevel creates a new Compressor writing to the given writer.
-// Writes may be buffered and not flushed until Close.
-// Callers that wish to set the fields in Compressor.Header must
-// do so before the first call to Write or Close.
-// It is the caller's responsibility to call Close on the WriteCloser when done.
-// level is the compression level, which can be DefaultCompression, NoCompression,
-// or any integer value between BestSpeed and BestCompression (inclusive).
-func NewWriterLevel(w io.Writer, level int) (*Compressor, error) {
-	z := new(Compressor)
-	z.OS = 255 // unknown
-	z.w = w
-	z.level = level
-	z.digest = crc32.NewIEEE()
-	return z, nil
+// NewWriterLevel is like NewWriter but specifies the compression level instead
+// of assuming DefaultCompression.
+//
+// The compression level can be DefaultCompression, NoCompression, or any
+// integer value between BestSpeed and BestCompression inclusive. The error
+// returned will be nil if the level is valid.
+func NewWriterLevel(w io.Writer, level int) (*Writer, error) {
+	if level < DefaultCompression || level > BestCompression {
+		return nil, fmt.Errorf("gzip: invalid compression level: %d", level)
+	}
+	return &Writer{
+		Header: Header{
+			OS: 255, // unknown
+		},
+		w:      w,
+		level:  level,
+		digest: crc32.NewIEEE(),
+	}, nil
 }
  
 // GZIP (RFC 1952) is little-endian, unlike ZLIB (RFC 1950).
@@ -70,7 +86,7 @@ func put4(p []byte, v uint32) {
 }
  
 // writeBytes writes a length-prefixed byte slice to z.w.
-func (z *Compressor) writeBytes(b []byte) error {
+func (z *Writer) writeBytes(b []byte) error {
  	if len(b) > 0xffff {
  		return errors.New("gzip.Write: Extra data is too large")
  	}
@@ -83,10 +99,10 @@ func (z *Compressor) writeBytes(b []byte) error {
  	return err
  }
  
-// writeString writes a string (in ISO 8859-1 (Latin-1) format) to z.w.
-func (z *Compressor) writeString(s string) error {
-	// GZIP (RFC 1952) specifies that strings are NUL-terminated ISO 8859-1 (Latin-1).
-	var err error
+// writeString writes a UTF-8 string s in GZIP's format to z.w.
+// GZIP (RFC 1952) specifies that strings are NUL-terminated ISO 8859-1 (Latin-1).
+func (z *Writer) writeString(s string) (err error) {
+	// GZIP stores Latin-1 strings; error if non-Latin-1; convert if non-ASCII.
  	needconv := false
  	for _, v := range s {
  		if v == 0 || v > 0xff {
@@ -114,7 +130,7 @@ func (z *Compressor) writeString(s string) error {
  	return err
  }
  
-func (z *Compressor) Write(p []byte) (int, error) {
+func (z *Writer) Write(p []byte) (int, error) {
  	if z.err != nil {
  		return 0, z.err
  	}
@@ -165,7 +181,7 @@ func (z *Compressor) Write(p []byte) (int, error) {
  			return n, z.err
  		}
  	}
-	z.compressor = flate.NewWriter(z.w, z.level)
+	z.compressor, _ = flate.NewWriter(z.w, z.level)
  }
  z.size += uint32(len(p))
  z.digest.Write(p)
@@ -173,8 +189,8 @@ func (z *Compressor) Write(p []byte) (int, error) {
  	return n, z.err
  }
  
-// Calling Close does not close the wrapped io.Writer originally passed to NewWriter.
-func (z *Compressor) Close() error {
+// Close closes the Writer. It does not close the underlying io.Writer.
+func (z *Writer) Close() error {
  	if z.err != nil {
  		return z.err
  	}

src/pkg/compress/zlib/writer.go: NewWriter, NewWriterLevel, NewWriterDict関数のシグネチャが変更され、(*Writer, error)を返すようになりました。また、内部でwriteHeaderメソッドが導入され、ヘッダ書き込み時のエラーハンドリングが改善されました。

--- a/src/pkg/compress/zlib/writer.go
+++ b/src/pkg/compress/zlib/writer.go
@@ -6,7 +6,7 @@ package zlib
  
 import (
  	"compress/flate"
-	"errors"
+	"fmt"
  	"hash"
  	"hash/adler32"
  	"io"
@@ -24,30 +24,55 @@ const (
 // A Writer takes data written to it and writes the compressed
 // form of that data to an underlying writer (see NewWriter).
 type Writer struct {
-	w          io.Writer
-	compressor *flate.Writer
-	digest     hash.Hash32
-	err        error
-	scratch    [4]byte
+	w           io.Writer
+	level       int
+	dict        []byte
+	compressor  *flate.Writer
+	digest      hash.Hash32
+	err         error
+	scratch     [4]byte
+	wroteHeader bool
 }
  
-// NewWriter calls NewWriterLevel with the default compression level.\n    -func NewWriter(w io.Writer) (*Writer, error) {
-	return NewWriterLevel(w, DefaultCompression)
+// NewWriter creates a new Writer that satisfies writes by compressing data
+// written to w.
+//
+// It is the caller's responsibility to call Close on the WriteCloser when done.
+// Writes may be buffered and not flushed until Close.
+func NewWriter(w io.Writer) *Writer {
+	z, _ := NewWriterLevelDict(w, DefaultCompression, nil)
+	return z
 }
  
-// NewWriterLevel calls NewWriterDict with no dictionary.\n    -func NewWriterLevel(w io.Writer, level int) (*Writer, error) {
-	return NewWriterDict(w, level, nil)
+// NewWriterLevel is like NewWriter but specifies the compression level instead
+// of assuming DefaultCompression.
+//
+// The compression level can be DefaultCompression, NoCompression, or any
+// integer value between BestSpeed and BestCompression inclusive. The error
+// returned will be nil if the level is valid.
+func NewWriterLevel(w io.Writer, level int) (*Writer, error) {
+	return NewWriterLevelDict(w, level, nil)
 }
  
-// NewWriterDict creates a new io.WriteCloser that satisfies writes by compressing data written to w.\n    -// It is the caller's responsibility to call Close on the WriteCloser when done.\n    -// level is the compression level, which can be DefaultCompression, NoCompression,\n    -// or any integer value between BestSpeed and BestCompression (inclusive).\n    -// dict is the preset dictionary to compress with, or nil to use no dictionary.\n    -func NewWriterDict(w io.Writer, level int, dict []byte) (*Writer, error) {
-	z := new(Writer)
+// NewWriterLevelDict is like NewWriterLevel but specifies a dictionary to
+// compress with.
+//
+// The dictionary may be nil. If not, its contents should not be modified until
+// the Writer is closed.
+func NewWriterLevelDict(w io.Writer, level int, dict []byte) (*Writer, error) {
+	if level < DefaultCompression || level > BestCompression {
+		return nil, fmt.Errorf("zlib: invalid compression level: %d", level)
+	}
+	return &Writer{
+		w:     w,
+		level: level,
+		dict:  dict,
+	}, nil
+}
+
+// writeHeader writes the ZLIB header.
+func (z *Writer) writeHeader() (err error) {
+	z.wroteHeader = true
  	// ZLIB has a two-byte header (as documented in RFC 1950).
  	// The first four bits is the CINFO (compression info), which is 7 for the default deflate window size.
  	// The next four bits is the CM (compression method), which is 8 for deflate.
@@ -56,7 +81,7 @@ func NewWriterDict(w io.Writer, level int, dict []byte) (*Writer, error) {
  	// 0=fastest, 1=fast, 2=default, 3=best.
  	// The next bit, FDICT, is set if a dictionary is given.
  	// The final five FCHECK bits form a mod-31 checksum.
-	switch level {
+	switch z.level {
  	case 0, 1:
  		z.scratch[1] = 0 << 6
  	case 2, 3, 4, 5:
@@ -66,35 +91,38 @@ func NewWriterDict(w io.Writer, level int, dict []byte) (*Writer, error) {
  	case 7, 8, 9:
  		z.scratch[1] = 3 << 6
  	default:
-		return nil, errors.New("level out of range")
+		panic("unreachable")
  	}
-	if dict != nil {
+	if z.dict != nil {
  		z.scratch[1] |= 1 << 5
  	}
  	z.scratch[1] += uint8(31 - (uint16(z.scratch[0])<<8+uint16(z.scratch[1]))%31)
-	_, err := w.Write(z.scratch[0:2])
-	if err != nil {
-		return nil, err
+	if _, err = z.w.Write(z.scratch[0:2]); err != nil {
+		return err
  	}
-	if dict != nil {
+	if z.dict != nil {
  		// The next four bytes are the Adler-32 checksum of the dictionary.
-		checksum := adler32.Checksum(dict)
+		checksum := adler32.Checksum(z.dict)
  		z.scratch[0] = uint8(checksum >> 24)
  		z.scratch[1] = uint8(checksum >> 16)
  		z.scratch[2] = uint8(checksum >> 8)
  		z.scratch[3] = uint8(checksum >> 0)
-		_, err = w.Write(z.scratch[0:4])
-		if err != nil {
-			return nil, err
+		if _, err = z.w.Write(z.scratch[0:4]); err != nil {
+			return err
  		}
  	}
-	z.w = w
-	z.compressor = flate.NewWriterDict(w, level, dict)
+	z.compressor, err = flate.NewWriterDict(z.w, z.level, z.dict)
+	if err != nil {
+		return err
+	}
  	z.digest = adler32.New()
-	return z, nil
+	return nil
 }
  
 func (z *Writer) Write(p []byte) (n int, err error) {
+	if !z.wroteHeader {
+		z.err = z.writeHeader()
+	}
  	if z.err != nil {
  		return 0, z.err
  	}
@@ -112,6 +140,9 @@ func (z *Writer) Write(p []byte) (n int, err error) {
  
 // Flush flushes the underlying compressor.
 func (z *Writer) Flush() error {
+	if !z.wroteHeader {
+		z.err = z.writeHeader()
+	}
  	if z.err != nil {
  		return z.err
  	}
@@ -121,6 +152,9 @@ func (z *Writer) Flush() error {
  
 // Calling Close does not close the wrapped io.Writer originally passed to NewWriter.
 func (z *Writer) Close() error {
+	if !z.wroteHeader {
+		z.err = z.writeHeader()
+	}
  	if z.err != nil {
  		return z.err
  	}

src/pkg/archive/zip/writer.go: flate.NewWriterの呼び出しが、エラーをチェックするように変更されました。

--- a/src/pkg/archive/zip/writer.go
+++ b/src/pkg/archive/zip/writer.go
@@ -127,7 +127,11 @@ func (w *Writer) CreateHeader(fh *FileHeader) (io.Writer, error) {
  	case Store:
  		fw.comp = nopCloser{fw.compCount}
  	case Deflate:
-		fw.comp = flate.NewWriter(fw.compCount, 5)
+		var err error
+		fw.comp, err = flate.NewWriter(fw.compCount, 5)
+		if err != nil {
+			return nil, err
+		}
  	default:
  		return nil, ErrAlgorithm
  	}

src/pkg/compress/flate/deflate_test.go, src/pkg/compress/gzip/gzip_test.go, src/pkg/compress/zlib/writer_test.go: 各パッケージのテストファイルが、API変更に合わせて更新され、特にgzip_test.goは大幅にリファクタリングされました。
doc/go1.html, doc/go1.tmpl: Go 1のドキュメントに、compress/flate, compress/gzip, compress/zlibパッケージのNewWriterXxx関数の変更と、gzipパッケージの型名変更に関する記述が追加されました。

コアとなるコードの解説

`flate.NewWriter`のシグネチャ変更 (`src/pkg/compress/flate/deflate.go`)

変更前:

func NewWriter(w io.Writer, level int) *Writer

変更後:

func NewWriter(w io.Writer, level int) (*Writer, error)

この変更により、NewWriter関数は圧縮レベルが無効な場合にエラーを返すようになりました。例えば、levelが-1から9の範囲外である場合、nilではないerrorが返されます。これにより、呼び出し元は圧縮器の初期化が成功したかどうかを確実にチェックできるようになります。

同様に、NewWriterDictも(*Writer, error)を返すように変更され、内部でNewWriterが返すエラーを伝播するようになりました。

`gzip.Decompressor`から`gzip.Reader`へのリネーム (`src/pkg/compress/gzip/gunzip.go`)

変更前:

type Decompressor struct { ... }
func NewReader(r io.Reader) (*Decompressor, error) { ... }
func (z *Decompressor) Read(p []byte) (n int, err error) { ... }

変更後:

type Reader struct { ... }
func NewReader(r io.Reader) (*Reader, error) { ... }
func (z *Reader) Read(p []byte) (n int, err error) { ... }

Decompressorという型名がReaderに変更されました。これは、io.Readerインターフェースを実装する型に対して、その役割をより直接的に示す命名規則に統一するためです。これにより、compress/flateやcompress/zlibとの一貫性が保たれ、GoのI/Oストリームの概念とより密接に結びつきます。

また、io.EOFに関するコメントの修正も行われました。以前のコメントは、Readが0バイトを読み込みnilエラーを返すことをEOFのマーカーとしていましたが、Goの慣習ではio.EOFエラー自体が入力の終わりを示します。この修正により、ドキュメントがGoの標準的なI/Oセマンティクスと一致するようになりました。

`gzip.Compressor`から`gzip.Writer`へのリネームと`NewWriter`のシグネチャ変更 (`src/pkg/compress/gzip/gzip.go`)

変更前:

type Compressor struct { ... }
func NewWriter(w io.Writer) (*Compressor, error) // NewWriterLevelを呼び出す
func NewWriterLevel(w io.Writer, level int) (*Compressor, error)

変更後:

type Writer struct { ... }
func NewWriter(w io.Writer) *Writer // NewWriterLevelを呼び出すが、エラーは内部で処理
func NewWriterLevel(w io.Writer, level int) (*Writer, error)

Compressorという型名がWriterに変更されました。これもflateやzlibとの一貫性を保つための変更です。

NewWriter関数は、デフォルトの圧縮レベルを使用するため、エラーを返さない*Writerを返すようになりました。一方、NewWriterLevelは、無効な圧縮レベルが指定された場合にfmt.Errorfを使用してエラーを返すようになりました。これにより、圧縮レベルの検証がAPIレベルで行われるようになり、より堅牢なコードになります。

`gzip.Writer.writeString`の変更 (`src/pkg/compress/gzip/gzip.go`)

変更前:

func (z *Compressor) writeString(s string) error {
    // GZIP (RFC 1952) specifies that strings are NUL-terminated ISO 8859-1 (Latin-1).
    var err error
    // ... (Latin-1変換ロジック)
    return err
}

変更後:

func (z *Writer) writeString(s string) (err error) {
    // GZIP stores Latin-1 strings; error if non-Latin-1; convert if non-ASCII.
    // ... (Latin-1変換ロジックとエラーチェック)
    if v == 0 || v > 0xff { // NUL文字またはLatin-1範囲外の文字
        return fmt.Errorf("gzip: invalid header string: non-Latin-1 character or NUL: %q", s)
    }
    // ...
}

writeStringメソッドは、GZIPヘッダのCommentやNameフィールドに文字列を書き込む際に使用されます。この変更により、入力文字列sにNUL文字（\x00）やLatin-1で表現できない文字（Unicodeコードポイントが0xffを超える文字）が含まれている場合、明示的にエラーを返すようになりました。これにより、GZIP仕様に準拠しない不正なメタデータが書き込まれることを防ぎます。

`zlib.NewWriterXxx`関数のシグネチャ変更 (`src/pkg/compress/zlib/writer.go`)

変更前:

func NewWriter(w io.Writer) (*Writer, error)
func NewWriterLevel(w io.Writer, level int) (*Writer, error)
func NewWriterDict(w io.Writer, level int, dict []byte) (*Writer, error)

変更後:

func NewWriter(w io.Writer) *Writer
func NewWriterLevel(w io.Writer, level int) (*Writer, error)
func NewWriterLevelDict(w io.Writer, level int, dict []byte) (*Writer, error)

zlibパッケージでも、NewWriterLevelとNewWriterLevelDictが圧縮レベルの検証を行い、無効なレベルの場合にエラーを返すようになりました。NewWriterはデフォルトレベルを使用するため、エラーを返さない*Writerを返します。

また、Writer構造体にwroteHeaderフィールドが追加され、writeHeaderメソッドが導入されました。これにより、Zlibヘッダの書き込みが一度だけ行われることが保証され、ヘッダ書き込み時のエラーも適切に処理されるようになりました。

`archive/zip/writer.go`における`flate.NewWriter`の呼び出し変更

変更前:

fw.comp = flate.NewWriter(fw.compCount, 5)

変更後:

var err error
fw.comp, err = flate.NewWriter(fw.compCount, 5)
if err != nil {
    return nil, err
}

flate.NewWriterがエラーを返すようになったため、archive/zipパッケージ内のwriter.goでも、その戻り値のエラーをチェックし、適切に処理するように変更されました。これは、APIの変更が他の依存するパッケージに波及する典型的な例です。

テストコードのリファクタリング (`src/pkg/compress/gzip/gzip_test.go`)

gzip_test.goでは、pipeヘルパー関数が削除され、bytes.Bufferを直接使用するよりシンプルなテストパターンに置き換えられました。

変更前 (例: TestEmpty):

func TestEmpty(t *testing.T) {
    pipe(t,
        func(compressor *Compressor) {},
        func(decompressor *Decompressor) {
            b, err := ioutil.ReadAll(decompressor)
            if err != nil {
                t.Fatalf("%v", err)
            }
            if len(b) != 0 {
                t.Fatalf("did not read an empty slice")
            }
        })
}

変更後 (例: TestEmpty):

func TestEmpty(t *testing.T) {
    buf := new(bytes.Buffer)

    if err := NewWriter(buf).Close(); err != nil {
        t.Fatalf("Writer.Close: %v", err)
    }

    r, err := NewReader(buf)
    if err != nil {
        t.Fatalf("NewReader: %v", err)
    }
    b, err := ioutil.ReadAll(r)
    if err != nil {
        t.Fatalf("ReadAll: %v", err)
    }
    if len(b) != 0 {
        t.Fatalf("got %d bytes, want 0", len(b))
    }
    if err := r.Close(); err != nil {
        t.Fatalf("Reader.Close: %v", err)
    }
}

このリファクタリングにより、テストのフローがより明確になり、パイプの複雑さを回避することで、テストのデバッグや理解が容易になりました。

参考にした情報源リンク

Go Issue #2839: compress/gzip: API - GitHub: https://github.com/golang/go/issues/2839
RFC 1952 - GZIP file format specification: https://www.rfc-editor.org/rfc/rfc1952
RFC 1950 - ZLIB Compressed Data Format Specification: https://www.rfc-editor.org/rfc/rfc1950
RFC 1951 - DEFLATE Compressed Data Format Specification: https://www.rfc-editor.org/rfc/rfc1951
ISO/IEC 8859-1 (Latin-1) - Wikipedia: https://ja.wikipedia.org/wiki/ISO/IEC_8859-1
UTF-8 - Wikipedia: https://ja.wikipedia.org/wiki/UTF-8
Go言語のioパッケージドキュメント: https://pkg.go.dev/io
Go言語のcompress/flateパッケージドキュメント: https://pkg.go.dev/compress/flate
Go言語のcompress/gzipパッケージドキュメント: https://pkg.go.dev/compress/gzip
Go言語のcompress/zlibパッケージドキュメント: https://pkg.go.dev/compress/zlib

comemo

[インデックス 11779] ファイルの概要

コミット

GitHub上でのコミットページへのリンク

元コミット内容

変更の背景

前提知識の解説

Go言語の圧縮ライブラリ

`io.Writer`と`io.Reader`インターフェース

圧縮レベル

エラーハンドリング

GZIPヘッダとLatin-1エンコーディング

`io.EOF`

技術的詳細

`NewWriterXxx`関数の戻り値の統一

`gzip.Compressor`と`gzip.Decompressor`のリネーム

GZIPメタデータ（コメント、ファイル名）のLatin-1エンコーディングに関する明確化と修正

`io.EOF`コメントバグの修正

`gzip_test`のリファクタリング

関連するIssue #2839の修正

コアとなるコードの変更箇所

コアとなるコードの解説

`flate.NewWriter`のシグネチャ変更 (`src/pkg/compress/flate/deflate.go`)

`gzip.Decompressor`から`gzip.Reader`へのリネーム (`src/pkg/compress/gzip/gunzip.go`)

`gzip.Compressor`から`gzip.Writer`へのリネームと`NewWriter`のシグネチャ変更 (`src/pkg/compress/gzip/gzip.go`)

`gzip.Writer.writeString`の変更 (`src/pkg/compress/gzip/gzip.go`)

`zlib.NewWriterXxx`関数のシグネチャ変更 (`src/pkg/compress/zlib/writer.go`)

`archive/zip/writer.go`における`flate.NewWriter`の呼び出し変更

テストコードのリファクタリング (`src/pkg/compress/gzip/gzip_test.go`)

関連リンク

参考にした情報源リンク

Keyboard shortcuts

comemo