[インデックス 14924] ファイルの概要

コミット

commit 1d6eb2e9fae957ccdc4ea83b965aa41313f7d4bb
Author: Rémy Oudompheng <oudomphe@phare.normalesup.org>
Date:   Fri Jan 18 22:40:32 2013 +0100

    cmd/gc: fix handling of struct padding in hash/eq.
    
    The test case of issue 4585 was not passing due to
    miscalculation of memequal args, and the previous fix
    does not handle padding at the end of a struct.
    
    Handling of padding at end of structs also fixes the case
    of [n]T where T is such a padded struct.
    
    Fixes #4585.
    (again)
    
    R=golang-dev, rsc
    CC=golang-dev
    https://golang.org/cl/7133059

GitHub上でのコミットページへのリンク

https://github.com/golang/go/commit/1d6eb2e9fae957ccdc4ea83b965aa41313f7d4bb

元コミット内容

cmd/gc: fix handling of struct padding in hash/eq.

The test case of issue 4585 was not passing due to
miscalculation of memequal args, and the previous fix
does not handle padding at the end of a struct.

Handling of padding at end of structs also fixes the case
of [n]T where T is such a padded struct.

Fixes #4585.
(again)

R=golang-dev, rsc
CC=golang-dev
https://golang.org/cl/7133059

変更の背景

このコミットは、Goコンパイラ（cmd/gc）における構造体（struct）のパディング（padding）が、ハッシュ計算（hash）と等価性比較（eq）に与える影響を修正することを目的としています。具体的には、Go言語のIssue 4585に関連する問題の再修正です。

元の問題は、構造体内に存在するパディングバイトが、memequal（メモリ比較）やmemhash（メモリハッシュ）の計算時に正しく扱われないことにありました。特に、構造体の末尾にパディングが存在する場合や、パディングを持つ構造体の配列（[n]T）の場合に問題が発生していました。

Go言語では、構造体のフィールドはメモリ上でアライメント（alignment）の制約を満たすように配置されます。これにより、フィールド間に未使用の領域（パディング）が挿入されることがあります。ハッシュ計算や等価性比較を行う際には、これらのパディングバイトを適切に無視する必要があります。なぜなら、パディングバイトの内容は未定義であり、異なる値を持つ可能性があるため、これらを比較対象に含めると、論理的に等しいはずの構造体が等しくないと判断されたり、異なるハッシュ値が生成されたりするからです。

以前の修正では、このパディングの問題が完全に解決されておらず、特に構造体の末尾のパディングや、パディングを持つ構造体の配列のケースが考慮されていませんでした。このコミットは、これらの残された問題を解決し、構造体のハッシュと等価性比較が常に正しく行われるようにするためのものです。

前提知識の解説

Goコンパイラ (cmd/gc): Go言語の公式コンパイラです。Goのソースコードを機械語に変換する役割を担います。構造体のメモリレイアウトの決定、ハッシュ関数や等価性比較関数の生成なども行います。
構造体 (Struct): 異なる型のフィールドをまとめた複合データ型です。
構造体パディング (Struct Padding): メモリのアライメント要件を満たすために、構造体のフィールド間にコンパイラによって挿入される未使用のバイトのことです。例えば、32ビットのフィールドの後に64ビットのフィールドが続く場合、64ビットフィールドが8バイト境界に配置されるように、32ビットフィールドの後に4バイトのパディングが挿入されることがあります。
memequal: メモリ領域の内容が等しいかどうかを比較する関数です。Goコンパイラが構造体の等価性比較のために生成するコードの一部として使用されます。
memhash: メモリ領域の内容からハッシュ値を計算する関数です。Goコンパイラが構造体のハッシュ計算のために生成するコードの一部として使用されます。マップのキーとして構造体を使用する場合などに必要となります。
Type 構造体: Goコンパイラの内部で型情報を表現するために使用されるデータ構造です。フィールドのオフセット（width）、サイズ（type->width）、次のフィールドへのポインタ（down）などの情報を含みます。
TFIELD: Type 構造体の etype フィールドが示す、型が構造体のフィールドであることを示す定数です。
AMEM: algtype1 関数が返す値の一つで、型がメモリ比較/ハッシュに適していることを示します。
isblanksym: シンボルがブランク識別子（_）であるかどうかを判定する関数です。ブランク識別子で宣言されたフィールドは、ハッシュや等価性比較の対象から除外されます。

技術的詳細

Goコンパイラは、構造体のハッシュ計算と等価性比較のために、特殊なコードを生成します。これは、構造体内の各フィールドの型に基づいて、適切なハッシュ関数や比較関数を呼び出すことで実現されます。このプロセスにおいて、構造体パディングの扱いは非常に重要です。

このコミットの核心は、src/cmd/gc/subr.c 内の ispaddedfield 関数のロジック変更と、genhash および geneq 関数における memhash/memequal のサイズ計算の修正にあります。

ispaddedfield 関数の改善:
- この関数は、与えられた構造体フィールドの後にパディングが存在するかどうかを判定します。
- 変更前は、フィールドのサイズと次のフィールドのオフセットを比較してパディングを検出していました。しかし、これは構造体の最後のフィールドの後に存在するパディングを考慮していませんでした。
- 変更後、ispaddedfield は total という新しい引数を受け取るようになりました。これは、エンクロージング構造体（そのフィールドを含む構造体）の全体のサイズを示します。
- これにより、フィールドが構造体の最後のフィールドである場合（t->down == T）、そのフィールドのサイズとエンクロージング構造体の合計サイズを比較することで、末尾のパディングを正確に検出できるようになりました。
genhash および geneq における memhash/memequal のサイズ計算の修正:
- genhash は構造体のハッシュ関数を生成し、geneq は構造体の等価性比較関数を生成します。
- これらの関数は、構造体内の連続したメモリ領域（パディングを含まない）に対して memhash や memequal を適用することで効率化を図っています。
- 変更前は、memhash/memequal に渡すサイズを計算する際に、次のフィールドのオフセットと現在のフィールドのオフセットの差分を使用していました。しかし、これはパディングが正しく考慮されない場合に誤ったサイズを計算する可能性がありました。
- 変更後、offend という新しい変数が導入されました。これは、現在のメモリ領域の終端オフセットを正確に追跡します。
- size = offend - first->width という計算式により、memhash/memequal に渡されるサイズが、パディングを適切に除外した、連続するメモリ領域の実際のサイズを正確に反映するようになりました。これにより、パディングバイトがハッシュ計算や等価性比較に影響を与えることがなくなりました。

これらの変更により、構造体のメモリレイアウトにおけるパディングの有無にかかわらず、Goコンパイラが生成するハッシュ関数と等価性比較関数が、常に論理的に正しい結果を返すことが保証されます。これは、特にマップのキーとして構造体を使用する場合や、構造体の比較を行う場合に、予期せぬ動作を防ぐ上で非常に重要です。

コアとなるコードの変更箇所

`src/cmd/gc/subr.c`

ispaddedfield 関数のシグネチャが変更され、vlong total 引数が追加されました。

--- a/src/cmd/gc/subr.c
+++ b/src/cmd/gc/subr.c
@@ -504,12 +504,17 @@ nod(int op, Node *nleft, Node *nright)
 	return n;
 }

+// ispaddedfield returns whether the given field
+// is followed by padding. For the case where t is
+// the last field, total gives the size of the enclosing struct.
 static int
-ispaddedfield(Type *t)
+ispaddedfield(Type *t, vlong total)
 {
 	if(t->etype != TFIELD)
 		fatal("ispaddedfield called non-field %T", t);
-	return t->down != T && t->width + t->type->width != t->down->width;
+	if(t->down == T)
+		return t->width + t->type->width != total;
+	return t->width + t->type->width != t->down->width;
 }

 int
@@ -591,7 +596,7 @@ algtype1(Type *t, Type **bad)
 		for(t1=t->type; t1!=T; t1=t1->down) {
 			// Blank fields and padding must be ignored,
 			// so need special compare.
-			if(isblanksym(t1->sym) || ispaddedfield(t1)) {
+			if(isblanksym(t1->sym) || ispaddedfield(t1, t->width)) {
 				ret = -1;
 				continue;
 			}
@@ -2619,7 +2624,7 @@ genhash(Sym *sym, Type *t)
 	Node *hashel;
 	Type *first, *t1;
 	int old_safemode;
-	int64 size, mul;
+	int64 size, mul, offend;

 	if(debug['r'])
 		print("genhash %S %T\n", sym, t);
@@ -2705,24 +2710,21 @@ genhash(Sym *sym, Type *t)
 		// Walk the struct using memhash for runs of AMEM
 		// and calling specific hash functions for the others.
 		first = T;
+		offend = 0;
 		for(t1=t->type;; t1=t1->down) {
 			if(t1 != T && algtype1(t1->type, nil) == AMEM && !isblanksym(t1->sym)) {
+				offend = t1->width + t1->type->width;
 				if(first == T)
 					first = t1;
 				// If it's a memory field but it's padded, stop here.
-				if(ispaddedfield(t1))
+				if(ispaddedfield(t1, t->width))
 					t1 = t1->down;
 				else
 					continue;
 			}
 			// Run memhash for fields up to this one.
 			if(first != T) {
-				if(first->down == t1)
-					size = first->type->width;
-				else if(t1 == T)
-					size = t->width - first->width;  // first->width is offset
-				else
-					size = t1->width - first->width;  // both are offsets
+				size = offend - first->width; // first->width is offset
 				hashel = hashmem(first->type);
 				// hashel(h, size, &p.first)
 				call = nod(OCALL, hashel, N);
@@ -2856,6 +2858,7 @@ geneq(Sym *sym, Type *t)
 	Type *t1, *first;
 	int old_safemode;
 	int64 size;
+	int64 offend;

 	if(debug['r'])
 		print("geneq %S %T\n", sym, t);
@@ -2929,12 +2932,14 @@ geneq(Sym *sym, Type *t)
 		// and calling specific equality tests for the others.
 		// Skip blank-named fields.
 		first = T;
+		offend = 0;
 		for(t1=t->type;; t1=t1->down) {
 			if(t1 != T && algtype1(t1->type, nil) == AMEM && !isblanksym(t1->sym)) {
+				offend = t1->width + t1->type->width;
 				if(first == T)
 					first = t1;
 				// If it's a memory field but it's padded, stop here.
-				if(ispaddedfield(t1))
+				if(ispaddedfield(t1, t->width))
 					t1 = t1->down;
 				else
 					continue;
@@ -2952,10 +2957,7 @@ geneq(Sym *sym, Type *t)
 					\tfn->nbody = list(fn->nbody, eqfield(np, nq, newname(first->sym), neq));
 				} else {
 					// More than two fields: use memequal.
-					if(t1 == T)
-						size = t->width - first->width;  // first->width is offset
-					else
-						size = t1->width - first->width;  // both are offsets
+					size = offend - first->width; // first->width is offset
 					fn->nbody = list(fn->nbody, eqmem(np, nq, newname(first->sym), size, neq));
 				}
 				first = T;

`test/fixedbugs/issue4585.go`

新しい構造体 V と W が追加され、それぞれ異なるパディングのシナリオをテストします。
- V: 最初のフィールドにはパディングがないが、途中のフィールドにパディングがあるケース。
- W: 構造体の末尾にパディングがあるケース。
test4() と test5() 関数が追加され、それぞれ V と W 構造体に対する等価性比較とハッシュ計算のテストを行います。これらのテストは、memequal と memhash がパディングを正しく無視していることを検証します。
```
// V has padding but not on the first field.
type V struct {
	A1, A2, A3 int32
	B          int16
	C          int32
}

// W has padding at the end.
type W struct {
	A1, A2, A3 int32
	B          int32
	C          int8
}
```

コアとなるコードの解説

`ispaddedfield` 関数の変更

変更前:

static int
ispaddedfield(Type *t)
{
	if(t->etype != TFIELD)
		fatal("ispaddedfield called non-field %T", t);
	return t->down != T && t->width + t->type->width != t->down->width;
}

この関数は、現在のフィールド t の後にパディングがあるかどうかをチェックします。t->width はフィールドのオフセット、t->type->width はフィールド自体のサイズ、t->down->width は次のフィールドのオフセットです。もし t->width + t->type->width（現在のフィールドの終端オフセット）が t->down->width（次のフィールドの開始オフセット）と異なる場合、その間にパディングが存在すると判断していました。しかし、このロジックでは、t->down == T（つまり、現在のフィールドが構造体の最後のフィールドである場合）のパディングを検出できませんでした。

変更後:

static int
ispaddedfield(Type *t, vlong total)
{
	if(t->etype != TFIELD)
		fatal("ispaddedfield called non-field %T", t);
	if(t->down == T)
		return t->width + t->type->width != total;
	return t->width + t->type->width != t->down->width;
}

新しい total 引数は、エンクロージング構造体全体のサイズを表します。

もし t->down == T であれば、現在のフィールドは構造体の最後のフィールドです。この場合、t->width + t->type->width（最後のフィールドの終端オフセット）が total（構造体全体のサイズ）と異なる場合、構造体の末尾にパディングが存在すると判断します。
それ以外の場合（途中のフィールドの場合）は、以前と同様に次のフィールドのオフセットと比較します。この変更により、構造体の末尾のパディングも正確に検出できるようになりました。

`genhash` および `geneq` 関数の変更

genhash (ハッシュ生成) と geneq (等価性比較生成) の両関数で、memhash や memequal に渡すサイズを計算するロジックが修正されました。

変更前 (genhash の例):

				if(first->down == t1)
					size = first->type->width;
				else if(t1 == T)
					size = t->width - first->width;  // first->width is offset
				else
					size = t1->width - first->width;  // both are offsets

このロジックは、連続するメモリ領域のサイズを計算しようとしていますが、パディングの存在によって誤ったサイズを計算する可能性がありました。特に、t1 == T のケース（構造体の最後まで到達した場合）では、t->width（構造体全体のサイズ）から first->width（最初のフィールドのオフセット）を引いていましたが、これは構造体全体のサイズにパディングが含まれるため、正確なデータ部分のサイズを反映していませんでした。

変更後 (genhash の例):

				offend = t1->width + t1->type->width; // 新しく追加された行
				// ...
				size = offend - first->width; // first->width is offset

新しい offend 変数は、現在の連続するメモリ領域の終端オフセットを正確に追跡します。t1->width + t1->type->width は、t1 が指すフィールドの終端オフセットです。この offend から first->width（連続するメモリ領域の開始オフセット）を引くことで、パディングを含まない純粋なデータ部分のサイズを正確に計算できるようになりました。これにより、memhash や memequal が常に正しいサイズのメモリ領域に対して操作を行うことが保証されます。

これらの変更は、Go言語の型システムにおける構造体のメモリレイアウトと、それに対するハッシュおよび等価性比較の正確性を確保するために不可欠です。

参考にした情報源リンク

Go言語のIssueトラッカー
Go言語のソースコード
Go言語のドキュメント (構造体、メモリレイアウト、アライメントに関する情報)
Go言語のコンパイラに関する技術記事や解説

comemo