Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/gopls: out-of-bounds slice panic in bug in frob.(*reader).bytes #71244

Open
adonovan opened this issue Jan 13, 2025 · 6 comments
Open
Assignees
Labels
BugReport Issues describing a possible bug in the Go implementation. gopls/telemetry-wins gopls Issues related to the Go language server, gopls. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Tools This label describes issues relating to any tools in the x/tools repository.
Milestone

Comments

@adonovan
Copy link
Member

#!stacks
"runtime.goPanicSliceAcap" && "frob.(*reader).bytes"

Issue created by stacks.

func (r *reader) bytes(n int) []byte {
	v := r.data[:n]
	r.data = r.data[n:] // <--- panic
	return v
}

This stack MZjt-Q was reported by telemetry:

golang.org/x/tools/gopls@v0.17.1 go1.23.3 linux/amd64 vscode (1)
@adonovan adonovan added gopls Issues related to the Go language server, gopls. gopls/telemetry-wins NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Tools This label describes issues relating to any tools in the x/tools repository. labels Jan 13, 2025
@gopherbot gopherbot added this to the Unreleased milestone Jan 13, 2025
@adonovan
Copy link
Member Author

If the line number is accurate, the r.data[n:] operation is implicated, but it cannot fail if the r.data[:n] operation on the previous line succeeded. So this makes me doubt the line number.

Looking at the disassembly, the check whose failure calls panicSliceACap is the very first thing the function does after its prolog:

golang.org/x/tools/gopls/internal/util/frob.(*reader).bytes STEXT size=176 args=0x10 locals=0x18 funcid=0x0 align=0x0
	0x0000 00000 	L0390	TEXT	golang.org/x/tools/gopls/internal/util/frob.(*reader).bytes(SB), ABIInternal, $32-16
	0x0000 00000 	L0390	MOVD	16(g), R16
	0x0004 00004 	L0390	PCDATA	$0, $-2
	0x0004 00004 	L0390	CMP	R16, RSP
	0x0008 00008 	L0390	BLS	148
	0x000c 00012 	L0390	PCDATA	$0, $-1
	0x000c 00012 	L0390	MOVD.W	R30, -32(RSP)
	0x0010 00016 	L0390	MOVD	R29, -8(RSP)
	0x0014 00020 	L0390	SUB	$8, RSP, R29
	0x0018 00024 	L0390	FUNCDATA	$0, gclocals·2NSbawKySWs0upw55xaGlw==(SB)
	0x0018 00024 	L0390	FUNCDATA	$1, gclocals·ISb46fRPFoZ9pIfykFK/kQ==(SB)
	0x0018 00024 	L0390	FUNCDATA	$5, golang.org/x/tools/gopls/internal/util/frob.(*reader).bytes.arginfo1(SB)
	0x0018 00024 	L0390	FUNCDATA	$6, golang.org/x/tools/gopls/internal/util/frob.(*reader).bytes.argliveinfo(SB)
	0x0018 00024 	L0390	PCDATA	$3, $1
	0x0018 00024 	L0391	MOVD	16(R0), R2   ; r2 = r.data.cap
	0x001c 00028 	L0391	CMP	R2, R1     ; n < r.data.cap
	0x0020 00032 	L0391	BHI	140   ;  if n >= cap(r.data) ...
...
	0x008c 00140 	L0391	CALL	runtime.panicSliceAcap(SB)

This implicates the first slice operation---and means the line number is indeed off by one.

In any case, the frob decoder is clearly being fed an illegal short message that doesn't even contain the 4-byte magic number.

@adonovan
Copy link
Member Author

adonovan commented Jan 13, 2025

This looks like a consequence of execution continuing after the failed bug.Reportf in typerefData:

// typerefData retrieves encoded typeref data from the filecache, or computes it on
// a cache miss.
func (s *Snapshot) typerefData(ctx context.Context, id PackageID, imports map[ImportPath]*metadata.Package, cgfs []file.Handle) ([]byte, error) {
	key := typerefsKey(id, imports, cgfs)
	if data, err := filecache.Get(typerefsKind, key); err == nil {
		return data, nil
	} else if err != filecache.ErrNotFound {
		bug.Reportf("internal error reading typerefs data: %v", err) /// <-- should return an error
	}
...

The bug.Reportf should be turned into return bug.Errorf.

[Edit: nope, the code here looks fine. An unexpected error is just a cache miss. The problem must come either from Get returning a tiny or empty file and no error.]

@adonovan adonovan self-assigned this Jan 13, 2025
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/642435 mentions this issue: gopls/internal/cache: Snapshot.typerefData: add missing error return

@gabyhelp gabyhelp added the BugReport Issues describing a possible bug in the Go implementation. label Jan 13, 2025
@adonovan
Copy link
Member Author

This is a really weird one. What we know:

  • Snapshot.typerefData returns an empty or short (<4) data and no error.
  • This must have come from filecache.Get, or from typerefs.Encode.
  • typerefs.Encode cannot return a string that doesn't have the magic number as a prefix, so it cannot be that.
  • filecache.Get can succeed either from the LRU cache, or from the complete sequence of I/O steps. The LRU cache doesn't seem to have any opportunity to truncate good data; therefore the I/O sequence must have returned a short/empty file.
  • Get's I/O sequence fails if the digest of the file content doesn't match the file name; so the short/empty file must have been written as such. Faulty storage hardware or external user meddling (e.g. deletion, renaming, edits, or truncation) could not cause this situation.

The only explanations I can think of are:

  • a hardware fault corrupted the slice after loading it. Seems very unlikely.
  • someone went to some trouble to intentionally creating a corresponding index/CAS file pair. Who would do that?

What am I missing?

gopherbot pushed a commit to golang/tools that referenced this issue Jan 13, 2025
Also, add justification for fall through.

Updates golang/go#71244

Change-Id: I781d015a9d4659815588e95dea92eb350388b925
Reviewed-on: https://go-review.googlesource.com/c/tools/+/642435
Reviewed-by: Robert Findley <rfindley@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@findleyr
Copy link
Member

findleyr commented Jan 13, 2025

Skimming filecache.gc, it doesn't look like we care to delete the casfile before the content file. Therefore, doesn't it seem possible that we could read the content file concurrent to a deletion?

EDIT: err, I suppose the order of deletion doesn't really matter: the fact is we could try to read a content file concurrent with another process GCing that file.

EDIT2: nevermind, we'd fail the hash comparison, of course. OK, I agree that I don't understand this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BugReport Issues describing a possible bug in the Go implementation. gopls/telemetry-wins gopls Issues related to the Go language server, gopls. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Tools This label describes issues relating to any tools in the x/tools repository.
Projects
None yet
Development

No branches or pull requests

4 participants