-
-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds initial support for external attachments using the Filesystem or MinIO #1320
base: main
Are you sure you want to change the base?
Conversation
Avoids returning `undefined`, which simplifies a lot of the parent logic.
Helps document and prevent programmer errors by adding a utility function to turn it into a docId.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a batch of superficial comments from a skim. Feels like it is coming along pretty well!
app/server/lib/ActiveDoc.ts
Outdated
@@ -2822,17 +2841,21 @@ export class ActiveDoc extends EventEmitter { | |||
return this._dataEngine; | |||
} | |||
|
|||
private async _getDocumentSettings(): Promise<DocumentSettings | undefined> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are no settings, should that be an error? In what circumstances might that happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see the method gets used in a migration. That answers that. Might still make sense to throw an error, but accept that error as a non-erroneous possibility in the migration. Feels like in all other circs it should be an error? No big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can switch this to throwing an error, then wrap the code below in _makeEngine
in a try-catch block to handle this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that'd be fine
app/server/lib/AttachmentStore.ts
Outdated
* Gets the correct pool id for a given document, given the document's id and trunk id. | ||
* Avoids other areas of the codebase having to understand how documents are mapped to pools. | ||
* This is a key security point - documents which share a pool can access each others' attachments. | ||
* Therefore documents with different security domains (e.g from different teams) need to be sure not to share |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good now to talk about our strategy, for snapshots and forks to have the same pool id and share attachments?
app/server/lib/AttachmentStore.ts
Outdated
* This is a general-purpose interface that should abstract over many different storage providers, | ||
* so shouldn't have methods which rely on one the features of one specific provider. | ||
* | ||
* This is distinct from `ExternalStorage`, in that it's more generic and doesn't support versioning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm "more generic" feels wrong here, since an interface mentioning docPoolIds: DocPoolId
vs key: string
feels much more specific in some ways. Not many things in the world have DocPoolIds. Maybe drop that text?
app/server/lib/AttachmentStore.ts
Outdated
export class MinIOAttachmentStore implements IAttachmentStore { | ||
constructor( | ||
public id: string, | ||
private _storage: MinIOExternalStorage, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the reason this is using MinIOExternalStorage specifically and won't work with S3ExternalStorage or AzureExternalStorage is new Stream methods, correct? It would still help I think to anticipate that developers will add those methods for those types of storage. It is totally fine just to implement for MinIO here, since this is core, but ideally the follow up work needed would be adding the methods and making small updates, and not doing a significant refactor. Trying to bear in mind for reminder of review to watch for specific dependencies on MinIO creeping in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read rest of PR. Feels fine, no MinIO creep 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% - I can't remember if I left the "Make an interface for streamable external storage" in a TODO or not somewhere, but it's planned, so we can turn "MinIOAttachmentStore" into "ExternalStorageAttachmentStore"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done now - any external storage should be usable, if wrapped in ExternalStorageAttachmentStore
, and made to implement the streaming interface
} | ||
|
||
public async getFileData(fileIdent: string): Promise<Buffer | null> { | ||
this._log.debug({ fileIdent }, "retrieving file data"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned this logging might be a bit much. When a document is opened, it loads attachment previews. That results in 1 or 2 log lines for every attachment in a document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might indeed be a bit much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed 1 of the log lines - the other one is quite useful for debugging as it mentions the attachment pool, and which storage (document / external) the file is going into.
# Conflicts: # app/server/generateInitialDocSql.ts # app/server/lib/ActiveDoc.ts # app/server/lib/DocApi.ts # app/server/lib/FlexServer.ts # app/server/lib/coreCreator.ts # test/server/lib/HostedStorageManager.ts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is copying of documents with external attachments addressed? Might have missed it. If not, would want to be sure we don't miss that in follow-up.
app/server/lib/ActiveDoc.ts
Outdated
@@ -636,6 +648,7 @@ export class ActiveDoc extends EventEmitter { | |||
useExisting?: boolean, // If set, document can be created as an overlay on | |||
// an existing sqlite file. | |||
}): Promise<ActiveDoc> { | |||
console.log(`OPENOPENOPENOPENOPENOPENOPENOPEN ${this.docName} ANYTHING ELSE`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stray debugging message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this and others. Thanks for catching this 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan on removing it last minute? Just checking why it is still present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, likely an oversight on my part when doing the other bits of your feedback. Glad this wasn't ticked off at least!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a fairly complex setup, but given the problem, I think you have convinced me that all of this complexity is required. I could not find many opportunities for simplification.
app/server/lib/AttachmentStore.ts
Outdated
// Compatible with Document entity for ease of use | ||
export interface AttachmentStoreDocInfo { | ||
id: string; | ||
// This should NOT be an optional: the programmer should make an explicit choice for this value to prevent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So why make it optional if it shouldn't be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's inherently a nullable value - a document might not have trunkId. But it's not "optional" in the sense of typescript's "optional".
If I make it an optional (i.e typescripts ?
syntax), code like this won't error:
getDocPoolIdFromDocInfo({ id: _someDoc.id })
The above is incorrect - and would result in attachments ending up in the wrong place (the fork's attachment file pool, instead of the original document's file pool).
This is correct:
getDocPoolIdFromDocInfo({ id: _someDoc.id, trunkId: _someDoc.trunkId })
So by omitting ?
and using string | null | undefined
instead, it's hopefully catching a class of errors early.
That's what this comment is trying (and failing?) to explain. :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, boy, "optional" vs "an optional", subtle reading thing there.
Maybe rephrase, "We explicitly make this a union type instead of making the attribute optional because the programmer must make a conscious choice to mark it as null or undefined, not merely omit it."
/** | ||
* Gets the correct pool id for a given document, given the document's id and trunk id. | ||
* | ||
* Attachments are stored in a "Document Pool", which is used to manage the attachments' lifecycle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
120 is what we've got configured in .eslint.js. Happy to go lower personally, but is it worth discussing at a Grist level? We could lower the max length for comments only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't be much of a problem beyond diffs. If you can easily warp comments at a lower character count, nice. If you can't, don't worry about it and let's move on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated IDE with settings for this for the future, probably a bit too big to go through and update all comments right now :)
app/server/lib/AttachmentStore.ts
Outdated
private _storage: StreamingExternalStorage, | ||
private _prefixParts: string[] | ||
) { | ||
if (_storage.removeAllWithPrefix == undefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really want to allow the possibility of null == undefined
in this comparison?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! Honestly, a simple !_storage.removeAllWithPrefix
would do. removeAllWithPrefix
needs to be a valid function for this to work.
Would !_storage.removeAllWithPrefix
be clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. I kind of consider the ==
operator to be "legacy". At least checking for falsey is clearer in intent.
This PR is still being worked on, but open to initial review.
Context
When attachments are uploaded to Grist, they're stored in an SQLite table within the document.
The result is that large files (multi-GB) substantially increase the size of the SQLite file, and cause performance problems (for example, with snapshots).
Proposed solution
The current design / full scope of changes can be seen in this Google document..
This PR contains a minimal proof of concept for external attachments, with the following features:
No UI exists for configuring anything related to attachment storage at this time. To use a store, the API or an SQLite client needs using to set the
idOfDefaultAttachmentStore
document setting to the ID of the desired attachment store. Attachment store IDs are logged to the console.Currently only a single store of each backend type (minio, filesystem) can exist, as they're automatically configured using the existing setting.
Related issues
#1021
Review guidance / code overview
Three significant components are added in this PR, with the rest of the PR being generally glue around them:
AttachmentFileManager
(in "AttachmentFileManager.ts") handles attachment fetching / saving logic on behalf ofActiveDoc
IAttachmentStore
is an instantiated store, that handles saving / loading attachments from a specific provider.AttachmentStoreProvider
is used to resolve store IDs to stores, and is the main way of providing other components (e.g AttachmentFileManager) with access to store instances.The available types of attachment store are defined in
coreCreator.ts
, under the name "attachmentStoreBackends".Additionally:
Migrations / schema changes
_gristsys_files
has been added, which adds a column to track which store a document is in.idOfDefaultAttachmentStore
has been added to theDocumentSettings
JSON blob.Both of these should be non-breaking on older installations, as both should be ignored.
Has this been tested?
Behaviour has been manually tested in the following ways:
There are currently no unit tests. These should be added before merging, but the goal.
Recommended local testing setup
MinIO
"idOfDefaultAttachmentStore": "store id"
to_grist_DocInfo
->documentSettings
Known issues