From 54677ce2a5411e46aafeb8d81a7daac5753f5e13 Mon Sep 17 00:00:00 2001 From: Philip Durbin Date: Tue, 19 Nov 2024 08:49:54 -0500 Subject: [PATCH 1/5] file detection using first few bytes disabled on direct upload --- doc/sphinx-guides/source/api/native-api.rst | 1 + doc/sphinx-guides/source/developers/big-data-support.rst | 1 + 2 files changed, 2 insertions(+) diff --git a/doc/sphinx-guides/source/api/native-api.rst b/doc/sphinx-guides/source/api/native-api.rst index b464b6df393..4d6d6c28a49 100644 --- a/doc/sphinx-guides/source/api/native-api.rst +++ b/doc/sphinx-guides/source/api/native-api.rst @@ -3617,6 +3617,7 @@ The fully expanded example above (without environment variables) looks like this Currently the following methods are used to detect file types: - The file type detected by the browser (or sent via API). +- Custom code that reads the first few bytes. As explained at :ref:`s3-direct-upload-features-disabled`, this code is disabled during direct upload to S3. However, this code is active when the "redetect" API is used. - JHOVE: https://jhove.openpreservation.org - The file extension (e.g. ".ipybn") is used, defined in a file called ``MimeTypeDetectionByFileExtension.properties``. - The file name (e.g. "Dockerfile") is used, defined in a file called ``MimeTypeDetectionByFileName.properties``. diff --git a/doc/sphinx-guides/source/developers/big-data-support.rst b/doc/sphinx-guides/source/developers/big-data-support.rst index 759dd40413b..dad0af5ae87 100644 --- a/doc/sphinx-guides/source/developers/big-data-support.rst +++ b/doc/sphinx-guides/source/developers/big-data-support.rst @@ -44,6 +44,7 @@ Features that are Disabled if S3 Direct Upload is Enabled The following features are disabled when S3 direct upload is enabled. - Unzipping of zip files. (See :ref:`compressed-files`.) +- Detection of file type based on custom code that reads the first few bytes. (See :ref:`redetect-file-type`.) - Extraction of metadata from FITS files. (See :ref:`fits`.) - Creation of NcML auxiliary files (See :ref:`netcdf-and-hdf5`.) - Extraction of a geospatial bounding box from NetCDF and HDF5 files (see :ref:`netcdf-and-hdf5`) unless :ref:`dataverse.netcdf.geo-extract-s3-direct-upload` is set to true. From 89b8a1ccb46d553d0b31af55e6fb3e03c58cbabb Mon Sep 17 00:00:00 2001 From: landreev Date: Tue, 19 Nov 2024 10:02:27 -0500 Subject: [PATCH 2/5] Update doc/sphinx-guides/source/api/native-api.rst Co-authored-by: Philip Durbin --- doc/sphinx-guides/source/api/native-api.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/sphinx-guides/source/api/native-api.rst b/doc/sphinx-guides/source/api/native-api.rst index 4d6d6c28a49..929dac592e0 100644 --- a/doc/sphinx-guides/source/api/native-api.rst +++ b/doc/sphinx-guides/source/api/native-api.rst @@ -3617,7 +3617,7 @@ The fully expanded example above (without environment variables) looks like this Currently the following methods are used to detect file types: - The file type detected by the browser (or sent via API). -- Custom code that reads the first few bytes. As explained at :ref:`s3-direct-upload-features-disabled`, this code is disabled during direct upload to S3. However, this code is active when the "redetect" API is used. +- Custom code that reads the first few bytes. As explained at :ref:`s3-direct-upload-features-disabled`, this method of file type detection is not utilized during direct upload to S3, since by nature of direct upload Dataverse never sees the contents of the file. However, this code is utilized when the "redetect" API is used. - JHOVE: https://jhove.openpreservation.org - The file extension (e.g. ".ipybn") is used, defined in a file called ``MimeTypeDetectionByFileExtension.properties``. - The file name (e.g. "Dockerfile") is used, defined in a file called ``MimeTypeDetectionByFileName.properties``. From be0979d8047d5e1aaa2191830b332233a42d91fd Mon Sep 17 00:00:00 2001 From: landreev Date: Tue, 19 Nov 2024 10:02:33 -0500 Subject: [PATCH 3/5] Update doc/sphinx-guides/source/api/native-api.rst Co-authored-by: Philip Durbin --- doc/sphinx-guides/source/api/native-api.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/sphinx-guides/source/api/native-api.rst b/doc/sphinx-guides/source/api/native-api.rst index 929dac592e0..442e4862dab 100644 --- a/doc/sphinx-guides/source/api/native-api.rst +++ b/doc/sphinx-guides/source/api/native-api.rst @@ -3618,7 +3618,7 @@ Currently the following methods are used to detect file types: - The file type detected by the browser (or sent via API). - Custom code that reads the first few bytes. As explained at :ref:`s3-direct-upload-features-disabled`, this method of file type detection is not utilized during direct upload to S3, since by nature of direct upload Dataverse never sees the contents of the file. However, this code is utilized when the "redetect" API is used. -- JHOVE: https://jhove.openpreservation.org +- JHOVE: https://jhove.openpreservation.org . Note that the same applies about direct upload to S3 and the "redirect" API. - The file extension (e.g. ".ipybn") is used, defined in a file called ``MimeTypeDetectionByFileExtension.properties``. - The file name (e.g. "Dockerfile") is used, defined in a file called ``MimeTypeDetectionByFileName.properties``. From 813063e5454623c0cc7b5a32d76b57346ca01d5d Mon Sep 17 00:00:00 2001 From: landreev Date: Tue, 19 Nov 2024 10:02:45 -0500 Subject: [PATCH 4/5] Update doc/sphinx-guides/source/developers/big-data-support.rst Co-authored-by: Philip Durbin --- doc/sphinx-guides/source/developers/big-data-support.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/sphinx-guides/source/developers/big-data-support.rst b/doc/sphinx-guides/source/developers/big-data-support.rst index dad0af5ae87..f3d98fae0bf 100644 --- a/doc/sphinx-guides/source/developers/big-data-support.rst +++ b/doc/sphinx-guides/source/developers/big-data-support.rst @@ -44,7 +44,7 @@ Features that are Disabled if S3 Direct Upload is Enabled The following features are disabled when S3 direct upload is enabled. - Unzipping of zip files. (See :ref:`compressed-files`.) -- Detection of file type based on custom code that reads the first few bytes. (See :ref:`redetect-file-type`.) +- Detection of file type based on JHOVE and custom code that reads the first few bytes. (See :ref:`redetect-file-type`.) - Extraction of metadata from FITS files. (See :ref:`fits`.) - Creation of NcML auxiliary files (See :ref:`netcdf-and-hdf5`.) - Extraction of a geospatial bounding box from NetCDF and HDF5 files (see :ref:`netcdf-and-hdf5`) unless :ref:`dataverse.netcdf.geo-extract-s3-direct-upload` is set to true. From e5f841597dcd4be4150c969da40d70777a3ed773 Mon Sep 17 00:00:00 2001 From: Philip Durbin Date: Tue, 19 Nov 2024 10:04:48 -0500 Subject: [PATCH 5/5] Update doc/sphinx-guides/source/api/native-api.rst --- doc/sphinx-guides/source/api/native-api.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/sphinx-guides/source/api/native-api.rst b/doc/sphinx-guides/source/api/native-api.rst index 442e4862dab..691aa94c834 100644 --- a/doc/sphinx-guides/source/api/native-api.rst +++ b/doc/sphinx-guides/source/api/native-api.rst @@ -3618,7 +3618,7 @@ Currently the following methods are used to detect file types: - The file type detected by the browser (or sent via API). - Custom code that reads the first few bytes. As explained at :ref:`s3-direct-upload-features-disabled`, this method of file type detection is not utilized during direct upload to S3, since by nature of direct upload Dataverse never sees the contents of the file. However, this code is utilized when the "redetect" API is used. -- JHOVE: https://jhove.openpreservation.org . Note that the same applies about direct upload to S3 and the "redirect" API. +- JHOVE: https://jhove.openpreservation.org . Note that the same applies about direct upload to S3 and the "redetect" API. - The file extension (e.g. ".ipybn") is used, defined in a file called ``MimeTypeDetectionByFileExtension.properties``. - The file name (e.g. "Dockerfile") is used, defined in a file called ``MimeTypeDetectionByFileName.properties``.