GitHub - aws-samples/file-tagger-with-llms: How to use AI to build a content-based file tagging system using Amazon Bedrock.

Use AI to build a content-based file tagging system using Amazon Bedrock

This repository contains code samples that will show you how to analyze document and image files stored in S3 using the following techniques:

Using Large Language Models (LLMs) hosted on Amazon Bedrock.
Using Amazon Bedrock Data Automation.

Both these will show you how to generate metadata based on the content of the files and store them as key-value pairs (tags) in an Amazon DynamoDB table with a reference to the files in S3.

Overview

Amazon S3 is a popular object storage service on AWS. You can store any type of file as an object in a S3 bucket. Although you can write files of a specific type or context within a specific directory structure (path) in S3, it will be useful to add metadata to the files like it's content description, owner, context etc. so you can easily retrieve the file that you are looking for. There are two ways to do this:

Option 1: Use the user-defined metadata feature in S3

While uploading an object to a S3 bucket, you can optionally assign user-defined metadata as key-values pairs to the object. This will be stored along with the object. This cannot be added later on to an existing object. The only way to modify object metadata is to make a copy of the object and set the metadata.

Option 2: Store the metadata in an external system with a reference to the object in S3

If you want to set metadata to an existing object in S3 without copying that object or if you want to add to an existing metadata system that already exist, then it will make sense to store the metadata in an external system, like an Amazon DynamoDB table for example. This option is also applicable if the data is stored outside S3 and needs to be tagged with metadata.

In both of these options, if you do not know the metadata that describes the data stored in the object, then, you have to read the object, analyze it's content and generate the appropriate metadata. This is where AI can help.

To get started

Choose an AWS Account to use and make sure to create all resources in that Account.
Identify an AWS Region that has Amazon Bedrock with Amazon Nova 1.0 or Anthropic Claude 3/3.5 or Meta Llama 3.2 models and Amazon Bedrock Data Automation.
In that Region, create two new or use two existing Amazon S3 buckets of your choice. These will be input and output S3 buckets.
In the S3 bucket that you designate as the input bucket, upload all the files from the assets folder.
In that same Region, create a new Amazon SageMaker notebook instance with Amazon Linux 2, Jupyter Lab 4(notebook-al2-v3) as the Platform Identifier.
Clone this GitHub repo to that notebook instance.
In that notebook instance, open the following Jupyter notebooks,
1. For using Large Language Models (LLMs) hosted on Amazon Bedrock, open file-tagger-with-bedrock-llms.ipynb by navigating to the Amazon SageMaker notebook instances console and clicking on the Open Jupyter link.
2. For using Amazon Bedrock Data Automation, open file-tagger-with-bedrock-data-automation.ipynb by navigating to the Amazon SageMaker notebook instances console and clicking on the Open Jupyter link.

Repository structure

This repository contains

Two Jupyter Notebooks to get started.
A set of helper functions for the notebook
Assets folder with files that represent various types of documents and images that will be processed by the notebook. Note: Of these, Document_2.pdf is not shared under MIT-0 license.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
notebooks		notebooks
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Use AI to build a content-based file tagging system using Amazon Bedrock

Overview

To get started

Repository structure

Security

License

About

Releases

Packages

Contributors 2

Languages

License

aws-samples/file-tagger-with-llms

Folders and files

Latest commit

History

Repository files navigation

Use AI to build a content-based file tagging system using Amazon Bedrock

Overview

To get started

Repository structure

Security

License

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages