GitHub - sryaco/aitools: AI tools used to test the WIlliam Elliot Griffis manuscript collection at Rutgers University Libraries

Readme.md

Aitools repository AI tools tested on the William Elliot Griffis manuscript collection at Rutgers University Libraries

Sonia Yaco Rutgers University 2024

Locations: Notebooks are in \notebooks Photographs A small number of photos that can be used for clustering and mapping are in \data and \data\photos

The full corpus of digitized Griffis Japan images used in testing, 427 tiff files, 10 Gig is available for download from: Google Drive -https://drive.google.com/drive/folders/1U-NIDpXC5cUOzNW0fZ0H8mk9Q5PH3xgG?usp=drive_link

Program names with descriptions Cosine_similarity.ipynb Compares two utf-8 formatted texts and calculates the cosine similarity.

Image_cluster.ipynb Creates 4 groups of photos, groups by content similarly. Prints 5 of each group.

Image_clustermatch.ipynb • Image clustering Creates 4 groups of photos, groups by content similarly. Prints 7 of each group on screen and to png file • Matches one images to all, selecting 5 closest. No reprocessing of corpus is needed so it can be re-run quickly, changing file names each time of a photo to match. Original picture is displayed, then 5 matches.

Image_match.ipynb Provides the top 5 most similar images to one selected image, based on VVG16 pattern similarity.

NER.ipynb Three routines: • NER alpha order by word Spacy and NLTK create ner lists in output file, in word order

• NER in category order Spacy and NLTK create ner lists in output file, sorted by NER category

• NER color coded word visualizations – two versions o All NERs are shown color coded in context o Just three filtered labels, ('PERSON', 'ORG', 'GPE') are shown in original word order but no context.

ngrams.ipynb Builds three lists of n-grams from first input file (diary), second input file (biography), and common to both. Defaults to n-grams length of 1.

Sentiment_analysis.ipynb Produces numeric scores and visual graph of sentiment by paragraph. Output to screen and text and png file.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
analysis		analysis
data		data
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

sryaco/aitools

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages