Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add glossary to user manual docs #1091

Merged
merged 3 commits into from
Jan 6, 2025
Merged

Add glossary to user manual docs #1091

merged 3 commits into from
Jan 6, 2025

Conversation

fiver-watson
Copy link
Contributor

@fiver-watson fiver-watson commented Dec 19, 2024

Modified the glossary developed for SFA for general use in the documentation. I mentioned this to @sallain before pushing, but a couple notes on some of the decisions made here:

  • Markdown doesn't have native support for glossaries, and definition lists are also part of extended markdown. I wasn't sure if Mkdocs supports definition lists, but you also can't hyperlink to a definition anyways. In the end, I decided to use level 3 headers for the terms, with the definitions just added in text below.
  • I have added this to the User Manual, and not as a glossary for all manuals, because this is a glossary of highly-specific domain language, intended primarily for operators with familiarity in digital preservation concepts and terminology. Some definitions may not match exactly how terms are used in a developer's context, and I wanted to keep the focus on the domain. We can always change this decision later, or add different glossaries to the other manuals as needed.
  • I have also chosen NOT to go through all existing user manual pages to find glossary terms used in text and link them back to the glossary. Sara and I propose we not do this throughout unless there are specific instances where it seems useful or appropriate - doing so every time a domain term from the glossary appears will be a writing and maintenance nightmare. Also, since markdown anchors are page-specific, you can't call them from anywhere, meaning the links would first need to link to the relevant md file, and then the anchor within that file - meaning each link would be annoying to look up, and the links would be long and intrusive when trying to write or edit documentation. We can always make more specific mention of the glossary on the User Manual README / landing page, but in general, the idea is that if readers encounter a term they don't understand or would like clarity on how it is used in the Enduro docs, they can go find it in the glossary.
  • Similarly, in the glossary itself, I have bolded the first time a different glossary term is used in a given definition, rather than hyperlinking it. I mentioned this in the glossary's intro text - readers will get an indication when part of the definition uses another term they can look up or cross-reference. This was for consistency with the broader decision not to hyperlink every instance of a term throughout our documentation, and for ease of writing up this draft for review. Again, happy to change this based on feedback or in the future.
  • Finally - another thing basic markdown doesn't support is footnotes. We had one footnote referencing the PREMIS Data Dictionary in our original version of the glossary, which I recreated here with headings. I originally used a lower level 4 heading for the Footnotes section at the bottom of the page, but the Readthedocs theme then nested the "Footnotes" header under the last glossary term in the sidebar navigation menu, so I moved it all up a level. We could possibly just remove this one footnote, but I thought I would leave it for review and discussion in this PR.

Looking forward to feedback!

@fiver-watson fiver-watson added the docs documentation update label Dec 19, 2024
@fiver-watson
Copy link
Contributor Author

goddamnit, that was premature apparently. fixing now...

@fiver-watson
Copy link
Contributor Author

Okay, ready to go!

Copy link

codecov bot commented Dec 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 54.69%. Comparing base (ee2a5c3) to head (3e8e40c).
Report is 5 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1091   +/-   ##
=======================================
  Coverage   54.69%   54.69%           
=======================================
  Files         105      105           
  Lines        7696     7696           
=======================================
  Hits         4209     4209           
  Misses       3228     3228           
  Partials      259      259           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@sallain sallain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

format specifications, smaller file sizes, widespread adoption and support
(particularly in web browsers), high compression rates, etc. Access derivatives
are typically used in the creation of **Dissemination Information Packages
(DIPs)**.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bullet formatting went a bit funny here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed


A preservation-relevant action that involves at least one **object** and/or
**agent**. Events are typically captured in the preservation **metadata** of a
**package** during a preservation work**flow using the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

work**flow should be **workflow**

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@fiver-watson fiver-watson requested a review from sallain January 6, 2025 17:24
@fiver-watson fiver-watson merged commit f5d5477 into main Jan 6, 2025
14 checks passed
@fiver-watson fiver-watson deleted the dev/add-glossary branch January 6, 2025 19:25
Copy link
Collaborator

@djjuhasz djjuhasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fiver-watson I have requested many changes, but do not despair for they are all small changes. :)

@@ -0,0 +1,368 @@
# Glossary

**NOTE**: This glossary of terms outlines the domain-specific language used when
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think NOTE: is necessary as the text nicely explains the purpose of the Glossary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


## Agent

An actor (human, machine, or software) associated with one or more **events**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the agent actively initiates an event, rather than just being associated with the event?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I am less in agreement with. I was trying to avoid getting too detailed, but in the PREMIS Data Dictionary for example, An Agent:

May hold or grant one or more Rights.
May carry out, authorize, or compel one or more Events.
May create or act upon one or more Objects through an Event or with respect to a Rights statement.

So, much like in archival descriptions, an actor is not always the creator.

## Archival Information Package (AIP)

A type of package derived from a **Processing Information Package (PIP)** that
is transformed during a preservation **workflow** into one or more AIPs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definition is circular. I think "...into one or more AIPs," should be cut from this paragraph. The detail that more than one AIP can be produced from a single PIP can be added to one of the subsequent paragraphs.

is transformed during a preservation **workflow** into one or more AIPs,
depending on the **preservation policies** defined in the **preservation engine**.

An AIP is the output of the **preservation engine**, consisting of one or more
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe this general definition of an AIP should be the first paragraph?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i have tried rewording the whole AIP definition to avoid tautology a bit more.


## Child workflow

Child workflows are a feature of the [workflow engine](#workflow-engine),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to start with a general definition of a child workflow, without limiting the definition to a feature of the workflow engine. Something like "A child workflow is an ancillary workflow that is spawned by a parent workflow." There are other ways to create and run a child workflow besides Temporal's child workflow mechanism.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, i have taken your advice and tried rewording the definition a bit. That said, while i agree that it's not just a Temporal feature, this was part of the whole reason I reworked the Components docs to be more general. Workflows in general are going to be managed by some kind of "workflow engine" in Enduro, regardless of whether it's Cadence, Temporal, or something else. If we ever found a need and a way to run workflows and child workflows in the Enduro stack outside of the workflow engine, then I had thought that would be a big enough change to warrant updating the definition.

In any case, I removed the reference to the workflow engine from this definition.


## Intellectual entity

The [PREMIS](https://www.loc.gov/standards/premis/) 3 Data Dictionary defines an
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "3" looks so lonely outside the hyperlink. 🥺 Maybe "The PREMIS v3 Data Dictionary defines..."?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


An **intellectual entity** describing a type of collection, composed of a set of
**files** and related **metadata**, assembled together for a particular purpose.
A package will typically contain one or more **objects**, zero or more
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the definition of "Object" above a package must always contain one or more objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure... but adding "must" and "may" gets a bit complex. I just removed the word "typically" - does that work?


## Preservation processing

A phase in a preservation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short line.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also should "preservation workflow" be bold?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol i missed a lot of terms that should have been bolded in that one. Fixed.

workflow describing all the tasks that occur after Ingest, when a PIP is sent to
the preservation engine for transformation into one or more AIPs. Prior to final
bagging, compression, and storage, any AIPs created during ingest may also
undergo any additional Post-ingest activities defined in the system. Examples of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"post-ingest" shouldn't be capitalized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


## Watched directory

A filesystem directory used that is configured with a small command-line utility
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be shortened to "A filesystem directory that is monitored for changes (e.g. adding, deleting, or renaming a file), and where such a change may trigger one or more subsequent actions (e.g. preservation processing)."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!

@djjuhasz
Copy link
Collaborator

djjuhasz commented Jan 7, 2025

Oh, this was merged already. Nevermind. :(

@fiver-watson
Copy link
Contributor Author

Sorry @djjuhasz - wasn't sure if you wanted to get into the nitty gritty of domain stuff, so after sara reviewed and approved, I merged it... but it's good feedback - i can use it on a new PR. Thanks for taking the time!

@fiver-watson
Copy link
Contributor Author

Ok @djjuhasz I have replied to your comments above, but implemented the changes in a new PR - see #1097

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs documentation update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants