Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specification: Contradicting statements in 1.2 text #381

Open
elichad opened this issue Dec 12, 2024 · 9 comments
Open

Specification: Contradicting statements in 1.2 text #381

elichad opened this issue Dec 12, 2024 · 9 comments
Labels
question Up for discussion
Milestone

Comments

@elichad
Copy link
Contributor

elichad commented Dec 12, 2024

Contradicting statements that I've found while proofreading the 1.2 spec. There may be more discovered as I continue...

  1. In structure.md we have both:
    L72

    The payload directory (and its child directory) contains files and directories that SHOULD be described within the RO-Crate Metadata File as Data Entities.

    L197

    Payload files may appear directly in the RO-Crate Root alongside the RO-Crate Metadata File, and/or appear in sub-directories of the RO-Crate Root. Each file and directory MAY be represented as Data Entities in the RO-Crate Metadata File.

    Should these both be MAY, or both be SHOULD?

  2. In metadata.md, L72 and L100

    Property references to other entities (e.g. author property to a Person entity) SHOULD use the { "@id": "..."} object form (see JSON-LD appendix).

    JSON-LD examples given on the [Schema.org] website might not be in flattened form; any nested entities in RO-Crate JSON-LD SHOULD be described as separate contextual entities in the flat @graph list.

    As the JSON-LD MUST be flattened, should these statements be upgraded to MUST as a consequence?

  3. root-data-entity.md, L151-153

    The root data entity's @id SHOULD be either ./ (indicating the directory containing ro-crate-metadata.json is the RO-Crate Root), or an absolute URI (indicating a detached RO-Crate).

    If the @id of the Root Data Entity is an absolute URI, an Attached RO-Crate MAY contain both data entities using relative URI references (relative to the RO-Crate Root), and Web-based Data Entities using absolute URIs but it is RECOMMENDED that data entities are referenced using absolute URIs.

    Contradiction (or just a mistype) in the second sentence? If it's an absolute URI, that indicates the crate is detached, but then it talks about what an attached crate may contain

@elichad elichad added this to the RO-Crate 1.2 milestone Dec 12, 2024
@ptsefton
Copy link
Contributor

ptsefton commented Dec 15, 2024

Issue 1 -- Should both be MAY

Issue 2 -- MUST be flattened

  1. This may be a bigger issue. Habving a URI for a crate ID does NOT mean it is detatched This statement is incorrect - this line should be removed "A Detached RO-Crate can be identified by the root data entity having an @id different from ./ in the JSON." (structure.md line 85)

@simleo
Copy link
Contributor

simleo commented Dec 16, 2024

This may be a bigger issue. Habving a URI for a crate ID does NOT mean it is detatched This statement is incorrect - this line should be removed "A Detached RO-Crate can be identified by the root data entity having an @id different from ./ in the JSON." (structure.md line 85)

Then how can a detached RO-Crate be identified? All other data entities can be absolute URIs even in attached RO-Crates.

BTW, There is another place in the spec where it says that if the root data entity is an absolute URI then the crate is detached. In https://www.researchobject.org/ro-crate/specification/1.2-DRAFT/data-entities.html:

If B’s Root Data Entity has an @id that is an absolute URI indicating a detached RO-Crate ...

@ptsefton
Copy link
Contributor

ptsefton commented Dec 16, 2024 via email

@stain
Copy link
Contributor

stain commented Dec 16, 2024

I think if the root @id is absolute, then relative @id in anything else is very confusing. As that means you were in some kind of folder but that folder is not an RO-Crate. So the split is "Attached": the root is this folder, there may be data entities with relative paths (but also absolute ones), or "Detached", the root is something else (probably from an API), all data entities are absolute.

We have already specified identifier and cite-as for an RO-Crate to declare its own global identifier https://www.researchobject.org/ro-crate/specification/1.2-DRAFT/root-data-entity.html#root-data-entity-identifier

@stain
Copy link
Contributor

stain commented Dec 16, 2024

The traditional Linked Data way would be to not distinguish absolute and relative URIs, but rather relative URIs all become absolute based on how you found then. But I don't think we want to go down the route of "Resolve ./ from the current URI you got the ro-crate-metadata.json from to see if it matches @id of the root"?

Because you will easily end up from http://example.com/api/crate?id=132 to http://example.com/api/ro-crate.metadata.json we didn't get tempted to follow that system before.

@stain stain added the beginner Suitable for new RO-Crate community members label Dec 16, 2024
@elichad
Copy link
Contributor Author

elichad commented Dec 16, 2024

Our current description of Profile Crates is linked to this discussion about attached/detached crates and may need updating as well depending on the outcome.

The Profile Crate @id declared within its own RO-Crate Metadata Document SHOULD be an absolute URI, and the corresponding reference from its RO-Crate Metadata Descriptor updated accordingly.

The example on that page has a root data entity with an absolute URI as @id, and a data entity with a relative URI, but as @stain points out that doesn't make a lot of sense.

@ptsefton
Copy link
Contributor

First of all thanks @elichad for your work on this and @simleo for making sure we consider implementation.

@Stian -- re relative paths being confusing -- they are anyway - you have explained how this does not "just work". I think that in our use case (see below) at least it makes sense for attached crates to have URIs (URLs, arcpids) as @ids as it makes it very clear that this thing has an 'official' @id -- which may match the same crate being served in detached mode.

When we moved to having the RO-Crate Metadata Descriptor from the original '@id' I proposed that this would allow for the Root Data Entity to have URIs for IDs, including URLs and non-resolvable IDs like arcp:, which we have adopted In LDaCA we have crates in three 'modes':

  1. Stored in the repository on disk -- the @id of each crate is typically an arcpid (with a DOI embedded in it). The data entites within the crate are relative URIs / file paths. There are several reasons I think this is a good rather than using an @id of './' -- if we did that we would then have to use some other way of noting the 'real' ID, this is complicated and may involve property values etc and crucially different types of ID are represented differently -- this means lots more to write up in the spec having algorith to find URLs. DOIs etc and then work out which one is the main one. Anyway, these are attached crates.

  2. As a downloaded zip 'attached crate' -- (the same as 1)

  3. Over an API (detached) -- delivered as a single crate - the API has a parameter to switch between delivering data entites with API URLs for @ids or the relative @ids that are needed to reconstitute a downloaded crate. This is not ideal, and is something we really should address for ro-crate v2 -- how to express both the path to use in Attached mode.

Regarding @simleo's question about how to tell Attached from Detached in an implementation -- I don't think it's a property of the Crate metadata at all, it is the context of use that makes a crate attached or detached.

How about we look at it like this:

  • If you have instantiated a crate (eg using ROCrate-py) by passing a directory, then you're in Attached mode, included Data entities may have URIs for IDs (in which case it is an implementation choice whether to attempt to check them), or local/relative. Local data entities with relative URIs (which the spec says are relative to the RO-Crate Metadata File) can be validated to see if they are present.

  • If you instantiate a crate by passing in an RO-Crate metadata document obtained from somewhere other than a folder it's in Detached mode. Depending on the implementation of the API or wherever you got the crate relative URIs may or may not work -- something to think about improving for RO-Crate v2.

I think generally speaking the python library started with a strong assumption that crates are Attached while the Javascript library has always treated crates as Detached in some sense (before we introduced that term), in that you always pass in a (JSON) object, and the library has no interaction with the file system. Maybe you could add a flag to ro-crate-py to specify the behaviour attached/detached? The only difference I can think of is checking that all local data entities are present on save or validate -- all other operations should be the same or am I missing something?

In either case I think it's a good idea to let Attached crates have URIs for @ids as stated above.

@ptsefton ptsefton added question Up for discussion and removed beginner Suitable for new RO-Crate community members labels Dec 18, 2024
@elichad
Copy link
Contributor Author

elichad commented Dec 19, 2024

@ptsefton thanks for sharing those examples, that helps me understand what you're getting at.

Stian is on leave until the new year, so let's discuss this at the next community call on 9 Jan.

@ptsefton
Copy link
Contributor

I have attempted to clear this up as a holiday activity. See my proposed solution:

NOTE: This is not complete - I have not gone thru the whole spec to check for references, but have touched the sections on structure, root data entity and terminology for a start and to get some feedback on this general approach.

@simleo I have tried to add some hints about what libraries would need to do (I don't think it will be too hard for RO-Crate-py)

I am proposing that we define "Local RO-Crate Package" and "Detached RO-Crate Package" as the main terms but also recognize a third category which is "Abstract RO-Crate" where no attempt is made to verify that resources are present -- eg for use in an online validatore ot preview generator that CANT see local files and would not want to be link checking detached URLs.

I also think that we should allow the use of fill URIs as IDs for "Local RO-Crate Packages" as this makes it very clear what the preferred ID is -- and the ./ does not actually work with linked data software anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Up for discussion
Projects
None yet
Development

No branches or pull requests

4 participants