Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing key into update/remove process xslt for iso 19139 to fix issue with updating/deleting resources with same url #7431

Conversation

wangf1122
Copy link
Collaborator

The issue is in ISO19139. If the user upload just a single resource file, but decided to link it twice or multiple times for different purpose (i.e. English vs French).

image

But if the user attempt to delete either one of them
image

They both will be wiped out

image

The issue is onlinesrc-remove xsl only remove the resource by name and url. But name and url are duplicated and the transformation XSLT decides to remove them all.

This pull request will pass extra indexing measure from the UI component to its XSLT. In this case, it is used for ISO19139. But the extra parameter wont cause any harm to any other schemas.

@wangf1122 wangf1122 marked this pull request as ready for review October 12, 2023 18:11
@ianwallen
Copy link
Contributor

ianwallen commented Oct 12, 2023

Related to #7246 and #5173

Note that thumbnail-remove.xsl has the same issue.

@fxprunayre
Copy link
Member

Does it make sense to add 2 times the same URL with same name ? The key for update/remove was based on URL+name and in your case the name for French and English will be different so it should work fine ?

But checking the template maybe we should remove the template for WWW:DOWNLOAD-1.0-http--download matching geonet:*|gmd:onLine[normalize-space(gmd:CI_OnlineResource/gmd:linkage/gmd:URL) = $url and normalize-space(gmd:CI_OnlineResource/gmd:protocol/*) = 'WWW:DOWNLOAD-1.0-http--download'] ?

Using the position, not sure it will work if we have multiple transfer options.

@jahow
Copy link
Contributor

jahow commented Oct 13, 2023

Can't we just remove the online resource by simply giving its position in the whole metadata, instead of matching by url/name?

@josegar74
Copy link
Member

josegar74 commented Oct 13, 2023

@jahow that's the idea of the pull request, but as indicated by Francois at least in ISO19139 it will not work with multiple transfer options blocks with the current changes.

It will require a more complex xslt process to match the distribution section and loop over the different transfer options / online resources to copy all except the one identified by the global position.

Also requires to update the other metadata schemas like iso19115-3.2008.

@jahow
Copy link
Contributor

jahow commented Oct 13, 2023

Another option would be to include the protocol in the match (instead of a position), to make it even more unique. A concrete example could be WMS and WFS services accessed through the same url, and using the same layername/featuretype. The only way to differentiate them would be the protocol.

@wangf1122
Copy link
Collaborator Author

@jahow

The protocol was part of the url. If url is not enough to determine the uniqueness, I can't see any point to use protocol which will be same as we use url.

<id>
<xsl:value-of select="$url"/>
<xsl:value-of select="$uid"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, thank you for the improvement

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See feedback above, we need to account for more then one transfer section, so a compound key may be required ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps matching based on URL + a counter be enough? This id does not have to last very long ... only long enough for the remove to function.

…ert id back to the old method. This will ensure backward compatibility

Also update xslt for add and remove thumbnail/onlines recources
@ianwallen
Copy link
Contributor

There is an issue with replacing the $url with a randome uuid $uid. It will end up making the logic no longer compatible with all existing schemas. And changing all existing schemas to use the new id would be a large task. I recommend adding new variables.
I also personally don't like using a random uuid for an id as it does not have any linkage to the original metadata.

I have created a PR against this branch wangf1122#15
which is based on this branch https://github.com/ianwallen/core-geonetwork/tree/main.prevent.delete.multiple.online.resources.iso.19139.schema_

The change reverts the logic for modifying the id field so that the code can be backwards compatible with existing schemas.
I added 2 new field

  • HASH - which is better for comparing the field instead of using url or name as it can compare the full xml section.
  • IDX - index identifying the element position in the xml

With the idx it should be possible to go directly to the element however to ensure that we have the correct element, we can also compare the hash value.

Note that my PR also addressed thumbnails and it also addressed the updating the thumbnail and online data as it had the same issue as the delete. It was updating all records with the same url.

Please review and test the changes.

@ianwallen
Copy link
Contributor

Looks like the code in my PR wangf1122#15 is failing with the following

Error at xsl:when on line 161 column 97 of onlinesrc-add.xsl:
  XPST0017: XPath syntax error at char 452 on line 161 in {...:md5Hex(exslt:node-set(.)) ...}:
    Cannot find a matching 1-argument function named {java:org.fao.geonet.util.XslUtil}md5Hex()

Which seems to be caused by the following namespace not loading the java correclty since it works correctly when executing the application locally.

xmlns:util="java:org.fao.geonet.util.XslUtil"

I'm not sure how to modify the unit test so that it loads the class path correctly to locate the XslUtil class. Any suggestions?

@jodygarnett
Copy link
Contributor

That may be a sign it needs to be an integration test later in the build? How is the XslUtil setup when the application runs?

@ianwallen
Copy link
Contributor

ianwallen commented Oct 30, 2023

Looks like I have to pass a test dependency to the iso19139 schema to include gn_core. I'm surprised that was not needed before?

However it does not totally work

If I add the following to the iso19139 schema pom file then I can run the test.

    <dependency>
      <groupId>org.geonetwork-opensource</groupId>
      <artifactId>gn-core</artifactId>
      <version>${project.version}</version>
      <scope>test</scope>
    </dependency>
    <dependency>

However if fails to do a maven clean or build from the main pom file with the following error.

[INFO] Scanning for projects...
[ERROR] [ERROR] The projects in the reactor contain a cyclic reference: Edge between 'Vertex{label='org.geonetwork-opensource.schemas:gn-schema-iso19139:4.4.0-SNAPSHOT'}' and 'Vertex{label='org.geonetwork-opensource:gn-core:4.4.0-SNAPSHOT'}' introduces to cycle in the graph org.geonetwork-opensource:gn-core:4.4.0-SNAPSHOT --> org.geonetwork-opensource.schemas:gn-schema-iso19139:4.4.0-SNAPSHOT --> org.geonetwork-opensource:gn-core:4.4.0-SNAPSHOT @ 
[ERROR] The projects in the reactor contain a cyclic reference: Edge between 'Vertex{label='org.geonetwork-opensource.schemas:gn-schema-iso19139:4.4.0-SNAPSHOT'}' and 'Vertex{label='org.geonetwork-opensource:gn-core:4.4.0-SNAPSHOT'}' introduces to cycle in the graph org.geonetwork-opensource:gn-core:4.4.0-SNAPSHOT --> org.geonetwork-opensource.schemas:gn-schema-iso19139:4.4.0-SNAPSHOT --> org.geonetwork-opensource:gn-core:4.4.0-SNAPSHOT -> [Help 1]

ianwallen and others added 3 commits October 30, 2023 09:05
…line.resources.iso.19139.schema_

Updated resource logic to use 2 new field called idx and hash
@ianwallen ianwallen added the bug label Nov 7, 2023
@josegar74
Copy link
Member

josegar74 commented Nov 8, 2023

@ianwallen the test are failing due to this error:

Error at xsl:when on line 164 column 99 of onlinesrc-add.xsl:
  XPST0017: XPath syntax error at char 462 on line 164 in {...:md5Hex(exslt:node-set(.)) ...}:
    Cannot find a matching 1-argument function named {java:org.fao.geonet.util.XslUtil}md5Hex()

I suspect the parameter value exslt:node-set(.) doesn't match the type of the method:

public static String md5Hex(String str) {
return org.apache.commons.codec.digest.DigestUtils.md5Hex(str);
}

@josegar74
Copy link
Member

@ianwallen found the problem, the iso19139 doesn't depend on gn-core so doesn't have access to XslUtil, but adding the dependency adds a cyclic dependency, as gn-core depends on gn-schema-iso19139.

We need to check about the reason for this dependency.

@fxprunayre fxprunayre modified the milestones: 4.4.1, 4.4.2 Nov 22, 2023
@ianwallen
Copy link
Contributor

@wangf1122

The blocker at the moment is attempting to call the md5Hex function as we cannot add the dependency

public static String md5Hex(String str) {
    return org.apache.commons.codec.digest.DigestUtils.md5Hex(str);
}

I wonder if you could try changing

xmlns:util="java:org.fao.geonet.util.XslUtil"

to

xmlns:util="java:org.apache.commons.codec.digest.DigestUtils"

This way it would use the apache library directly and we would not need the dependency. Or if we do need a dependency then it should not be an issue to add the org.apache.commons.codec.digest.DigestUtils dependency in iso 19139

I think if this works then it could unblock this issue.

@ianwallen ianwallen changed the title Passing the index into remove process xslt for iso 19139 Passing key into update/remove process xslt for iso 19139 to fix issue with updating/deleting resources with same url Feb 12, 2024
@wangf1122
Copy link
Collaborator Author

@josegar74 @ianwallen

I had fixed the issue which we also need to add hashing indexing mechanism to the search engine index.xsl because the new UI module is not pulling relation from the original extract-relations.xsl anymore. It's now pulling from the elastic search indexing directly.

I also need to remove/give up the thumbnail part because the thumbnail is deleting directly the backend store. It's not possible for us to remove single link at one time.

@josegar74
Copy link
Member

josegar74 commented Mar 5, 2024

@wangf1122 the changes for the schema should be done also for iso19115-3.2018.

What would happen if a schema is not updated with these changes?


I tested with iso19115-3.2018 and seem working also, not using the position / hash, but should be fine. Also saw that the xslt process in iso19139 is backwards compatible, so it should be fine. No need to update iso19115-3.2018.

@@ -33,7 +33,8 @@
xmlns:gco="http://www.isotc211.org/2005/gco"
xmlns:gmx="http://www.isotc211.org/2005/gmx"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:util="java:org.fao.geonet.util.XslUtil"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out. I just realized this namespace is used. I have placed it back.

@josegar74 josegar74 added the index structure change Indicate that this work introduces an index change. label Mar 5, 2024
gmd:CI_OnlineResource/gmd:name/gco:CharacterString)
]">
<!-- Updating the gmd:onLine based on update parameters -->
<!-- Note: first part of the match needs to match the xsl:for-each select from extract-relations.xsl in order to get the position() to match -->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would write a comment explaining how it works, something like this:

If provided resourceHash and resourceIdx matches the online resource to update using that fields. When not provided (backwards compatibility) uses the updateKey to match the online resource to update.

That would facilitate the maintenance in the long term, as the condition is not that simple.

Please if you can also update the other xslt files with similar conditions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some extra comment explain what the resourceIdx and resourceHash for both add and remove xsl

@ianwallen
Copy link
Contributor

@josegar74

@wangf1122 the changes for the schema should be done also for iso19115-3.2018.

What would happen if a schema is not updated with these changes?

I tested with iso19115-3.2018 and seem working also, not using the position / hash, but should be fine. Also saw that the xslt process in iso19139 is backwards compatible, so it should be fine. No need to update iso19115-3.2018.

The logic should be backward compatible and should not affect any schema's that have not been updated.
I suggest that we apply the changes to iso19115-3.2018 as a separate PR. As We also need to apply a similar change to HNAP as well.

Copy link
Member

@josegar74 josegar74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wangf1122, I tested the changes and seem all fine.

@fxprunayre fxprunayre modified the milestones: 4.4.3, 4.4.4 Mar 13, 2024
@josegar74 josegar74 merged commit 7d7db43 into geonetwork:main Mar 21, 2024
6 checks passed
@@ -1105,6 +1107,8 @@
<atomfeed><xsl:value-of select="gmd:linkage/gmd:URL"/></atomfeed>
</xsl:if>
<link type="object">{
"hash": "<xsl:value-of select="digestUtils:md5Hex(string(exslt:node-set(normalize-space(.))))"/>",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on a new indexing module, I think we could simplify this to digestUtils:md5Hex(normalize-space(.)) which should produce the same hash or is there a particular case where we need exslt and the string function?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fxprunayre you are correct that this seems cleaner.

<!-- The unique identifier is marked with resourceIdx which is the position index and resourceHash which is hash code of the current node (combination of url, resource name, and description) -->
<xsl:template
match="*//gmd:MD_DigitalTransferOptions/gmd:onLine
[gmd:CI_OnlineResource[gmd:linkage/gmd:URL!=''] and ($resourceIdx = '' or position() = xs:integer($resourceIdx))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The position here is computed in the context of *//gmd:MD_DigitalTransferOptions/gmd:onLine and during indexing with the context of https://github.com/geonetwork/core-geonetwork/blob/main/schemas/iso19139/src/main/plugin/iso19139/index-fields/index.xsl#L1125-L1126 so the resourceIdx is not the same. Removal does not work when we have more than one transferOptions block.

Maybe we should simplify the process with the hash only support?

  <xsl:template
    match="*//gmd:MD_DigitalTransferOptions/gmd:onLine[digestUtils:md5Hex(normalize-space(.)) = $resourceHash]"
    priority="2">
  </xsl:template>

This was also not applied to 115-3 https://github.com/geonetwork/core-geonetwork/blob/main/schemas/iso19115-3.2018/src/main/plugin/iso19115-3.2018/process/onlinesrc-remove.xsl

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fxprunayre
We have recently encountered issues where data was supplied by a third party which has all the resources in separate transferOptions block and it was noticed that it was not possible to edit (and possibly delete) data from these other blocks. So you are correct that there are issues.

Maybe we should simplify the process with the hash only support?

  • Do you mean ignore the position? The idea of also checking the position is the following

    • block1
    • block2
    • block3
    • block4 (Same as block 1 so same hash value)

    Then if someone click delete on block4 then it would delete block4 even with duplicate hash.
    If we only use the hash then it could delete block1 or block4.
    I know GN does not allow duplicate data to be entered however when getting data from third party, we don't really know what is being provided.
    So position/index was the original goal and the hash was added later as a double check to ensure it did not mess up on the position/index.

    We also have a request from users to be able to order record positions how they like. I think will be easier done if we keep position/index as we should be able to perform a swap based on position? Not really sure as we have not had time to look into this feature yet.

  • Do you mean ignore the url check? If so then unchanged metadata 101 schema will stop working if we remove the url check. This was kept for backward compatibility.

Copy link
Contributor

@ianwallen ianwallen Jul 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was also not applied to 115-3

Our intent was to add it but did not get around to it. Now that there are a couple bugs need to be fix. Probably once these are fixed then we can add support to 115-3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed it depends a bit on how people organize things in ISO distribution blocks or not.

From use cases I have here, I would say hash of (URL+name+protocol) should be unique but user can always create duplicates in different block. We can keep it backward compatible like it is now but if relying on the position, we need to compute a position using the same context (currently the context in indexing is different).

Will think about it, if any better options.

fxprunayre added a commit that referenced this pull request Jul 29, 2024
transfer options block are used.

Follow up of #7431
fxprunayre added a commit that referenced this pull request Jul 29, 2024
transfer options block are used.

Follow up of #7431
fxprunayre added a commit that referenced this pull request Aug 9, 2024
…sfer options block are used. (#8281)

* Standard / ISO19139 / Fix removal of online source when multiple
transfer options block are used.

Follow up of #7431

* Fix online resource update/delete so that it supports multiple gmd:MD_DigitalTransferOptions blocks.

---------

Co-authored-by: Ian Allen <ianwallen@hotmail.com>
ianwallen added a commit that referenced this pull request Aug 9, 2024
…sfer options block are used. (#8281)

* Standard / ISO19139 / Fix removal of online source when multiple
transfer options block are used.

Follow up of #7431

* Fix online resource update/delete so that it supports multiple gmd:MD_DigitalTransferOptions blocks.

---------

Co-authored-by: Ian Allen <ianwallen@hotmail.com>
ianwallen added a commit that referenced this pull request Aug 9, 2024
…sfer options block are used. (#8281) (#8297)

* Standard / ISO19139 / Fix removal of online source when multiple
transfer options block are used.

Follow up of #7431

* Fix online resource update/delete so that it supports multiple gmd:MD_DigitalTransferOptions blocks.

---------

Co-authored-by: François Prunayre <fx.prunayre@gmail.com>
cmangeat added a commit to geoadmin/geocat that referenced this pull request Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug index structure change Indicate that this work introduces an index change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants