-
-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Passing key into update/remove process xslt for iso 19139 to fix issue with updating/deleting resources with same url #7431
Conversation
Does it make sense to add 2 times the same URL with same name ? The key for update/remove was based on URL+name and in your case the name for French and English will be different so it should work fine ? But checking the template maybe we should remove the template for Using the position, not sure it will work if we have multiple transfer options. |
Can't we just remove the online resource by simply giving its position in the whole metadata, instead of matching by url/name? |
@jahow that's the idea of the pull request, but as indicated by Francois at least in ISO19139 it will not work with multiple transfer options blocks with the current changes. It will require a more complex xslt process to match the distribution section and loop over the different transfer options / online resources to copy all except the one identified by the global position. Also requires to update the other metadata schemas like |
Another option would be to include the protocol in the match (instead of a position), to make it even more unique. A concrete example could be WMS and WFS services accessed through the same url, and using the same layername/featuretype. The only way to differentiate them would be the protocol. |
The protocol was part of the url. If url is not enough to determine the uniqueness, I can't see any point to use protocol which will be same as we use url. |
<id> | ||
<xsl:value-of select="$url"/> | ||
<xsl:value-of select="$uid"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, thank you for the improvement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See feedback above, we need to account for more then one transfer section, so a compound key may be required ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps matching based on URL + a counter be enough? This id does not have to last very long ... only long enough for the remove to function.
…ert id back to the old method. This will ensure backward compatibility Also update xslt for add and remove thumbnail/onlines recources
There is an issue with replacing the $url with a randome uuid $uid. It will end up making the logic no longer compatible with all existing schemas. And changing all existing schemas to use the new id would be a large task. I recommend adding new variables. I have created a PR against this branch wangf1122#15 The change reverts the logic for modifying the id field so that the code can be backwards compatible with existing schemas.
With the idx it should be possible to go directly to the element however to ensure that we have the correct element, we can also compare the hash value. Note that my PR also addressed thumbnails and it also addressed the updating the thumbnail and online data as it had the same issue as the delete. It was updating all records with the same url. Please review and test the changes. |
Looks like the code in my PR wangf1122#15 is failing with the following
Which seems to be caused by the following namespace not loading the java correclty since it works correctly when executing the application locally.
I'm not sure how to modify the unit test so that it loads the class path correctly to locate the XslUtil class. Any suggestions? |
That may be a sign it needs to be an integration test later in the build? How is the XslUtil setup when the application runs? |
Looks like I have to pass a test dependency to the iso19139 schema to include gn_core. I'm surprised that was not needed before? However it does not totally work If I add the following to the iso19139 schema pom file then I can run the test.
However if fails to do a maven clean or build from the main pom file with the following error.
|
…line.resources.iso.19139.schema_ Updated resource logic to use 2 new field called idx and hash
….multiple.online.resources.iso.19139.schema
@ianwallen the test are failing due to this error:
I suspect the parameter value core-geonetwork/core/src/main/java/org/fao/geonet/util/XslUtil.java Lines 1308 to 1310 in 66a8ca8
|
@ianwallen found the problem, the iso19139 doesn't depend on We need to check about the reason for this dependency. |
The blocker at the moment is attempting to call the md5Hex function as we cannot add the dependency
I wonder if you could try changing
to
This way it would use the apache library directly and we would not need the dependency. Or if we do need a dependency then it should not be an issue to add the org.apache.commons.codec.digest.DigestUtils dependency in iso 19139 I think if this works then it could unblock this issue. |
…ne.resources.iso.19139.schema
…xing as the backend file/store will get deleted.
I had fixed the issue which we also need to add hashing indexing mechanism to the search engine index.xsl because the new UI module is not pulling relation from the original extract-relations.xsl anymore. It's now pulling from the elastic search indexing directly. I also need to remove/give up the thumbnail part because the thumbnail is deleting directly the backend store. It's not possible for us to remove single link at one time. |
@wangf1122 the changes for the schema should be done also for What would happen if a schema is not updated with these changes? I tested with |
@@ -33,7 +33,8 @@ | |||
xmlns:gco="http://www.isotc211.org/2005/gco" | |||
xmlns:gmx="http://www.isotc211.org/2005/gmx" | |||
xmlns:xlink="http://www.w3.org/1999/xlink" | |||
xmlns:util="java:org.fao.geonet.util.XslUtil" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wangf1122 this is required for https://github.com/geonetwork/core-geonetwork/pull/7431/files#diff-8ba6fa3b50cb72ae6c6e65d46bf59e1b4f7a4c2102df0c7f1e1c779ae08db967R134
I tested uploading a thumbnail and it's failing due to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing it out. I just realized this namespace is used. I have placed it back.
gmd:CI_OnlineResource/gmd:name/gco:CharacterString) | ||
]"> | ||
<!-- Updating the gmd:onLine based on update parameters --> | ||
<!-- Note: first part of the match needs to match the xsl:for-each select from extract-relations.xsl in order to get the position() to match --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would write a comment explaining how it works, something like this:
If provided
resourceHash
andresourceIdx
matches the online resource to update using that fields. When not provided (backwards compatibility) uses theupdateKey
to match the online resource to update.
That would facilitate the maintenance in the long term, as the condition is not that simple.
Please if you can also update the other xslt files with similar conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added some extra comment explain what the resourceIdx and resourceHash for both add and remove xsl
The logic should be backward compatible and should not affect any schema's that have not been updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wangf1122, I tested the changes and seem all fine.
@@ -1105,6 +1107,8 @@ | |||
<atomfeed><xsl:value-of select="gmd:linkage/gmd:URL"/></atomfeed> | |||
</xsl:if> | |||
<link type="object">{ | |||
"hash": "<xsl:value-of select="digestUtils:md5Hex(string(exslt:node-set(normalize-space(.))))"/>", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working on a new indexing module, I think we could simplify this to digestUtils:md5Hex(normalize-space(.))
which should produce the same hash or is there a particular case where we need exslt and the string function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fxprunayre you are correct that this seems cleaner.
<!-- The unique identifier is marked with resourceIdx which is the position index and resourceHash which is hash code of the current node (combination of url, resource name, and description) --> | ||
<xsl:template | ||
match="*//gmd:MD_DigitalTransferOptions/gmd:onLine | ||
[gmd:CI_OnlineResource[gmd:linkage/gmd:URL!=''] and ($resourceIdx = '' or position() = xs:integer($resourceIdx))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The position here is computed in the context of *//gmd:MD_DigitalTransferOptions/gmd:onLine
and during indexing with the context of https://github.com/geonetwork/core-geonetwork/blob/main/schemas/iso19139/src/main/plugin/iso19139/index-fields/index.xsl#L1125-L1126 so the resourceIdx
is not the same. Removal does not work when we have more than one transferOptions
block.
Maybe we should simplify the process with the hash only support?
<xsl:template
match="*//gmd:MD_DigitalTransferOptions/gmd:onLine[digestUtils:md5Hex(normalize-space(.)) = $resourceHash]"
priority="2">
</xsl:template>
This was also not applied to 115-3 https://github.com/geonetwork/core-geonetwork/blob/main/schemas/iso19115-3.2018/src/main/plugin/iso19115-3.2018/process/onlinesrc-remove.xsl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fxprunayre
We have recently encountered issues where data was supplied by a third party which has all the resources in separate transferOptions
block and it was noticed that it was not possible to edit (and possibly delete) data from these other blocks. So you are correct that there are issues.
Maybe we should simplify the process with the hash only support?
-
Do you mean ignore the position? The idea of also checking the position is the following
- block1
- block2
- block3
- block4 (Same as block 1 so same hash value)
Then if someone click
delete
onblock4
then it would deleteblock4
even with duplicate hash.
If we only use the hash then it could deleteblock1
orblock4
.
I know GN does not allow duplicate data to be entered however when getting data from third party, we don't really know what is being provided.
So position/index was the original goal and the hash was added later as a double check to ensure it did not mess up on the position/index.We also have a request from users to be able to order record positions how they like. I think will be easier done if we keep position/index as we should be able to perform a swap based on position? Not really sure as we have not had time to look into this feature yet.
-
Do you mean ignore the url check? If so then unchanged metadata 101 schema will stop working if we remove the url check. This was kept for backward compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was also not applied to 115-3
Our intent was to add it but did not get around to it. Now that there are a couple bugs need to be fix. Probably once these are fixed then we can add support to 115-3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed it depends a bit on how people organize things in ISO distribution blocks or not.
From use cases I have here, I would say hash of (URL+name+protocol) should be unique but user can always create duplicates in different block. We can keep it backward compatible like it is now but if relying on the position, we need to compute a position using the same context (currently the context in indexing is different).
Will think about it, if any better options.
transfer options block are used. Follow up of #7431
transfer options block are used. Follow up of #7431
…sfer options block are used. (#8281) * Standard / ISO19139 / Fix removal of online source when multiple transfer options block are used. Follow up of #7431 * Fix online resource update/delete so that it supports multiple gmd:MD_DigitalTransferOptions blocks. --------- Co-authored-by: Ian Allen <ianwallen@hotmail.com>
…sfer options block are used. (#8281) * Standard / ISO19139 / Fix removal of online source when multiple transfer options block are used. Follow up of #7431 * Fix online resource update/delete so that it supports multiple gmd:MD_DigitalTransferOptions blocks. --------- Co-authored-by: Ian Allen <ianwallen@hotmail.com>
…sfer options block are used. (#8281) (#8297) * Standard / ISO19139 / Fix removal of online source when multiple transfer options block are used. Follow up of #7431 * Fix online resource update/delete so that it supports multiple gmd:MD_DigitalTransferOptions blocks. --------- Co-authored-by: François Prunayre <fx.prunayre@gmail.com>
The issue is in ISO19139. If the user upload just a single resource file, but decided to link it twice or multiple times for different purpose (i.e. English vs French).
But if the user attempt to delete either one of them
They both will be wiped out
The issue is onlinesrc-remove xsl only remove the resource by name and url. But name and url are duplicated and the transformation XSLT decides to remove them all.
This pull request will pass extra indexing measure from the UI component to its XSLT. In this case, it is used for ISO19139. But the extra parameter wont cause any harm to any other schemas.