Replies: 3 comments 3 replies
-
There are several answers or possible solutions to this problem caused by the different data models involved when creating the corpus and the limitations/capabilities of these data models. Like you did with the In the graphANNIS data model, this could be achieved by adding the meta annotations to corpus/objects for each speaker and then adding
You could then search something like The obvious problem with this approach is, that no converters exists for directly converting data to the graphANNIS data model (e.g. in the GraphAML format that can be read by the import CLI. In theory, it would be possible to use the Python API to construct such corpora on your own, but this would be a lot of work. By using Pepper to convert the data to relANNIS, two data models are involved: Salt and the relANNIS model. RelANNIS has the concept of documents and texts. A document can consist of several texts, and it is possible to have pointing relations between annotation nodes of any text inside the same document. That could help in your case, but I assume you have some kind of timeline and thus means the relANNIS exporter will create a single text with empty token for the timeline items. Also, in relANNIS only documents can have metadata, not texts. This means you can't use the same structure as outlined for graphANNIS above. I thought that it might be possible to solve this by adding node annotation labels with the speaker id and than do an "identical value" ( There might be a solution that makes use of the graphANNIS data model, but does not need the implementation of a full conversion script in Python. Instead, you might convert and import the corpus with Pepper and the graphANNIS CLI. Then, you could use the Python API to add some additional annotations and nodes as a post-processing step. For every speaker you know (probably defined by the namespace of the document node annotations), you create a new graphANNIS node with the node type "speaker". It is important that this node type is neither "node" (an annotation node) nor "corpus" (a document). If this would be "corpus", it would be treated by the ANNIS frontend as document, and it won't be able to handle that there are several documents for an annotation node. Add all relevant metadata like age group to the speaker node. Then find all annotation nodes/Gloss/token that belong to that speaker and add a
This would allow you to search for
and
If you want to distribute the post-processed corpus, you can export the modified corpus as GraphML using the graphANNIS command line. Or you just use a script that imports the corpus to a temporary data folder, post-processes it and then exports the result as GraphML. I know that it might be frustrating to not have a simple way of solving this with relANNIS and the Pepper ecosystem. We have some ideas of a graphANNIS based Pepper to avoid the limitations of the Salt and relANNIS data model, but this is more experimental and not an actual project with estimated deadlines. |
Beta Was this translation helpful? Give feedback.
-
I've been looking at doing this using the python interface as you suggested, and I can read in the corpus, perform queries, add nodes and edges and export the corpus without any problems. But in order to move some metadata from one node to another I need to be able to access the values of the metadata and I can't figure out how to do that. Here is the original metadata for one transcript
I can create a new node for Person, PersonB, and PersonMod and I can find and link all the other appropriate token/Gloss etc But then I need to move the appropriate metadata so I would end up with
but I can't figure out how to access and move those individual bits of data. I've tried doing it by hand for one transcript and it seems to give the correct query results - I'm sure I can figure out a way to copy the bits of metadata from elsewhere, but for the whole corpus it would be great to manage it all within the python interface. |
Beta Was this translation helpful? Give feedback.
-
To show as metadata in the result view, the ANNIS 4 interface, the type must currently be
As long there is a single path from each document to its root corpus I think using the type Also, even when the display does not work for nodes of type |
Beta Was this translation helpful? Give feedback.
-
In the corpus I am working with we have 2 participants in a signed dialogue, and they each have multiple annotation tiers. We use namespaces in the annotations e.g.
PersonA::Gloss
PersonA::Mundbild
PersonB::Gloss
PersonB::Mundbild
and also in some of the metadata e.g.
PersonA::AltersGruppe
I created the relAnnis corpus using pepper and put in links between tokens from different tiers belonging to the same participant, so the query
Gloss=/ZUG.*/ ->ident Mundbild=/.*zug.*/
will find Gloss - Mundbild pairs from both PersonA and PersonB but only where the Gloss and the Mundbild belong to the same participant.
Is there any way to do this for the metadata?
At the moment if I want to find tokens in the Gloss tier produced by participants in a certain age group I have to use the query
(PersonA:Gloss=/KANN.*/ @* PersonA:AltersGruppe="31-45") | (PersonB:Gloss=/KANN.*/ @* PersonB:AltersGruppe="31-45")
because
Gloss=/KANN.*/ @* AltersGruppe="31-45"
returns results where either one of the participants is in the age group, not just the one who signed the Gloss.
If I have missed that there is already a way to restrict queries to tokens or metadata which are all in the same namespace, please let me know!
Beta Was this translation helpful? Give feedback.
All reactions