Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding language analyzers #8591

Merged

Conversation

AntonEliatra
Copy link
Contributor

Description

adding arabic language analyzer

Issues Resolved

Part of #1483 addressed in this PR.

Version

all

Checklist

  • By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>
Copy link

Thank you for submitting your PR. The PR states are In progress (or Draft) -> Tech review -> Doc review -> Editorial review -> Merged.

Before you submit your PR for doc review, make sure the content is technically accurate. If you need help finding a tech reviewer, tag a maintainer.

When you're ready for doc review, tag the assignee of this PR. The doc reviewer may push edits to the PR directly or leave comments and editorial suggestions for you to address (let us know in a comment if you have a preference). The doc reviewer will arrange for an editorial review.

kolchfa-aws and others added 8 commits October 21, 2024 11:26
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>
Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>
Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>
Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>
…n analyzer docs

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>
…lithuanian,norwegian and persion laguage analyzer docs

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>
…rkish language analyzer docs

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>
@AntonEliatra AntonEliatra marked this pull request as ready for review November 4, 2024 12:26
@AntonEliatra AntonEliatra changed the title adding arabic language analyzer adding language analyzers Nov 4, 2024
@vagimeli vagimeli added 3 - Tech review PR: Tech review in progress 4 - Doc review PR: Doc review in progress Content gap labels Nov 4, 2024
Copy link
Collaborator

@kolchfa-aws kolchfa-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @AntonEliatra! Please apply my changes, and we'll get this to editorial review.

_analyzers/language-analyzers/index.md Show resolved Hide resolved
nav_order: 100
parent: Analyzers
has_children: true
has_toc: false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not a bad idea to have the TOC here. It will list all analyzers on the bottom under Related articles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


#### Example request

The following query specifies the `french` language analyzer for the index `my-index`:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This request creates a text field with a french subfield configured with the french analyzer. I think we can keep this request but correct the description for it. Also, it would be nice to provide just a plain example with setting an analyzer on the whole index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added additional example

_analyzers/language-analyzers/index.md Outdated Show resolved Hide resolved
_analyzers/language-analyzers/index.md Outdated Show resolved Hide resolved

## Stem exclusion

You can also use `stem_exclusion` with this language analyzer using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can also use `stem_exclusion` with this language analyzer using the following command:
You can use `stem_exclusion` with this language analyzer using the following command:


## Stem exclusion

You can also use `stem_exclusion` with this language analyzer using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can also use `stem_exclusion` with this language analyzer using the following command:
You can use `stem_exclusion` with this language analyzer using the following command:


## Stem exclusion

You can also use `stem_exclusion` with this language analyzer using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can also use `stem_exclusion` with this language analyzer using the following command:
You can use `stem_exclusion` with this language analyzer using the following command:


## Stem exclusion

You can also use `stem_exclusion` with this language analyzer using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can also use `stem_exclusion` with this language analyzer using the following command:
You can use `stem_exclusion` with this language analyzer using the following command:


## Arabic analyzer internals

The `arabic` analyzer is build using the following:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global: please apply these changes to all files.

@kolchfa-aws kolchfa-aws added the backport 2.18 PR: Backport label for 2.18 label Nov 6, 2024
Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: AntonEliatra <anton.rubin@eliatra.com>
Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>
Signed-off-by: AntonEliatra <anton.rubin@eliatra.com>
Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>
Copy link
Collaborator

@kolchfa-aws kolchfa-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws @AntonEliatra Please see my comments and changes and let me know if you have any questions. Thanks!


## Custom Armenian analyzer

You can create custom Armenian analyzer using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can create custom Armenian analyzer using the following command:
You can create a custom Armenian analyzer using the following command:


## Custom Basque analyzer

You can create custom Basque analyzer using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can create custom Basque analyzer using the following command:
You can create a custom Basque analyzer using the following command:


## Custom Bengali analyzer

You can create custom Bengali analyzer using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can create custom Bengali analyzer using the following command:
You can create a custom Bengali analyzer using the following command:


## Custom Brazilian analyzer

You can create custom Brazilian analyzer using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can create custom Brazilian analyzer using the following command:
You can create a custom Brazilian analyzer using the following command:


## Custom Bulgarian analyzer

You can create custom Bulgarian analyzer using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can create custom Bulgarian analyzer using the following command:
You can create a custom Bulgarian analyzer using the following command:


- Tokenizer: `thai`

- Token Filters:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Token Filters:
- Token filters:


## Custom Thai analyzer

You can create custom Thai analyzer using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can create custom Thai analyzer using the following command:
You can create a custom Thai analyzer using the following command:


- Tokenizer: `standard`

- Token Filters:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Token Filters:
- Token filters:


## Custom Turkish analyzer

You can create custom Turkish analyzer using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can create custom Turkish analyzer using the following command:
You can create a custom Turkish analyzer using the following command:

_analyzers/language-analyzers/index.md Outdated Show resolved Hide resolved
kolchfa-aws and others added 5 commits November 11, 2024 15:20
Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: AntonEliatra <anton.rubin@eliatra.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
@kolchfa-aws kolchfa-aws merged commit c29761c into opensearch-project:main Nov 14, 2024
5 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Nov 14, 2024
* adding arabic language analyzer

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>

* Add grandparent to arabic analyzer

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* adding more details

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>

* adding armenian language analyzer

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>

* adding basque bengali and brazilian language analyzers

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>

* adding bulgarian catalan and cjk language analyzers

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>

* adding czech,danish,dutch,english,estonian,finnish,french and galician analyzer docs

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>

* adding german,greek,hindi,hungarian,indonesian,irish,italian,latvian,lithuanian,norwegian and persion laguage analyzer docs

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>

* adding portuguese,romanian,russian,sorani,spanish,swedish,thai and turkish language analyzer docs

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>

* Apply suggestions from code review

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: AntonEliatra <anton.rubin@eliatra.com>

* updating as per pr review

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>

* fixing broken link

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: AntonEliatra <anton.rubin@eliatra.com>

* Update _analyzers/language-analyzers/index.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Add redirect to index page

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

---------

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: AntonEliatra <anton.rubin@eliatra.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
(cherry picked from commit c29761c)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
github-actions bot pushed a commit that referenced this pull request Nov 14, 2024
epugh pushed a commit to o19s/documentation-website that referenced this pull request Nov 23, 2024
* adding arabic language analyzer

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>

* Add grandparent to arabic analyzer

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

* adding more details

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>

* adding armenian language analyzer

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>

* adding basque bengali and brazilian language analyzers

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>

* adding bulgarian catalan and cjk language analyzers

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>

* adding czech,danish,dutch,english,estonian,finnish,french and galician analyzer docs

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>

* adding german,greek,hindi,hungarian,indonesian,irish,italian,latvian,lithuanian,norwegian and persion laguage analyzer docs

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>

* adding portuguese,romanian,russian,sorani,spanish,swedish,thai and turkish language analyzer docs

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>

* Apply suggestions from code review

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: AntonEliatra <anton.rubin@eliatra.com>

* updating as per pr review

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>

* fixing broken link

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: AntonEliatra <anton.rubin@eliatra.com>

* Update _analyzers/language-analyzers/index.md

Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>

* Add redirect to index page

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>

---------

Signed-off-by: Anton Rubin <anton.rubin@eliatra.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: AntonEliatra <anton.rubin@eliatra.com>
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Co-authored-by: Nathan Bower <nbower@amazon.com>
Signed-off-by: Eric Pugh <epugh@opensourceconnections.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Tech review PR: Tech review in progress 4 - Doc review PR: Doc review in progress backport 2.18 PR: Backport label for 2.18 Content gap
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants