forked from d-run/drun-docs
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request d-run#156 from yyyhhhh/corpus-new
语料库部分文档更新
- Loading branch information
Showing
25 changed files
with
58 additions
and
56 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,34 +1,39 @@ | ||
# 文件导入 | ||
|
||
## 上传数据 | ||
--- | ||
hide: | ||
- toc | ||
--- | ||
|
||
1. 点击语料库旁的 **┇** 按钮 | ||
# 文件导入 | ||
|
||
2. 点击 **语料导入** ,选择上传的数据 | ||
1. 点击语料库旁的 **┇** 按钮,选择 **文件导入** 方式。 | ||
|
||
3. 在导入数据界面,点击 **上传数据** | ||
![upload-data01](./images/upload-data01.jpg) | ||
|
||
选择自己想要上传的文件,目前支持 pdf、txt、docx、doc、csv、xlsx | ||
单个文件的大小不建议超过 50 M,文件上传数量限制为50个。 | ||
1. **导入数据** :点击 **上传文件**,并选择文件分片的处理方式:标准处理、自定义处理(即插件处理,请到插件接入处查看) | ||
|
||
4. 将文件上传完成后,可以在 **文件上传结果** 中查看 | ||
![upload-data02](./images/upload-data02.jpg) | ||
|
||
![upload-date](./images/upload-date.png) | ||
!!! note | ||
|
||
5. 上传成功后,点击 **下一步** | ||
- 目前支持 pdf、txt、docx、doc、csv、xlsx 等格式,单个文件的大小不建议超过 50M,文件上传数量限制为 50 个。 | ||
- 分片规则之标准处理。 | ||
|
||
```template | ||
1. PDF、TXT、DOC、DOCX 支持自定义分隔符; | ||
2. 设置分隔符,不设置分片大小,仅根据分隔符划分文档; | ||
3. 不设置分隔符,设置分片大小,仅根据分片大小拆分文档; | ||
4. 设置分隔符并设置分片大小,在分片大小内,最终根据分隔符匹配进行分割。 | ||
``` | ||
|
||
6. 选择文件分片的处理类型:标准处理、自定义处理(即插件处理,请到插件接入处查看) | ||
1. **分片预览** :预览分片是否正确,如果不正确可以回到上一步修改分片规则或文件内容。 | ||
|
||
7. 数据向量化过程后,查看文件分片数量、重复分片数量、本次导入分片数以及向量化状态 | ||
![upload-data03](./images/upload-data03.jpg) | ||
|
||
8. 当向量化处理成功后,点击 **下一步** | ||
1. **数据向量化** :查看文件分片数量、重复分片数量、本次导入分片数以及向量化状态,当向量化处理成功后,点击 **下一步** | ||
|
||
9. 待文件状态为文件处理完成后,点击 **关闭** 即可 | ||
![upload-data04](./images/upload-data04.jpg) | ||
|
||
### 标准处理 | ||
1. 待文件状态为文件处理完成后,点击 **确定** | ||
|
||
- PDF、TXT、DOC、DOCX 支持自定义分隔符 | ||
- CSV、xlsx 按照一行分片 | ||
- 设置分隔符,不设置分片大小,仅根据分隔符划分文档 | ||
- 不设置分隔符,设置分片大小,仅根据分片大小拆分文档 | ||
- 设置分隔符并设置分片大小,在分片大小内,最终根据分隔符匹配进行分割 | ||
![upload-data05](./images/upload-data05.jpg) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters