Skip to content

Commit

Permalink
chore: automate converting dict xlsx to tsv
Browse files Browse the repository at this point in the history
  • Loading branch information
DGCK81LNN committed Sep 24, 2024
1 parent 75112e3 commit c9fe6a4
Show file tree
Hide file tree
Showing 4 changed files with 37 additions and 4 deletions.
4 changes: 2 additions & 2 deletions package/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -474,9 +474,9 @@ interface Transcriber {

新字表发布时:

1. 将字表以 TSV 格式(无标题行,第一列为汉字,第二列为希顶聊天字母)保存到 <code>data/<var>6位日期</var>.tsv</code>。
1. 将字表保存到 <code>data/希顶字表<var>xxxxx</var>.xlsx</code>,执行 <code>py tools/totsv.py data/希顶字表<var>xxxxx</var>.xlsx</code> 将其转换为 TSV

2. 执行 <code>ruby tools/sort.rb data/<var>xxxxxx</var>.tsv</code>,统一字表数据的排序(希顶拼写按希顶字母表顺序简单排序)。
2. 执行 <code>ruby tools/sort.rb data/<var>xxxxxx</var>.tsv</code>,统一字表数据的排序(希顶拼写按希顶字母表顺序简单排序,相同的按汉字 Unicode 码位排序)。

3.`data` 目录中,执行 `node ../tools/append.mjs dict.tsv`:此过程可能会移除部分条目的汉希提示,程序会输出相应的提示,在下一步中可能需要将其补回到适当位置。

Expand Down
Binary file added tools/datadiff.xltx
Binary file not shown.
7 changes: 5 additions & 2 deletions tools/sort.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
@table="!bpmwjqxynzDsrHNldtgkh45vF7BcfuaoeEAYL62T83V1i"

def stn(x)
x.each_char.map { |l| "%02d" % @table.index(l) }.join("")
x.each_char.map { |l| @table.index(l) }
end

fn = ARGV[0]
Expand All @@ -13,5 +13,8 @@ def stn(x)
end

lines = File.readlines(fn)
lines.sort_by! { |line| stn(line.chomp.split("\t")[1]) }
lines.sort_by! do |line|
h, x = line.chomp.split("\t")
[stn(x), h]
end
File.write(fn, lines.join)
30 changes: 30 additions & 0 deletions tools/totsv.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/usr/bin/env python
from datetime import datetime
import openpyxl
from os import path
import re
import sys

if len(sys.argv) < 2:
print("Usage: python totsv.py <希顶字表.xlsx>")
sys.exit(2)

xls_path = sys.argv[1]
wb = openpyxl.load_workbook(xls_path)
sheet = wb.worksheets[0]

m = re.search(r"(?<!\d)2\d[01]\d[0-3]\d(?!\d)", xls_path)
date = m.group(0) if m else datetime.now().strftime("%y%m%d")
out_path = f"{path.dirname(xls_path) or path.curdir}{path.sep}{date}.tsv"

headers = [cell.value for cell in sheet[1]]
h_col = headers.index("汉字")
x_col = headers.index("希顶")

with open(out_path, 'wb') as f:
for row in sheet.iter_rows(min_row=2, values_only=True):
h, x = row[h_col], row[x_col]
if h and x:
f.write(f"{h}\t{x}\n".encode("utf-8"))

print(f"{out_path} written")

0 comments on commit c9fe6a4

Please sign in to comment.