This repository contains resources developed within the following paper:
A. Dargahi Nobari, and D. Rafiei. “Efficiently Transforming Tables for Joinability”.
You may check the paper (PDF) or the extended version for more information.
Python 3.7+ (without any specific library) is sufficient to run the code.
The source files are located in src
directory:
-
join.py
: A simple example of an end to end join with our approach. For advanced settings and a detailed output on transformations and join process,main.py
andtransformation_joiner.py
should be used. -
main.py
: This file is the main to run our approach or our implementation of auto-join on a set of tables. It can be called with no command-line argument to use default values for all parameters.To set the method, parameters, and paths, a config json file may be passed as a command-line argument:
python3 src/main.py -c config_sample.json
A sample config file is provided in
config_sample.json
. The extracted transformations will be stored in the output path defined in the config file or defaultoutput
directory.This file will generate the transformations. To use the transformations for a table join
transformation_joiner.py
may be used. -
transformation_joiner.py
: This file can parse the transformations generated bymain.py
and utilize them for table join. -
dataset_generator.py
: This file generates synthetic datasets given the parameters provided in command-line arguments. Use the help function to get a list of arguments and their details:python3 src/dataset_generator.py --help
-
auto_fuzzy_join.py
: This file is to apply the join via the code provided by Auto-FuzzyJoin authors. This is a baseline method and not a part of our approach.
Please cite the paper, If you used the codes in this repository.
@INPROCEEDINGS{efficient.tr.2022,
author={Dargahi Nobari, Arash and Rafiei, Davood},
booktitle={2022 IEEE 38th International Conference on Data Engineering (ICDE)},
title={Efficiently Transforming Tables for Joinability},
year={2022},
volume={},
number={},
pages={1649-1661},
doi={10.1109/ICDE53745.2022.00169}
}