Skip to content

A framework to extract string-based transformations for data integration.

Notifications You must be signed in to change notification settings

arashdn/table-string-transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Efficiently Transforming Tables for Joinability

This repository contains resources developed within the following paper:

A. Dargahi Nobari, and D. Rafiei. “Efficiently Transforming Tables for Joinability”.

You may check the paper (PDF) or the extended version for more information.

Requirements

Python 3.7+ (without any specific library) is sufficient to run the code.

Usage

The source files are located in src directory:

  • join.py: A simple example of an end to end join with our approach. For advanced settings and a detailed output on transformations and join process, main.py and transformation_joiner.py should be used.

  • main.py: This file is the main to run our approach or our implementation of auto-join on a set of tables. It can be called with no command-line argument to use default values for all parameters.

    To set the method, parameters, and paths, a config json file may be passed as a command-line argument:

    python3 src/main.py -c config_sample.json

    A sample config file is provided in config_sample.json. The extracted transformations will be stored in the output path defined in the config file or default output directory.

    This file will generate the transformations. To use the transformations for a table join transformation_joiner.py may be used.

  • transformation_joiner.py: This file can parse the transformations generated by main.py and utilize them for table join.

  • dataset_generator.py: This file generates synthetic datasets given the parameters provided in command-line arguments. Use the help function to get a list of arguments and their details:

    python3 src/dataset_generator.py --help

  • auto_fuzzy_join.py: This file is to apply the join via the code provided by Auto-FuzzyJoin authors. This is a baseline method and not a part of our approach.

Citation

Please cite the paper, If you used the codes in this repository.

@INPROCEEDINGS{efficient.tr.2022,
  author={Dargahi Nobari, Arash and Rafiei, Davood},
  booktitle={2022 IEEE 38th International Conference on Data Engineering (ICDE)}, 
  title={Efficiently Transforming Tables for Joinability}, 
  year={2022},
  volume={},
  number={},
  pages={1649-1661},
  doi={10.1109/ICDE53745.2022.00169}
  }

About

A framework to extract string-based transformations for data integration.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages