Skip to content

raaidrizwan/alite-implementation

 
 

Repository files navigation

Getting Started

All code should be ran through the Jupyter Notebook.

Environment Setup (done through VS Code on a Windows machine)

  1. Configure your Jupyter notebook to use Python 3.9.2
  2. Run the pip install code snippet in the test.ipynb file to install all of the necessary modules. a. Ensure the modules install succesfully, if they do not then look at the error code to resolve.

Running Testing Code

  1. Open test.ipynb
  2. Run first code snippet in to setup autoreload
  3. Run second code snippet (pip install one) to install packages if not done already
  4. Run third code snippet to import relevant custom libraries
  5. Run either fourth or fifth code snippet
    1. Fourth code snippet has a large dataset for real-life code execution (school_report)
    2. Fifth code snippet has a small dataset for quick code execution (500spend)
  6. After running, desired code snippet, output files will be generated showing state of table in each major step
    1. Final step is the 5 - PostSubsumption... files.
    2. A CSV with the final table is provided along with a .txt file with general information on the table
  7. The console output in the notebook should provide the execution time
  8. The sixth and seventh data set graphs information from individual runs

Running Benchmarking Code

  1. Open test_suite.ipynb
  2. Run the first code snippet to install packages if not done so already
  3. Run the rest of the code snippets sequentially to obtain run statistics and graphs

Implementation of Alite FD Algorithm

Original paper to be implemented - https://www.vldb.org/pvldb/vol16/p932-khatiwada.pdf

Original paper Github repository - https://github.com/northeastern-datalab/alite

Original authors' presentation - https://youtu.be/4c6SYCwQ7uc?si=qE36Hm70qaJAa8Hz

Etc.

Other Literature

Origin of Full Disjunction? - https://dl.acm.org/doi/10.1145/191839.191908

Other FD paper - https://dl.acm.org/doi/10.1145/237661.237717

IncrementalFD - https://www.sciencedirect.com/science/article/pii/S0022000006001449

ParaFD - https://www.sciencedirect.com/science/article/pii/S2214579618303137

BIComNLoj - https://dl.acm.org/doi/10.5555/1182635.1164191

Benchmark Data

Use data located here - https://drive.google.com/drive/folders/1yUgL8TjQievzp8zvmHLpa_ClNzc5mTmD

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 86.7%
  • Jupyter Notebook 13.3%