Project Description: This project is a Python FastAPI application that serves as an API. It provides three endpoints to interact with the application. Basically, this project is a link in the chain of a complete ML project and it plays the role of automatic feature extraction for a dataset.
- Clone the repo:
git clone https://github.com/mordilos/feature_engineering.git - Navigate to the repo dir:
cd feature_engineering - Use the compose.yml to build and run the app:
docker-compose -f compose.yml up
- Clone the repository:
git clone https://github.com/mordilos/feature_engineering.git - Navigate to the project directory:
cd feature_engineering
Now you can choose your next step.- Locally install everything
- Create new venv:
python3 -m venv venv_name - Activate the venv:
source venv_name/bin/activate - Install the dependencies:
pip install -r requirements.txt - Start the FastAPI server:
python src/main.py - Access the endpoints using a web browser or an API client.
- You can also run the tests that are in the test folder in
test_main.py.
- Create new venv:
- Use the Dockerfile to build the image
docker build -t feature_engineering_image .
and then create and run the containerdocker run -d --name fe_container -p 8000:8000 feature_engineering_image
- Locally install everything
- Description: Returns the built-in swagger documentation.
- URL:
http://localhost:8000/docs - Method:
GET
- Description: Returns all the available endpoints of the app.
- URL:
http://localhost:8000/ - Method:
GET - Response:
{ "endpoints": [ "/", "/status", "/features_file", "/features_json" ] }
- Description: Returns the status of the application.
- URL:
http://localhost:8000/status - Method:
GET - Response:
{ "status": "UP" }
- Description: Automatic feature extraction for data given in json file.
- URL:
http://localhost:8000/features_file - Method:
POST - Request Body:
{ "file": "<path-to-json-file>", "feature_selection": ["keyword1,keyword2"] }file(required): Path to the JSON file containing user data.feature_selection(optional): List of strings specifying methods to filter the data, [highly_null_features, single_value_features, highly_correlated_features]
(the user can choose between 0 and 3 values) based on https://featuretools.alteryx.com/en/stable/guides/feature_selection.html
- Response:
{ "feature_matrix": "<extracted-feature-matrix>" }feature_matrix: JSON representation of the extracted feature matrix.
- Description: Automatic feature extraction for data given in json form.
- URL:
http://localhost:8000/features_json - Method:
POST - Request Body:
{ "data": [ { "customer_ID": "string", "loans": [ { "customer_ID": "string", "loan_date": "string", "amount": "string", "fee": "string", "loan_status": "string", "term": "string", "annual_income": "string" } ] } ], "feature_selection": [ "string" ]
}
- `data` (required): data in json format
- `feature_selection` (optional): List of strings specifying methods to filter the data, [highly_null_features, single_value_features, highly_correlated_features]<br /> (the user can choose between 0 and 3 values) based on https://featuretools.alteryx.com/en/stable/guides/feature_selection.html
- Response:
```json
{
"feature_matrix": "<extracted-feature-matrix>"
}
feature_matrix: JSON representation of the extracted feature matrix.
- Start the FastAPI server.
- Open a web browser or an API client.
- Go to
http://localhost:8000/docsfor the built-in swagger documentation. From there you can test all the other endpoints. - Send a GET request to
http://localhost:8000/to get all available endpoints. - Send a GET request to
http://localhost/status:8000to check the application status. - Send a POST request to
http://localhost/features:8000with a JSON body containing the path to the data file and optional keywords to extract features.
CLI example that uses all the feature selection algorithms:curl -X 'POST' \'http://localhost:8000/features' \-H 'accept: application/json' \-H 'Content-Type: multipart/form-data' \-F 'file=@cvas_data.json;type=application/json' \-F 'feature_selection=highly_null_features,single_value_features,highly_correlated_features' - Receive the extracted feature matrix in the response.