Cross validation of machine-learning models on Faculty platform. At present, the package mostly offers a way to cross validate models in parallel by means of Faculty jobs. To access the functionality one makes use of the class:
faculty_xval.validation.JobsCrossValidatorAdditional information is found in the example notebooks provided. Please have a
look at the section Try out the examples below.
The package supports keras and sklearn models. Whilst one can write custom
models that are compatible with faculty-xval, no guarantee is given that the
package handles these situations correctly, in particular because of issues
concerning the randomisation of weights.
Two sets of installation instructions are provided below:
- If you would like to simply use
faculty-xval, please follow theUser installation instructions. - If you would like to develop
faculty-xvalfurther, please follow theDeveloper installation instructions.
In your project on Faculty platform, create an environment named faculty_xval.
In the PYTHON section, select Python 3 and pip from the dropdown menus.
Then, type faculty-xval in the text box, and click on the ADD button.
The environment installs the package faculty-xval, and should be applied on
every server that you create; this includes both interactive servers and job
servers, as explained next.
Create a new job definition named cross_validation. In the COMMAND section,
paste the following:
faculty_xval_jobs_xval $in_paths
Then, add a PARAMETER with the name in_paths, and ensure that the
Make field mandatory box is checked.
Finally, under SERVER SETTINGS, add faculty_xval to the ENVIRONMENTS
section.
For cross-validation jobs that are computationally intensive, we recommend using
dedicated servers as opposed to running on shared infrastructure. To achieve
this, click on Large and GPU servers under SERVER RESOURCES, and select an
appropriate server type from the dropdown menu.
Remember to click SAVE when you are finished.
Before beginning the installation process, pick an appropriate username, such as
foo. This does not necessarily need to match your Faculty platform username.
In the following instructions, your selected username will be referred to as
<USER_NAME>.
Create the folder /project/<USER_NAME>. Then, run the commands:
cd /project/<USER_NAME>
git clone https://github.com/facultyai/faculty-xval.gitNext, create an environment in your project named faculty_xval_<USER_NAME>.
In this environment, under SCRIPTS, paste in the following code to the BASH
section, remembering to change the USER_NAME definition on the second line to
your selected <USER_NAME>:
# Remember to change username!
USER_NAME=<USER_NAME>
# Install faculty-xval from local repository.
pip install /project/$USER_NAME/faculty-xval/
# Turn USER_NAME into an environment variable.
echo "export USER_NAME=$USER_NAME" > /etc/faculty_environment.d/app.sh
if [[ -d /etc/service/jupyter ]] ; then
sudo sv restart jupyter
fiThis environment should be applied on every server that you create; this includes both 'normal' interactive servers and job servers, as explained next.
Next, create a new job definition named cross_validation_<USER_NAME>. In the
COMMAND section, paste the following:
faculty_xval_jobs_xval $in_paths
Then, add a PARAMETER with the name in_paths, and ensure that the
Make field mandatory box is checked.
Finally, under SERVER SETTINGS, add faculty_xval_<USER_NAME> to the
ENVIRONMENTS section.
For cross-validation jobs that are computationally intensive, we recommend using
dedicated servers as opposed to running in the cluster. To achieve this, click
on Large and GPU servers under SERVER RESOURCES, and select an appropriate
server type from the dropdown menu.
Remember to click SAVE when you are finished.
Please clone this repository. Examples of cross validation with faculty-xval
for the different types of model are provided in the directories
examples/keras and examples/sklearn. Usage instructions are then divided in
two notebooks:
jobs_cross_validator_run.ipynbloads the data, instantiates the model, and starts a Faculty job that carries out the cross validation.jobs_cross_validator_analyse.ipynbgathers the results from the cross validation, reloads the target data, and calculates the model accuracy over multiple train-test splits.
Note that the example notebooks must be run in the order just defined.
