-
Notifications
You must be signed in to change notification settings - Fork 2
glorotxa/SDAESampling
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Step to install:
**** To install liblinear locally: do make in the pyliblinear folder.
**** add the folder pyliblinear/python to your PYTHONPATH
**** in pyliblinear/python/linear.py: hard code the path line 19 to your /pyliblinear/liblinear.so.1 file
**** in DAESampling.py line 14: hard code your /pyliblinear/python/ path
You can run an experiment on machine of the lab, or on condor:
the different parameters of the experiment script:
------path_data: (path to the training data file)
for the small amazon: amazon/all-domain-lab_unlab-#N.vec [#N being the number of input dimensions (I have created: 50 (for fast testing), 5000,10000,25000,50000)]
for the big amazon: fullamazon/pylibsvm-data/all-domain-train-featsz=#N.vec [#N, the input dimension should be: 50,5000,25000,50000,100000 (!!!)]
-----path_data_test: (path to the test data file)
for the small amazon: amazon/pylibsvm-data/all-domain-test-#N.vec [#N being the number of input dimensions (I have created: 50 (for fast testing), 5000,10000,25000,50000)]
for the big amazon: fullamazon/pylibsvm-data/in-domain-test-selected-featsz=#N.vec [#N, the input dimension should be: 50,5000,25000,50000,100000 (!!!)]
----seed : the seed
----zeros=0.8 (the default I use): the masking by zeros percentage in the corruption noise
----ones=0.013 (for 5000) or 0.0035 (for 25000) (the default I use): the adding one percentage in the corruption noise
----n_inp: number of input dimension
----n_hid: number of hidden units (I use 5000)
----lr: unsupervised learning rate
----pattern: sampling pattern -> 'inp' (ones in inputs + random) or 'inpnoise' (ones in inputs + added noise ones + random) or 'noise' (where we corrupt + random) or None (for no sampling). I only use inpnoise and None (for dense experiments)
----ratio: ratio of reconstruction units to sample randomly
----batchsize: 10 (training batchsize)
----batchsizeerr: 1600 for small amazon 1139 for big amazon
----nepochs: number of mazimum unsupervised training epochs
----epochs: list of epochs to do test reconstruction and svm evaluation on.
----cost: 'CE' or 'MSE' ('CE' is the best)
----act: 'rect' or 'sigmoid' ('rect' is the best)
----scaling = False (no scaling in the cost to unbiased it) or tuple of the form: (weights on 1,weights on 0)
----small= True (if you use the small amazon) or False (if it is the big one)
----regcoef L1 regularization coefficient (at the moment I put it to 0)
----folds=5 (number of folds for SVM training (max 5))
----dense_size = 7000 for small amazon 2278 for the big one (only used for dense experiment)
example of fast test for sparse experiment:
THEANO_FLAGS=device=cpu,floatX=float32 jobman cmdline DAESampling.SamplingsparseSDAEexp seed=1 path_data=amazon/all-domain-lab_unlab-50.vec ninputs=50 path_data_test=amazon/all-domain-test-50.vec zeros=0.8 ones=0.013 n_inp=50 n_hid=50 lr=0.01 pattern='inpnoise' ratio=0.02 scaling=False batchsize=10 batchsizeerr=1600 nepochs=10 epochs=[1,5,10] small=True regcoef=0. folds=2 act='rect' cost='CE' dense_size=7000
example of fast test for dense experiment: (GPU only....)
THEANO_FLAGS=device=gpu,floatX=float32 jobman cmdline DAESampling.SamplingdenseSDAEexp seed=1 path_data=amazon/all-domain-lab_unlab-50.vec ninputs=50 path_data_test=amazon/all-domain-test-50.vec zeros=0.8 ones=0.013 n_inp=50 n_hid=50 lr=0.01 pattern='inpnoise' ratio=0.02 scaling=False batchsize=10 batchsizeerr=1600 nepochs=10 epochs=[1,5,10] small=True regcoef=0. folds=2 act='rect' cost='CE' dense_size=7000
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published