I write a attention-based seq2seq model for neural machine translation. It can runs on multiple GPUs(one PC with multiple GPUs)
My data was downloaded from nlp.stanford.edu/projects/nmt/. Trained on small dataset english-Vietnamese. Pickle data is avaliable if you want
build_dict.pypreprocess dataset. I preprocessed input dataset into pickle files. Transfer string into int32, filter length(3~50). Note your file path.config.pysome model parameters.model_topbah.pyBahanau attention on top layer of decoder and encodertrain_vi.pyentrance for training. set up your own parameters at the beginning of this file. At line 37, set gpu_id likegpus = "5,6,7", No space in string.gpuloader.pydataloader for multiple gpu training.dataloader.pydataloader to feed in data totf.placeholder. Not used.
- You can try Luong attention as well, but I didn't get well result using Luong, not so easy to train. RMSProp and Adam need small lr(like 0.001), while SGD need bigger like 1.0. But SGD is much harder to train.
- Best result is
output att=False, rmsp, (lr=0.001, start_decay=8000,0.8), got bleu=20.5% on tst2012.vi without beam search. - decode phase not tested.
- Spend lots of time on writing multiple gpu training. how to feed in data? how to compute loss and gradient?
My code, especially model.py and train.py, is not well organized, may be updated if I have spare time.
may add comments.
Want to use tf.data.Dataset api. I wonder how to set validation_per_train_step.
- https://github.com/JayParks/tf-seq2seq/blob/master/seq2seq_model.py # good for beginners
- https://github.com/tensorflow/nmt/tree/master/nmt
- https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_multi_gpu_train.py
- Effective Approaches to Attention-based Neural Machine Translation
- neural machine translation by jointly learning align and translate