Request for pretraining script

Appreciate your great work!

I would like to pre-train the model from scratch including the tokenizer. The run_mlm.py seems not to use the BPE tokenizer. Therefore, could you please share the exact pre-train script so that I can follow the same steps as you did for pretraining? 

Thank you very much!