Appreciate your great work!
I would like to pre-train the model from scratch including the tokenizer. The run_mlm.py seems not to use the BPE tokenizer. Therefore, could you please share the exact pre-train script so that I can follow the same steps as you did for pretraining?
Thank you very much!