Avg time per step inference on CPU: 0.002576863145828247 Avg time per step inference on GPU: 0.003681471061706543 0.003681471061706543 * 16000 = 59s so how can we get the result in the paper?