Different transformer implementations have variations (e.g. in positional encoding, where skip connections are, use of MQA, etc). Lets provide a Gemma standard implementation of transformers. This could be verified by being able to load and evaluate with a Gemma weights file.