Hi,
I was trying to download and use the AWS exemplar dataset, but I could not find num_tokens_seen.txt in the layer directories.
In the code, the original lines are:
num_tokens_seen = 0
for split in splits:
layer_dir = self.get_layer_dir(layer, split)
with open(os.path.join(layer_dir, "num_tokens_seen.txt"), "r") as f:
num_tokens_seen += int(f.read())
Could you please confirm whether this file is included in the AWS dataset? If not, is there a recommended way to replace it for quantile calculations?
Thanks!