-
Notifications
You must be signed in to change notification settings - Fork 67
Open
Description
I'm trying to do a smoke test for your eval suite but cannot get the script to run properly.
I've followed the setup instructions:
- run
makein the root directory - add
nl2bashtoPYTHONPATH - run
make datainscripts
From here I attempt to confirm the dev set evaluates well against itself: from scripts I run
./bash-run.sh --data bash --prediction_file ../data/bash/dev.cm.filtered --evalthis produces the following stdout & traceback:
Reading data from /workspace/sempar/nl2bash/encoder_decoder/../data/bash
Saving models to /workspace/sempar/nl2bash/encoder_decoder/../model/seq2seq
Loading data from /workspace/sempar/nl2bash/encoder_decoder/../data/bash
source file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/train.nl.filtered
target file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/train.cm.filtered
9985 data points read.
source file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/dev.nl.filtered
target file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/dev.cm.filtered
782 data points read.
source file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/test.nl.filtered
target file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/test.cm.filtered
779 data points read.
(Auto) evaluating ../data/bash/dev.cm.filtered
782 predictions loaded from ../data/bash/dev.cm.filtered
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/workspace/sempar/nl2bash/encoder_decoder/translate.py", line 378, in <module>
tf.compat.v1.app.run()
File "/workspace/sempar/sempar.env/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 36, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/workspace/sempar/sempar.env/lib/python3.7/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/workspace/sempar/sempar.env/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/workspace/sempar/nl2bash/encoder_decoder/translate.py", line 301, in main
eval(dataset, FLAGS.prediction_file)
File "/workspace/sempar/nl2bash/encoder_decoder/translate.py", line 176, in eval
return eval_tools.automatic_eval(prediction_path, dataset, top_k=3, FLAGS=FLAGS, verbose=verbose)
File "/workspace/sempar/nl2bash/eval/eval_tools.py", line 246, in automatic_eval
"{} vs. {}".format(len(grouped_dataset), len(prediction_list)))
ValueError: ground truth and predictions length must be equal: 701 vs. 782
You can see it's evaluating against a dataset with only 701 bash utterances even though it successfully read 782 from the dev set in data_utils.load_data. Do you know why this is happening?
(If it helps I'm in Python 3.7.11 running on a fresh install of Ubuntu 18.04.6)
Metadata
Metadata
Assignees
Labels
No labels