Failure to run eval script

I'm trying to do a smoke test for your eval suite but cannot get the script to run properly.

I've followed the setup instructions:
1. run `make` in the root directory
2. add `nl2bash` to `PYTHONPATH`
3. run `make data` in `scripts`

From here I attempt to confirm the dev set evaluates well against itself: from `scripts` I run
```bash
./bash-run.sh --data bash --prediction_file ../data/bash/dev.cm.filtered --eval
```
this produces the following stdout & traceback:
```
Reading data from /workspace/sempar/nl2bash/encoder_decoder/../data/bash                                                          
Saving models to /workspace/sempar/nl2bash/encoder_decoder/../model/seq2seq                                                       
Loading data from /workspace/sempar/nl2bash/encoder_decoder/../data/bash                                                          
source file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/train.nl.filtered                                             
target file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/train.cm.filtered                                             
9985 data points read.                                                                                                            
source file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/dev.nl.filtered                                               
target file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/dev.cm.filtered                                               
782 data points read.                                                                                                             
source file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/test.nl.filtered                                              
target file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/test.cm.filtered                                              
779 data points read.                                                                                                             
(Auto) evaluating ../data/bash/dev.cm.filtered                                                                                    
782 predictions loaded from ../data/bash/dev.cm.filtered                                                                          
Traceback (most recent call last):                                                                                                
  File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main                                                      
    "__main__", mod_spec)                                                                                                         
  File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code                                                                 
    exec(code, run_globals)                                                                                                       
  File "/workspace/sempar/nl2bash/encoder_decoder/translate.py", line 378, in <module>                                            
    tf.compat.v1.app.run()                                                                                                        
  File "/workspace/sempar/sempar.env/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 36, in run              
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)                                                          
  File "/workspace/sempar/sempar.env/lib/python3.7/site-packages/absl/app.py", line 312, in run                                   
    _run_main(main, args)                                                                                                         
  File "/workspace/sempar/sempar.env/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main                             
    sys.exit(main(argv))                                                                                                          
  File "/workspace/sempar/nl2bash/encoder_decoder/translate.py", line 301, in main                               
    eval(dataset, FLAGS.prediction_file)                                                                                          
  File "/workspace/sempar/nl2bash/encoder_decoder/translate.py", line 176, in eval                                                
    return eval_tools.automatic_eval(prediction_path, dataset, top_k=3, FLAGS=FLAGS, verbose=verbose)                             
  File "/workspace/sempar/nl2bash/eval/eval_tools.py", line 246, in automatic_eval                                                
    "{} vs. {}".format(len(grouped_dataset), len(prediction_list)))                                                               
ValueError: ground truth and predictions length must be equal: 701 vs. 782
```
You can see it's evaluating against a dataset with only 701 bash utterances even though it successfully read 782 from the dev set in `data_utils.load_data`. Do you know why this is happening?

(If it helps I'm in Python 3.7.11 running on a fresh install of Ubuntu 18.04.6)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Failure to run eval script #33

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Failure to run eval script #33

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions