Skip to content

Failure to run eval script #33

@kyduff

Description

@kyduff

I'm trying to do a smoke test for your eval suite but cannot get the script to run properly.

I've followed the setup instructions:

  1. run make in the root directory
  2. add nl2bash to PYTHONPATH
  3. run make data in scripts

From here I attempt to confirm the dev set evaluates well against itself: from scripts I run

./bash-run.sh --data bash --prediction_file ../data/bash/dev.cm.filtered --eval

this produces the following stdout & traceback:

Reading data from /workspace/sempar/nl2bash/encoder_decoder/../data/bash                                                          
Saving models to /workspace/sempar/nl2bash/encoder_decoder/../model/seq2seq                                                       
Loading data from /workspace/sempar/nl2bash/encoder_decoder/../data/bash                                                          
source file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/train.nl.filtered                                             
target file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/train.cm.filtered                                             
9985 data points read.                                                                                                            
source file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/dev.nl.filtered                                               
target file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/dev.cm.filtered                                               
782 data points read.                                                                                                             
source file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/test.nl.filtered                                              
target file: /workspace/sempar/nl2bash/encoder_decoder/../data/bash/test.cm.filtered                                              
779 data points read.                                                                                                             
(Auto) evaluating ../data/bash/dev.cm.filtered                                                                                    
782 predictions loaded from ../data/bash/dev.cm.filtered                                                                          
Traceback (most recent call last):                                                                                                
  File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main                                                      
    "__main__", mod_spec)                                                                                                         
  File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code                                                                 
    exec(code, run_globals)                                                                                                       
  File "/workspace/sempar/nl2bash/encoder_decoder/translate.py", line 378, in <module>                                            
    tf.compat.v1.app.run()                                                                                                        
  File "/workspace/sempar/sempar.env/lib/python3.7/site-packages/tensorflow/python/platform/app.py", line 36, in run              
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)                                                          
  File "/workspace/sempar/sempar.env/lib/python3.7/site-packages/absl/app.py", line 312, in run                                   
    _run_main(main, args)                                                                                                         
  File "/workspace/sempar/sempar.env/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main                             
    sys.exit(main(argv))                                                                                                          
  File "/workspace/sempar/nl2bash/encoder_decoder/translate.py", line 301, in main                               
    eval(dataset, FLAGS.prediction_file)                                                                                          
  File "/workspace/sempar/nl2bash/encoder_decoder/translate.py", line 176, in eval                                                
    return eval_tools.automatic_eval(prediction_path, dataset, top_k=3, FLAGS=FLAGS, verbose=verbose)                             
  File "/workspace/sempar/nl2bash/eval/eval_tools.py", line 246, in automatic_eval                                                
    "{} vs. {}".format(len(grouped_dataset), len(prediction_list)))                                                               
ValueError: ground truth and predictions length must be equal: 701 vs. 782

You can see it's evaluating against a dataset with only 701 bash utterances even though it successfully read 782 from the dev set in data_utils.load_data. Do you know why this is happening?

(If it helps I'm in Python 3.7.11 running on a fresh install of Ubuntu 18.04.6)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions