Skip to content

Conversation

@shubhamagarwal92
Copy link

  1. Updated README; removing old links
  2. Removing hard coded paths in extractor.lua
  3. Shell script

extractor.lua Outdated
cmd:option('-convens_paths1', 'conv1-ep10-94-73' , [[path to conv net files]])
cmd:option('-convens_paths2', 'conv2-ep10-95-71' , [[path to conv net files]])
cmd:option('-convens_paths3', 'conv3-ep10-94-71' , [[path to conv net files]])
cmd:option('-lstmens_paths1', 'lstm1-ep5-92-76' , [[path to conv net files]])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the hint, it should read path to lstm files

torch.manualSeed(opt.seed)
cutorch.manualSeed(opt.seed)
cutorch.setDevice(opt.gpuid)
device_id = cutorch.getDevice()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure about this change. Setting gpuid from input through opt.gpuid seems better.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I was having this error when I used opts gpuid

THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-2141/cutorch/init.c line=734 error=10 : invalid device ordinal
/home/sagarwal/torch/install/bin/luajit: /home/sagarwal/projects/d2t/d2t/data2text-1/extractor.lua:574: cuda runtime error (10) : invalid device ordinal at /tmp/luarocks_cutorch-scm-1-2141/cutorch/init.c:734
stack traceback:
        [C]: in function 'setDevice'
        /home/sagarwal/projects/d2t/d2t/data2text-1/extractor.lua:574: in function 'main'
        /home/sagarwal/projects/d2t/d2t/data2text-1/extractor.lua:677: in main chunk
        [C]: in function 'dofile'
        ...rwal/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: at 0x00406470
srun: error: gpu05: task 0: Exited with exit code 1

I referenced this issue here

Maybe, the best solution would be to have a try-except statement instead?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A point to note is that -gpuid parameter is 1-indexed.
If you set it to value like 0, it will give similar error.
In a 4 gpu setup, values such as 1, 2, 3, 4 are valid.
I normally run with -gpuid 1 and it works

th $LUA_FILE \
-datafile $OUTPUT_H5 \
-preddata $MODEL_DIR/roto_stage2_$IDENTIFIER-beam5_gens.h5 \
-savefile $MODEL_DIR/roto_stage2_$IDENTIFIER-beam5_gens.h5-tuples.txt \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-savefile is not applicable for -just_eval

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants