-
Notifications
You must be signed in to change notification settings - Fork 31
Christina Wusinich Project #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
README.md
Outdated
| 1. Transfer MEG, behavior, and MRI data to data processing folder on biowulf (this is just done with a BASH command) | ||
| 2. Update demographics spreadsheet for use in analysis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These may not require manual intervention. The more you can automate the more scalable and less error prone your processing will be.
README.md
Outdated
| 1. Script name will be **01_megpreprocess1**, and I would like it to do the following: | ||
| - Change file name using CTF command (to make file names look like “subject#_MID.ds” so they are more uniform) | ||
| - Set markers for stimuli (cue markers) and output file (.txt file) of the times of the cue marks (this file will be used in the next step) | ||
| - I already have some of the CTF commands for this, but I would like to figure out how to give this script (and all of the following ones) a list of subject names to loop through so I can do this all at once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
snakemake may be a useful approach for this. Or since you are working in human neuroimaging using nipype might be worth it. These tools for building and executing workflows are hard and the beginning but they pay off in the long run. An added advantage is that nipype is that many of the generic tools for neuroimaging processing use it so you will be able to debug/modify these pipelines more easily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for all of these responses and suggestions!!
README.md
Outdated
| ## Behavior processing | ||
| 1. Script name will be **02_behprocess**, and it will do the following: | ||
| - Makes marker files for more MEG processing; these markers will designate win/loss/neutral cues to be marked in the MEG file and are output as three separate .txt files to the subject’s MEG directory | ||
| - Pulls columns from behavior data file and calculates mean RTs and accuracy by trial and subject and appends that to master behavior data sheet (I will probably work with this sheet in R unless I can do an ANOVA as another step) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look into pandas for working on tabular data. It will make things a lot easier. And using csv files instead of text might be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python has advance tools for statistical learning/analysis. Statsmodels is probably what you need here but as the sophistication of your analysis grows you could add in the packages lme, scikit-learn, and tensorflow. There are certainly times to use R but if you can complete everything with python it will help you a lot. Learning how to debug bash, python, and R is never a fun task.
README.md
Outdated
| 1. Script name will be **02_behprocess**, and it will do the following: | ||
| - Makes marker files for more MEG processing; these markers will designate win/loss/neutral cues to be marked in the MEG file and are output as three separate .txt files to the subject’s MEG directory | ||
| - Pulls columns from behavior data file and calculates mean RTs and accuracy by trial and subject and appends that to master behavior data sheet (I will probably work with this sheet in R unless I can do an ANOVA as another step) | ||
| - I have already made two separate jupyter notebook files to do each of the above tasks, but I can figure how to do it without finding and replacing the subject ID; obviously a loop would be a lot easier, so I would like to figure that out and collapse both notebooks into one script |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect a pandas dataframe, string formatting should be adequate. Your processing should likely be encapsulated in functions (and in turn in modules). Any function could take subject id as an argument to modify it's behavior appropriately.
README.md
Outdated
| - Copy talairached .nii file into a group folder for group analysis | ||
| 2. Problems: | ||
| - This step uses a parameter file that I would like to be able to create and distribute among subject directories whenever we want to change parameters (e.g. the time window we’re looking at in the task) | ||
| - Each of the three SAM commands takes a bit of time, but I heard that using swarm in biowulf may be a good way to do this. I am not sure how it would work with a loop just yet, but it’s an aspiration to make this somewhat efficient if possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where nipype would shine. It is true that swarm would distribute this but then it is not very portable. And workflow managers do lots of intelligent management of the processing so that when you make a change you only recompute the bits of the analysis that you need to.
README.md
Outdated
|
|
||
| ## Permissions | ||
| 1. I want a script that will set all of the permissions correctly for my group for all of the output files above because umask in my bash profile does not seem to work for this. | ||
| 2. The commands in this script may just be put at the end of each of the above ones depending on how this works out, or should it just be a bash script? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bash is fine. Python would be better. There's not too much point in rewriting lots of bash scripts into python. With that said if you tie everything together with python it will be easier to maintain and debug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It occurred to me that some tools completely ignore umask. Perhaps that is your problem. Assuming you have written all your output to "/data/$USER/output_dir" you could make sure your output is well behaved regarding permissions by ending each swarm job with something like:
OUTPUT_DIR=/data/$USER/output_dir
GROUP=mygroup
find $OUTPUT_DIR -group $USER -exec chown :$GROUP {} \; -exec chmod g+rx \;
README.md
Outdated
| 2. The commands in this script may just be put at the end of each of the above ones depending on how this works out, or should it just be a bash script? | ||
|
|
||
| # Additional Questions: | ||
| - Can I have a script do “module load afni” (and ctf, and R) so that I don’t have to? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can certainly put this in your ~/.bashrc file (or ~/.bash_profile, depends on what os you are on). For the class project you are developing a python package. A python package (a pip installed one) will often just assume the appropriate dependencies are installed on the system.
README.md
Outdated
|
|
||
| # Additional Questions: | ||
| - Can I have a script do “module load afni” (and ctf, and R) so that I don’t have to? | ||
| - If the commands are from CTF, can they be in a python script? I had just been running them in the terminal, so would I have to add something special to the beginning of each one to include it in a python script? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The subprocess package may be what you need here. Or the above mentioned workflow managers
README.md
Outdated
| # Additional Questions: | ||
| - Can I have a script do “module load afni” (and ctf, and R) so that I don’t have to? | ||
| - If the commands are from CTF, can they be in a python script? I had just been running them in the terminal, so would I have to add something special to the beginning of each one to include it in a python script? | ||
| - Can I make a giant script that will run all of these steps (01-05 and setting permissions) on a list of subjects? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be a giant script. It should be a script that provides a command line interface to the user (use arg parse) and then calls the appropriate functionality (imported from other python modules in your package).
Or you might just have several smaller scripts if that is more convenient. Python entrypoints might be useful as you attempt to achieve this. Using the scripts keyword for setup.py may also be useful if you do wish to install non-python scripts as part of your package.
README.md
Outdated
| - Can I have a script do “module load afni” (and ctf, and R) so that I don’t have to? | ||
| - If the commands are from CTF, can they be in a python script? I had just been running them in the terminal, so would I have to add something special to the beginning of each one to include it in a python script? | ||
| - Can I make a giant script that will run all of these steps (01-05 and setting permissions) on a list of subjects? | ||
| - How can I set my default python version in biowulf without manually having to load 3.7 every time? Can I do that at the beginning of the first script or in my profile? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. I have a Conda environment that is set in my profile as a default. That way I will always be using the python in that environment.
Often module load can be quite slow. Often, I create an bash alias to software within a conda environment. so for example alias git=/home/$USER/anaconda3/envs/misc_env/bin/git takes no time to load at login but will give me git from that environment when I want it.
Yes it appears so
Sounds like checking both the .o and .e files will help to debug this.
No need to worry about taking more than you have allocated, they don't let you. You only harm your own processing. It is an sbatch system so likely you can use variables like the ones listed here Try: SLURM_CPUS_PER_TASK.
Have a look at snakmake and nipype. They're worth it for the long run. Just maybe not for this project Other questions if there is time: |
|
f-strings will make a lot of this easier to construct and more legible. For example... Will generate "command -ax --another-flag -s 01 --session a". It is a lot more legible than adding strings together. Something to bear in mind in future. Also, add a setup.py and some tests... |
|
Thank you for the f-strings suggestion! If I don't change it for the project deadline, I definitely will at some point moving forward. |
… draft--plz don't look at it yet), added blank setup.py and blank test file
|
@leej3 If we don't have any functions (I just have scripts), what should the tests be? Should I make one of them a function just so it can have tests? |
|
Yes, check the rubric. If you don't have functions it is hard to test, and
your code will not be modular so it will be difficult to
understand/maintain/reuse.
…On Wed, May 6, 2020 at 3:06 PM cwusinich ***@***.***> wrote:
@leej3 <https://github.com/leej3> If we don't have any functions (I just
have scripts), what should the tests be? Should I make one of them a
function just so it can have tests?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABJKZKGULDJRFXZDSCZ5O53RQGYL3ANCNFSM4K5FVMSQ>
.
|
|
Thanks! Will do. |
…to scripts (all swarm making functions in one, all behavior data processing functions in one; still have one more script to make into a function)
|
@leej3 I'm still planning to add more tests today and clean some of the scripts up if there's time, but the one test I have in there now works, and so do all of my modules, as far as I can tell. I also tested installing the package on a different computer, and that worked, and the test ran successfully. I'm wondering why it's not passing circleci; I read the details of it failing and looked at other people's pull requests that passed, but I'm still a little confused about what's wrong with mine. I'm guessing something isn't formatted or set-up correctly, but I'm not sure what. If you have time, can you check to see what's going wrong? |
…raneous printing tho
test files should start with test to be picked up by pytest You should run your tests using pytest The creation of test data has been tidied into a function to make troubleshooting it a little easier. Ultimately creating a test fixture would be ideal for this sort of function. Using Path objects from pathlib helps when working with lots of directories and filenames. It adds the appropriate path seperators "/" or "\" depending on your operating system. And you no longer have to worry about trailing slashes. And it has lots of neat methods. If you wish to create variables throughout a .py file the correct style is to capitilize the names
|
Great work. I've submitted some suggested fixes to help you on your way. Quick summary:
|
some fixes
|
Thank you! It looks so much better now! |
… functions; cleaned things up
|
Submitting now. Thank you for your help with this! I learned way more than I thought I would! |
|
You did an awesome job on your project! You provided truly impressive description of your project and tests. Great job using Numpy and pandas - I hope these become your best friends when using Python! I would suggest having more white space for improved readability. In addition, I would suggest checking whether the libraries/modules you imported are actually being used. I saw that you imported csv module, but you ended up using pandas csv instead. Using pandas is completely fine, but you don’t need to import csv in that case. Also, you should have a look into pathlib and argparse. Again, great job with your project! -Paul |
|
Thank you @suhwanplee !! Those are great suggestions; I am definitely going to look into them. Take care! |
No description provided.