Christina Wusinich Project #3

cwusinich · 2020-02-27T23:55:11Z

No description provided.

leej3 · 2020-03-26T21:28:42Z

README.md

+1.	Transfer MEG, behavior, and MRI data to data processing folder on biowulf (this is just done with a BASH command)
+2.	Update demographics spreadsheet for use in analysis


These may not require manual intervention. The more you can automate the more scalable and less error prone your processing will be.

leej3 · 2020-03-26T21:31:43Z

README.md

+1.	Script name will be **01_megpreprocess1**, and I would like it to do the following:
+-	 Change file name using CTF command (to make file names look like “subject#_MID.ds” so they are more uniform)
+-	Set markers for stimuli (cue markers) and output file (.txt file) of the times of the cue marks (this file will be used in the next step)
+-	I already have some of the CTF commands for this, but I would like to figure out how to give this script (and all of the following ones) a list of subject names to loop through so I can do this all at once.


snakemake may be a useful approach for this. Or since you are working in human neuroimaging using nipype might be worth it. These tools for building and executing workflows are hard and the beginning but they pay off in the long run. An added advantage is that nipype is that many of the generic tools for neuroimaging processing use it so you will be able to debug/modify these pipelines more easily

Thank you so much for all of these responses and suggestions!!

leej3 · 2020-03-26T21:32:43Z

README.md

+##	Behavior processing
+1.	Script name will be **02_behprocess**, and it will do the following:
+-	Makes marker files for more MEG processing; these markers will designate win/loss/neutral cues to be marked in the MEG file and are output as three separate .txt files to the subject’s MEG directory
+-	Pulls columns from behavior data file and calculates mean RTs and accuracy by trial and subject and appends that to master behavior data sheet (I will probably work with this sheet in R unless I can do an ANOVA as another step)


Look into pandas for working on tabular data. It will make things a lot easier. And using csv files instead of text might be useful.

Python has advance tools for statistical learning/analysis. Statsmodels is probably what you need here but as the sophistication of your analysis grows you could add in the packages lme, scikit-learn, and tensorflow. There are certainly times to use R but if you can complete everything with python it will help you a lot. Learning how to debug bash, python, and R is never a fun task.

leej3 · 2020-03-26T21:37:08Z

README.md

+1.	Script name will be **02_behprocess**, and it will do the following:
+-	Makes marker files for more MEG processing; these markers will designate win/loss/neutral cues to be marked in the MEG file and are output as three separate .txt files to the subject’s MEG directory
+-	Pulls columns from behavior data file and calculates mean RTs and accuracy by trial and subject and appends that to master behavior data sheet (I will probably work with this sheet in R unless I can do an ANOVA as another step)
+-	I have already made two separate jupyter notebook files to do each of the above tasks, but I can figure how to do it without finding and replacing the subject ID; obviously a loop would be a lot easier, so I would like to figure that out and collapse both notebooks into one script


I suspect a pandas dataframe, string formatting should be adequate. Your processing should likely be encapsulated in functions (and in turn in modules). Any function could take subject id as an argument to modify it's behavior appropriately.

leej3 · 2020-03-26T21:40:37Z

README.md

+-	Copy talairached .nii file into a group folder for group analysis
+2.	Problems:
+-	This step uses a parameter file that I would like to be able to create and distribute among subject directories whenever we want to change parameters (e.g. the time window we’re looking at in the task)
+-	Each of the three SAM commands takes a bit of time, but I heard that using swarm in biowulf may be a good way to do this. I am not sure how it would work with a loop just yet, but it’s an aspiration to make this somewhat efficient if possible.


This is where nipype would shine. It is true that swarm would distribute this but then it is not very portable. And workflow managers do lots of intelligent management of the processing so that when you make a change you only recompute the bits of the analysis that you need to.

leej3 · 2020-03-26T21:44:00Z

README.md

+
+##	Permissions
+1.	I want a script that will set all of the permissions correctly for my group for all of the output files above because umask in my bash profile does not seem to work for this.
+2.	The commands in this script may just be put at the end of each of the above ones depending on how this works out, or should it just be a bash script?


Bash is fine. Python would be better. There's not too much point in rewriting lots of bash scripts into python. With that said if you tie everything together with python it will be easier to maintain and debug

It occurred to me that some tools completely ignore umask. Perhaps that is your problem. Assuming you have written all your output to "/data/$USER/output_dir" you could make sure your output is well behaved regarding permissions by ending each swarm job with something like:

OUTPUT_DIR=/data/$USER/output_dir GROUP=mygroup find $OUTPUT_DIR -group $USER -exec chown :$GROUP {} \; -exec chmod g+rx \;

leej3 · 2020-03-26T21:45:56Z

README.md

+2.	The commands in this script may just be put at the end of each of the above ones depending on how this works out, or should it just be a bash script?
+
+# Additional Questions:  
+-	Can I have a script do “module load afni” (and ctf, and R) so that I don’t have to?


You can certainly put this in your ~/.bashrc file (or ~/.bash_profile, depends on what os you are on). For the class project you are developing a python package. A python package (a pip installed one) will often just assume the appropriate dependencies are installed on the system.

leej3 · 2020-03-26T21:46:32Z

README.md

+
+# Additional Questions:  
+-	Can I have a script do “module load afni” (and ctf, and R) so that I don’t have to?
+-	If the commands are from CTF, can they be in a python script? I had just been running them in the terminal, so would I have to add something special to the beginning of each one to include it in a python script?


The subprocess package may be what you need here. Or the above mentioned workflow managers

leej3 · 2020-03-26T21:49:08Z

README.md

+# Additional Questions:  
+-	Can I have a script do “module load afni” (and ctf, and R) so that I don’t have to?
+-	If the commands are from CTF, can they be in a python script? I had just been running them in the terminal, so would I have to add something special to the beginning of each one to include it in a python script?
+-	Can I make a giant script that will run all of these steps (01-05 and setting permissions) on a list of subjects?


It should be a giant script. It should be a script that provides a command line interface to the user (use arg parse) and then calls the appropriate functionality (imported from other python modules in your package).

Or you might just have several smaller scripts if that is more convenient. Python entrypoints might be useful as you attempt to achieve this. Using the scripts keyword for setup.py may also be useful if you do wish to install non-python scripts as part of your package.

leej3 · 2020-03-26T21:51:47Z

README.md

+-	Can I have a script do “module load afni” (and ctf, and R) so that I don’t have to?
+-	If the commands are from CTF, can they be in a python script? I had just been running them in the terminal, so would I have to add something special to the beginning of each one to include it in a python script?
+-	Can I make a giant script that will run all of these steps (01-05 and setting permissions) on a list of subjects?
+-	How can I set my default python version in biowulf without manually having to load 3.7 every time? Can I do that at the beginning of the first script or in my profile?


yes. I have a Conda environment that is set in my profile as a default. That way I will always be using the python in that environment.

Often module load can be quite slow. Often, I create an bash alias to software within a conda environment. so for example alias git=/home/$USER/anaconda3/envs/misc_env/bin/git takes no time to load at login but will give me git from that environment when I want it.

…2020

leej3 · 2020-04-23T23:04:01Z

Am I doing git right?

Yes it appears so

Is there a reason that something would work when I run a command individually vs. running as a swarm? Trouble-shooting ideas would be great because my whole project won’t work if I can’t figure this out…

Sounds like checking both the .o and .e files will help to debug this.

Related to q#1: how do I figure out how many CPUs and threads to assign to a job? I’ve been doing a lot of googling and am still a bit confused. I’ve just been assigning a lot until the command runs successfully, but that seems kind of sloppy and maybe a disrespectful use of shared resources. And what about threads?

No need to worry about taking more than you have allocated, they don't let you. You only harm your own processing. It is an sbatch system so likely you can use variables like the ones listed here Try: SLURM_CPUS_PER_TASK.

I need some help conceptualizing how to bring multiple scripts together so that I can run them in order on a bunch of subjects at once while allocating enough memory for each so that this doesn’t take forever. Also I have a combination of python scripts I’ve written and some bash swarm files (which one of the python scripts generates), which is adding to some of my confusion. In Dr. Lee’s comments on my outline, he mentioned looking into pipelines, which I hope to do by next week but will probably still have some questions about organizing this and which one to pick.

Have a look at snakmake and nipype. They're worth it for the long run. Just maybe not for this project

Other questions if there is time:
Additionally, I have been encountering a lot of errors when swarming a bunch of subjects at once which sometimes seem to get lost in the shuffle (so I don’t realize it until a later step) or are very vague (i.e. it will just say “FAILED” and I have to then go figure out why); so it would be nice to figure out how to build in some more informative error messages and mechanisms that will stop later scripts in the chain if one of the earlier ones fails (also messages that tell me if things have run successfully--maybe some kind of report output?).
Another higher level question is that the neuroimaging data I’m working with just seems to be kind of unpredictable (or error-inducing?), and I wonder if it’s even wise to combine too many steps into one, though maybe the error messages will help with that. Or maybe is there something I can do early in the process to make sure the datasets are ready to go through the pipeline?

leej3 · 2020-05-03T02:27:52Z

f-strings will make a lot of this easier to construct and more legible. For example...

subject = "01"
session = "a"
cmd = f"""
    command 
        -ax 
        --another-flag 
        -s {subject}
        --session {session}
    """
cmd = ' '.join(cmd.split())
print(cmd)

Will generate "command -ax --another-flag -s 01 --session a". It is a lot more legible than adding strings together. Something to bear in mind in future.

Also, add a setup.py and some tests...

cwusinich · 2020-05-05T02:09:04Z

Thank you for the f-strings suggestion! If I don't change it for the project deadline, I definitely will at some point moving forward.

… draft--plz don't look at it yet), added blank setup.py and blank test file

cwusinich · 2020-05-06T19:06:22Z

@leej3 If we don't have any functions (I just have scripts), what should the tests be? Should I make one of them a function just so it can have tests?

leej3 · 2020-05-06T19:09:22Z

Yes, check the rubric. If you don't have functions it is hard to test, and your code will not be modular so it will be difficult to understand/maintain/reuse.

…

On Wed, May 6, 2020 at 3:06 PM cwusinich ***@***.***> wrote: @leej3 <https://github.com/leej3> If we don't have any functions (I just have scripts), what should the tests be? Should I make one of them a function just so it can have tests? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABJKZKGULDJRFXZDSCZ5O53RQGYL3ANCNFSM4K5FVMSQ> .

cwusinich · 2020-05-06T19:35:11Z

Thanks! Will do.

…to scripts (all swarm making functions in one, all behavior data processing functions in one; still have one more script to make into a function)

cwusinich · 2020-05-07T05:23:09Z

@leej3 I'm still planning to add more tests today and clean some of the scripts up if there's time, but the one test I have in there now works, and so do all of my modules, as far as I can tell. I also tested installing the package on a different computer, and that worked, and the test ran successfully. I'm wondering why it's not passing circleci; I read the details of it failing and looked at other people's pull requests that passed, but I'm still a little confused about what's wrong with mine. I'm guessing something isn't formatted or set-up correctly, but I'm not sure what. If you have time, can you check to see what's going wrong?

…raneous printing tho

test files should start with test to be picked up by pytest You should run your tests using pytest The creation of test data has been tidied into a function to make troubleshooting it a little easier. Ultimately creating a test fixture would be ideal for this sort of function. Using Path objects from pathlib helps when working with lots of directories and filenames. It adds the appropriate path seperators "/" or "\" depending on your operating system. And you no longer have to worry about trailing slashes. And it has lots of neat methods. If you wish to create variables throughout a .py file the correct style is to capitilize the names

leej3 · 2020-05-07T14:01:24Z

Great work. I've submitted some suggested fixes to help you on your way.

Quick summary:

Encapsulating fake data creation is a good idea
test modules must start with "test" for pytest to pick them up
Try to use pathlib.Path more. Just as f-string change one's life with strings, Path objects make working with paths much easier. The one gotcha (which is becoming less frequent) is that some tools insist on a string as input instead of a Path object. The quick fix is my_path_as_a_string = str(my_path_object)

some fixes

cwusinich · 2020-05-07T15:12:24Z

Thank you! It looks so much better now!

…g on third test

… functions; cleaned things up

cwusinich · 2020-05-08T15:14:42Z

Submitting now. Thank you for your help with this! I learned way more than I thought I would!

leesup · 2020-05-11T05:20:48Z

You did an awesome job on your project! You provided truly impressive description of your project and tests. Great job using Numpy and pandas - I hope these become your best friends when using Python! I would suggest having more white space for improved readability. In addition, I would suggest checking whether the libraries/modules you imported are actually being used. I saw that you imported csv module, but you ended up using pandas csv instead. Using pandas is completely fine, but you don’t need to import csv in that case. Also, you should have a look into pathlib and argparse. Again, great job with your project!

-Paul

cwusinich · 2020-05-12T20:31:45Z

Thank you @suhwanplee !! Those are great suggestions; I am definitely going to look into them. Take care!

cwusinich added 5 commits February 27, 2020 18:40

Add files via upload

5abfca1

Update Final_Project_Outline_CWusinich.md

164a1b7

Update Final_Project_Outline_CWusinich.md

9dbcbc0

Update README.md

1ed59bc

Delete Final_Project_Outline_CWusinich.md

d3797d1

leej3 approved these changes Mar 26, 2020

View reviewed changes

uploading scripts so far

74026d3

cwusinich changed the title ~~Christina Wusinich Project Outline~~ Christina Wusinich Project Apr 23, 2020

Christina Wusinich added 4 commits April 23, 2020 18:43

Uploading scripts to git

9de7eb4

Trying to sync this with gitgit pull origin master -m message

99f001f

still trying to make this upload

5c6fe45

Merge branch 'master' of https://github.com/cwusinich/project_spring_…

99d34f1

…2020

Christina Wusinich added 2 commits April 23, 2020 19:39

is this working...

c811c0a

is this working...

e9e0f76

Christina Wusinich and others added 3 commits May 4, 2020 22:17

Edited readme, renamed behavior data processing script draft (it is a…

aa69691

… draft--plz don't look at it yet), added blank setup.py and blank test file

Updated setup.py

8d37754

Updated readme

e1fc6e1

Updated setup file and renamed a few things

b43d960

Christina Wusinich and others added 6 commits May 6, 2020 19:51

Turned some old scripts into functions and combined some functions in…

85566e4

…to scripts (all swarm making functions in one, all behavior data processing functions in one; still have one more script to make into a function)

Gave package a cuter name

d4ec2f3

updated readme with function names and collapsed script names

ce78f38

more minor updates

2d16aa3

edits to tests

20b5ff3

A working test appears

a87295a

cwusinich added 2 commits May 7, 2020 00:26

Edited .gitignore

04bf814

Removed .files

a2e43b6

cwusinich and others added 3 commits May 7, 2020 03:23

Added a second functional test; need to figure out how to supress ext…

2f6285b

…raneous printing tho

make subject directory a sub directory

4e08edd

Merge pull request #1 from leej3/make_some_fixes

dfca453

some fixes

cwusinich and others added 10 commits May 7, 2020 12:04

A few edits to the tests

f904ba9

added more fake data making capability to test setup function; workin…

c80ca16

…g on third test

added 3 more tests; removed default args that were variables from all…

b57891d

… functions; cleaned things up

updated .gitignore

7b6ae2b

removed file clutter

22af7eb

removed more clutter

df84c6f

updated README.md

4f39821

edited version number

aba1874

updated README.md yet again

82e3122

updated README.md one last time

b3cada0

		1. Transfer MEG, behavior, and MRI data to data processing folder on biowulf (this is just done with a BASH command)
		2. Update demographics spreadsheet for use in analysis

Christina Wusinich Project #3

Are you sure you want to change the base?

Christina Wusinich Project #3

Uh oh!

Conversation

cwusinich commented Feb 27, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leej3 commented Apr 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leej3 commented May 3, 2020

Uh oh!

cwusinich commented May 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cwusinich commented May 6, 2020

Uh oh!

leej3 commented May 6, 2020 via email

Uh oh!

cwusinich commented May 6, 2020

Uh oh!

cwusinich commented May 7, 2020

Uh oh!

leej3 commented May 7, 2020

Uh oh!

cwusinich commented May 7, 2020

Uh oh!

cwusinich commented May 8, 2020

Uh oh!

leesup commented May 11, 2020

Uh oh!

cwusinich commented May 12, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

leej3 commented Apr 23, 2020 •

edited

Loading

cwusinich commented May 5, 2020 •

edited

Loading