-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Labels
Description
Will update this as things are changed
- test separate variance modeling in a structured way
- fix experiment alignment figures image path - needs to be relative to the HTML.
- iteratively run the filters, and check if there are experiments where all observations are filtered out, and then remove them and run the filters again
- fix "sort" issue with pd.concat
q:\anaconda\lib\site-packages\pandas\core\frame.py:6201: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.
To accept the future behavior, pass 'sort=True'.
To retain the current behavior and silence the warning, pass sort=False
sort=sort)
- fix RI columns being wiped when concatenating
- quick and dirty pairwise correlation b/n experiments - to see outliers and warn the user that they should be filtered out
- check the max(PEP) of each raw file, and warn the user if they input a raw file with PEPs that are too low (nothing to boost)
- retention length filtering - raw file specific
- rename output columns
- migrate to config file instead of command-line options
- improve input file-type converting
- file-type determines column names
- move filtering blocks into separate functions. file-type determines which functions are run
- pip installable
- violin plot of residual density by RT (RT on x-axis)
- pairwise correlation of RTs - heatmap
- diagnostic figures for the update portion
- PEP vs PEP.new scatterplot
- fold change increase in IDs as function of PEP threshold
- validation figures
- multiple peptides of the same protein - should have the same intensity (measure the CV)
- generate HTML file to view figures
- add and start throwing exceptions
- create entire output directory including all subfolders
- parameter for defining column headers - additional option instead of specifying the file type
- fix experiment exclusion
- optional save alignment parameters
- split up outputs in same way the inputs are split up
- then remove input_id column
- remove id column as well?
- verbose levels and actually enforce them in code
- additional parameters to select which columns to have
- default should just be pep_new. maybe have a "diagnostic" flag that includes the other columns?
- logging -> logger
- default retention length filter - (max_retention_time) / 60
- optimize experiment updating
- filter_decoys/contaminants -> include_decoys/contaminants
- add PEP_updated column
FUTURE VERSION
- move off of STAN
- optimize data selection by RT bin, experiment, and peptide. remove as much as possible but retain the same amount of coverage.