Getting started

1. Installation

a. Installing PORT

The latest version of PORT can be downloaded from https://github.com/itmat/Normalization/releases

Alternatively, you can clone the github repository.

git clone https://github.com/itmat/Normalization.git

b. Installing sam2cov (optional)

You can use sam2cov to create coverage files and upload them to a Genome Browser.

Currently, sam2cov only supports reads aligned with RUM, STAR or GSNAP. sam2cov supports stranded data, but it assumes the reverse read is in the same orientation as the transcripts/genes (sense).

You can download sam2cov from https://github.com/khayer/sam2cov

Please make sure you have the latest version of sam2cov.

c. Installing samtools

You can download samtools from http://samtools.sourceforge.net/

2. Input requirements

a. Input files

i. Raw sequence reads

Raw sequence reads need to be provided as input in Fasta or Fastq format.
Expects one file per sample for single-end data and two files per sample for paired-end data.
Fasta/Fastq files can be gzipped.

ii. Alignment files

Alignment files need to be provided as input in SAM or BAM format.
Required tags: IH (or NH) and HI.
Expects one alignment file per sample.
Paired-End data: Mated alignments need to be in adjacent lines.
- aligner options to use for PORT compatibility:
  - STAR v2.5.1a or newer: use "--outSAMunmapped Within KeepPairs" option.
    use "--outSAMtype BAM Unsorted" for bam output.
  - GSNAP 2015-12-31.v6 or newer: use "-A sam", "--ordered" and "--add-paired-nomappers" option.
Paired-End data: Read pairs need to have same read ids.

iii. Gene information file

Gene information file with required suffixes need to be provided.
- The required suffixes are: name, chrom, strand, txStart, txEnd, exonStarts, exonEnds, name2, geneSymbol, ensemblToGeneName.value.
Gene-level normalization requires an ENSEMBL gene information file.

Ensembl gene info files for mm9, mm10, hg19, hg38, dm3 and danRer7 are available in Normalization/norm_scripts directory:

 mm9: /path/to/Normalization/norm_scripts/mm9_ensGenes.txt  
 mm10: /path/to/Normalization/norm_scripts/Mus_musculus.GRCm38.84.PORT_geneinfo.txt  
 hg19: /path/to/Normalization/norm_scripts/hg19_ensGenes.txt  
 hg38: /path/to/Normalization/norm_scripts/Homo_sapiens.GRCh38.84.PORT_geneinfo.txt  
 dm3: /path/to/Normalization/norm_scripts/dm3_ensGenes.txt  
 danRer7: /path/to/Normalization/norm_scripts/danRer7_ensGenes.txt

A script to convert an ENSEMBL gtf file to a gene information file is available in Normalization/norm_scripts directory.
- /path/to/Normalization/norm_scripts/convert_gtf_to_PORT_geneinfo.transcripts.pl

iv. Genome fa/fai

The description line (the header line that begins with ">") MUST begin with chromosome names that match the chromosome names in Gene information file.

Please check and modify the file appropriately before starting PORT.

You can get the index file (*.fai) using samtools
- samtools faidx <ref.fa>

b. Input directory structure

The input files need to be organized into a specific directory structure for PORT to run properly.

Give STUDY directory a unique name.
Sample directories (Sample_1, Sample_2, etc) can have any name.
Make sure the raw sequence reads and alignment files (SAM/BAM files) are in each sample directory inside the READS folder.
All alignment files MUST have the same name across samples.

Example:

STUDY
└── READS
    ├── Sample_1
    │   ├── Unaligned reads
    │   └── Aligned.sam/bam
    ├── Sample_2
    │   ├── Unaligned reads
    │   └── Aligned.sam/bam
    ├── Sample_3
    │   ├── Unaligned reads
    │   └── Aligned.sam/bam
    └── Sample_4
        ├── Unaligned reads
        └── Aligned.sam/bam

c. Configuration file

Get the template_version.cfg file from /path/to/Normalization/ and follow the instructions in the config file.
- NORMALIZATION TYPE, DATA TYPE (stranded), CLUSTER INFO, GENE INFO, rRNA, FA and FAI, DATA VISUALIZATION and CLEANUP options need to be specified.
PORT is designed to be run on a compute cluster. It has been tested on SGE and LSF.
See about_cfg.md for detail.

next: How to run PORT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Getting started

1. Installation

a. Installing PORT

b. Installing sam2cov (optional)

c. Installing samtools

2. Input requirements

a. Input files

i. Raw sequence reads

ii. Alignment files

iii. Gene information file

iv. Genome fa/fai

b. Input directory structure

c. Configuration file

Uh oh!

Uh oh!

Clone this wiki locally