Usage
build_bedpe
build_bedpe
builds pairs between elements in two bed files. Pairs can be constrained by a third bed file (usually TADs) or by user-defined minimum and maximum distances. Pairs are printed in bedpe format to standard out. Pairs can be use to query .hic files with query_bedpe
. View the tutorial here.
Usage and Option Summary
build_bedpe -A path/to/bed1.bed -B path/to/bed2.bed -T path/to/TADfile.bed
(or):
build_bedpe -A path/to/bed1.bed -B path/to/bed2.bed -d 10000 -D 100000
Required
Short Option | Long Option | Description |
---|---|---|
-A | --bed_A | Path to the first bed file |
-B | --bed_B | Path to the second bed file |
Optional
Short Option | Long Option | Description |
---|---|---|
-T | --TAD | Path of the TAD file to restrict pairings outside of the TAD |
-d | --min_dist | Minimum distance between pairs used to drop results. Default 0 bp |
-D | --max_dist | Maximum distance between pairs used to drop results. Default 5 Mb |
-m | --preserve_meta | If bed file meta data columns should be preserved. Default FALSE |
-i | --get_trans | If pairs between different chromosomes should be made. If TRUE, will print trans pairs only. Default FALSE |
-f | --fraction | Control the number of possible trans pairs to be printed. Between 0-1. Default = 1. Only applicable if --get_trans is TRUE. |
-h | --help | Help message |
get_loops
get_loops
obtains loops with scores > 1. Scores are calculated using inherent normalization and printed in bedpe format to standard out.
Usage and Option Summary
get_loops \
-A H3K27ac \
-G hg38 \
-R chr1:1000:5000 \
-r 5000
Required
Short Option | Long Option | Description |
---|---|---|
-A | --sample1 | Name of the sample you want to use as it appears on the Tinker box |
-G | --genome | The genome build the sample(s) has been processed using. Strictly hg19 or hg38 |
-R | --range | Range to obtain loops from, in chr:start format. |
-r | --resolution | Resolution of sample in base pairs. Only 5000 and 1000 supported. |
Optional
Short Option | Long Option | Description |
---|---|---|
-T | --TAD | Full path to the TAD file, the boundaries of which will be used to obtain loops |
-S | --score | Minimum inherent score to get loops. Default = 1 |
-d | --min_dist | Minimum distance to filter obtained loops. Default 0 bp |
-h | --help | Help message |
get_multisample_viewpoints
get_multisample_viewpoints
is used to extract contact values from specific genomic viewpoints for multiple samples simultaneously.
Usage and Option Summary
get_multisample_viewpoints \
-G hg38 \
-L LAMP_DMSO,LAMP_dCBP1 \
-R chr1:40280000:40530000:MCC7_MYCL \
-V chr1:40400000:anchor1
Required
Short Option | Long Option | Description |
---|---|---|
-L | --list | Comma separated list of sample names. For ex., LAMP_DMSO,LAMP_dCBP1 |
-G | --genome | The genome build the sample(s) has been processed using. Strictly hg19 or hg38 |
-R | --range | The genomic range to extract the contact values of, in chr:start:end format. For example: -R chr1:40280000:40530000:MCC7_MYCL |
-V | --viewpoint | Viewpoint in chr:start format. For example: -V chr1:40400000:anchor1 |
Optional
Short Option | Long Option | Description |
---|---|---|
-T | --table | Path to 1-col .txt file containing list of sample names, if --list option is not used |
-Q | --norm | Which normalization to use. Strictly none , cpm or aqua in lower case. Non-spike-in samples default to cpm. Spike-in samples default to aqua. |
-r | --resolution | Resolution of sample in base pairs, using which the contact values should be calculated. Default 5000. Accepted resolutions- 1000,5000,10000,25000,50000,100000,250000,500000,1000000,2500000 |
-O | --output_name | If saving to a file is desired, provide a name for the output |
-h | --help | Help message |
intersect_bedpe
Given a bedpe file, intersect_bedpe
prints those rows of the bedpe in standard out that intersect with rows of given bed file(s) on either foot of the pair. intersect_bedpe
is useful for extracting biological subsets from the bedpe.
Usage and Option Summary
intersect_bedpe -A H3K27ac -P /path/to/bedpe
(or):
intersect_bedpe -A H3K27ac -B H3K27me3 -P /path/to/bedpe
Required
Short Option | Long Option | Description |
---|---|---|
-A | --bed_A | Path to the first bed file |
-P | --bedpe | Path to the bedpe file |
Optional
Short Option | Long Option | Description |
---|---|---|
-F | --flank | Genome distance in bp that the bed should be in vicinity of either foot. Default is 0 |
-V | --absence | If specified, reports those rows of the bedpe that do not intersect with rows of given bed file. Default FALSE |
-B | --bed_B | Path to the second bed file |
--print_bed | If specified, reports rows of bed instead of bedpe | |
-h | --help | Help message |
plot_APA
plot_APA
generates APA (Aggregate Peak Analysis) plots using AQuA normalized contact values from genomic pair data.
Usage and Option Summary
plot_APA \
-P /path/to/example_pairs.bedpe \
-A H3K27ac \
-G hg38 \
-O /path/to/output_directory \
-B SampleB \
--bin_size 10000 \
--hard_cap_cpm 50
(or):
plot_APA \
-P /path/to/example_pairs.bedpe \
-A H3K27ac \
-G hg38 \
-O /path/to/output_directory \
-B H3K27me3
Required
Short Option | Long Option | Description |
---|---|---|
-P | --pair | Path to the bedpe (pairs) file you want to use, without headers. |
-A | --sample1 | Name of the sample you want to use to create the plot, name it as it appears on the Tinker box |
-G | --genome | The genome build the sample(s) has been processed using. Strictly hg19 or hg38. |
-O | --out-dir | Full path of the directory you want to store the output plots in. |
Optional
Short Option | Long Option | Description |
---|---|---|
-B | --sample2 | The name of the second sample. If triggered, plots the delta AQuA normalized values from both samples for that pair. Useful in case vs control. |
--cpml | No input required. If —cpml is specified, CPM and AQuA APA values get normalized by the number of loops in the bedpe. | |
--bin_size | Bin size you want to use for the APA plots. Default = 5000. | |
--hard_cap_cpm | If saving to a file is desired, provide a name for the output. | |
--hard_cap_cpm_delta | Upper limit of the CPM delta plot range. Only for two sample analysis. If not specified, upper limit will be calculated using max delta value. | |
--hard_cap_aqua | Upper limit of the AQuA plot range. If not specified, upper limit will be calculated using max bin value. | |
--hard_cap_aqua_delta | Upper limit of the AQuA delta plot range. Only for two sample analysis. If not specified, upper limit will be calculated using max delta value. | |
-h | --help | Help message |
plot_contacts
plot_contacts
creates contact plots with CPM/AQuA normalized contact values. View the tutorial here.
Usage and Option Summary
plot_contacts -A H3K27ac -R chr1:40280000:40530000:MCC7_MYCL -G hg38
(or):
plot_contacts -A H3K27ac -B H3K27me3 -g MCC7_MYCL -G hg38
Required
Short Option | Long Option | Description |
---|---|---|
-A | --sample_1 | Name of sample you want to use to create the contact plot, name it as it appears on the Tinkerbox |
-R | --range | The genomic range that is to be plotted, in chr:start:end format. For example: -R chr1:40280000:40530000 |
-G | --genome | The genome build the sample(s) has been processed using. Strictly hg19 or hg38 |
Optional
Short Option | Long Option | Description |
---|---|---|
-O | --output_name | Provide a name for the output pdf |
-Q | --norm | Which normalization to use. Strictly ‘none’, ‘cpm’ or ‘aqua’ in lower case. Non-spike-in samples default to cpm. Spike-in samples default to aqua. |
-B | --sample_2 | For two sample delta plots, name of the second sample. |
-r | --resolution | Resolution of sample in base pairs, using which the contact values should be calculated. Default 5000. Accepted resolutions- 1000,5000,10000,25000,50000,100000,250000,500000,1000000,2500000 |
-p | --profiles | If contact profiles should be drawn along the diagonal, x axis and y axis. Default = FALSE |
-o | --color_one_sample | Color for contacts for single sample plots in RGB hexadecimal, ex: red = FF0000 (RRGGBB). Default = FF0000 |
-t | --color_two_sample | Color for contacts for two sample plots (delta) in RGB hexadecimal separated by ’-’, ex: 1E90FF-C71585 |
--annotations_default | Draw bed annotations; TSSs, ENCODE 3 enhancers, CpG islands. Default = TRUE | |
--annotations_custom | Path to bed file to draw custom annotations. Only one custom bed supported | |
--quant_cut | Between 0.00-1.00. Rather than using the max value of the matrix as the highest color, cap the values at a given percentile. Default 1.00 | |
--max_cap | Set a hard cap, all values greater contact values than this will be brought down to cap value supplied | |
--use_dump | TRUE or FALSE. Obtain raw contact matrices along with contact plot. Default FALSE | |
--bedpe | Supply path to a bedpe file to highlight tiles of interacting bedpe feet | |
--bedpe_color | Color for supplied bedpe in RGB hexadecimal. ex: C71585 | |
-i | --inherent | TRUE or FALSE. If TRUE, normalize the contact plot using inherent normalization |
-w | --width | Manually set width of printed bin between 0 and 1. Default width calculated automatically. |
-g | --gene | Provide a gene name to automatically select TAD coordinates for interval range (-R). -g can be used in place of -R. |
-h | --help | Help message. Primer can be found at https://rb.gy/fjkwkr |
plot_virtual_4C
plot_virtual_4C
is used for visualizing chromatin interactions, similar to what the 4C (Circular Chromosome Conformation Capture) technique does. However, instead of performing a wet-lab 4C experiment, the tool uses processed data to virtually generate a 4C profile focused on interactions of a specific genomic region (viewpoint) with the rest of the genome.
Usage and Option Summary
plot_virtual_4C
-A H2K27ac \
-G hg38 \
-R chr1:40280000:40530000:MCC7_MYCL \
-V chr1:40400000:anchor1
(or):
plot_virtual_4C
-A H3K27ac \
-B H3K27me3 \
-G hg38 \
-R chr1:40280000:40530000:MCC7_MYCL \
-V chr1:40400000:anchor1
Required
Short Option | Long Option | Description |
---|---|---|
-A | --sample1 | Name of the first sample you want to use as it appears on the tinker box |
-G | --genome | The genome build the sample(s) has been processed using. Strictly hg19 or hg38 |
-R | --range | The genomic range that is to be plotted, in chr:start:end format. For example: -R chr1:40280000:40530000:MCC7_MYCL |
-V | --viewpoint | The viewpoint to be considered in chr:start format. For example: -R chr1:40400000:anchor1 |
Optional
Short Option | Long Option | Description |
---|---|---|
-B | --sample2 | Name of the second sample you want to use as it appears on the tinker box |
-Q | --norm | Which normalization to use. Strictly none , cpm or aqua in lower case. Non-spike-in samples default to cpm. Spike-in samples default to aqua. |
-r | --resolution | Resolution of sample in base pairs. Default 5000. Accepted resolutions: 1000,5000,10000,25000,50000,100000,250000,500000,1000000,2500000 |
-O | --output_name | Optional: provide a name for the plot |
--quant_cut | Help message | |
--max_cap | Set a hard cap, all values greater than this will be brought down to cap value supplied | |
--width | Number of bins up and downstream of viewpoint locus to be considered for drawing profiles. Default 0 | |
--height | Numeric factor to control the height of the Virtual 4C profile in the plot. Default 0.3 | |
-h | --help | Help message |
query_bedpe
query_bedpe
uses a bedpe file to calculate AQuA normalized or counts-per-million (CPM) contact values for given ranges in a sample and prints to standard out. View the tutorial here.
Usage and Option Summary
query_bedpe -A H3K27ac -P path/to/pairs.bedpe -G hg38
(or):
query_bedpe -A H3K27ac -B H3K27me3 -P path/to/pairs.bedpe -G hg38
Required
Short Option | Long Option | Description |
---|---|---|
-P | --pair | Full path to the bedpe (pairs) file you want to query, without headers! |
-A | --sample_1 | Name of the sample you want to use as it appears on the Tinker box |
-G | --genome | The genome build the sample(s) has been processed using. Strictly hg19 or hg38 |
-Q | --norm | Which normalization to use. Strictly ‘none’, ‘cpm’ or ‘aqua’ in lower case. Non-spike-in samples default to cpm. Spike-in samples default to aqua. |
Optional
Short Option | Long Option | Description |
---|---|---|
-B | --sample_2 | The name of the second sample. If triggered, calculates the delta contact values for that pair. Useful in case vs control |
-R | --resolution | Resolution of sample in base pairs. Default 5000. Accepted resolutions: 1000,5000,10000,25000,50000,100000,250000,500000,1000000,2500000 |
-f | --formula | Arithmetic to use to report contact values. Options: center, max, average, sum. Default = center |
-F | --fix | If FALSE, reports new coordinates based on arithmetic center or max. Default = TRUE |
--shrink_wrap | Squeezes a 2D bedpe interval until supplied value is reached. Default = FALSE | |
--split | Splits a 2D bedpe interval into multiple sub-intervals greater than supplied value. Default = FALSE | |
--padding | Joins sub-intervals in 2D space reported by —split, based on supplied value in bin units. Default = 2 | |
--expand | Expands 1D bedpe feet in both directions based on supplied value (in bin units). Default = 0 | |
-I | --inherent | If TRUE, hic values transformed to inherent units. For one-sample tests only. Default = FALSE |
-h | --help | Help message. Primer can be found at https://rb.gy/zyfjxc |
summarize_interval
summarize_interval
counts both short-range and long-range 3D contacts within specified genomic intervals.
Usage and Option Summary
summarize_interval \
-G hg38 \
-I /path/to/input_bed_file.bed \
-A H3K27ac
(or):
summarize_interval \
-G hg38 \
-I /path/to/input_bed_file.bed \
-A H3K27ac \
-B H3K27me3
Required
Short Option | Long Option | Description |
---|---|---|
-I | --input | Full path to the bed (intervals) file without headers |
-A | --sample1 | Name of the sample you want to use to calculate the contact values, as it appears on the Tinker box |
-G | --genome | The genome build the sample(s) has been processed using. Strictly hg19 or hg38 |
Optional
Short Option | Long Option | Description |
---|---|---|
-B | --sample2 | Name of the second sample you want to use as it appears on the Tinker box. Useful in case vs control. |
-Q | --AQuA | Use AQuA factors: TRUE or FALSE. Non-spike-in samples default to FALSE (CPM). Spike-in samples default to TRUE (AQUA). |
-r | --resolution | Resolution of sample in base pairs. Default 5000. Accepted resolutions: 1000,5000,10000,25000,50000,100000,250000,500000,1000000,2500000 |
-D | --distance | Distance in base pairs to classify short-range and long-range contact values. Default 15000. |
-h | --help | Help message |