Examples
This is a tool-oriented primer designed to guide you through the functionalities of build_bedpe
. From your tinker box, type build_bedpe -h
into the command window and press enter to view the help message.
Help message
> build_bedpe -h
Builds pairs between elements in two bed files.
Pairs can be constrained by a third bed file (usually TADs)
or by user-defined minimum and maximum distances.
Prints pairs in bedpe format to standard out.
Pairs can be use to query .hic files with query_bedpe
--------------
OPTIONS
-A|--bed_A : Path to the first bed file
-B|--bed_B : Path to the second bed file
[-T|--TAD ]: Path to the TAD file to restrict pairings outside of the TAD
[-d|--min_dist ]: Minimum distance between pairs used to drop results. Default 0 bp
[-D|--max_dist ]: Maximum distance between pairs used to drop results. Default 5 Mb
[-m|--preserve_meta]: If bed file metadata columns should be preserved.Default FALSE
[-i|--get_trans ]: If pairs between different chromosomes should be made. If TRUE, will print trans pairs only. Default FALSE
[-f|--fraction ]: Control the number of possible trans pairs to be printed. Only applicable if --get trans is TRUE. Between 0-1. Default = 1
[-h|--help ] Help message
Let’s tinker with build_bedpe
using provided sample data in your tinker box. First, let’s assign our sample bed file to the variable bed1
. Notice the 3-column format for bed files, with chromosome number preceded by “chr” in column 1, start location in column 2, and stop location in column 3.
bed1=~/tutorials/sample_bed.bed
cat $bed1
chr16 875000 930000
chr16 1050000 1105000
Homogeneous builds
A single bed file is all you need to start using build_bedpe
. Here, we use the same bed file for -A
and -B
parameters.
build_bedpe -A $bed1 -B $bed1
chr16 875000 930000 chr16 875000 930000
chr16 875000 930000 chr16 1050000 1105000
chr16 1050000 1105000 chr16 1050000 1105000
The output prints unique combinations of -A
and -B
.
Heterogeneous builds
Let’s try another example using two different bed files.
bed2=~/tutorials/sample_bed2.bed
cat $bed2
chr16 975000 1200000
chr16 1250000 1500000
build_bedpe -A $bed1 -B $bed2
chr16 875000 930000 chr16 975000 1200000
chr16 875000 930000 chr16 1250000 1500000
chr16 1050000 1105000 chr16 975000 1200000
chr16 1050000 1105000 chr16 1250000 1500000
Minimum and maximum distances
We can fine tune our output using parameters -d
(min_dist) and -D
(max_dist). If we set -d
to 0 and -D
to 100000, we will only get results for pairs with start coordinates that are no more than 100000 base pairs apart.
build_bedpe -A $bed1 -B $bed2 -d 0 -D 100000
chr16 875000 930000 chr16 975000 1200000
chr16 1050000 1105000 chr16 975000 1200000
You can see that we have lost the second and fourth rows, since the start values of those pairs was more than 100000 bp apart and our maximum distance limit -D
was 100000.
# Row 2:
1250000 - 875000 = 375000
# Row 4:
1250000 - 1050000 = 200000
Furthermore, if we increase-d
to 80000, we will filter all pairs separated by more than 80000 and less than (or equal to) 100000 base pairs. This leaves only one pair.
build_bedpe -A $bed1 -B $bed2 -d 80000 -D 100000
chr16 875000 930000 chr16 975000 1200000
Distance filtering with TADs
Instead of using -d
and -D
distance constraints, we can also use genomic distances within TADs to inform our minimum and maximum distances and build pairs using -T
.
tad=~/tutorials/TAD_goldpan.liftOver.hg38_TADids.bed
build_bedpe -A $bed1 -B $bed1 -T $tad
chr16 875000 930000 chr16 875000 930000
chr16 875000 930000 chr16 1050000 1105000
chr16 1050000 1105000 chr16 1050000 1105000
Preserving metadata
If either (or both) bed file(s) have metadata in columns beyond the first three columns, we can choose to retain that metadata using the --preserve_meta
(or -m
) option.
bed_meta=~/tutorials/sample_meta.bed
cat $bed_meta
chr16 875000 930000 X 100
chr16 1050000 1105000 Y 200
build_bedpe -A $bed_meta -B $bed_meta --m TRUE
chr16 1050000 1105000 chr16 1050000 1105000 Y 200 Y 200
chr16 875000 930000 chr16 1050000 1105000 X 100 Y 200
chr16 875000 930000 chr16 875000 930000 X 100 X 100
Trans pairings
Pairs of interest are often made between regions on the same chromosome or within the same TAD (cis pairings), but we can also make pairs between regions on different chromosomes (trans pairings). The number of possible trans pairings gets very large very quickly, so we will use the fraction -f
parameter to control the fraction of output printed. For this, let’s use a bed file with data for more than one chromosome. In this example, we start with 8 trans pairings with -A $bed1
and -B $bed3
. Invoking -f 0.5
halves the printed output. Notice that -f
operates sequentially, meaning the first half is printed.
bed3=~/tutorials/sample_bed3.bed
build_bedpe -A $bed1 -B $bed3 --get_trans TRUE
chr16 875000 930000 chr3 125000 150000
chr16 875000 930000 chr3 135000 160000
chr16 875000 930000 chr3 145000 170000
chr16 875000 930000 chr3 155000 180000
chr16 1050000 1105000 chr3 125000 150000
chr16 1050000 1105000 chr3 135000 160000
chr16 1050000 1105000 chr3 145000 170000
chr16 1050000 1105000 chr3 155000 180000
build_bedpe -A $bed1 -B $bed3 --get_trans TRUE -f -0.5
chr16 875000 930000 chr3 125000 150000
chr16 875000 930000 chr3 135000 160000
chr16 875000 930000 chr3 145000 170000
chr16 875000 930000 chr3 155000 180000
To save your bedpe output to a file, simply add >
to the end of your arguments followed by the desired file name. For example:
build_bedpe -A $bed1 -B $bed3 --get_trans TRUE -f -0.5 > tutorial.bedpe
Bedpe files can then be used in many other aqua_tools, like querying .hic files with query_bedpe