Skip to content
Convergence
GitHub

Examples

This is a tool-oriented primer designed to guide you through the functionalities of build_bedpe. From your tinker box, type build_bedpe -h into the command window and press enter to view the help message.

Help message

> build_bedpe -h

Builds pairs between elements in two bed files.
Pairs can be constrained by a third bed file (usually TADs)
or by user-defined minimum and maximum distances.
Prints pairs in bedpe format to standard out.
Pairs can be use to query .hic files with query_bedpe

--------------

OPTIONS

-A|--bed_A      	  : Path to the first bed file
-B|--bed_B      	  : Path to the second bed file
[-T|--TAD          ]: Path to the TAD file to restrict pairings outside of the TAD
[-d|--min_dist 	   ]: Minimum distance between pairs used to drop results. Default 0 bp
[-D|--max_dist 	   ]: Maximum distance between pairs used to drop results. Default 5 Mb
[-m|--preserve_meta]: If bed file metadata columns should be preserved.Default FALSE
[-i|--get_trans	   ]: If pairs between different chromosomes should be made. If TRUE, will print trans pairs only. Default FALSE
[-f|--fraction 	   ]: Control the number of possible trans pairs to be printed. Only applicable if --get trans is TRUE. Between 0-1. Default = 1
[-h|--help     	   ]  Help message

Let’s tinker with build_bedpe using provided sample data in your tinker box. First, let’s assign our sample bed file to the variable bed1. Notice the 3-column format for bed files, with chromosome number preceded by “chr” in column 1, start location in column 2, and stop location in column 3.

bed1=~/tutorials/sample_bed.bed

cat $bed1

chr16	   875000	  930000
chr16	  1050000	 1105000

Homogeneous builds

A single bed file is all you need to start using build_bedpe. Here, we use the same bed file for -A and -B parameters.

build_bedpe -A $bed1 -B $bed1

chr16	  875000	  930000	 chr16	  875000	  930000
chr16	  875000	  930000	 chr16	  1050000	 1105000
chr16	  1050000	 1105000	 chr16	  1050000	 1105000

The output prints unique combinations of -A and -B.

Heterogeneous builds

Let’s try another example using two different bed files.

bed2=~/tutorials/sample_bed2.bed

cat $bed2

chr16	  975000	 1200000
chr16	 1250000	 1500000

build_bedpe -A $bed1 -B $bed2

chr16    875000   930000	   chr16    975000   1200000
chr16    875000   930000	   chr16   1250000   1500000
chr16   1050000  1105000	   chr16    975000   1200000
chr16   1050000  1105000	   chr16   1250000   1500000

Minimum and maximum distances

We can fine tune our output using parameters -d (min_dist) and -D (max_dist). If we set -d to 0 and -D to 100000, we will only get results for pairs with start coordinates that are no more than 100000 base pairs apart.

build_bedpe -A $bed1 -B $bed2 -d 0 -D 100000

chr16 	 875000    930000   chr16   975000   1200000
chr16	  1050000   1105000   chr16   975000   1200000

You can see that we have lost the second and fourth rows, since the start values of those pairs was more than 100000 bp apart and our maximum distance limit -D was 100000.

# Row 2:
1250000 - 875000 = 375000
# Row 4:
1250000 - 1050000 = 200000

Furthermore, if we increase-d to 80000, we will filter all pairs separated by more than 80000 and less than (or equal to) 100000 base pairs. This leaves only one pair.

build_bedpe -A $bed1 -B $bed2 -d 80000 -D 100000

chr16   875000   930000   chr16   975000   1200000

Distance filtering with TADs

Instead of using -d and -D distance constraints, we can also use genomic distances within TADs to inform our minimum and maximum distances and build pairs using -T.

tad=~/tutorials/TAD_goldpan.liftOver.hg38_TADids.bed

build_bedpe -A $bed1 -B $bed1 -T $tad

chr16	875000	930000	chr16	   875000	   930000
chr16	875000	930000	chr16	   1050000	   1105000
chr16	1050000	1105000	chr16	   1050000	   1105000

Preserving metadata

If either (or both) bed file(s) have metadata in columns beyond the first three columns, we can choose to retain that metadata using the --preserve_meta (or -m) option.

bed_meta=~/tutorials/sample_meta.bed

cat $bed_meta

chr16	   875000	   930000	   X	  100
chr16	  1050000	  1105000	   Y	  200

build_bedpe -A $bed_meta -B $bed_meta --m TRUE

chr16   1050000   1105000   chr16	  1050000	 1105000	  Y    200	 Y   200
chr16    875000    930000   chr16	  1050000	 1105000	  X    100	 Y   200
chr16    875000    930000   chr16	   875000	  930000	  X    100	 X   100

Trans pairings

Pairs of interest are often made between regions on the same chromosome or within the same TAD (cis pairings), but we can also make pairs between regions on different chromosomes (trans pairings). The number of possible trans pairings gets very large very quickly, so we will use the fraction -f parameter to control the fraction of output printed. For this, let’s use a bed file with data for more than one chromosome. In this example, we start with 8 trans pairings with -A $bed1 and -B $bed3. Invoking -f 0.5 halves the printed output. Notice that -f operates sequentially, meaning the first half is printed.

bed3=~/tutorials/sample_bed3.bed

build_bedpe -A $bed1 -B $bed3 --get_trans TRUE

chr16	875000	930000	chr3	   125000  150000
chr16	875000	930000	chr3	   135000	 160000
chr16	875000	930000	chr3	   145000	 170000
chr16	875000	930000	chr3	   155000	 180000
chr16	1050000	1105000	chr3	   125000	 150000
chr16	1050000	1105000	chr3	   135000	 160000
chr16	1050000	1105000	chr3	   145000	 170000
chr16	1050000	1105000	chr3	   155000	 180000

build_bedpe -A $bed1 -B $bed3 --get_trans TRUE -f -0.5

chr16	875000	930000	chr3	   125000  150000
chr16	875000	930000	chr3	   135000	 160000
chr16	875000	930000	chr3	   145000	 170000
chr16	875000	930000	chr3	   155000	 180000

To save your bedpe output to a file, simply add > to the end of your arguments followed by the desired file name. For example:

build_bedpe -A $bed1 -B $bed3 --get_trans TRUE -f -0.5 > tutorial.bedpe

Bedpe files can then be used in many other aqua_tools, like querying .hic files with query_bedpe