Skip to content
Convergence
GitHub

FOXO1, MYOG

Research questions

This task-based tutorial builds upon three previous tool-oriented tutorials build_bedpe, query_bedpe, and plot_contacts and also introduces an alternate solution using get_loops. Becoming familiar with these tools is recommended before diving into this tutorial. In this tutorial, we will explore two different approaches to answer the following questions:

  1. Which enhancers near the FOXO1 gene interact most frequently?
  2. Which enhancers near the MYOG gene interact most frequently?

FOXO1

To answer the first question, we will start by identifying the TAD where FOXO1 resides and obtaining all the H3K27ac peaks for the TAD using bedtools. Next, we will build genomic pairs between FOXO1 H3K27ac peaks with build_bedpe and determine contact frequencies of the pairings using query_bedpe. We will use plot_contacts to visualize the process and conclude with a visualization of the strongest enhancer loop near the FOXO1 gene.

To answer the second question, we will identify the enhancers that interact most frequently using the get_loops tool, and visualize the strongest interactions with plot_contacts. The goal of this tutorial is to demonstrate how to employ various tools to effectively analyze genomic interactions near specific genes.

Let’s start with the first question. To list all available sample data on the Tinker box, simply type the following command into the terminal window:

list_samples

For ease, let’s assign variable names to our samples and reference files.

sample=RH4_DMSO_GSM3414981
TSS=~/lab-data/hg38/reference/gencode.TSS.200bp.hg38.bed
TAD=~/lab-data/hg38/reference/TAD_goldstandard_lift.hg38.bed
PEAKS=~/lab-data/hg38/reference/primer_refs/peaks.bed
# *NOTE: the peak file $PEAKS is for illustration purposes only*

Obtain TSS

Now that we have a TSS file ready, we can obtain the FOXO1 TSS

cat $TSS | grep FOXO1 > FOXO1_tss.bed

Obtain TAD

Using the FOXO1 TSS, we can fetch the TAD where it resides

bedtools intersect \
   -a FOXO1_tss.bed \
   -b $TAD \
   -wa -wb
   | cut -f 6-8 \
   > FOXO1_tad.bed

cat FOXO1_tad.bed

chr13  39745863  40770864

Obtain peaks

Now, let’s now obtain all the H3K27ac peaks housed within the FOXO1 TAD.

bedtools intersect \
   -a $PEAKS \
   -b FOXO1_tad.bed > FOXO1_peaks.bed

Visualize locus

We can visualize the FOXO1 locus using plot_contacts.

plot_contacts \
   --sample1 $sample \
   --gene FOXO1 \
   --genome hg38 \
   --output_name FOXO1.pdf

foxo1

Zooming in..

foxo1

Construct pairs

Using build_bedpe, we can now construct genomic pairings (as a bedpe file) between all FOXO1 H3K27ac peaks, and visualize them using the --bedpe option with plot_contacts

build_bedpe \
-A $PEAKS \
-B $PEAKS \
-d 20000 > FOXO1_peaks.bedpe
bedpe=FOXO1_peaks.bedpe

plot_contacts \
--sample1 $sample \
--gene FOXO1 \
--genome hg38 \
--output_name FOXO1_bedpe.pdf \
--bedpe $bedpe \
--bedpe_color 424242

bedpe

Query pairs

Great! We now have all the genomic pairings we want! We can now obtain the contact frequencies of these pairings using query_bedpe and sort them in increasing order

query_bedpe \
   --sample1 $sample
   --pair $bedpe \
   --genome hg38 | sort -k7n

chr13   40340001  40345000	chr13  40535001	40540000	0
chr13   40440001  40445000	chr13  40575001	40580000	0
chr13   40540001  40545000	chr13  40590001	40595000	0
chr13   40340001  40345000	chr13  40575001	40580000	0.186
chr13   40380001  40385000	chr13  40440001	40445000	0.186
...
chr13   40230001  40235000	chr13  40590001  40595000	3.724
chr13   40230001  40235000	chr13  40380001  40385000	3.91
chr13   40230001  40235000	chr13  40585001  40590000	4.096
chr13   40230001  40235000  chr13  40575001  40580000	4.282

Visualize strongest loop

The last entry with the highest contact value indicates the strongest loop. Let’s save that into a new file and visualize using the --gene and --bedpe options with with plot_contacts.

annotate_loops \
   --sample1 $sample \
   --pair $bedpe \
   --genome hg38 \
   | sort -k7n | tail -1 > FOXO1_loop.bedpe
plot_contacts \

--sample1 $sample \
--gene FOXO1 \
--genome hg38 \
--output_name FOXO1_loop.pdf \
--bedpe FOXO1_loop.bedpe \
--bedpe_color 424242

There it is!

high_value

MYOG

Now, let’s look at Question 2.

Which enhancers near the MYOG gene interact most frequently?

We have an alternate way to answer this question using the tool get_loops. We’ll also use another sample to switch things up.

Visualize MYOG TAD

sample=RH41_K27ac

plot_contacts \
   --sample1 $sample \
   --gene MYOG \
   --genome hg38 \
   --output_name MYOG.pdf

gene_view

Hmmm, the diagonal is too dark. We can use --max_cap and focus on a shorter range to bring our color cap down and create better visualization. To do this, we’ll switch from using the gene parameter and instead define a specific range.

plot_contacts \
   --sample1 $sample \
   --range chr1:202800000:203500000:MYOG \
   --genome hg38 \
   --max_cap 1000 \
   --output_name MYOG_2.pdf

range_view

Get loops

Much better! Now, to get the enhancers that interact the most, we will use get_loops

get_loops \
   --sample_A $sample \
   --genome hg38 \
   --range chr1:202800000:203500000 | sort -k7n

chr1	203250000	203255000	chr1	203250000	203255000	1.003
chr1	202805000	202810000	chr1	203285000	203290000	1.015
chr1	203290000	203295000	chr1	203300000	203305000	1.022
...
chr1	203280000	203285000	chr1	203325000	203330000	4.301
chr1	203285000	203290000	chr1	203325000	203330000	4.944
chr1	203275000	203280000	chr1	203325000	203330000	5.227

All entries in this file indicate strong interactions, the last entry being the strongest. Let’s save all of them into a new file and visualize using plot_contacts.

Visualize loops

get_loops
--sample_A $sample \
--genome hg38 \
--range chr1:202800000:203500000 | sort -k7n > MYOG_loops.bedpe

plot_contacts \
--sample1 $sample \
--range chr1:202800000:203500000:MYOG \
--genome hg38 \
--max_cap 1000
--output_name MYOG_loops.pdf \
--bedpe MYOG_loops.bedpe \
--bedpe_color 424242

loops_view