Skip to content
Convergence
GitHub

SMARCA4

Research question

This task-based tutorial builds upon three previous tool-oriented tutorials build_bedpe, query_bedpe, and plot_contacts and also introduces an alternate solution using get_loops. Becoming familiar with these tools is recommended before diving into this tutorial. In this tutorial, we will explore two different approaches to answer the following questions:

How do the genomic interactions between H3K4me3 and H3K27ac peaks associated with the SMARCA4 gene differ between a control sample and a case sample?

To answer this question, we will use mm10 genes sourced from UCSC, coordinates for the SMARCA4 gene, and pre-existing peak files for H3K27ac and H3K4me3. First, we will use bedtools to extract the H3K4me3 peaks that are located in the gene body. Next, we will use build_bedpe to create genomic pairings between those peaks and all H3K27ac peaks within a user-defined genomic range. We’ll use query_bedpe to obtain the contact frequencies of the pairings and plot_contacts to visualize our process. We will conclude this tutorial with visualizations of the strongest and weakest interactions between control and case samples.

To list all available sample data on the Tinkerbox, simply type the following command into the terminal window:

list_samples.sh

Let’s assign variable names to our samples and reference files for ease

control=WT_K27ac_2i_Rep1
case=A2_K27ac_2i_Rep1

It might also be handy to have a list of mm10 genes, that’ll allow us to ask meaningful biological questions. We obtained mm10 genes from UCSC

gene_list=~/shared-files/UCSC.genes.mm10.bed

Obtain SMARCA4 gene coordinates

Since we now have a gene file handy, we can obtain the SMARCA4 gene coordinates from our gene list

gene=Smarca4
cat $gene_list | grep $gene > ${gene}.bed
cat ${gene}.bed

Obtain reference files from ENCODE-3

We also have some pre-cooked reference files ready; mESC H3K27ac and H3K4me3 peaks from ENCODE-3-

K4me3=~/path/to/E14Tg2a_K4me3_ENCSR212KGS_5kbins_merged_mm10.bed
K27ac=~/path/to/C57BL6_K27ac_ENCSR775OEV_5kbins_merged_mm10.bed

Extract H3K4me3 peaks

As we now have the SMARCA4 gene body, we can extract the H3K4me3 peaks that are located in the gene body, using bedtools


bedtools intersect \
   -a ${gene}.bed \
   -b $K4me3 \
   -wa -wb \
   | cut -f 5-7 > ${gene}_K4me3.bed

Construct pairs

Using build_bedpe, we can now construct genomic pairings (as a bedpe file) between our SMARCA4 K4me3 peak of interest and all H3K27ac peaks within a certain genomic distance.

build_bedpe.sh \
   -A ${gene}_K4me3.bed \
   -B $K27ac \
   --min_dist 30000 \
   --max_dist 300000 > ${gene}_K4me3-K27ac_pairings.bedpe

cat ${gene}_K4me3-K27ac_pairings.bedpe

bedpe=${gene}_K4me3-K27ac_pairings.bedpe

Obtain contact frequencies

Great! We now have all the genomic pairings we want! We can now obtain the contact frequencies of these pairings using query_bedpe and sort them in increasing order, based on control vs case values.

query_bedpe.sh \
   --sample1 $control 
   --sample2 $case \
   --pair $bedpe \
   --formula sum \
   --genome mm10 \
   --norm cpm \
   --resolution 5000 \
   | sort -k9n > ${gene}_K4me3-K27ac_pairings_annotated.bedpe

cat ${gene}_K4me3-K27ac_pairings_annotated.bedpe

chr9   21605000   21620000   chr9   21715000   21775000   1.237   0.94    -0.297
chr9   21605000   21620000   chr9   21780000   21825000   0.989   0.715   -0.274
chr9   21605000   21620000   chr9   21665000   21710000   0.67    0.414   -0.257
chr9   21505000   21570000   chr9   21605000   21620000   1.661   1.429   -0.232
chr9   21575000   21580000   chr9   21605000   21620000   0.177   0.038   -0.139
chr9   21605000   21620000   chr9   21830000   21880000   0.389   0.301   -0.088
chr9   21365000   21375000   chr9   21605000   21620000   0.212   0.15    -0.062
chr9   21475000   21480000   chr9   21605000   21620000   0        0       0
chr9   21605000   21620000   chr9   21885000   21890000   0        0       0
chr9   21605000   21620000   chr9   21645000   21660000   0.141    0.15    0.009
chr9   21380000   21385000   chr9   21605000   21620000   0.035    0.075   0.04
chr9   21490000   21500000   chr9   21605000   21620000   0.035    0.075   0.04
chr9   21605000   21620000   chr9   21895000   21965000   0.565    0.715   0.15
chr9   21395000   21470000   chr9   21605000   21620000   0.53     0.752   0.222

Visualize pairs

Let’s visualize our results using plot_contacts

plot_contacts.sh 
   --sample1 $control \
   --range chr9:21300000:21900000:${gene} \
   --genome mm10 \
   --output_name ${gene}.pdf \
   --norm cpm \
   --resolution 5000 \
   --max_cap 0.2 \
   --bedpe $bedpe \
   --bedpe_color 000000 \
   --annotations_custom ${gene}_K4me3.bed \
   --annotations_custom_color FF0000

pairs

Sort pairs

Which K4me3-K27ac pairs show maximum and minimum changes in contacts across the two samples? Let’s use commands to extract the top and bottom pairings from this list. Next, we will use the --bedpe option of plot_contacts to view the high and low pairs within the contact matrix.

head -n 1 \
${gene}_K4me3-K27ac_pairings_annotated.bedpe \
> low_${gene}_K4me3-K27ac_pairings_annotated.bedpe

tail -n 1 \
${gene}_K4me3-K27ac_pairings_annotated.bedpe \
> high_${gene}_K4me3-K27ac_pairings_annotated.bedpe

Visualize low contact value pairs

plot_contacts.sh \
   --sample1 $control \
   --sample2 $case \
   --range chr9:21300000:21900000:${gene} \
   --genome mm10 \
   --output_name ${gene}_1.pdf \
   --norm cpm \
   --resolution 5000 \
   --max_cap 0.2 \
   --bedpe low_${gene}_K4me3-K27ac_pairings_annotated.bedpe \
   --bedpe_color 0000FF \
   --annotations_custom ${gene}_K4me3.bed \
   --annotations_custom_color FF0000

low_pairs

Visualize high contact value pairs

plot_contacts.sh \
   --sample1 $control 
   --sample2 $case 
   --range chr9:21300000:21900000:${gene} \
   --genome mm10 \
   --output_name ${gene}_2.pdf \
   --norm cpm \
   --resolution 5000 \
   --max_cap 0.2 \
   --bedpe high_${gene}_K4me3-K27ac_pairings_annotated.bedpe \
   --bedpe_color FF0000 \
   --annotations_custom ${gene}_K4me3.bed \
   --annotations_custom_color FF0000

high_value