Skip to content
Convergence
GitHub

SOX8

Research Question

This task-based tutorial builds upon three previous tool-oriented tutorials build_bedpe, query_bedpe, and plot_contacts and also introduces an alternate solution using get_loops. Becoming familiar with these tools is recommended before diving into this tutorial. In this tutorial, we will explore two different approaches to answer the following questions:

For the SOX8 gene, which TSS-CpG pairings show the strongest and weakest acetylation signal in 3D space?

To answer this question, we will determine acetylation levels by assessing and visualizing the changes in H3K27ac HiChIP reads between DMSO (control) and dCBP1 (treatment) in the RH4 cell type. We will use bedtools to obtain the SOX8 TSS, the TAD in which SOX8 resides, as well as all the CpG sites within the TAD. Next, we will use build_bedpe to construct pairings between the SOX8 TSS and all CpG sites within the TAD, and determine contact frequencies of the pairings using query_bedpe. We will use plot_contacts to visualize the process and conclude with visualizations of both the strongest and weakest acetylation signals in our case vs control analysis.

To list all available data on the Tinker box, simply type the following command into the terminal window:

list_samples.sh

Let’s assign simple names to our control, treatment, and reference files:

control=RH4_DMSO
treatment=RH4_HDACi

TSS=~/lab-data/hg38/reference/gencode.TSS.200bp.hg38.bed
TAD=~/lab-data/hg38/reference/TAD_goldstandard_lift.hg38.bed
CPG=~/lab-data/hg38/reference/cpg.hg38.bed

Obtain TSS

Now that we have a TSS file ready, we can obtain the SOX8 TSS

cat $TSS | grep SOX8 > SOX8_tss.bed

Obtain TAD

Using the SOX8 TSS, we can fetch the TAD it resides in

bedtools intersect \
   -a SOX8_tss.bed
   -b $TAD \
   -wa -wb
   | cut -f 6-8 > SOX8_tad.bed

cat SOX8_tad.bed

chr16  790000  1259999

Obtain CpG sites

We now have the SOX8 TSS and the TAD it resides in. Using the SOX8 TAD, let’s now obtain all the CpG sites housed in it.

bedtools intersect \
   -a $CPG
   -b SOX8_tad.bed > SOX8_cpgs.bed

Visualize SOX8 TAD

We can now visualize the SOX8 TAD using plot_contacts. CpG sites and TSSs will be displayed by default.

plot_contacts.sh \
   --sample1 $control \
   --range chr16:790000:1260000:SOX8_TAD\
   --genome hg38 \
   --output_name SOX8.pdf \
   --norm aqua
   --max_cap 2 \
   --resolution 5000
   --annotations_default TRUE \
   --annotations_custom SOX8_tad.bed \
   --annotations_custom_color 657c78

sox8_tad

Let’s zoom in to SOX8.

sox8_zoom

Construct Pairs

Next, we can construct pairings between the SOX8 TSS and all CpG sites within the TAD by using build_bedpe.

build_bedpe.sh \
  -A SOX8_tss.bed \
  -B SOX8_cpgs.bed > SOX8_tss_cpg.bedpe

bedpe=SOX8_tss_cpg.bedpe

Visualize pairs

We can visualize these pairings using the --bedpe option with plot_contacts

plot_contacts.sh \
  -sample1 $control \
  -range chr16:790000:1260000:SOX8_TAD \
  -genome hg38 \
  -output_name SOX8_bedpe.pdf \
  -norm aqua \
  -resolution 5000 \
  -max_cap 2 \
  -bedpe $bedpe \
  -bedpe_color 424242

pairs

Obtain and sort contact frequencies

Great! We now have all the genomic pairings we want! We can now obtain the contact frequencies of these pairings using query_bedpe and sort them in increasing order

query_bedpe.sh \
   --sample1 $control \
   --sample2 $treatment \
   --pair SOX8_tss_cpg.bedpe \
   --genome hg38 --norm aqua --resolution 5000 \
   | sort -k9n > SOX8_tss_cpg_contact_values.bedpe

We can view the top and bottom values using head and tail commands. Since we sorted our list in increasing order, head will show us the smallest acetylation values and tail will show us the largest acetylation values.

head -n 5 SOX8_tss_cpg_contact_values.bedpe

chr16   979878   985327   chr16   981770	   981970   0.745   0      -0.745
chr16   807341   808025   chr16   981770	   981970   1.117   0.379  -0.738
chr16   817032   817833   chr16   981770	   981970   0.559   0      -0.559
chr16   869858   870117   chr16   981770	   981970   0.559   0      -0.559
chr16   800227   800806   chr16   981770	   981970   0.372   0      -0.372

tail -n 5 SOX8_tss_cpg_contact_values.bedpe

chr16   981770   981970   chr16   1089732   1089937   0       0.758   0.758
chr16   981770   981970   chr16    986820    987448   2.607   3.411   0.804
chr16   970219   971303   chr16    981770    981970   0.931   1.895   0.964
chr16   981770   981970   chr16   1050626   1050848   1.489   2.653   1.164
chr16   981770   981970   chr16   1057607   1057865   0       1.895   1.895

Visualize case-control pairs

Before we visualize the lowest and highest acetylation pairs, let’s visualize all of our pairs again, but this time we’ll look at both control and treatment samples using the two-sample version of plot_contacts

plot_contacts.sh \
   --sample1 $control \
   --sample2 $treatment \
   --range chr16:790000:1260000:SOX8_TAD \
   --resolution 5000 \
   --genome hg38 \
   --output_name SOX8_delta_all.pdf \
   --norm aqua \
   --max_cap 2 \
   --annotations_default TRUE \
   --annotations_custom SOX8_tad.bed \
   --annotations_custom_color 657c78 \
   --bedpe SOX8_tss_cpg.bedpe \
   --bedpe_color 424242

pc.png

Save pairs with lowest and highest acetylation

Let’s save the pair with the lowest acetylation:

head -n 1 \
SOX8_tss_cpg_contact_values.bedpe > low_SOX8_tss_cpg_contact_values.bedpe

And let’s also save the pair with the highest acetylation:

tail -n 1 \
SOX8_tss_cpg_contact_values.bedpe > high_SOX8_tss_cpg_contact_values.bedpe

Visualize lowest and highest acetylation pairs

Now we can plug these bedpe files into the --bedpe option of plot_contacts to visualize the pairs with the lowest and highest acetylation values in our case vs control analysis.

plot_contacts.sh \
  --sample1 $control \
  --sample2 $treatment \
  --range chr16:790000:1260000:SOX8_TAD \
  --resolution 5000 \
  --genome hg38 \
  --output_name SOX8_delta_low.pdf \
  --norm aqua \
  --max_cap 2 \
  --annotations_default TRUE \
  --annotations_custom SOX8_tad.bed \
  --annotations_custom_color 657c78 \
  --bedpe low_SOX8_tss_cpg_contact_values.bedpe \
  --bedpe_color 424242

sox8_delta_low

plot_contacts.sh \
  --sample1 $control \
  --sample2 $treatment \
  --range chr16:790000:1260000:SOX8_TAD \
  --resolution 5000 \
  --genome hg38 \
  --output_name SOX8_delta_high.pdf \
  --norm aqua \
  --max_cap 2 \
  --annotations_default TRUE \
  --annotations_custom SOX8_tad.bed \
  --annotations_custom_color 657c78 \
  --bedpe high_SOX8_tss_cpg_contact_values.bedpe \
  --bedpe_color 424242

sox8_delta_high