UniBind: maps of high-confidence direct TF-DNA interactions across species
Description
We provide here the track hub that corresponds to the map of permissive direct TF-DNA
interactions (aka TFBSs) stored in the UniBind 2021
database. In the track hub, users can find the following two tracks:
- UniBind 2021 robust TFBSs
- UniBind 2021 robust cis-regulatory modules (CRMs): the CRMs computed with the set of robust TFBSs in UniBind 2021.
UniBind is a comprehensive map of direct transcription factor (TF) - DNA
interactions across species. These interactions were obtained by uniformly
processing ~10,000 public ChIP-seq data sets using the ChIP-eat software. The
uniform processing, up to ChIP-seq peaks calling was performed by ReMap and GTRD and the
entire collection of ChIP-seq peaks is also available in their respective websites. An entropy-based algorithm was used to automatically
delineate an enrichment zone containing direct TF-DNA interactions, supported
by both strong computational evidence and strong experimental evidence. Moreover, we applied a quality control step to each set of TF-DNA interactions to identify high-quality transcription factor binding sites (TFBSs), yielding two different collections of TFBSs:
-
Permissive TFBSs: a collection containing all TFBSs available in UniBind.
-
Robust TFBSs: a collection containing only those TFBSs that passed the quality control. More details on the quality control metrics and the selected thresholds are described in our UniBind 2021 publication.
The
UniBind database hosts the complete set of TFBS predictions, as well as the prediction model itself, and
cis-regulatory modules (CRMs) derived from these direct TF-DNA interactions. All the
data is publicly available. For further details, please refer to the
associated publications:
Individual BED files for specific TFs or datasets can be found and
downloaded on the UniBind website at http://unibind2021.uio.no.
Display Conventions and Configuration
-
Each transcription factor follow a specific RGB color.
- A set of TFBSs derived from a specific ChIP-seq experiment with a specific
TF binding profile from JASPAR is
defined with a name following the format
<GEO/ArrayExpress/ENCODE/GTRD identifier>_<cell type/tissue>-<condition>_<TF name>_<JASPAR ID>.<JASPAR version>
Methods
The entire collection of ChIP-seq data sets was uniformly processed in ReMap and GTRD up
to ChIP-seq peak calling. The entire collection of ChIP-seq peaks is also
available in the ReMap and GTRD databases, respectively. These peaks served as input for the ChIP-eat
data processing pipeline. The complete pipeline is designed to uniformly
process ChIP-seq data sets, from raw reads to the identification of direct
TF-DNA binding events, and it was implemented in the ChIP-eat software with
source code freely available at https://bitbucket.org/CBGR/chip-eat/. Only the
ChIP-seq datasets for which a TF binding profile for the targeted TF was
available in JASPAR were used for TFBS predictions. The enrichment zone
containing high confidence direct TF-DNA interactions was automatically defined
for each data set using an entropy-based algorithm. The diagram below
illustrates the processing steps.
Data Availability
Individual BED files for specific TFs or datasets can be found and
downloaded on the UniBind website at http://unibind2021.uio.no.
Reference
If you use UniBind or ChIP-eat in your work, please cite:
R. Riudavets Puig, P. Boddie, A. Khan, J.A. Castro Mondragon, A. Mathelier,
UniBind: maps of high-confidence direct TF-DNA interactions across nine species.
BioRxiv (2020) https://doi.org/10.1101/2020.11.17.384578.
M. Gheorghe, G.K. Sandve, A. Khan, J. Cheneby, B. Ballester, and A. Mathelier,
A map of direct TF-DNA interactions in the human genome.
Nucleic Acids Research (2019) gky1210 https://doi.org/10.1093/nar/gky1210.
Contact
If you have questions or comments, please write to: