What is UniBind?
UniBind is a comprehensive map of direct transcription factor (TF) – DNA interactions in the genome for nine different species. These interactions were obtained from thousands of ChIP-seq data sets obtained from ReMap and GTRD. From these resources, we obtained MACS2-called peaks. Next, these genomic regions were analysed with DAMO-optimized Position Weight Matrices (PWMs) to predict direct TF-DNA interactions. An entropy-based algorithm was used on the best subsequence per peak to automatically delineate an enrichment zone containing direct TF – DNA interactions, supported by both strong computational evidence and strong experimental evidence. Finally, all peaks that did not fall within the enrichment zone were rescanned to detect any subsequence different from the best one that would fall within the enrichment zone.
Datasets in UniBind are separated into robust and permissive collections based on two quality control metrics. First, we filtered out datasets where the DAMO-optimized TF binding motif was not similar to the expected canonical motif. Second, we filtered out datasets where TFBSs are not enriched around their summits. Datasets satisfying both criteria were classified as part as the robust collection, while the rest were classified as part of the permissive collection.
The UniBind database hosts the complete set of TFBS predictions, as well as the prediction model itself and cis-regulatory modules derived from these direct TF – DNA interactions. All the data is publicly available. For further details, please refer to the associated publications:
R. Riudavets Puig, P. Boddie, A. Khan, J.A. Castro Mondragon, A. Mathelier,
UniBind: maps of high-confidence direct TF-DNA interactions across nine species. BMC Genomics 22, 482 (2021). https://doi.org/10.1186/s12864-021-07760-6.
M. Gheorghe, G.K. Sandve, A. Khan, J. Cheneby, B. Ballester, and A. Mathelier,
A map of direct TF-DNA interactions in the human genome. Nucleic Acids Research (2019) gky1210 https://doi.org/10.1093/nar/gky1210.
The data can be searched using the case insensitive search option available on the homepage. The database can be searched for TF name, cell/tissue type, species, data source and collection using the ‘Advanced Options’, available on the homepage. Search results are presented in a responsive and paginated table along with metadata information, which can be clicked to view the detail information and download TFBSs, summary plots, and ChIP-seq peaks. All the metadata in the responsive tables can be downloaded as CSV files.
The UniBind web interface was developed in Python using the model-view-controller framework Django. It uses SQLite to store TFBS metadata and Bootstrap as the frontend template engine. The source code is available at https://bitbucket.org/CBGR/unibind.
Cell lines & Tissues
Source code availability
The data processing pipeline
The source code for UniBind portal
The source code for the UniBind enrichment tool