TY - JOUR TI - Towards a systematic census of bacterial quorum sensing genes in public databases AU - Juhász, János AU - Kistóth, Éva Mercédesz AU - Nagy, Zoltán AU - Szolgay, Péter AU - Pongor, Sándor AU - Ligeti, Balázs T2 - Jedlik Labor Rep AB - The growing body of sequence data offers new possibilities to find new bacterial quorum sensing (QS) genes. Here we outline a method that allows one to extract sequence as well as chromosomal position patterns from known QS genes that can be used to find similar gene arrangements in unannotated sequence data. Quorum sensing signaling is an autocrine signaling mechanism present in various unicellular organisms, mainly in bacteria. This mechanism is based on an incoherent feed forward network that, by definition consists of a positive feedback and negative feedback control loop. The heart of this mechanism is the positive feedback or autoinducer loop based on an enzymatic production of a signal molecule that binds to a sensor-receptor molecule that upregulates the production of the signal as well as the expression of various other genes. The ?trick? of this mechanism is that the signal molecule can leave from as well as return to the cell either by active or by passive transport. The signal molecules outside the cell can influence the metabolism of other cells, so there will be a de facto communication between cells. If signal concentration in the extracellular space will be high enough, gene expression in all concerned cells will be upregulated, so the functioning of the cell population will be synchronized. This simple phenomenon allows a cell population to solve problems that individual cells cannot tackle, such as the colonization of a surface or infecting a host organism. On the other hand, the negative feedback loop plays a stabilizing role that does not allow signal production to grow without limits. As an example, luxR genes encode a transcriptional regulator that control acyl homoserine lactone-based quorum sensing (AHL QS) in many Gram negative bacteria. In this system the AHL signal travels through the bacterial cell wall via passive transport. On the bacterial chromosome, luxR genes are usually in the direct vicinity of a luxI gene encoding the AHL signal synthase. Genes involved in stabilization are often in between these two genes or are located next to them. Over 15 operon types were observed in the AHL signaling family alone. Another well studied QS mechanism in Gram positive bacteria is the comQXPA locus in Bacillus subtilis and related bacteria which encodes a QS system consisting of 4 genes. Here the signal is a peptide that is transported across the membrane via active transport. The peptide in the extracellular space is sensed by the extracellular part of a transmembrane receptor ComP. The intracellular part of the receptor is a histidine kinase which will phosphorylate a DNA-binding protein ComA. Once phosphorylated, ComA will bind to the chromosome and upregulate the production of a ComX protein that includes the peptide signal. This protein will then be cleaved by a transmembrane protein ComQ that will pump out the peptide signal into the extracellular space. The autoinducer loop of this system thus consists of 4 proteins and includes active transport, in contrast to the AHL system where the autoinducer loop consists of only two proteins and is based on passive transport. On the other hand, the topologies of the comQXPA genes are quite conserved, minor differences occur only in the overlap of the concerned genes. A preliminary overview of the current literature revealed about 20 further well studied quorum sensing systems. Comprehensive sequence collections were published on the AHL and the comQXPA systems approximately 5 years ago, but the body of available bacterial sequences has grown about 10 fold in the meantime so a survey of new data is an important task. The challenge of such a survey is the variability of the QS systems. Importantly, we can safely detect only the known, and let?s add, well known QS systems. The strategy tries to generalize the logics of our previous surveys, i.e. a QS system is considered as a generalized structure of entities and relationships which constitutes a graph in which protein coding genes are the nodes and intergenic distances are the edges. In order to detect such a gene set within the chromosome we need to apply parsimonious and scalable solutions because the number sequences to be screened is now many millions and the number is exponentially growing. One of the problem complicating the situation are the truncated topologies, i.e. operons where on ore more members are missing. For instance, AHL operons contain only two genes, luxI and luxR but many bacteria contain solo luxR genes, i.e. receptor genes with no signal synthase. Solo luxR genes are even more frequent than complete AHL operons. This can have many reasons. For instance, the survey did not pick up the adjacent luxI homolog either because its sequence is too divergent or because it is further away within the chromosome. Or the luxR gene may control and unknown type of signal synthase (as it was found in a few cases). Or the solo luxR protein responds to an unknown type of a signal. With more complex operons the situation is even more complicated, so we designed a search algorithm that employs a hierarchy of search space reduction steps which is based on a hierarchy of molecular descriptions, namely presence-absence, composition and full structure descriptions. For instance, if we have an operon of four members, we will keep only genomes where at least three of the components are present. From these we concentrate on genomes where the required number of elements are present within a certain distance, a value observed in the known instances of the operon. Finally establish the gene distances and write down the topology found. This is a highly efficient space reduction strategy since the compute/intensive steps are limited to very few cases. An additional challenge is the recent surge in the number of next generation sequencing data obtained on various bacterial systems. Such data, such as those present in the NCBI SRA archive consist of many million reads each, and it is important to know whether or not QS systems are present or are active in them. The collection of QS genes developed within our project will be a useful tool for detecting such genes directly from reads and from metagenomics datasets. The long term goal of this project is develop automated protocols to extract QS genes from genomic data. The output would be the topological description of the QS operons along with a quality indicator characterizing the reliability of the prediction. For the predictions we will use Hidden Markov Models as well as fast sequence compassion programs (Bowtie or BWA) ported to multicore architectures such as GPU and FPGA which will be used as search engines of dedicated web servers. DA - 2019/// PY - 2019 VL - Vii IS - 1 SP - 72 EP - 74 J2 - Jedlik Laboratories Reports SN - 2064-3942 UR - https://m2.mtmt.hu/api/publication/30762833 ER -