Abstract

The identification of potential regulatory motifs in new sequence data is increasingly important for experimental design. Those motifs are commonly located by matches to IUPAC strings derived from consensus sequences. Although this method is simple and widely used, a major drawback of IUPAC strings is that they necessarily remove much of the information originally present in the set of sequences. Nucleotide distribution matrices retain most of the information and are thus better suited to evaluate new potential sites. However, sufficiently large libraries of pre-compiled matrices are a prerequisite for practical application of any matrix-based approach and are just beginning to emerge. Here we present a set of tools for molecular biologists that allows generation of new matrices and detection of potential sequence matches by automatic searches with a library of pre-compiled matrices. We also supply a large library (> 200) of transcription factor binding site matrices that has been compiled on the basis of published matrices as well as entries from the TRANSFAC database, with emphasis on sequences with experimentally verified binding capacity. Our search method includes position weighting of the matrices based on the information content of individual positions and calculates a relative matrix similarity. We show several examples suggesting that this matrix similarity is useful in estimating the functional potential of matrix matches and thus provides a valuable basis for designing appropriate experiments.

Keywords

WeightingSet (abstract data type)Sequence (biology)BiologyComputational biologyConsensus sequenceMatrix (chemical analysis)Basis (linear algebra)Similarity (geometry)Computer scienceIdentification (biology)Data miningBase sequenceGeneticsArtificial intelligenceDNAMathematics

Affiliated Institutions

Related Publications

NetAffx: Affymetrix probesets and annotations

NetAffx (http://www.affymetrix.com) details and annotates probesets on Affymetrix GeneChip microarrays. These annotations include (i) static information specific to the probeset...

2003 Nucleic Acids Research 486 citations

Publication Info

Year
1995
Type
article
Volume
23
Issue
23
Pages
4878-4884
Citations
2602
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

2602
OpenAlex

Cite This

Kerstin Quandt, Kornelie Frech, Holger Karas et al. (1995). Matlnd and Matlnspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Research , 23 (23) , 4878-4884. https://doi.org/10.1093/nar/23.23.4878

Identifiers

DOI
10.1093/nar/23.23.4878