ITS sequences were treated similarly as CO1 sequences. These sequences provide an example for non-coding DNA barcodes (which are difficult to analyze using conventional alignment-based methods). We also assembled all the fungal ITS sequences that have been generated from representative species of fungi for reconstructing fungal tree of life (AFTOL). The answer contains the species name and the sequences of 50 closest matches to the query sorted by their levels of character (word) similarity to the query, with the differences in sequence shaded (see below). Using a conventional desktop computer as our hardware, a 650 base of 5' region of cytochrome c oxidase I gene (CO1-barcode) as query, and all the ~15,000 CO1 sequences in GenBank as our database, the search usually takes 1-2 seconds on a typical high speed Internet connection. We have used the freely available Google Desktop Search (GDS) engine for searching the sequences broken to words (but it is also possible to use the commercially available Google search appliances or any other search engine for this purpose). The user submits a query sequence and the program filters out gaps and breaks the sequence into words that will be piped to a conventional search engine. ![]() Our user interface is composed of a simple "one box" search window. We also assembled all the fungal Internal Transcribed Spacer (ITS) sequences that have been generated from representative species of fungi for reconstructing fungal tree of life (AFTOL). We gathered all the cytochrome c oxidase 1 (CO1, cox1) sequences identified by the keyword BARCODE in GenBank and compiled them in a database broken into words. Here we provide a brief overview of this approach and two implementations using DNA barcoding data as an example. Since both the query sequence and the library of sequences have been separated into short "words", we can exploit a variety of custom-built and existing word search algorithms, such as Google, to perform these searches. This set of characters is then compared to a library of known DNA sequences (DNA barcodes) that have, themselves, been subdivided in a similar way. Essentially, we convert the DNA sequence into a series of "characters" that can be used to create dichotomous keys for identification. ![]() In order to facilitate the use of a search engine such as Google on sequence data, we developed a character-based algorithm for DNA sequences, similar to the method recently employed by. We have utilized this approach for sequence searches involving DNA barcodes, which are short genomic regions used in biodiversity, ecologic, and taxonomic studies for species-level identification. ![]() Here we present an approach that utilizes the capabilities of conventional web-based search engines such as Google for exploring sequence and related information across multiple data sources. However, new web-based technologies can significantly increase the possibilities for sharing and using sequence data in different contexts. Bioinformatics platforms such as those of National Center for Biotechnology Information (NCBI) provide suites of sequence search and analysis tools. The post genomic era presents us with an ever increasing amount of DNA sequence and sequence-related data.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |