Bioinformatics
The Bioinformatics Group supports the operations of the other
five laboratory-based branches of CUGI by providing computational infrastructure,
analysis, and public dissemination of data.
CUGI's IT infrastructure includes a 138 processor UNIX cluster for parallel
computation, several high-end Solaris computational machines for non-parallel
compuatation, Solaris/Windows workstations, web servers, database servers and
multiple terabytes of storage and backup space.
CUGI's Bioinformatics Group also consists of a team of professionals skilled
in the use of several genomic bioinformatic applications, systems administration,
programming/scripting and web development. This group provides genomic bioinformatics
services to the research community.
Please see CUGI's other service groups for bioinformatics services related to
analysis of data derived from functional genomics and proteomics, physical
map construction and shotgun seqence assembly.
Tools
BLAST
- Compare your sequences to numerous public sequence databases.
FASTA
- Compare your sequences to numerous public sequence databases.
SSR Mining
- Look for microsatellites and primers in your sequences.
CAP3 Assembly
- Assemble your sequences at varying stringencies.
Services
Please click on each service for further details.
CUGI uses publicly available software applications to perform base-calling
and to remove residual vector from sequences. We use our in-house scripts
to filter low quality sequences. Typically, sequences with at least 100
base pairs with a phred value of 20 or higher and less than 5% ambiguous
bases are considered successful. (Although those paramaters can be adjusted
depending on the needs of the project). All other low-quality traces are
removed from the dataset. The final sequences contain only the longest
contiguous non-vector sequence of bases with low quality bases trimmed
off both ends of the sequence. In addition, CUGI will submit high-quality,
trimmmed sequences to Genbank upon request. Reports are provided detailing
individual sequence failure/success rates, and overall project success.
Sequences can be provided in raw chromatagram file format or as FASTA-formatted files,
already base-called with quality values.
CUGI uses NCBI Blast or the FASTA algorithm for finding sequence similarity between
nucleotide or protein sequences and nucleotide or protein databases, using CUGI's
computational machines or our 138 processor cluster. For easy readability, an
in-house script parses and formats the results in an Excel spreadsheet organized by the
top 10 matches and the top match per sequence. Raw output is also provided. Customer's
can chose the search parameters, and we will provide output in any format requested.
A combination of in-house scripts and publicly available software applications
are used to mine SSRs from any set of sequences. Preferably, we can mine SSRs
from sequences that were trimmed and filtered for low quality at CUGI with a
higher phred quality cutoff. Resulting SSRs are defined as dinucleotides (motifs
containing two base pairs), trinucleotides (motifs containing three base pairs),
tetranucleotides (motifs containing four base pairs), pentanucleotides (motifs
containing five base pairs) and hexanucleotides (motifs containing six base pairs).
Typically, dinucleotides with at least five repeats, trinucleotides with at least
four repeats, and tetra-, penta- and hexanucleotides with at least three repeats
are included in the result set (although those parameters can be adjusted as needed
for the project). Forward and reverse primers (and alternate primers) for SSRs are
also generated. Primers are mostly generated for SSRs from sequences that have a GC
content between 40% and 60% with at least 20 base pairs of sequence on either side
of the SSR. Results are provided in Excel spreadsheet.
CUGI uses in-house scripts and the publicly available CAP3 software applications
to assembly ESTs into consensus contigs. Library files of contigs and singlets
as well as sequence alignments (in the form of raw output from cap3) are provided to the customer.
Bioinformatics Quality Guarantee
CUGI guarantees correct analysis in accordance with requested
parameters, but cannot guarantee a particular outcome of any analysis.
Shipping & Delivery Costs
All bioinformatics services are deliverable at no charge by FTP download
through CUGI's FTP server. However, customers may request that results
be stored on digital media such as CD, DVD, or USB Hard Drive and shipped.
Customers are invoiced, on a cost-recovery basis only, for the price of
the media and shipping.
Contact Information