Bioinformatics




stack of computer CPUs

The Bioinformatics Group supports the operations of the other five laboratory-based branches of CUGI by providing computational infrastructure, analysis, and public dissemination of data.

CUGI's IT infrastructure includes a 138 processor UNIX cluster for parallel computation, several high-end Solaris computational machines for non-parallel compuatation, Solaris/Windows workstations, web servers, database servers and multiple terabytes of storage and backup space.

CUGI's Bioinformatics Group also consists of a team of professionals skilled in the use of several genomic bioinformatic applications, systems administration, programming/scripting and web development. This group provides genomic bioinformatics services to the research community.

Please see CUGI's other service groups for bioinformatics services related to analysis of data derived from functional genomics and proteomics, physical map construction and shotgun seqence assembly.

Tools

BLAST - Compare your sequences to numerous public sequence databases.
FASTA - Compare your sequences to numerous public sequence databases.
SSR Mining - Look for microsatellites and primers in your sequences.
CAP3 Assembly - Assemble your sequences at varying stringencies.

Services

Please click on each service for further details.

CUGI uses publicly available software applications to perform base-calling and to remove residual vector from sequences. We use our in-house scripts to filter low quality sequences. Typically, sequences with at least 100 base pairs with a phred value of 20 or higher and less than 5% ambiguous bases are considered successful. (Although those paramaters can be adjusted depending on the needs of the project). All other low-quality traces are removed from the dataset. The final sequences contain only the longest contiguous non-vector sequence of bases with low quality bases trimmed off both ends of the sequence. In addition, CUGI will submit high-quality, trimmmed sequences to Genbank upon request. Reports are provided detailing individual sequence failure/success rates, and overall project success.

Sequences can be provided in raw chromatagram file format or as FASTA-formatted files, already base-called with quality values.

CUGI uses NCBI Blast or the FASTA algorithm for finding sequence similarity between nucleotide or protein sequences and nucleotide or protein databases, using CUGI's computational machines or our 138 processor cluster. For easy readability, an in-house script parses and formats the results in an Excel spreadsheet organized by the top 10 matches and the top match per sequence. Raw output is also provided. Customer's can chose the search parameters, and we will provide output in any format requested.

A combination of in-house scripts and publicly available software applications are used to mine SSRs from any set of sequences. Preferably, we can mine SSRs from sequences that were trimmed and filtered for low quality at CUGI with a higher phred quality cutoff. Resulting SSRs are defined as dinucleotides (motifs containing two base pairs), trinucleotides (motifs containing three base pairs), tetranucleotides (motifs containing four base pairs), pentanucleotides (motifs containing five base pairs) and hexanucleotides (motifs containing six base pairs). Typically, dinucleotides with at least five repeats, trinucleotides with at least four repeats, and tetra-, penta- and hexanucleotides with at least three repeats are included in the result set (although those parameters can be adjusted as needed for the project). Forward and reverse primers (and alternate primers) for SSRs are also generated. Primers are mostly generated for SSRs from sequences that have a GC content between 40% and 60% with at least 20 base pairs of sequence on either side of the SSR. Results are provided in Excel spreadsheet.

CUGI uses in-house scripts and the publicly available CAP3 software applications to assembly ESTs into consensus contigs. Library files of contigs and singlets as well as sequence alignments (in the form of raw output from cap3) are provided to the customer.



Bioinformatics Quality Guarantee

CUGI guarantees correct analysis in accordance with requested parameters, but cannot guarantee a particular outcome of any analysis.

Shipping & Delivery Costs

All bioinformatics services are deliverable at no charge by FTP download through CUGI's FTP server. However, customers may request that results be stored on digital media such as CD, DVD, or USB Hard Drive and shipped. Customers are invoiced, on a cost-recovery basis only, for the price of the media and shipping.

Contact Information

Click to send an email.

Phone: 1 (864) 656-4292

Fax: 1 (864) 656-4293