* indicates subscription-based resources (off-campus will prompt for credentials)
START HERE: HGNC - Hugo Gene Nomenclature Committee - make sure you are using the most current symbol and name for human genes. From the symbol report page, then use the Genbank and other database links.
ArrayExpress - public archive for transcriptomics data, which is aimed at storing MIAME - and MINSEQE - compliant data in accordance with FGED recommendations. The ArrayExpressWarehouse stores gene-indexed expression profiles from a curated subset of experiments in the archive.
BioCyc - a collection of 1763 Pathway/Genome Databases. Each database in the BioCyc collection describes the genome and metabolic pathways of a single organism.
BRENDA - the main collection of enzyme functional data available to the scientific community.
CMR - The Comprehensive Microbial Resource (CMR) is a website used to display information on all of the publicly available, complete prokaryotic genomes. Common data types across all genomes in the CMR make searches more meaningful and cross genome analysis highlight differences and similarities between the genomes.
dbGaP - The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the results of studies that have investigated the interaction of genotype and phenotype. Such studies include genome-wide association studies, medical sequencing, molecular diagnostic assays, as well as association between genotype and non-clinical traits. dbGaP provides two levels of access - open and controlled - in order to allow broad release of non-sensitive data, while providing oversight and investigator accountability for sensitive data sets involving personal health information. Summaries of studies and the contents of measured variables as well as original study document text are generally available to the public, while access to individual-level data including phenotypic data tables and genotypes require varying levels of authorization.
Ensembl Genome - a joint project between the EMBL-EBI and the Wellcome Trust Sanger Institute that aims at developing a system that maintains automatic annotation of large eukaryotic genomes. Access to all the software and data is free and without constraints of any kind. The project is primarily funded by theWellcome Trust. It is a comprehensive source of stable annotation with confirmed gene predictions that have been integrated from external data sources. Ensembl annotates known genes and predicts new ones, with functional annotation from InterPro , OMIM and gene families.
GenBank - NIH genetic sequence database, an annotated collection of all publicly available DNA sequences.
Gene - NCBI's database forgene-specific information.
Genome - provides views for a variety of genomes, complete chromosomes, sequence maps with contigs, and integrated genetic and physical maps.
Genome Browser - created by the GenomeBioinformatics Group of University of California at Santa Clara.
GEO - Gene Expression Omnibus is a gene expression/molecular abundance repository supporting MIAME compliant data submissions, and a curated, online resource for gene expression data browsing, query, and retrieval.
KEGG - a complete computer representation of the cell, the organism, and the biosphere, enabling computational prediction of higher-level complexity of cellular processes and organism behaviors from genomic and molecular information.
PRIDE - a database of ESTs and gene expression profiles obtained mainly in the Plant Science Center, RIKEN. PRIDE contains information on gene expression profiles of Zinnia elegans, and will contain that of BY-2 and other organisms such as Lotus japonica and arabidopsis.
SNP - The Single Nucleotide Polymorphismdatabase (dbSNP) is a public-domain archive for a broad collection of simple genetic polymorphisms.
UniProt - comprehensive, high-quality and freely accessible resource of protein sequence and functional information.
Worm Base - database of the model organism Caenorhabditis elegans and relatednematodes.
Bioconductor - an open source and open development software project to provide tools for the analysis and comprehension of genomic data.
Discovery Studio Visualizer - visualize and share molecular information in a clear and consistent way, and in a wide variety of industry-standard formats. You can also create high quality graphics.
ExPASy - The ExPASy (Expert ProteinAnalysis System) proteomics server of the Swiss (SIB) is dedicated to the analysis of protein sequences and structures as well as 2-D PAGE.
FirstGlance - view the 3D structures of proteins, DNA, RNA, and their complexes. FirstGlance in Jmol can display major structural features of the molecule with one click each. One-click options display secondary structure, amino and carboxy (or 3' and 5') termini, composition (protein, DNA, RNA, ligands, and solvent), the distributions of hydrophobic, polar, and charged amino acids, salt bridges and cation-pi orbital interactions for amino acids. Non-standard residues and missing side chains are flagged automatically.
GeneCards - a searchable, integrated database of human genes that provides concise genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes. Information featured in GeneCards includes orthologies, disease relationships, mutations and SNPs, gene expression, gene function, pathways, protein-protein interactions, related drugs & compounds and direct links to cutting edge research reagents and tools such as antibodies, recombinant proteins, clones, expression assays and RNAi reagents.
GenePattern - combines a powerful scientific workflow platform with more than 100 genomic analysis tools.
GenMAPP - a free computer application designed to visualize gene expression and other genomic data on maps representing biological pathways and groupings of genes. Integrated with GenMAPP are programs to perform a global analysis of gene expression or genomic data in the context of hundreds of pathway MAPPs and thousands of Gene Ontology Terms (MAPPFinder), import lists of genes/proteins to build new MAPPs (MAPPBuilder), and export archives of MAPPs and expression/genomic data to the web.
LOCATE - a curated database that houses data describing the membrane organization and subcellular localization of proteins from the RIKEN FANTOM4 mouse and human protein sequence set. The membrane organization is predicted by the high-throughput, computational pipeline MemO. The subcellular locations were determined by a high-throughput, immunofluorescence-based assay and by manually reviewing peer-reviewed publications.
NCBI Structure - Cn3D ("see in 3d") is a helper application for your web browser that allows you to view3-dimensional structures from NCBI's Entrez retrieval service. Cn3D runs on Windows, Macintosh, and Unix. Cn3D simultaneously displays structure, sequence, and alignment, and now has powerful annotation and alignment editing features.
SAM - An Excel Add-in that can be applied to data from Oligo or cDNA arrays, SNP arrays, protein arrays, etc.; correlates expression data to clinical parameters including treatment, diagnosis categories, survival time, paired (before and after), quantitative (egg. tumor volume) and one-class. Both parametric and non-parametric tests are offered. Correlates expression data with time, to study time trends. The experimental units can fall into one or two classes, or be paired. Automatic imputation of missing data via nearest neighbor algorithm (better, faster in SAM version 2.0) .Adjustable threshold determines number of genes called significant. Uses data permutations to provide estimate of False Discovery Rate for multiple testing. Gene lists in Excel workbook form, easily exportable into TreeView. Cluster or other software.
MeV - Normalized and filtered expression files can be analyzed using TIGR Multi experiment Viewer (MeV). MeVis a versatile microarray data analysis tool, incorporating sophisticated algorithms for clustering, visualization, classification, statistical analysis and biological theme discovery. MeV can handle several input file formats. These include the ".mev" and ".tav" files generated by TIGR Spotfinder and TIGRMIDAS, and also Affymetrix® (".txt") and Genepix® (".gpr") files.
Gene Ontology- provides a controlled vocabulary to describe gene and gene product attributes in any organism.
Literature Searching Tools
*Biological Abstracts - includes citations and some abstracts from over 6500 international life sciences journals. Fields covered include: biology, botany, zoology, biotechnology, and environmental studies.
Google Patents - All patents available through Google Patent Search come from the United States Patent and Trademark Office (USPTO). Google Patent Search covers the entire collection of issued patents and millions of patent application made available by the USPTO-from patents issued in the 1790sthrough those most recently issued in the past few months. To date, the USPTO has made available approximately 7 million patents and over a million patent applications.
Google Scholar - Google Scholar searches peer-reviewed papers, theses, books, abstracts and articles, from academic publishers, professional societies, preprint repositories, universities and other scholarly organizations.
GOPubMed - Your keywords are submitted to PubMed and the resulting abstracts are classified using Gene Ontology and Medical Subject Headings (MeSH). MeSH is a hierarchical vocabulary covering biomedical and health-related topics. Gene Ontology is a hierarchical vocabulary for molecular biology covering cellular components, biological processes and molecular functions.
GraphPad Prism - a combination of basic biostatistics, curve fitting and scientific graphing in one program. Designed for the practical scientist, Prism does not expect you to be a statistician. It guides you through each analysis - giving you as much help as you need - and tracks and organizes your work.
iHOP - a network of concurring genes and proteins extends through the scientific literature touching on phenotypes, pathologies and gene function. iHOP provides this network as a natural way of accessing millions of PubMed abstracts. By using genes and proteins as hyperlinks between sentences and abstracts, the information in PubMed can be converted into one navigable resource, bringing all advantages of the internet to scientific literature research.
NextBio - a life science search engine that enables researchers and clinicians to access and understand the world's life sciences information. NextBio content includes pre-processed data from the public resources such as NCBIGEO (Gene Expression Omnibus), ArrayExpress, SMD (Stanford Microarray Database), and many others. In addition, individual organizations and users contribute data to NextBio for the benefit of the entire scientific community. Users and organizations canals keep their data private and share it with a select group of individuals. NextBio currently supports any type of gene-centric data (gene expression, proteomics, siRNA screens, etc.) for human, mouse, rat, fly, worm and yeast. We are actively working on adding support for monkey, plants and many other organisms.
ResearchGate - ResearchGate is the world's largest professional network for scientists and researchers, with over 1.4 million users. Our online literature search provides researchers with more than 10millionfull texts freely downloadable from our site, the majority sourced from partnerships with open access databases and the remaining having been uploaded by the authors themselves. The extensive database also provides researchers with access to approximately 45million abstracts pulled from 7 of the largest databases (including PubMed, IEEE, and CiteSeer).
*Web of Science - Contains three ISICitation Databases (Sciences, Arts & Humanities, Social Sciences), which together index over 8,000 peer-reviewed journals. Provides bibliographic data, author abstracts, and cited references. Useful for searching databases for articles that cite a known author or work.