Frequently Asked Questions (FAQ)

If your question is not answered here - please see our detailed help page which includes PowerPoint tutorials and a detailed guide to our curation process. If you still can't find what you are looking for contact us.

Please click on a question to open/close its answer.

I would like to obtain a list of genes that have been annotated by InnateDB as having a role in innate immunity - where can I get this?

This list (human and mouse genes only) can be downloaded here. There are currently 950+ human and 500+ mouse annotated by InnateDB as having a role in innate immunity.

Are all 377,000 molecular interactions in InnateDB relevant to innate immunity?

No. InnateDB has annotated approximately 25,000 interactions of relevance to innate immunity. InnateDB also goes beyond innate immunity and includes imported interactions from BIOGRID, BIND, DIP, INTACT & MINT (see Resources - Other Interaction Databases). If you only want innate immunity relevant interactions you can download them here or by using our search [check the box: Only return InnateDB-curated interactions].

How does InnateDB designate basic cell location (nucleus, membrane, etc.) to each gene in Cerebral visualisations?

This is how the "Cerebral Localisation" is assigned in InnateDB:

Look-up the Gene Ontology Cellular Compartment terms associated with each gene.
Using a manually created internal map of GO Cellular Compartment terms to Cerebral Localisation terms - map GO to Cerebral Localisations. This mapping file is attached and you are welcome to use it. Please note, however, it was created in a very ad hoc way back in 2007 and has not really been updated since then - so it is possible that there are some terms which are not mapped; and it is possible there are some terms which are not mapped correctly.
You are still left with the issue that many proteins have multiple possible sub-cellular localisations. We have made the decision in InnateDB not to duplicate such nodes throughout different localisations but rather to choose one representative node, which in our opinion is the most information rich.

We have a sub-routine in InnateDB that does this on the fly. E.g. Nuclear, extracellular and membrane localizations will take precedence over cytoplasm if there are multiple possible localisations - as we believe that knowing that a protein is localised in these compartments is more information rich (i.e. tells you something about what the protein might do). The take home message is that Cerebral localisations are primarily generated for visualisation purposes only and should only be used as a guide.

Can our resource link to InnateDB?

Yes, please do. One can link to gene cards using an InnateDB or Ensembl Gene ID specified in the following URL format:

http://www.innatedb.com/getGeneCard.do?id=ENSG00000136560
http://www.innatedb.com/getGeneCard.do?id=IDBG-73552

Please contact us if you want to link to interaction cards.

How do I cite InnateDB?

If you use InnateDB we would be very grateful if you could please cite

Lynn et al, InnateDB: facilitating systems-level analyses of the mammalian innate immune response. Molecular Systems Biology 2008; 4:218.
Breuer et al, InnateDB: systems biology of innate immunity and beyond - recent updates and continuing curation. Nucl. Acids Res. 2013; 41 (D1).
Lynn et al, Curating the Innate Immunity Interactome. BMC Systems Biology 2010; 4:117.

What is the InnateDB pathway analysis tool and why is it useful?

InnateDB Pathway Analysis enables users to determine which biological pathways are significantly over-represented (represented more than expected by chance) in a given gene/protein list.

InnateDB incorporates pathway annotation (all pathways not just immune relevant ones) from major public databases including KEGG, Reactome, NetPath, INOH and PID (see Resources - Other Pathway Databases) and is thus one of the most comprehensive sources of pathways available.

To do a Pathway Analysis, first upload a tab-delimited text file or Excel spreadsheet (.xls files only) of gene/protein identifiers (human, mouse or cow only) and any associated quantitative data (e.g. gene expression data fold-changes and p-values) from up to 10 conditions/time-points. Please see our help page if you are unsure how to do this and to see what IDs are accepted.

What is special about the InnateDB Gene Ontology Analysis tool?

The Gene Ontology Consortium is a major initiative to provide a controlled hierarchical vocabulary of terms for describing genes and their encoded products in terms of their Molecular functions; Biological processes and Cellular compartments.

A Gene Ontology (GO) over-representation analysis (ORA) examines a gene/protein list for the occurrence of GO annotation terms which are more prevalent in the dataset than expected by chance. Annotations that occur more frequently than expected in a gene list can be identified, and may point towards a biological process or pathway that is being differentially regulated in the condition of interest.

In InnateDB, GO annotation is supplemented by enhanced annotation which genes have a published role in innate immunity. For further information on this enhanced innate immunity annotation please click here.

To do a GO Analysis, first upload a tab-delimited text file or Excel spreadsheet (.xls files only) of gene/protein identifiers (human, mouse or cow only) and any associated quantitative data (e.g. gene expression data fold-changes and p-values) from up to 10 conditions/time-points. Please see our help page if you are unsure how to do this and to see what IDs are accepted.

Transcription Factor Binding Site Analysis in InnateDB - where does the data come from?

InnateDB incorporates transcription factor binding site (TFBS) data from the CisRED database for human and mouse genes. Users can analyse a list of genes to determine whether particular TFBSs are over-represented in their dataset based on this data.

To do a TFBS Analysis, first upload a tab-delimited text file or Excel spreadsheet (.xls files only) of gene/protein identifiers (human or mouse only). Please see our help page if you are unsure how to do this and to see what IDs are accepted.

I am trying to use InnateDB to analyze gene expression data for a large number of genes. However, I only have the gene symbols, and I do not have any gene IDs available i.e. from Ensembl, RefSeq, or InnateDB. Is there an easy way to convert all these gene symbols into gene IDs without having to manually enter each gene symbol into InnateDB?

There are a number of gene id conversion tools online where you can convert these gene symbols into an ID (ensembl, refseq, entrez gene, uniprot) that is suitable for InnateDB. Here are two we have used before:
http://idconverter.bioinfo.cnio.es/IDconverter.php
http://david.abcc.ncifcrf.gov/conversion.jsp
The reason that we don't accept gene symbols is that they are not very stable ids, changing frequently, and also it can be easy to get mixed up between similarly-named genes.

Can InnateDB be used to analyze RNAseq data?

Yes.

Although InnateDB was designed with the analysis of microarray data in mind, ourselves and other users frequently use InnateDB for the analysis of other types of omics data including RNAseq data. Ideally, for analysis in InnateDB, RNAseq data should be in the form of absolute fold-changes in gene expression (+2 = 2 fold increase in expression; -2 = 2 fold decrease in expression) which can easily be converted from the log2 fold changes that are produced by many packages for the analysis of RNAseq data e.g. EdgeR. Theoretically, RPKM (or similar) values for RNAseq data could be uploaded to InnateDB, but in general these are not recommended as one should first determine whether there is a statistically significant change in gene expression across multiple biological replicates. RNAseq does NOT negate the need for biological replicates in the design of gene expression experiments!

One potential pitfall in the analysis of RNAseq data in InnateDB (and in many other tools that were designed primarily for microarray data) is in the over-representation analysis of Gene Ontology categories. Before doing a Gene Ontology analysis of RNAseq data, we highly recommend that you read this paper by Young et al, Genome Biology, 2010. What Young et al show is that longer transcripts have more statistical power to detect differential expression between samples and this effect cannot be be removed by normalization or re-scaling (e.g. RPKM). Young et al also showed that the length distribution of the genes in GO categories varied widely, with some categories containing an over-representation of long genes and some with relatively short genes. Therefore, any GO category containing a preponderance of long genes will be more likely to show up as over-represented than a category with genes of average length. The authors introduce GOseq, an R package that corrects for this bias in Gene Ontology analysis of RNAseq data.

It is also possible that a similar bias would affect the Pathway Analysis of RNAseq data, but in practice we have not found this to be the case and InnateDB pathway analysis tool should be suitable for the analysis of RNAseq data.

I would like to know how to export an image from the Cerebral network?

To export an image from Cerebral/Cytoscape go to the plugins menu and then choose "Export Cerebral View".

What is the difference between the Pathway ORA P Value and the Pathway ORA P Value Corrected?

The Pathway ORA P Value = The P value generated using the Hypergeometric Distribution test of whether a pathway is statistically more over-represented in the uploaded dataset than expected by chance prior to correction for multiple testing.

The Pathway ORA P Value Corrected = Is this P value corrected for multiple testing. The default correction is the Benjamini and Hochberg correction (you can also choose to use Bonfferoni).

I have some questions regarding the interactions from the public interaction sources which are integrated into InnateDB.
Are all these interactions checked to remove erroneous entries? Are these interactions filtered to include only those related to innate immunity? How is the downloadable data you provide different from the data available from the source directly? For example, what are the differences between the MINT file on your site versus the file available directly from MINT?

There are >352,782 interactions integrated from public interaction databases such as MINT etc, and although we have a very detailed computational extraction and loading pipeline it is of course impossible to check the 352,782 interactions for errors! If one of our curators happens across an interaction they consider incorrect in the public databases then they will remove them.

Interactions are not filtered - all human and mouse interactions which can be mapped via cross-reference ids to our database are extracted. By extracting all these interactions for the external databases we are able to provide them in a common format. On InnateDB website we group these interactions from multiple sources into non-redundant groups.

Differences between our files and XML files on MINT website: We only extract out human and mouse. Depending on the ability to cross-reference some may be missing. We may format interactions differently.

I have problems using IE as browser for InnateDB?

While every effort is made for InnateDB to work with recent versions of different internet browsers - sometimes bugs do occur - InnateDB has been tested to ensure it works with Google Chrome , Firefox and Safari.