1.1. GEO. Gene Expression Omnibus http://www.ncbi.nlm.nih. gov/geo/ GEO[1], [2] is an online repository of microarray (as well as other high throughput tech-nologies such as SAGE and MPSS) experimental data. The aim is for data whichis submitted to be MIAME (Minimum information about a microarray experimenthttp://www.mged.org/ Workgroups/ MIAME/ miame.html) [3] compliant. It is notclear that this is in fact the case, as yet, as GEO is not as strict as they could be inthis regard. Submission of data is rather straightforward, so many people have sub-mitted experiments. (To date 155, 807 individual microarrays have been recordedin GEO.) One can submit records using MINiML (MIAME Notation in MarkupLanguage); this is useful for batch uploads.
There are three types of records within GEO:
(1) GPL (platform) - Each GPL file records the specifications of one single array
GPL1261 is Affymetrix GeneChip Mouse Genome 430 2.0
Array, GPL 4653 is Nimblegen Staphylococcus aureus 2.6K oligo array. Thefile contains information about the company, a link to their website, a littleblurb about the array, release dates of the array, and various other pieces ofinformation of an administrative nature. Most importantly, the file containsa table describing each of the features on the array. What appears in thetable is, of course, platform dependent, but will contain unique feature IDsand some information regarding where the sequence corresponding to eachfeature comes from. This might be a GenBank (or other database) accessionnumber; it might be the probe sequence; it might be a gene ID: this dependson the platform.
There is also a list of all the samples (GSM) and series (GSE) using this
(2) GSE (series) - Each GSE file is a record of an experiment, e.g GSE478 con-
cerns alveoli loss during caloric restriction, and GSE1871 concerns “mouselung samples treated with Simvastatin and LPS and corresponded [sic] con-trols”. Each GSE file contains some notes on the experiment - usually whatthe general aims of the experiment were, what some of the experimental con-ditions were, the authors, links to websites, perhaps a PubMed link, etc. Theplatforms used for the experiment are specified (note that there need nothave been only one platform). Finally, there is the list of samples (GSM)constituting this experiment. There may also be supplementary files, such asan archived file of raw data from the scanner.
(3) GSM (sample) - Each GSM file is the record of one single array, e.g. GSM24056
is the Universal Mouse Reference RNA from Strategene, used in the experi-ment GSE1435, “Microarray Based Comparison of two Amplification Meth-ods For Nanogram Amounts of Total RNA”. The file contains some infor-mation which duplicates that found in the GSE file: authors, PubMed links,contacts, experimental conditions, etc. It is specified which platform (GPL)the array is, and to which series (possibly more than one) the experimentbelongs. Finally, there is the expression table. This table contains, in a plat-form appropriate manner, a list of probe (expression) values. That is to say,for each feature on the array, the table contains an intensity, a measurementof the expression at that feature. These are processed data; the researchershave carried out whatever normalization and processing they felt was ap-propriate. This processing may reflect the data which appears on the otherarrays in the researchers’ experiment; for example, Affymetrix arrays may benormalized using RMA [4], taking into account variation across the arraysin an experiment. Occasionally raw data from the scanner is provided as asupplementary file. (For example, in the case of Affymetrix arrays, the .CELfile is often provided as a supplementary.
We have chosen many GSEs and their associated GSMs for use in out project. We
require, as will be explained below, that the GSMs have .CEL files available fromGEO.
1.2. ArrayExpress. http://www.ebi.ac.uk/arrayexpress/)ArrayExpress [5] is much stricter about MIAME compliance than GEO. One can sub-mit to ArrayExpress using MAGE (MicroArray and Gene Expression http://www. mged.org/Workgroups/ MAGE/mage.html [6]), which is a markup language muchlike HTML. This makes it easier for ArrayExpress to ensure that what is being sub-mitted meets the MIAME requirements. This strictness, however, has discouragedsome from submitting, and thus there are fewer records in ArrayExpress than inGEO. It is also perhaps not such a user friendly system as GEO.
1.3. Graphviz. The graphs which we draw within StarNet are actually drawn usinga publicly available package called Graphviz (www.graphviz.org). Graphviz, andits variants, reads a text file, in a specific format, into its engine, and outputs a giffile. It is through Graphviz that that graphs are given the visual properties specifiedby the user.
1.4. Perl. Much of the code described below was written in Perl (http://www.perl.org). Perl is a high level programming language designed specifically for string handlingand manipulation. It is thus suited to the simple sorting and editing tasks requiredin manipulation of files from GEO and other sources, and the files which we buildout of those files.
1.5. Octave. Octave (http://www.gnu.org/software/octave/) is a MatLab like tool,and was used in the computation of correlation coefficients.
1.6. R. R (http://www.r-project.org/) is a freely available statistics package. Sev-eral users in the biology community have written an R package, BioConductor [7],specifically for processing of microarray data. Further, packages called affy , Buffered-Matrix, and AffyExtensions (which has the tool justRMALite) [8], have been devel-oped for use within BioConductor for the normalization of data from Affymetrixchips. These tools were used for the normalization of our microarray data.
1.7. Gene Ontology. GO [9] can be found at (http://www.geneontology.org/). “The Gene Ontology project provides a controlled vocabulary to describe gene andgene product attributes in any organism.” There are three vocabularies: cellularcomponent, biological process and molecular function. (Cellular component: roughER, membrane components, etc. Biological process: alpha-glucoside transport, sig-nal transduction, etc. Molecular function: catalytic activity, binding, etc.) Thevocabularies are each hierarchical: nuclear chromosome is a chromosome, nucleus ispart of a cell, etc.
This is a very strictly controlled vocabulary, and well maintained. As it is not
species dependent, it is very generally applicable. The Mouse Genome Database(http://www.informatics.jax.org/) is a founding member of the consortium, so weknow that many mouse experiments, and corresponding literature, will be well GOannotated.
We use GO annotations in the literature to examine our networks. To see how,
1.8. Entrez. Entrez Gene [10], RefSeq [11] and GeneRIFs were used. We fixed theversion, using that of April 4, 2007.
2.1. Platform Selection. We wanted to choose a single array platform from whichto select experiments. We do not want to combine experiments across arrays asthis adds further complications to the analysis. Different arrays have different genesrepresented, and even those which are common may be represented by differentoligonucleotides. These differences make statistical analyses extremely difficult.
We opted for a well known commercial platform for several reasons. First, an
in house, custom array is most likely not to have many experiments run on it, asit was probably designed with a very specific purpose in mind; it will be hard tofind such an array with even fifty or a hundred experiments. It will also not havethe documentation associated with it that a commercial array will, nor will therebe the same body of literature concerning it.
of deciding how to interpret the data from the arrays. There is also the issue ofconsistency - commercial arrays will tend to be much more consistent in terms ofquality. Affymetrix is one of the oldest array companies, and perhaps the best known.
Next, we wanted an array which covered the whole genome. We did not wish to
restrict our attention to arrays which cover only a subset of the genome; there isa desire not to restrict our attention to genes which have already been identified
as playing a role in cardiac development, or to disregard genes which are presumednot to play a role. We are examining all genes as possible candidates. There areseveral platforms, even outside of the custom arrays, which are very specific in termsof which genes are featured. For example, GPL195 is an array of “genes encodingcytokines, chemokines, their receptors and other related immunoregulatory factors”.
Last, we wanted an array with many hundreds of associated experiments; some
experiments for any given platform will be irrelevant, inappropriate, or unusable ifthere is not enough information available. We wanted there to be enough associatedarray experiments that even after throwing out the inappropriate experiments wehave a large number left from which to choose.
The choice we made, out of perhaps twenty or so arrays which met some of the
above criteria, was, Affymetrix GeneChip Mouse Genome 430 2.0 Array. This is awhole genome array, from a reputable manufacturer, with 3255 associated GSM files,and 269 associated GSE files, and more than 20 associated experiments recorded inArrayExpress. The platform is GPL1261 and A-AFFY-45.
2.2. Experiment Selection. Given the choice of platform, we must choose a largenumber of experiments carried out on this platform.
The idea was to choose experiments which cover a wide variety of experimental
conditions. Here the aim is to try and counteract “apparent” coregulation betweengenes. That is to say, in any given experiment two genes may appear to be coreg-ulated, when they are, in fact, not. If one examines a wide variety of experiments,however, the illusory relationship will disappear. It can be countered that certaingenes are only co-regulated in certain tissues. We are interested in genes which areco-regulated in the milieu of cardiac development and function. We thus ensure thata large subset of the experiments chosen pertain to cardiac development/function. We build networks using the entire contingent of arrays, or just the cardiac cohort,and compare the derived networks. This allows a further check on our work, theability to produce further evidence for the relationships which we have found, andthe possibility of identifying new relationships which we would not have otherwisenoticed.
We attempted to find experiments where the samples were not pooled (each mi-
croarray is from the tissue of a single individual), and tried to pick experiments wherethe same strain of mice was used.
As mentioned above, we also looked for experiments related to the heart, either
cardiac development in utero, or over the lifetime of the animal, or under the effectsof drugs or knockouts. Several time series were also included. Additionally, weinclude some early developmental experiments (embryonic and pre-implantation) inthe cardiac cohort. We chose 2,145 experiments, 239 of which are cardiac.
2.3. Preprocessing. Up to this point, all the work has been done by hand. Nowwe start to use R (BioConductor, affy, and AffyExtensions), Perl and Octave code. Sections 2.3 through 2.6 are essentially automated, with some human supervision,and consist of several thousand lines of code. All are run as scripts from a command
line; there is no need for a GUI, as most of what the human user does is sit and watch.
Please note that the procedure described below is specific to Affymetrix chips, and
must be changed entirely if another manufacturer’s platform is to be used.
Affymetrix arrays are different in their design from other oligonucleotide arrays:
the oligonucleotides are shorter (25 base pairs on Affymetrix arrays, as opposed toroughly 50-60 base pairs on many other platforms), and the oligonucleotides are bio-chemically synthesized on the substrate using photolithographic masking, whereasother technologies typically deposit pre-synthesized probes using pins, or use ink-jet technology to synthesize probes on the substrate. On most arrays each gene isinterrogated by one spot on the array containing many copies of one unique oligonu-cleotide corresponding to a transcript. Affymetrix arrays, however, contain 11-20“probe set pairs” corresponding to each transcript. A probe set pair is a pair of 25base pair sequences, corresponding to a portion of the transcript of interest. Theperfect match, PM, member of the pair matches, as its name suggests, aligns per-fectly to the genomic sequence. The mismatch, MM, member of the pair is identicalto the PM, except at the 13th (middle) position. This difference is designed to helpidentify cross-hybridization of inappropriate targets to the PM probe. Each spot onthe array consists of multiple copies of a given PM/MM pair. The probe set pairswithin a given probe set are designed by Affymetrix to give reasonable coverage ofthe 3’ end of the transcript of interest [12], [13]. The expression level of a given geneis computed by considering the signal from each of the probe set pairs within theprobe set corresponding to a gene. There is much discussion in the literature as towhether the MM data should be used or ignored.
Several authors [14], [15], [16] have noted that there are errors in Affymetrix’s
annotations, and even in the composition of probe sets. In an effort to remedy this,Dai and colleagues [14] have provided an alternative explanation of the contents ofAffymetrix chips. They reanalyze the probe set pairs, using a different techniquethan Affymetrix, as well as the most up to date public databases. Their methodadditionally removes some of the ambiguity of Affymetrix’s choices: according toAffymetrix annotation, many genes are represented by several probe sets. This isa problem as it has been suggested that it is not appropriate to determine a gene’sexpression level by averaging over the probe sets corresponding to the gene [15]. Itis also difficult to determine from Affymetrix’s annotation what gene exactly eachfeature is meant to represent. Dai et al. have one probe set per gene, and it is clearfrom their annotation which gene it is that a given probe set is meant to represent.
Each Affymetrix chip has an associated .cdf file, which contains a mapping indi-
cating which probe pairs comprise each of the probe sets. Each array experimenthas an associated .cel file, indicating detection level of each of the probe set pairson the chip. In analyzing an experiment, the data from the .cel file is combinedwith that from the .cdf file to assign an expression level to each probe set. Daiet al. have adapted their methods to provide a .cdf file which identifies probe sets
by Entrez Gene (http://www.ncbi.nlm.nih. gov/entrez/ query.fcgi?db=gene) num-bers. We have chosen to use this description of the contents of the array. Bothdifficulties mentioned in the previous paragraph are resolved by this choice.
is also worth noting that Dai and colleagues update their annotations on a fairlyregular basis, essentially keeping in step with UniGene (http://www.ncbi.nlm.nih. gov/entrez/query.fcgi?db=unigene) builds.
In computing expression levels from arrays the data must be normalized: different
arrays may be processed at different times by different people, ambient conditionsmay be different, and thus the measured signal intensities may vary for technicalreasons. Normalization attempts to remedy these affects through the application ofvarious statistical techniques to the raw data. Many normalization methods havebeen proposed. We have chosen the following scheme for data normalization: RMA(Robust Multichip Analysis) [4], quantile normalization, PM only adjustment, andTukey median polish. This decision was made based on recommendations found inthe literature and by examining current trends for normalizing microarray data, [17],[18], [19], [20], [21].
Dai and colleagues provide the .cdf files, and the code in affy, needed to use their
alternate mapping of Affymetrix’s chips for data normalization. The code is relativelysimple and easy to use. The output from the affy script is not quite in a form which isusable by Octave, and thus a small Perl script is run to transform the files. The Perlscript also removes the 65 quality control probes which Affymetrix has on its chips,and which Dai and colleagues leave unchanged, but which we do not wish to consider.
2.4. Correlation Coefficients. An Octave script reads the files produced by affyand formatted byPerl, and computes the Pearson correlation coefficients between allpairs of genes. All of the correlation coefficients are output into a single file. Eachline contains:
(1) the two Entrez gene IDs,(2) the correlation coefficient,(3) the p-value, and(4) the number of array expression values used in the computation.
Note: This last item is vestigial, but may be useful at a later date. The idea isthe following: in certain cases it will be desirable to remove from considerationthe expression value of a given gene on certain arrays. This may be due to databeing viewed as “bad” because of technical experimental problems, or because theexpression value falls below the cutoff established for background signal. In thiscase, the correlation between this gene and the others can be computed based onlyon the arrays which have been deemed to have “good” data. If, in calculating thecorrelation between two genes, only a small subset of the arrays has been used, thiscan be viewed as an indicator of low quality data. As will be seen in Section 2.5,the network building procedure will filter correlations, among other things removingthose where the computation involved too small a number of arrays. For the momentthis is vestigial, as we are not worrying about background, but the functionality is
available for future use. It is useful, as well, in visualize.cgi for creating confidenceintervals for the correlations coefficients.
2.5. Network Preparation. Once the computations above are complete, we arepresented with an overwhelmingly large text file containing the correlations betweenour features. We do several things with this file.
(1) We place the data in the mm heart database. Here it will be easier to manip-
(2) We trim the table: we extract certain subsets of the set of correlations. This is
done using MySQL. These subsets are then placed in files, for use by StarNet.
(3) We turn the text file into a hash file. A hash file is a database files easily
and quickly read by Perl. Each entry in such a file has a key, and associateddata. The keys in this case are gene IDs. The associated data is a list. Thelist contains one entry for each other gene on the array for which there is acorrelation coefficient in the trimmed correlations file: gene ID, correlationcoefficient between the two genes, p-value, and number of arrays used in thecomputation of the correlation coefficient.
(4) We sort the hash: for each key we order the associated list, first by decreasing
correlation coefficient (absolute value), then by ascending p-value, then bydescending number of arrays used in coefficient computation.
2.6. Network Building. At this point we build the networks. We have removed alarge number of correlation coefficients in the trimming above, and the hashes aresorted. At this point, each gene has correlation coefficients, not with all other genes,but only with a selected subset.
There are two types of network which are built: iterated and star. Here is a brief
description of the two network types.
2.6.1. Iterated Networks. These networks are built by looking at all of the genes.
Level 1: First, for each gene, we connect it to that other gene to which it is most
closely correlated. Thus, some of the “dots” have been connected; we have taken allthe disjoint points and linked some of them together, into little networks.
Level 2: Next we think of each of these new little networks as a “gene” and
repeat the above process: traverse each little network, and find the largest correlationcoefficient pointing out of the network. Doing this we join some of our little networks.
Level n: We repeat the process, creating new “levels”, until all the genes are con-
nected into one network, or until we have no more correlation coefficients with whichto build connections.
This is the basic idea, but the user has some choices to make. They can specify
how many levels deep the network should be, or just allow it to be built as deep as itneeds to be. They can also specify to take the n (their choice of n) largest correlationcoefficients out of each gene/subnetwork.
We have to deal with ties, as unlikely as they might be. If you specify that you
are to have n connections out of a gene, and there is a tie between the n-th and(n + 1)-st connections, the extra connection will be added.
2.6.2. Star Networks. This is a simpler class of networks, with more options for theuser. In these networks we are examining only one gene, and we are looking to findall the genes to which it is connected. The user specifies a gene from which to buildthe network. They specify how many levels deep to build, or can let it go as deep asit needs.
There are three methods to build a star network.
(1) Levels - Simply include all correlations from the seed gene. This is the first
level. On the second level, connect these direct neighbours of the seed geneto their direct neighbours. Repeat the procedure. There is also a variant:Levels with Internal Edges, explained below.
(2) Weights - Essentially the same as above, except that we only allow a new
gene to be part of our network if the product of the correlation coefficientsconnecting the seed gene to the new gene is greater than a user specifiedcutoff.
(3) Highest - Again, essentially the same as “Levels”. In this instance, though,
we only allow the n highest correlation coefficients, where n is user specified. There is also a variant: Highest with Internal Edges, explained below.
The flexibility allows us to build many different types of networks, and do com-
parative analysis. The above procedures are implemented in Perl, and the networksare output as text files. These files can be further parsed for use in other programs. The star network functionality is publicly available online, at http://vanburenlab. tamhsc.edu/starnet.html. Users are able to specify a seed node, as well as theother parameters mentioned above, and have the network generated for them. Thisis a powerful visualization tool for dissecting out parts of large networks, and re-searchers will be able to examine genes of interest and discover novel relationshipsfor further investigation.
There are two important comments to be made. First, concerning the Weight
functionality. In building the network in this case, as we are building a given level,we prepare for the construction of the next level by recording for each gene added tothis level the weight to that gene. That is, for each gene which we connect to on thislevel, we record the product of the correlation coefficients connecting the seed geneto our new gene. As we process the current level, we may add a gene to that levelmore than once. This amounts to finding more than one path from the seed gene toa gene on the current level. The weights of the different paths may be different. Werecord the highest weight. Here is an example.
Our seed gene is A, and it is correlated to B and C with correlation coefficients .9
and .9, respectively. B and C are correlated to D with correlation coefficients .9 and.8, respectively. The two possible paths from A to D are:A to B to D, with weight .72,
A to C to D with weight .81. D will be recorded as having a weight of .81.
The second comment concerns connections from higher to lower levels, as well as
those within levels. There is a contrast between the standard functionality and theInternal Edges variants of Levels and Highest.
In the basic network building (Levels), there will be no connections from higher to
lower levels. If you were to need to connect to a lower level, you would already havebeen connected to from the lower level. (If a gene on a higher level has to connect toa gene on a lower level, then that connection has already been made, from the lowerlevel.) There is, however, the possibility that a gene may be connected to anothergene on the same level. In the basic network functionality this is not allowed; theseedges are simply not drawn. In the Levels with Internal Edges functionality theseedges are drawn. As an example, consider the following simple network:A B .9A C .9B C .9B D .99In the standard functionality the edges are: A-B, A-C, and B-D. In the InternalEdges functionality the edge from B to C (and the edge from C to B) is also drawn.
Note that with the Highest functionality, we can potentially feed back to our an-
cestors. Here is an example. Our genes are A through F, correlated as follows:A B .9A C .8A D .7B C .95B D .91D E .6D F .5We consider A as our seed gene, and use the two highest coefficients. A is connectedto B and C. B is an ancestor of D, so that A is as well. D should connect to A, be-cause that connection to A is one of its two highest coefficients. This is not allowedby the script, as A is D’s ancestor. The two connections from D are to E and F. In a similar vein, two genes on the same level may not be connected to each other. In our example there is only one connection from B - to D. Thus the edges drawnare: A-B, A-C, B-D, D-E, and D-F.
In the Highest with Internal Edges the two connections from D are to A and B.
Similarly, there are two connections from B: to C and D. The edges here are: A-B,A-C, B-C, B-D, C-B, C-A, D-B, and D-A.
Effectively, the upshot of the above is that in the basic functionalities a gene can
be “connected to” multiple times, only in the following way: It has several (direct)
parents. In other words, all connections to it are made from only one level.
We remark that the user can tell by looking at the graph where the edges come
from. Edges are recorded in the network file with the parent node sitting in the firstcolumn, and the child sitting in the second. The visualize.cgi script sees this,and draws the line from parent to child with a ball at the child end.
We have designed a file format for holding these networks which is both human
and machine readable. Perl outputs the networks in this file format.
We have built a front end for StarNet, through which the user can access the
functionalities discussed above. The user enters a seed gene of interest, the num-ber of levels to build the network to, which type of network (Levels, Highest, etc.),which distribution to use, and whether to consider both positive and negative corre-lations. They may also enter a GO annotation term for which to search; the defaultis “transcription”.
The user can then set parameters for how the network should be visualized:
whether genes should be labeled with Entrez gene IDs or gene symbols, whethergenes should be drawn as boxes or ellipses, filled or outlined. They decide whethergenes common to the cardiac and full graphs should be highlighted on the graphs,whether edges (representing correlations) should be drawn as straight lines or curves,and which colour scheme should be used for the levels.
Once these selections are made and submitted, a Perl-CGI script,
visualize.cgi, processes the request. The Entrez gene ID, or symbol, entered bythe user is examined to be sure that it actually is a valid gene, and that it appearson the array. Then the desired network is built using BuildStarNetwork.pl, andvisualized using Graphviz.
Each node in the graph is checked to see whether its GO annotation terms contain
the user specified GO search term. Those which do are highlighted in red. By default,genes common to the cardiac and full cohort graphs are placed in brackets.
Although we do not permit a graph to be drawn from a gene which has been
deprecated or discontinued by Entrez, we do allow such genes in out graphs. Theyare always indicated by having an asterisk after their Entrez gene ID, and are alwaysdenoted by the Entrez gene ID, even if the user has asked for the graphs to be drawnwith gene symbols.
The levels in the graph are colour coded, and a scale for the levels is provided below
the graphs. The edge are also colour coded, indicating the strength of the correlation. We provide one scale for positive and one scale for negative correlations, for each ofthe two graphs: cardiac and full. Note that the scale is not a fixed absolute scale; inparticular it does not run from 0 to 1 (or -1 to 0 for the negative correlations). Thescale is determined on a per graph basis; given a graph the positive scale does not
run from 0 to 1, but rather from the value of the smallest correlation in the graphto the value of the largest correlation in the graph.
We remark as well that while edges are not directional, we draw with a certain
We also provide the user with a graph of known gene interactions (experimentally
verified) involving the genes in the graph which we have just built. The nodes andedges are coded in order that the type of interaction can be recognized visually,and it is easy to determine which of the genes in the interaction network appearin our correlation graphs. For detail on the labeling, see the User’s Manual. Theinteractions comes from Entrez’s gene RIF (Reference Into Function) file, obtainableon their FTP site. The file contains information on interactions derived from theBioGrid and Bind databases, as well as PubMed references, and a user curateddescription of the interaction. We determine whether the interaction is betweenproteins, DNA, or RNA, using the RefSeq accession numbers provided in the file,and RefSeq’s notation for these accessions.
Several lists are generated for the user. We explain each list, mentioning that for
each of the lists described below, we have one for the cardiac and one for the fullcohort.
First, we present a list of all the genes appearing in the graph. Both Entrez
gene ID and official gene symbol appear, hyper-linked to Entrez. Synonyms and thedescription of the gene from Entrez are also included. Finally, we flag those genesthat appear in both networks.
Next we have a list of edges in the graph. Both genes appear, with symbol and
Entrez gene ID (the latter linked to Entrez). They are followed by the correlationcoefficient connecting them, as well as the 95 and 99% confidence intervals for thecorrelation.
There is a list of known interactions between genes in our graph. Both genes
are listed, with hyper-linked Entrez gene ID, gene description, description of theinteraction type, PubMed ID, Gene RIF, and database in which the interaction wasdocumented.
Next a list of genes in the graph which are GO annotated with the user specified
search term. Symbol and Entrez gene ID appear (the latter linked to Entrez), aswell as a hyper-linked (to GO) list of the GO terms annotating the gene.
The last list is of those GO terms which are enriched in our graph, with the set
of genes appearing on the array as background. We provide a description of the GOterm, placing an asterisk beside the description if more than one gene in the graphis annotated with that term. The GO term is hyper-linked to its entry in GO. Weprovide a p-value, indicating whether the enrichment is significant or not, and finallya list of genes annotated with the term in question. Again, the Entrez gene ID ishyper-linked to Entrez. We remark that the p-values are computed using a one sidedhypergeometric test, with mid p-values. Our analysis is based on reading in [22] and[23].
[1] Edgar, R., Domrachev, M., and Lash, A. E. (2002) Nucleic Acids Res 30(1), 207–10. [2] Barrett, T., Troup, D. B., Wilhite, S. E., Ledoux, P., Rudnev, D., Evangelista, C., Kim, I. F.,
Soboleva, A., Tomashevsky, M., and Edgar, R. (2007) Nucleic Acids Res 35(Database issue),D760–5.
[3] Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J.,
Ansorge, W., Ball, C. A., Causton, H. C., Gaasterland, T., Glenisson, P., Holstege, F. C., Kim,I. F., Markowitz, V., Matese, J. C., Parkinson, H., Robinson, A., Sarkans, U., Schulze-Kremer,S., Stewart, J., Taylor, R., Vilo, J., and Vingron, M. (2001) Nat Genet 29(4), 365–71.
[4] Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U., and
Speed, T. P. (2003) Biostatistics 4(2), 249–64.
[5] Parkinson, H., Kapushesky, M., Shojatalab, M., Abeygunawardena, N., Coulson, R., Farne, A.,
Holloway, E., Kolesnykov, N., Lilja, P., Lukk, M., Mani, R., Rayner, T., Sharma, A., William,E., Sarkans, U., and Brazma, A. (2007) Nucleic Acids Res 35(Database issue), D747–50.
[6] Spellman, P. T., Miller, M., Stewart, J., Troup, C., Sarkans, U., Chervitz, S., Bernhart,
D., Sherlock, G., Ball, C., Lepage, M., Swiatek, M., Marks, W. L., Goncalves, J., Markel,S., Iordan, D., Shojatalab, M., Pizarro, A., White, J., Hubley, R., Deutsch, E., Senger, M.,Aronow, B. J., Robinson, A., Bassett, D., Stoeckert, C. J., J., and Brazma, A. (2002) GenomeBiol 3(9), RESEARCH0046.
[7] Gentleman, R. C., Carey, V. J., Bates, D. M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B.,
Gautier, L., Ge, Y., Gentry, J., Hornik, K., Hothorn, T., Huber, W., Iacus, S., Irizarry, R., Li,F. L. C., Maechler, M., Rossini, A. J., Sawitzki, G., Smith, C., Smyth, G., Tierney, L., Yang,J. Y. H., and Zhang, J. (2004) Genome Biology 5, R80.
[8] Irizarry, R. A., Gautier, L., and Cope, L. M. The Analysis of Gene Expression Data: Methods
and Software, chapter 4, Springer-Verlag (2003).
[9] Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P.,
Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis,A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M., and Sherlock, G. (2000) Nat Genet 25(1), 25–9 1061-4036 (Print) Journal Article Research Support, Non-U.S. Gov’t Research Support, U.S. Gov’t, P.H.S.
[10] Maglott, D., Ostell, J., Pruitt, K. D., and Tatusova, T. (2007) Nucleic Acids Res 35(Database
[11] Pruitt, K. D., Tatusova, T., and Maglott, D. R. (2007) Nucleic Acids Res 35(Database issue),
http://www.affymetrix.com/products/arrays/specific/mouse430.affx Technical report.
www.affymetrix.com/support/technical/whitepapers/netaffxannot
[14] Dai, M., Wang, P., Boyd, A. D., Kostov, G., Athey, B., Jones, E. G., Bunney, W. E., Myers,
R. M., Speed, T. P., Akil, H., Watson, S. J., and Meng, F. (2005) Nucleic Acids Res 33(20),e175.
[15] Stalteri, M. A. and Harrison, A. P. (2007) BMC Bioinformatics 8, 13 1471-2105 (Electronic)
Journal Article Research Support, Non-U.S. Gov’t.
[16] Perez-Iratxeta, C. and Andrade, M. A. (2005) BMC Bioinformatics 6, 183. [17] Harr, B. and Schlotterer, C. (2006) Nucleic Acids Res 34(2), e8. [18] Vardhanabhuti, S., Blakemore, S. J., Clark, S. M., Ghosh, S., Stephens, R. J., and Rajagopalan,
[19] Bolstad, B. M., Irizarry, R. A., Astrand, M., and Speed, T. P. (2003) Bioinformatics 19(2),
[20] Irizarry, R. A., Wu, Z., and Jaffee, H. A. (2006) Bioinformatics 22(7), 789–94. [21] http://gepas.bioinfo.cipf.es/cgibin/tutoX?c=expresso/expresso.config. [22] Rivals, I., Personnaz, L., Taing, L., and Potier, M. C. (2007) Bioinformatics 23(4), 401–7. [23] Gentleman, R., Carey, V., Huber, W., Irizarry, R., and Dudoit, S. Bioinformatics and Com-
putational Biology Solutions Using R and Bioconductor, chapter 22, Springer-Verlag (2005).
Insurance Quote Depot Get The Lowest Insurance Quotes In Your Area By Zip Code http://www.InsuranceQuoteDepot.com Pennsylvania Auto Insurance - Critical Facts You Must Know About Coverage Pennsylvania Auto Insurance - Critical Facts You Must Know About Coverage Are you looking for Pennsylvania auto insurance rates that make your auto insurance deal abargain? There are many online servic
Emergency Medical Release Form 2005-2006 Please complete each section thoroughly, sign and date. The Awty International School A new form must be completed each school year and is required for enrollment. Student’s Name: ___________________________________________________________________________ Sex: F □ M □ Last First irthdate _______________ Age: _______ 20