## Agbs.kyb.tuebingen.mpg.de

Discovering Latent Structure in Clinical Databases
Dept. of Biostatistics and Medical Informatics
{peissig.peggy,caldwell.michael}@marshfieldclinic.org
Statistical relational learning allows algorithms to simultaneously reason aboutcomplex structure and uncertainty with a given domain. One common challengewhen analyzing these domains is the presence of latent structure within the data.

We present a novel algorithm that automatically groups together different objectsin a domain in order to uncover latent structure, including a hierarchy or evenheterarchy. We empirically evaluate our algorithm on two large real-world taskswhere the goal is to predict whether a patient will have an adverse reaction toa medication. We found that the proposed approach produced a more accuratemodel than the baseline approach. Furthermore, we found interesting latent struc-ture that was deemed to be relevant and interesting by a medical collaborator.

Statistical relational learning (SRL) [1] focuses on developing learning and reasoning formalismsthat combine the benefits of relational representations (e.g., first-order logic) with those of prob-abilistic, graphical models. SRL is especially applicable for domains when (1) it is important toexplicitly model uncertainty (e.g., the probabilities of predictions the model makes) and (2) the dataresides in a relational database, where information from each table is needed to learn an accuratemodel. One particularly challenging aspect of analyzing real-world relational data is the presence oflatent structure. Accurately detecting and modeling this type of structure requires combining ideasfrom latent variable discovery in graphical models (e.g., [2, 3]) and predicate invention in inductivelogic programming (e.g., [4, 5]). Despite the fact that many domains have latent structure, very fewlearning algorithms [6, 7, 8] can effectively cope with its presence in rich relational domains.

To motivate the need to discover latent structure, consider the task of analyzing electronic medicalrecords (EMRs). An EMR is a relational database that stores a patient’s clinical history (e.g., diseasediagnoses, prescriptions, lab test results, etc.). Recently, there has been much interest in applyingmachine learning to medical data [9]. When treating a patient, a doctor must decide among manydifferent medications that could be prescribed. These medications share varying degrees of sim-ilarity in terms of (i) how they interact with the human body to counteract disease, (ii) how theyinteract with other drugs, and (iii) how they interact with genetic and other variations from one hu-man body to another. In terms of medication, a patient’s clinical history only contains which drugswere prescribed for the patient (e.g., name, dosage, duration). Consequently, it can be difficult todetect interesting and meaningful patterns present in the data. For example, there may be a substan-
tial number of people who take a class of medicines, such as statins for cholesterol, and also havecertain outcome (e.g., disease diagnosis, adverse drug reaction, etc.). The data only records whichspecific medicine each patient has been prescribed, and the number of people who take each indi-vidual medicine and have the outcome may be too small to meet an interestingness threshold (e.g.,support threshold in association rule mining). What is missing is the ability to automatically detectrelated medicines and group them together. Requiring a domain expert to hand craft all relevantfeatures or relations necessary for a problem is a difficult and often infeasible task. For example,should drugs be grouped by which disease they treat, by which mechanism they use, by potentialside-effects or by interactions with other drugs? Ideally, the learning algorithm should automaticallydiscover and incorporate relevant features and relations.

We propose a novel approach for automatically discovering latent structure in relational domains.

Our approach represents latent structure by grouping together, possibly hierarchically, sets of objectsin a domain. The proposed approach dynamically introduces latent structure in a data-driven fash-ion. It automatically groups together objects and/or already-existing groups by evaluating whetherthe proposed grouping results in a more accurate learned model. Furthermore, each object canappear in multiple different groupings as an object may appear in multiple contexts (e.g., a drugcould be in different groupings related to its mechanism, indications, contraindications, etc.). Acentral challenge in the domains we consider is that they contain a large number of objects (e.g.,diseases, drugs). Consequently, searching for latent relationships among all objects at once is pro-hibitively expensive. We introduce several search strategies that help automatically identify a smallbut promising subset of objects that could belong to the same latent group.

We motivate and evaluate our approach on the specific task of predicting adverse drug reactions(ADRs) from EMR data. This application is timely for a number of reasons. First, EMRs are nowbecoming widespread, making large amounts of this data available for research. Second, adversedrug reactions are a major risk to health, quality-of-life and the economy. ADRs are the fourth-leading cause of death in the United States. The pain reliever VioxxTMalone was earning US$2.5billion per year before it was found to double the risk of heart attack and was pulled from themarket, while the related drug CelebrexTMalso raised this risk and remains on the market. Third,accurate predictive models for ADRs are actionable—if found to be accurate in a prospective trial,such a model could easily be incorporated into EMR-based medical practices to avoid giving a drugto those at highest risk of an ADR. Using three real-world ADR tasks, we demonstrate that theproposed approach produces a more accurate model than the baseline approach. Furthermore, ouralgorithm uncovers latent structure that a doctor, who has expertise in our tasks of interest, deems tobe interesting and relevant.

The proposed approach builds on the SRL algorithm VISTA [10], which combines automated featureconstruction and model learning into a single, dynamic process. VISTA uses first-order definiteclauses, which can capture relational information, to define (binary) features. These features thenbecome nodes in a Bayesian network. VISTA only selects those features that improve the underlyingstatistical model. It selects features by performing the following iterative procedure until some stopcriterion has been met. The feature induction module proposes a set of candidate features to includein the model. VISTA evaluates each feature, f , by learning a new model (i.e., the structure of theBayesian network) that incorporates f . To evaluate each candidate feature f , VISTA estimates thegeneralization ability of the model with and without the new feature by calculating the area under theprecision-recall curve (in principle, any metric is posible) on a tuning set. VISTA selects the featurethat results in the largest improvement in the model’s score and incorporates it into the model. If nofeature improves the score, the process terminates. VISTA offers several advantages for analyzingclinical data. First, the first-order rules can incorporate information from different relations withina single rule. Second, by incorporating each rule into a Bayesian network, VISTA can capture theinherent uncertainty in the data. Third, the learned rules are comprehensible to domain experts.

Bayesian networks are probabilistic graphical models that encode a joint probability distribu-tion over a set of random variables. Given a set of random variables X = {X1, . . . , Xn}, a
G, Θ is defined as follows. G is a directed, acyclic graph that con-
tains a node for each variable Xi ∈ X. For each variable (node) in the graph, the Bayesiannetwork has a conditional probability distribution (CPD), θXi|P arents(Xi), giving the probabil-ity distribution over the values that variable can take for each possible setting of its parents, andΘ = {θX , . . . , θ
}. A Bayesian network B encodes the following probability distribution:
VISTA needs to learn the structure of a Bayesian network. That is, given a data set, it must learn boththe network structure G (i.e., the arcs between different variables) and the CPDs, θXi|P arents(Xi),for each node in the network. VISTA uses tree augmented na¨ıve Bayes (TAN) [11] as its model. In aTAN model, there is a directed arc from the class variable to each non-class attribute (i.e., variable)in the domain. Furthermore, each non-class attribute may have at most one other parent, whichallows the model to capture a limited set of dependencies between attributes. The algorithm forlearning a TAN model has two nice theoretical properties [11]. First, it finds the TAN model thatmaximizes the log likelihood of the network structure given the data. Second, it finds this model inpolynomial time.

VISTA uses formulas in first-order logic to define features. Technically, it uses the non-recursiveDatalog subset of first-order logic, which with a closed-world assumption is equivalent to rela-tional algebra. The alphabet for Datalog consists of three types of symbols: constants, variables,and predicates. Constants (e.g., the drug name Propranolol), which start with an upper case let-ter, denote specific objects in the domain. Variable symbols (e.g., disease), denoted by lowercase letters, range over objects in the domain. Predicate symbols P/n, where n refers to thearity of the predicate and n ≥ 0, represent relations among objects. An example of a predi-cate is Diagnosis/3. A term is a constant or variable. If P/n is a predicate with arity n andt1, . . . , tn are terms, then P (t1, . . . , tn) is an atomic formula. An example of an atomic formulais Diagnosis(Patient123, 12 − 12 − 2010, Tuberculosis), where all three arguments are con-stants. This says that patient Patient123 had a diagnosis of Tuberculosis on December12, 2010.

A literal is an atomic formula or its negation. A clause is a disjunction over a finite set of literals. Adefinite clause is a clause that contains exactly one positive literal; it can be written as a conjunctionof atomic formulas that imply another atomic formula (the positive literal), as follows:
Drug(pid, date1, Ketoconazole) ∧ WithinMonth(date1, date2) ⇒ ADR(pid, date2)
All variables in a definite clause are assumed to be universally-quantified.

VISTA uses definite clauses to define features for the statistical model. Each definite clause becomesa binary feature in the underlying statistical model. The feature receives a value of one for a specificpatient if data about that patient can be used to satisfy (i.e., prove) the clause and it receives a valueof zero otherwise. Feature definitions are constructured in the standard, top-down (i.e., general-to-specific) manner. We briefly describe the approach here (see [12] for more details). Each inducedrule begins by just containing the target attribute (in the above example this is ADR(pid, date2))on the right-hand side of the implication. This means that the feature matches all examples. Theinduction algorithm then follows an iterative procedure. It generates a set of candidate refinementsby conjoining predicates to the left-hand side of the rule. This has the effect of making the featuremore specific (i.e., it matches fewer examples). The search can proceed in a breadth-first, best-first,or greedy (no backtracking) manner, but we employ a breadth-first search in this paper.

At a high-level, the key innovation of our proposed approach, LUCID (Latent Uncertain ConceptInvention on-Demand), occurs when constructing feature definitions. Here, the algorithm has theability to invent (hierarchical) clusters that pertain to a subset of the constants in the domain. In-tuitively, constants that appear in the same grouping share some latent relationship. Discoveringand exploiting the latent structure in the feature definitions provides several benefits. First, it al-lows for more compact feature definitions. Second, by aggregating across groups of objects, it helpsidentify important features that may not otherwise be deemed relevant by the learning algorithm.

To illustrate the intuition behind our approach, we will use a running example about ADRs to the
medication WarfarinTM, which is a blood thinner commonly prescribed to patients at risk of havinga stroke. However, Warfarin is known to increase the risk of internal bleeding for some patients.

Consider the following feature definition:
Drug(pid, date1, Terconazole) ∧ Weight(pid, date1, w) ∧ w < 120 ⇒ ADR(pid)
This rule applies only to those patients who satisfy all the conditions on the left hand side of rule.

Conditioning on whether a patient has been prescribed Terconazole limits the applicability of thisrule. Terconazole is an enzyme inducer, which is a type of medication known to elevate a patient’ssensitivity to Warfarin. However, many other drugs in the enzyme inducer class (e.g., Rifampicinand Ketoconazolegive) are frequently prescribed instead of Terconazole, which makes this featureoverly specific. A potentially stronger feature would replace Terconazole with an invented conceptsuch as enzyme inducer or Warfarin elevator.

Yet, these concepts are not explicitly encoded in clinical data. By grouping together related objects,LUCID captures latent structure and is able to learn more general features. For example, we couldgeneralize the previous rule as follows:
Cluster1(did) ∧ Drug(pid, date1, did) ∧ Weight(pid, date1, w) ∧ w < 120 ⇒ ADR(pid)
The definition for Cluster1 represents latent structure among a group of medicines.

The goal of our approach is to capture hierarchical latent structure about specific constants (i.e.,objects) in the domains. First, we want to capture that specific constants are interchangeable insome cases. For example, Terconazole, Rifampicin and Ketoconazole are all enzyme inducers, anda doctor could reasonable prescribe any of them. We can accomplish this by introducing a newconcept, which we generically call Cluster1, as follows:
These statements simply assign these drugs to Cluster1. There is no limit on the number of objectsthat can be assigned to each invented cluster.

Secondly, we want to able to make use of previously discovered concepts to represent more high-level, hierarchical structure. We can do this in the following manner:
Just as before, the first two statements assign specific drugs to Cluster2. The key step is the thirdstatement, where all the constants that have been assigned to Cluster1 are assigned to Cluster2as well. Once a proposed grouping has been used in a feature that has been included in the model, itis available for future reuse during the learning procedure. Reusing previously discovered conceptsallows the algorithm to automatically explore tradeoffs between fine-grained grouping (e.g., enzymeinducers) and more high-level groupings (e.g., Warfarin elevators) that may be present in the data.

Furthermore, it allows the algorithm to build progressively more complex concepts over time.

The key step in the algorithm is discovering the latent structure. Given a feature definition such asRule (1), latent structure is learned in the following way. First, LUCID rewrites the feature defini-tion by replacing the reference to the specific constant with a variable and conjoining an inventedlatent structure predicate to the end of the rule. For example, Rule (1) would be transformed intoRule (2), where Rule (2) has the variable did instead of the constant Terconazole and it containsthe invented predicate Cluster1(did).

Second, LUCID learns a definition for the invented latent predicate (e.g., Cluster1 in Rule (2)). Itbegins by assigning the replaced constant to this cluster, which in the running example corresponds
to this statement: Cluster1(Terconazole). Next, it tries to extend the definition of the cluster byidentifying a set of candidate constants that could be added to the cluster. It adds each constant to thecluster in turn. The benefit of the modified cluster is measured by seeing if the model, which includesthe feature that makes use of the extended cluster definition, improves. LUCID greedily selects thesingle constant that results in the largest improvement in the model’s score. This procedure iteratesuntil no addition improves the model’s performance or the set of candidate constants is empty. Theend result is a cluster definition as illustrated by either Cluster (3) or Cluster (4).

The central challenge in trying to discover latent structure is the large number of concepts that couldbe invented. For example, when predicting adverse reactions, the data contains information aboutthousands of drugs and diseases. Consequently, performing a complete search, where the utility ofadding each constant to the cluster is evaluated separately, is prohibitively expensive. We proposetwo different techniques to identify promising candidates to include in a grouping.

Restrict constants to those from “near miss” examples. To illustrate this idea, consider the fol-
Weight(pid, date1, w) ∧ w < 120 ⇒ ADR(pid)
Drug(pid, date1, Terconazole) ∧ Weight(pid, date1, w) ∧ w < 120 ⇒ ADR(pid)
The second rule, by adding the condition Drug(pid, date1, Terconazole), applies to fewerpatients. Some patients may match Rule (5), but not the more specific Rule (6) because they tooka similar, but not identical medication. Looking at the medications prescribed to these patients(i.e., those that match Rule (5) but not Rule (6)) potentially can inform the search as to whichmedications can be prescribed in place of Terconazole. Therefore, this strategy restricts the searchto only considering grouping together constants that appear in examples that are covered by arules’ immediate predecessor (i.e., Rule (5)) but not the rule itself (i.e., Rule (6)).

Restrict constants to those correlated with initial constant. This approach performs a pre-
computation to identify constants that are mutually replaceable. Intuitively, the idea is to discoverwhen one constant can be used in lieu of another (e.g., one drug is prescribed in place of anotherdrug). This can be done by employing a variant of “guilt by association,” which says that objectsare similar if they appear in similar contexts. Given a ground atom, Rel(A1, . . . , C1, . . . , An),constant C1 shares a context with another constant C2 (with C1 = C2) if replacing C1 with C2results in a ground atom that appears in the data. LUCID performs a preprocessing step over thetraining data and computes the Pearson correlation among all pairs of constants Ci and Cj:
where k ranges over all constants of the same type (k = i and k = i), Ni,k is the context size (i.e.,the number of times that Ci shares a context with Ck), and ¯
Ni is the average context size for Ci.

In principle, LUCID can use any evaluation metric for evaluating the quality of the model. Weuse the area under the precision-recall curve (AUC-PR). The tasks considered in this paper containmany more negative examples than positive examples, and this measure ignores the potentially largenumber of true negative examples. LUCID evaluates a model by looking at its AUC-PR on bothits training set (used to learn the model structure and parameters) and an independent tune set.

With relatively few positive examples, considering both the train and tune set scores helps make thealgorithm more robust to overfitting. Furthermore, it is likely that many features will improve themodel. Therefore, candidates must improve both the AUC-PR of the train set and the AUC-PR ofthe tune set by a certain percentage based threshold to be considered. Again, using the thresholdhelps control overfitting by preventing relative weak features (i.e., those that only improve the modelscore slightly) from being included.

LUCID essentially follows the same high-level control structure as the VISTA algorithm describedin Subsection 2. The key difference is that it defines more candidate features as it constructs and
Number of facts in the medicine database table
Number of facts in disease database table
evaluates features that contain invented, latent predicates. However, it is prohibitively expensiveto consider adding latent structure to each candidate feature. Therefore, LUCID restricts itself toadding latent concepts only to features that meet the following two conditions:
Condition 1: The rule under consideration improves the score of the model. This provides initial
evidence that the rule is useful, but the algorithm may be able to improve its quality by modelinglatent structure. Discarding rules that initially exhibit no improvement dramatically improves thealgorithm’s efficiency.

Condition 2: The most recent condition added to the rule must refer to a constant. Furthermore,
the user must have identified this type of constant as a candidate for having latent structure. Thishelps reduce the search space as not all types of constants will exhibit latent structure.

For each candidate feature that meets these two criteria, LUCID attempts to discover latent structure.

It invokes the procedure outlined in Subsection 3.2 and adds the feature it constructs (which containsan invented latent predicate) to the set of candidate features.

Given the extended set of candidate features, LUCID selects the feature that most improves the scoreof the model. After adding a candidate feature to the model, all other features must be re-evaluated.

The modified model, which changes the score for each candidate features, means that LUCID mustcheck each feature to determine which ones satisfy both of the aforementioned conditions and shouldbe augmented with a latent predicate. The modified model, if it incorporated a feature with a latentpredicate, also presents expanded opportunities for latent structure discovery. Newly invented latentpredicates can extend or reuse the previously introduced latent predicate definition.

In this section, we evaluate our proposed approach on three real-world data sets. As the baseline, wecompare LUCID to the VISTA algorithm [10]. In all tasks, the goal is to predict at prescription timewhether a patient will have an ADR that may be related to taking the medication. We first describethe data sets we use and then present and discuss our experimental results.

Our data comes from a large multispecialty clinic that has been using elec-
tronic medical records since 1985 and has electronic data back to the early 1960’s. We have receivedinstitutional review board approval to undertake these studies. For all tasks, we have access to infor-mation about observations (e.g., vital signs, family history, etc.), lab test results, disease diagnosesand medications. We only consider patient data up to one week before that patient’s first prescrip-tion of the drug under consideration. This ensures that we are building predictive models only fromdata generated before a patient is prescribed that drug. Characteristics of each task can be found inTable 4. We now briefly describe each task.

Selective Cox-2 inhibitors (e.g., VioxxTM) are a class of pain relief drugs that were found to increasea patients risk of having a a myocardial infarction (MI) (i.e., a heart attack) [13]. Angiotensin-converting enzyme inhibitors (ACEIs) are a class of drugs commonly prescribed to treat highblood-pressure and congestive heart failure. It is known that in some people, ACEIs may resultin angioedma (a swelling beneath the skin). Warfarin is a commonly prescribed blood-thinnerthat is known to increase the risk of internal bleeding for some individuals. On each task the goalis to distinguish between patients who take the medicine and have an adverse event (i.e., positiveexamples) and those who do not (i.e., the negative examples).

Table 2: Average AUC-PR for each approach. The best results for each task is shown in bold.

We performed stratified, ten-fold cross-validation for each tasks. We
sub-divided the training data and used five folds for training (i.e., learning the model structure andparameters) and four folds for tuning. We require that a candidate feature result in at least a 2%improvement to the AUC-PR in order to be considered for acceptance. We set all parameters tobe identical for all approaches. The only difference between the algorithms is that LUCID canintroduce latent structure. Without this ability, the algorithms would construct and evaluate identicalcandidate feature sets.

Table 2 reports the average AUC-PRs for each task . LUCID using the “near miss” strategy (LUCID-NM) for inventing latent structure outperforms VISTA on all three tasks. On the Selective Cox-2and Warfarin domains, LUCID-NM results in relatively large improvements in AUC-PR, of 12%and 41%, respectively when compared to VISTA. LUCID using the “correlation” strategy (LUCID-C) performance is more comparable with VISTA, except for on the ACI domain, where it has thebest performance overall. One possible reason that LUCID-NM does better than LUCID-C is thatit more actively uses the rule to guide concept invention. By leveraging a partial feature definition,LUCID-NM is able to detect correlations among objects that may more clearly arise in the contextof the rule. In contrast, LUCID-C takes a more global view with its pre-computation based strategy.

An another important evaluation measure is whether LUCID invents inter-
esting and relevant concepts. We presented several of the invented clusterings to a medical doctorwith expertise in circulatory diseases. We focus our discussion on structures from the SelectiveCox-2 domain. The expert remarked a cluster that contained the drugs diltiazem, a calcium-channelblocker, and clopidogrel (PlavixTM), an antiplatelet agent. These two cardiac drugs are frequentlyused in acute coronary syndrome especially after angioplasty. In terms of diseases, the expert high-lighted a cluster describing cardiac catheter and coronary angioplasty which are consistent withacute coronary syndrome and means that a patient is at a high risk of having a heart attack (MI). An-other cluster of interest involved cholecystectomy (which is a procedure to remove the gall blader)as in female individuals the diagnosis of MI is often confused with gall bladder pain. Finally, theexpert remarked on a cluster containing hearing loss as an finding that deserves further investigation.

SRL lies at the intersection of relational learning and graphical model learning. In terms of rela-tional learning, our approach is closely related to Dietterich and Michalski’s work [14]. Their workcontains an operation known as internal disjunction, which replaced constant with a disjunction ofseveral constants. We go beyond this work by allowing re-use of an internal disjunction and mostimportantly, by explicitly modeling and reasoning about uncertainty in the data and the inventedpredicates. The present paper is closely related to predicate invention in relational learning, es-pecially in inductive logic programming (e.g., [4, 5]). The present work advances beyond theseapproaches by explicitly modeling the uncertainty as well as structure within the data. These ap-proaches only invent new predicates when nothing else works, whereas our approach is much moreliberal about detecting latent structure. Our approach is closely related to latent variable discoveryfor graphical models. Introducing latent variables into a Bayesian network often results in a simplerstructure, yet it is difficult to automatically discover these latent variables from data [2, 3]. Our workgoes beyond these approaches by operating in a relational setting. Consequently our new clustersare incorporated into the Bayes net only within the context of specific rules, or definite clauses. Suchrules can capture a limited context around the cluster in which it is relevant.

Our work is not the first to combine ideas from latent variable discovery and predicate invention [6,7, 8, 15]. Popescul and Ungar [15] use an initial pre-processing step that learns clusterings and then
treats cluster membership as an invented feature during learning. In contrast, in the present approachthe learning task guides the construction of clusterings and also allows reuse of clusters as part ofnew clusters. Kemp et al. [6] propose a more advance algorithm based on a infinite relational modelwhich clusters entities in a domain. The cluster that an entity is assigned to should be predictiveof the relationships it satisfies. A weakness to this approach is that each entity can belong to onlyone cluster. Kok and Domingos [7] propose an algorithm that learns multiple relational clusters(MRC). The MRC algorithm clusters both relations and entities, and relations and entity can belongto more than one cluster. However, MRC is a transductive approach, rather than inductive approach.

These approaches have been evaluated on domains that contain information on only between 100and 200 objects. We have evaluated our approach on problems that are between one and two ordersof magnitude larger. It is unlikely that these approaches would scale to problems of this size.

We presented LUCID, a novel algorithm for latent structure discovery. We tested it within thedomain of learning from electronic medical record (EMR) data which patients are most at risk tosuffer a given adverse drug reaction (ADR). LUCID improved the performance of the baseline SRLalgorithm, and it produced meaningful latent structure. Important directions for further researchinclude applications to other ADRs, other tasks in learning from EMRs, and other types of relationaldatabases, as well as integrating LUCID with other SRL algorithms. Other important directionsinclude theoretical analysis of LUCID and of the task of latent structure discovery in general. Forexample, how accurately can correct latent structure be discovered as the complexity of the latentstructure varies, and as the amount of training data varies?
[1] L. Getoor and B. Taskar, editors. An Introduction to Statistical Relational Learning. MIT Press, 2007.

[2] G. Elidan, N. Lotner, N. Friedman, and D. Koller. Discovering hidden variables: A structure-based
approach. In NIPS 13, pages 479–485, 2000.

[3] N. L. Zhang, T. D. Nielsen, and F. V. Jensen. Latent variable discovery in classification models. Artificial
Intelligence in Medicine, 30(3):283–299, 2004.

[4] S. Muggleton and W. Buntine. Machine invention of first-order predicates by inverting resolution. In
Proc. of the 5th ICML, pages 339–352, 1988.

[5] J. Zelle, R. Mooney, and J. Konvisser. Combining top-down and bottom-up techniques in inductive logic
programming. In Proc. of the 11th ICML, pages 343–351, 1994.

[6] C. Kemp, J. Tenenbaum, T. Griffiths, T. Yamada, and N. Ueda. Learning systems of concepts with an
infinite relational model. In Proc. of the 21st AAAI, 2006.

[7] S. Kok and P. Domingos. Statistical predicate invention. In Proc. of the 24th ICML, pages 433–440, 2007.

[8] Z. Xu, V. Tresp, K. Yu, and H-P. Kriegel. Infinite hidden relational models. In Proc. of the 22nd UAI,
[9] F. Farooq, B. Krishnapuram, R. Rosales, S. Yu, J.W. Shavlik, and R. Kucherlapati. Predictive models in
personalized medicine: NIPS 2010 workshop report. SIGHIT Record, 1(1):23–25, 2011.

[10] J. Davis, I. Ong, J. Struyf, E. Burnside, D. Page, and V. Santos Costa. Change of representation for
statistical relational learning. In Proc. of the 20th IJCAI, pages 2719–2726, 2007.

[11] N. Friedman, D. Geiger, and M. Goldszmidt. Bayesian networks classifiers. Machine Learning, 29:131–
[12] N. Lavrac and S. Dzeroski. Inductive Logic Programming: Techniques and Applications. Ellis Horwood,
[13] P.M. Kearney, C. Baigent, J. Godwin, H. Halls, J.R. Emberson, and C. Patrono.

cyclo-oxygenase-2 inhibitors and traditional non-steroidal anti-inflammatory drugs increase the risk ofatherothrombosis? meta-analysis of randomised trials. BMJ, 332:1302–1308, 2006.

[14] T. G. Dietterich and R. S. Michalski. A comparative review of selected methods for learning from exam-
ples. In Machine Learning: An Artificial Intelligence Approach, pages 41–81. 1983.

[15] A. Popescul and L. Ungar. Cluster-based concept invention for statistical relational learning. In Proc. of
the 10th ACM SIGKDD, pages 665–670, 2004.

Source: http://agbs.kyb.tuebingen.mpg.de/wikis/bg/Davis_etal.pdf

A Chronology of Significant Events in the History of Science and Technology c. 2725 B.C. - Imhotep in Egypt considered the first medical doctor c. 2540 B.C. - Pyramids of Egypt constructed c. 2000 B.C. - Chinese discovered magnetic attraction c. 700 B.C. - Greeks discovered electric attraction produced by rubbing amber c. 600 B.C. - Anaximander discovered the ecliptic (the angle between the p

LIDERANÇA NA PRÁTICA COMO LIDERAR OS COLABORADORES RICARDO PAULO HENRIQUES GONÇALVES Aluno Nº 9701078 RICARDO FERREIRA CORNACHO ROSADO RIBEIRO Aluno Nº 9901050 RESUMO : Neste trabalho propomos uma reflexão sobre a Liderança na Prática, dando enfoque à liderança dos colaboradores. O trabalho está dividido em quatro partes: Na 1ª parte, apresentamos uma distin