Microsoft word - barrieregagnon-shortversion
Drugs and Disorders:
From specialized resources to Web data
1 Centre de Recherche Informatique de Montréal, 405 Ogilvy suite 101,
In this article, we focus on the may_treat
predicate linking drugs and
disorders. Such predicate is expressed in RDF format in the VHA National
Drug File Reference Terminology (NDF-RT), a specialized medical resource.
The DailyMed dataset also contains this predicate, but only in textual form: for
each drug there is an indication
field that links the drug’s URI to a literal that is
a long description (more than 100 words on the average). We show that natural
language processing (NLP) techniques can be used to further distil the
indication field to extract may_treat
predicates. We then move to Web
exploration and show how we can apply similar NLP techniques to find may_treat
predicates. The diversity in natural language expressions and the
embedding of good information among noisy and often redundant data make it
a challenge to exploit the Web. Still, we show that it can be used for finding
predicates with comparable success to using DailyMed.
linguistic patterns, NDF-RT, UMLS, DailyMed, text mining, Web
The number of sources for medical data on the Web is large and growing every day. One only has to look at the BioPortal site  to see the number of available medical ontologies, or to type in a disease name in Google to find multiple sites about this condition. Many resources exist in non-RDF forms as public databases that can be downloaded (UMLS, Snomed, Mesh)1 and others exist in RDF. In the Linked Data Cloud2, information about drugs and diseases can be found in many datasets such as: DbPedia3, Drugbank, DailyMed, Diseasome, Medicare, SIDER4.
1 These three resources are available at http://www.nlm.nih.gov/research/ . 2 http://linkeddata.org/ 3 http://dbpedia.org/ 4 http://www4.wiwiss.fu-berlin.de/ provides access to Drugbank, DailyMed, Diseasome,
In this research, we explore data sets of different levels of specialization and different levels of RDFization to reflect on possible ties with research in Corpus Linguistics. To focus this study on the distinction between datasets, we look at one single predicate, the may_treat
predicate linking drugs and disorders.
In section 2, we introduce the National Drug File Reference Terminology (NDF-
RT) and the Unified Medical Language System (UMLS). Throughout this study we rely on may_treat
predicates from NDF-RT, and on UMLS concept identifiers and labels for subjects and objects of the may_treat
In section 3, we look at the DailyMed dataset, and how the may_treat
rather expressed in textual form. For each drug, DailyMed contains an indication
field5, a literal that is a long description (more than 100 words on the average). We can apply natural language processing techniques to further analyze this textual information and generate one or more may_treat
predicates from it.
In section 4, we then contrast such resource with the Web at large, and show how
the diversity in natural language expressions and the embedding of purposeful information among noisy and often redundant data makes it a challenge to exploit. Still, we argue that Web data is valuable, and can help expand specialized resources such as the NDF-RT.
In section 5, we conclude on this exploration by comparing the two resources and
point to many possible ways to expand our search, either in DailyMed or on the Web at large, but taking different strategies based on the difference in the resources.
2 NDF-RT and UMLS
Not part of the Linked Data Cloud, the National Drug File Reference Terminology (NDF-RT) describes and defines medications. More specifically it describes generic ingredients or combinations, providing their active ingredients, mechanisms of action, physiologic effects, indications and contraindications.
NDF-RT has a distribution in OWL6. It contains more than 44000 concepts, each
one with a link to a UMLS concept unique identifier (CUI). Not part of the semantic web (although referred to in many sites), UMLS contains over 2 million names for some 900 000 concepts from more than 60 families of biomedical vocabularies, and 12 million relations among these concepts . Each concept in UMLS has a unique id (CUI), and is associated to a set of labels. These labels are essential to perform text analysis, as they give many possible ways of matching concepts by their lexical expression.
This greatly enriches the NDF-RT by providing many labels for each of its
concepts. We calculated that on average, drugs have 6 labels and diseases have 19 labels. Table 1a and 1b show examples of drugs and disorders with various labels. Their corresponding rows form may_treat
5 To be correct, it should be called a predicate, as DailyMed is available in RDF format, but the
term “field” is used throughout this article to differentiate between the indication
long literal (textual data) of DailyMed and the may_treat
predicate in NDF-RT.
6 The US government provides quarterly updates of the terminology in a variety of formats
(XML, OWL, and text) at http://evs.nci.nih.gov/ftp1/NDF-RT/.
Drugs with multiple labels found from UMLS
C0980568 Theophylline, anhydrous 200mg capsule
THEOPHYLLINE 200 MG ORAL CAPSULE, EXTENDED RELEASE THEOPHYLLINE ANHYDROUS 200 MG ORAL CAPSULE, EXTENDED RELEASE Theophylline, anhydrous 200mg capsule (product)
Risperidone 3mg tablet Risperidone 3mg tablet (product) Risperidone 3mg tablet (substance) RISPERIDONE 3 MG ORAL TABLET, FILM COATED RISPERIDONE 3 MG ORAL TABLET, ORALLY DISINTEGRATING
Disorders with multiple labels from UMLS
C0003578 Apnea / APNEA / Apneas / Apnoea / RESPIRATORY ARREST /
ARREST, RESPIRATORY / Apnea / Apnoea / Has stopped breathing / Not breathing / Apneic / Apnoeic
C0040517 Gilles de la Tourette syndrome / Gilles de la Tourette's syndrome
Tourette's disorder / Tourette Disorder / Syndrome, Tourette's Tourette's Syndrome / Tourettes Syndrome / Tourette's Disease Tourette Disease / Tourettes Disease Combined Multiple Motor and Vocal Tic Disorder Combined Vocal and Multiple Motor Tic Disorder
UMLS also contains a Semantic Network, which defines 54 relationships as well as 133 semantic types organized in 11 semantic groups. Two specific semantic groups interest us: “Chemical & Drugs” (which we often refer to as “drugs” in this article) and “Disorders” as they respectively represent the subject and object domains for the may_treat
predicate. Table 2 shows statistics about the NDF/RT concepts with their corresponding semantic groups defined in UMLS. The distribution certainly reflects the focus of NDF/RT on drugs and disorders.
Distribution of some UMLS Semantic Groups in NDF/RT
Among the 31527 concepts in NDF-RT associated with the semantic group “Chemical & Drugs”, we find that only 8836 (28%) of these concepts participate in the may_treat
relation. And among the 9115 concepts in NDF-RT associated with the semantic group “Disorders”, 962 (11%) participate in the may_treat
These small percentages show how such specialised resource is very valuable, but
also limited in its coverage. All drugs could be involved in a may_treat
relation, but only 28% of them actually are at this time (for this version of the resource7).
The site DailyMed (http://dailymed.nlm.nih.gov) published by the National Library of Medicine provides high quality information about market drugs. A Linked Data version provides a RDF view of part of the information at http://www4.wiwiss.fu-berlin.de/dailymed/. The RDF version of DailyMed is part of the Linking Open Drug Data project . It describes about 3600 drugs and provides many predicates, among which, some lead to resources and others to literals.
Some predicates in the RDF view link to resources, and others to literals of
variable sizes. Predicates such as “adverseReaction”, “clinicalPharmacology”, “precaution” or “indication” lead to literals that are actually textual data on which text analysis techniques can be used to further pursue the RDFization. The indication for each drug is rather lengthy (size varies from 1 word to 1338 words, with a means of 127). Figure 1 shows some examples. In bold are linguistic patterns, as we will refer to them in section 3.3.
Fluticasone propionate ointment is
a medium potency corticosteroid indicated for the relief
the inflammatory and pruritic manifestations of corticosteroid-responsive dermatoses in
RYZOLT is indicated for the management of
moderate to moderately severe chronic pain
in adults who require around-the-clock treatment of their pain for an extended period of time.
PhosLo is indicated for the control of
hyperphosphatemia in end stage renal failure and does
not promote aluminum absorption.
Astelin Nasal Spray is indicated for the treatment of the symptoms of
rhinitis such as
rhinorrhea, sneezing, and nasal pruritus in adults and children 5 years and
older, and for the treatment of the symptoms of
vasomotor rhinitis, such as
nasal congestion and post nasal drip in adults and children 12 years and older.
Examples from DailyMed
7 Version NDFRT_Public_2011.07.05_TDE.xml.
3.1 Coverage of DailyMed drug names in NDF-RT
Our first challenge is to establish a correspondence between DailyMed concepts and NDF-RT concepts. Dailymed does not provide a UMLS CUI, nor any other CUIs contained in NDF-RT (such as Mesh_CUI) that could have been use as an intermediate to link to a UMLS_CUI. Matches must therefore be established via concept labels and the process becomes prone to uncertainty and errors.
We rely on Lucene8, an open-source document indexing and retrieval software.
All UMLS CUIs with their associated labels (as presented in section 2) are indexed in Lucene. All DailyMed drug names, given by the dailymed name
property are used in turn as query.
Different retrieval strategies are implemented in Lucene and can be parameterized,
but we simply use the default TF-IDF (Term Frequency-Inverse Document Frequency) scoring which considers all labels as a bag-of-words9. Using the matching process on all drugs, we established that of its 2305 drugs10, Dailymed has 987 of them that are part of NDF-RT. Some examples of matching labels are shown in Table 3.
Matching UMLS_CUIs and DailyMed drug names
captopril, hydrochlorothiazide drug combination
Captopril+hydrochlorothiazide 25mg/15mg tablet
3.2 Finding may_treat predicates in the “indication” field
The Stanford Parser  is used to process indications by performing sentence splitting, tokenization, part-of-speech tagging and parsing. Using this parsing process, we are able to process 3231 indications (92.6% of all indications11), and from these 2090 different drugs are covered (90% of the list, from which already 3% had no indication fields).
We are currently developing approaches which will take advantage of the parse
tree, as promoted in the literature, especially in BioNLP , but for the research reported here, a simpler approach is used to discover may_treat
8 Lucene is available for download at http://lucene.apache.org/java/docs/index.html. 9 Information Retrieval strategies are beyond the scope of this article, and we refer the reader to
the introductory book by Manning et al. 2008 .
10 The SPARQL endpoint http://www4.wiwiss.fu-berlin.de/dailymed/sparql was queried in July
2011 to obtain drugs names and indications.
11 Some sentences were very long with long enumerations of disorders and side-effects. These,
among others, represent types of sentences that the parser could not digest.
indication fields. In fact, the underlying idea is very simple and consists in finding noun phrases (NPs) corresponding to drugs and disorders. These become candidates for may_treat
predicates to be validated against NDF-RT predicates. Taking all pairs of NPs is a bit naïve if precision is our goal, but here, recall will be measured and used as a comparison point for future Web analysis.
The Stanford Parser gives lists of NPs for each sentence. Each one can then be
matched to concept labels in UMLS using Lucene. Perfect matches will be put first and then partial matching after.
As an example shown in Table 4, the first sentence of Figure 1 is processed and its
NPs are matched to UMLS CUIs. The list of CUIs found from UMLS is then restricted to the ones in semantic groups “Chemical & Drugs” and “Disorders” (see the semantic group in last column of Table 4).
With this process, we find 849 drugs from DailyMed (41% of the 2090 analyzed)
which participate in a may_treat
relation in NDF-RT. From these, only 457 drugs (54%) contain a NP that can be linked to a disorder listed in may_treat
pairs from NDF-RT for that drug.
Noun Phrases extracted for the sentence with their closest CUIs from UMLS
C0015456 Facial Dermatoses Disease or Syndrome
In Table 5 shows examples of pairs for which the drug is found in the may_treat
pairs, but the disorder NPs are not found. Even without being a medical expert, these pairs seem “reasonable” and could be suggested as candidates (see last column of Table 5 for comparison). In our opinion, if the process is to be used for knowledge discovery, it is best to imagine a system in which automatic discovery is not a final step, but a step within a semi-automatic process leading to may_treat
predicate candidates to be validated by a medical expert.
Examples of pairs extracted from indication fields for which Drug is the subject of may_treat
but Disorder is not its object
Wounds and Injuries Hyponatremia Dehydration
Bipolar Disorder Epilepsies, Partial Phobic anxiety disorder Pain
C0700442 fibrillation C0232197 Fibrillation Ventricular Fibrillation
3.3 Analyzing how may_treat is expressed
We focus now on may_treat
pairs which were found in the indication fields to see
how this information is expressed in text. There are 954 sentences in which may_treat
pairs are found (covering 457 drugs, which we mentioned earlier). In these sentences, we simply record what occurs in-between the pairs as a possible linguistic pattern. Such possible patterns were emphasized in bold in Figure 1.
The use of linguistic patterns for knowledge discovery has been the subject of
much research in corpus linguistic and terminology . The general idea is to find sentences containing known relations to discover how these relations are expressed in natural language. Language is ambiguous and varied, but when specific relations are expressed, some more or less regular patterns can be discovered. Once these patterns are discovered, they can be used (with care as they are often noisy) to discover instances of relations.
All linguistic patterns recorded become candidate patterns. The weight of each one
is calculated. To do so, we take into account that each indication sentence could lead to multiple may_treat
candidate pairs, and therefore to multiple pattern candidates. A weight of 1/nbCandidates is assigned to each pattern candidate for that sentence. The
total weight on all sentences for all patterns is then calculated12. We show the top 20 patterns in the Table 6 below.
Patterns found between may_treat pairs in DailyMed indication fields
are indicated for the long-term management of
is indicated for the topical treatment of
is indicated for the temporary relief of
is indicated for use in the treatment of
3.4 Conclusions on DailyMed
The exploration of Dailymed and other RDF resources to fully exploit their textual data and transform them into RDF triplets is a research topic in itself deserving more research efforts. But as the focus of this article is the exploitation of Web data, we stop here our exploration to move to the Web. We will use this exploration as a comparison point, and also as a first entry point into the noisy web data. In our brief exploration of DailyMed, we have shown that: 1. Coverage is different than NDF-RT, with only 987 of its 2305 drugs present in
2. Indication fields can be analyzed with the StanfordParser to do part-of-speech
tagging and retrieve NPs which can be matched to drugs and disorders.
3. Only a portion of the indications lead to known may_treat
pairs from NDF-RT.
This comparison establishes recall, but does not inform us about the value of new knowledge found. If the method is able to recall known information, we infer
12 The size of candidate patterns is empirically set to a maximum of 50 characters before we
calculate the weights. With our naïve method of finding all NPs, we generate very long patterns and need to set a size limit.
that it will also be able to find new information that can become candidate information to be reviewed by medical experts to be added in a specialized resource.
4. Label matching is not obvious and could be a research topic in itself. At the
present, we rely on default TF-IDF strategies implemented in Lucene.
5. Ways of expressing the may_treat
relation are varied but still limited and almost
all patterns contain the keyword “indicated” in them. Our purpose here is to find what seems to be the most common way of expressing the may_treat
relation. We pursue in another research project a full use of these patterns (expressed syntactically) for the purpose of precisely extracting all information within DailyMed indications.
4 Web Data
We first discuss the presence of drug information on the Web. We then look into how to find may_treat
pairs using the single word “indicated”, word common to all patterns in DailyMed. We show some positive results, as some pairs from NDF-RT can be found in such way, providing support for the method. We then talk about the common linguistic patterns for may_treat
pairs on the noisy web.
4.1 Are drugs mentioned on the Web?
A first investigation is the actual presence of drugs on the Web. Before we even look at whether may_treat
pairs are present, we first look at the presence of drugs themselves. Contrarily to Dailymed, we do not have a list of drugs and indications, so we must search for them using the known labels for these drugs.
As we have seen earlier, we have about 6 labels per drug and 19 labels per disease
given by UMLS. We establish “presence” by finding hit counts for the labels. To find hit counts, we work with the Bing API13. Tables 7a and 7b show examples of hit counts for a few labels in three drugs and disorders.
We randomly chose 8000 pairs among the 47218 may_treat
pairs in NDF-RT, and
we calculated statistics on the presence of the drugs and diseases on the web. The number of different drugs in these pairs happens to be 3560, and the number of different disorders is 779. We found that all disorders have at least one label that has a presence on the web. That is not the case for the drugs, as we calculated that 72% of them (2547/3560) that have no presence on the web.
It is probably incorrect to say that the drug has no web presence at all, but by
using its different labels, as found in UMLS, we are not able to access them. As future work, we can investigate looking at active ingredients or other information about them to find them.
13 The Bing API allows web searches to be embedded in a Java program. Information and
download can be found at: http://msdn.microsoft.com/en-us/library/dd251056.aspx
Examples of hit counts for drug labels
Examples of hit counts for disease labels
Table 8 shows the distribution on number of hit counts based on frequency of different labels for drugs and diseases.
Result of hit counts for drugs and diseases.
4.2 Building a data set for experimentation
As mentioned earlier, 8000 may_treat
pairs were randomly selected to establish the presence of drugs and disorders on the Web. The most problematic category is the drug as they tend to be mentioned in very specific ways and a large proportion is not found on the Web.
To generate pairs for may_treat
web exploration, a minimum hit count for either
drug or disease label was set at 100000, and from those, pairs with a joint hit count of more than 10000 were selected.
Table 9 shows examples of pairs with their joint hit counts. Table 10 shows
distribution of hit counts for pairs with drugs & disorders having hit counts more than 100K.
Example of joint hit counts for random pairs selected.
Statistics on joint hit counts for pairs with individual hit counts above 100000.
4.3 Looking for may_treat relations on the Web
Analysis of Dailymed indications showed that most frequent linguistic patterns indicative of the may_treat
relation all contained the word “indicated”. This word “indicated” becomes the entry point into web data. The following steps are performed with for each drug to build a drug corpus (set of sentences) for it.
1. Use the Bing API to find the top 20 Web documents14 with the query
a. Retrieve the text from the pages (using JSoup15).
b. Split the text into sentences (using Stanford Parser).
c. Filter sentences that are too long as they will become problematic
for the parser (max is set to 500 characters).
d. Filter the sentences to keep the ones containing both the drugLabel
Step (2d) is important when working with web data as there is much redundancy,
and if use statistical techniques, it will affect our results.
Table 11 shows some examples of text retrieved on the web. These sentences are
often not in a state that strict linguistic analysis can be performed.
Information about experimental data (random pairs chosen)
http://www.capsaicin.co.uk/pages/1369/clinical-studies the journal of proteome research stated that prior studies have indicated that capsaicin may help fight obesity by decreasing calorie intake, shrinking fat tissue, lowering fat levels in the blood.
http://intensivecareunit.wordpress.com/2009/04/05/labetalol/ indications labetalol is indicated for the acute management of severe hypertension associated with a normal or adequate cardiac output.
overdosage leucovorin is indicated to diminish the toxicity and counteract the effect of inadvertently administered overdosages of methotrexate.
http://oswaldyves.com/ phentermine is indicated only for monotherapy, the drug should not be used in combination with selective serotonin-reuptake inhibitor antidepressants.
On the corpus built from the sentences, we perform the following:
1. Find the NPs. Rather than using the full parser as in the Dailymed
indications, we proceed by performing part-of-speech tagging with Stanford Parser and looking for sequences of nouns.
2. For each NP, find possible associated UMLS CUIs, and keep only the NPs to
which one or more (max 5) CUIs of the “disorder” semantic group can be matched.
3. Calculate the frequency of all possible UMLS CUIs to keep the most
14 Web pages from the DailyMed web site are removed to not have overlapping data. 15 JSoup is a Java HTML Parser available at http://jsoup.org/ .
In Table 12, we show some examples of NP frequencies and links to NDF-RT pairs, with the last column indicating if the pair is found or not.
Disorder found in web sentences, its frequency rank and its presence in NDF-RT
We perform some statistics to identify the retrieval capability of this experiment. We found that 216 drugs were part of the experimental data. Of those, there were 16 drugs for which no sentences were retrieved from the Web. For the other 200 drugs, we managed to gather an average of 20 sentences from which we extracted an average of 32 NP candidates. Among the 200 drugs, for 60 of them (35%), none of the NPs generated led to information part of the NDF-RT may_treat pairs. For the other 65%, we were able to find the correct answer at an average rank of 4.
This means for about 65% of the drugs looked at, searching on the web for 20
pages and analyzing its sentences containing the drug name and the word « indicated », we are able to retrieve information found in a may_treat
pair of a specialised and recognized resource. This is an interesting result.
We will do the same for each disease, building a disease corpus by gathering
4.4 How are pairs actually expressed on the Web?
In the previous experiment, we showed that via the entry point “indicated”, we are able to access some may_treat
pairs as expressed in textual data on the web. In the present experiment, we try to discover how may_treat
pairs are actually expressed on the Web to compare the linguistic patterns found with the ones from the DailyMed analysis.
We repeat the process from the previous Web experiment for retrieving
information and launch queries on Bing API of type “drugLabel” AND “disorderLabel”. We retrieve all the sentences containing the drug and the disorder
from the top 20 pages. We compile all patterns no longer than 100 characters separating the pair and calculate their frequency of occurrences among all pairs.
Table 13 shows the top patterns found16. This is obviously noisier than the clean
DailyMed patterns. The most interesting is that we do not even find among the top 20 patterns the word “indicated”. These patterns do contain words such as “calming” or “treat” and “treatment”. Many frequent patterns, “for”, “in”, “(“, or even “ ” (space) would certainly not be useful for searching on the Web.
Patterns extracted from may_treat
sentences on the Web
41.0 helps slow down and reverse the process of
4.5 Conclusions on Web Data
We showed that with a good entry point into the web, it is possible to find
interesting data. The same way as we mentioned for dailymed not being able to judge the value of the results, we would have to show the results to a domain experts. All we can evaluate is recall on known data and that we find about 65% recall at rank 4.
As with other knowledge discovery process, we would suggest to use it to show
candidates to a human to be able to evaluate the interest of the disorders retrieved and decide if they should be added or not to the resource.
In the last section, we showed how the knowledge expressed on the web
“naturally” is quite different than what was found in the DailyMed resource. Patterns retrieved are very noisy containing words such as and, or even just a parenthesis. We would have to perform the experiment of searching for a drug + such a pattern to evaluate results, but based on our previous background and expertise on knowledge patterns, we are sure that patterns with such general words would not lead to good results, unless the web search is first contained thematically. For example with tools like TerminoWeb  we gather domain-specific corpus, and then look for patterns, so a context is set.
16 We focus on “forward” patterns, assuming the drug occurs first in the sentence and the
disorder occurs second. This was true of indications in DailyMed, but not necessarily true of occurrences of drug/disorder pairs on the Web, and we will look at backward patterns in future work.
We have shown an exploration of 3 resources (1) NDF-RT with direct access to RDF info, (2) DailyMed with an indication field in which textual information can be found an analyzed, and (3) the web at large where valuable information is also found, but hidden among large amount of noisy information.
Comparing DailyMed and Web Data
Nb drugs not in may_treat
Percentage of drugs for which a known 53.8%
Table 14 summarizes our findings from DailyMed and the Web. The 3231
sentences in DailyMed are the subset that we could parse (using full parsing as opposed to only part-of-speech tagging for the Web). The percentage of drugs for which a known disorder was found is of 53.8% for DailyMed (457/849) and 60.6% for the Web (131/216). We obviously are not comparing equal sets, but still, the Web result is quite interesting. It shows that with a good entry point to it, the Web can lead to a recall rate comparable to a resource such as DailyMed, in which we are certain that the indication information would lead to a may_treat
predicate. This entry point “indicated” is valuable, and with normal bootstrap methods (as is usually suggested in knowledge discovery with linguistic patterns ), very noisy patterns would have been found. Nevertheless, some patterns discovered on the Web seem to have potential (“treatment”, calming”) and we should explore them in future work.
Obviously, the recall evaluation toward a resource already in RDF format does not
do justice to the process if it is to be used in knowledge discovery. Nevertheless, as we are not medical experts, evaluation with recall assures us that our method is at least able to retrieve known information. Information not found is not necessarily wrong but hopefully new, and should be validated to be incorporated into a specialized resource.
The coverage problem for drug labels on the Web should be investigated, as 72%
of them had all their labels with 0 hits. Although UMLS contained an average of 6 labels per drug, so many were not found as they are so specific. Some simple label modifications could work, for example by excluding the information about format and quantity from the labels.
Beyond web searches for which we need adequate labels, a persistent underlying
challenge in this text analysis process is the matching of labels whether we look at web sentences or sentences from DailyMed indications. In this research, we have relied on the simplest matching algorithm of Lucene, but we should investigate this further. Also, in complement to better label matching, we can explore inferencing. We sometimes saw that a more general disorder was mentioned in the text but that
NDF-RT had a may_treat
pair with a more specific disorder. The generic-specific predicate could be used for inferences.
New experimentations should also be done on the opposite findings of drugs for
In conclusion, much future work is envisaged to further exploit the content of each
resource with more refined methods. First, we should deploy more precise analysis for parsing and distilling the knowledge from DailyMed. For the Web, we need to access quality information, and deal with noise and redundancy. Redundancy is an issue we briefly mentioned, but need to come back to. Pure copy of information is noise in a statistical process, but redundancy could be used as certainty evaluation on new information if different recognized web sources all corroborate that same information. The present research has shown that textual data found on the Web can be valuable, so it is worth exploring to provide ways of enriching specialized resources.
1. Rubin, D. L., Moreira, D. A., Kanjamala P. P., Musen M. A. (2008), AAAI Spring
Symposium Series, Symbiotic Relationships between Semantic Web and Knowledge Engineering, Stanford University.
2. Lincoln, M.J., et al. (2004) U.S. Department of Veterans Affairs Enterprise Reference
Terminology strategic overview, Studies In Health Technology And Informatics, 107, 391-395.
3. Carter J.S., Brown S.H., Erlbaum M.S., Gregg W., Elkin P.L., Speroff T., Mark S. Tuttle,
M.S. (2002), Initializing the VA medication reference terminology using UMLS Metathesaurus co-occurrences. Proceedings of AMIA Annual Symposium, Boston, p.116–20.
4. Bodenreider, O. (2004), The Unified Medical Language System (UMLS): integrating
biomedical terminology, Nucleic Acids Research 2004, vol . 32 (Database issue).
5. Jentzsch A, Zhao J, Hassanzadeh O (2009), Linking Open Drug Data, Triplification
6. Manning, C.D, P. Raghavan, H. Schütze. Cambridge UP (2008). Classical and web
information retrieval systems: algorithms, mathematical foundations and practical issues.
7. Klein D., Manning C.D. (2003), Accurate Unlexicalized Parsing. Proceedings of the 41st
Meeting of the Association for Computational Linguistics, pp. 423-430.
8. Miwa M., Pyysalo S., Hara T., Tsujii J. (2010), A Comparative Study of Syntactic Parsers
for Event Extraction, Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, ACL 2010, Uppsala, Sweden, p.37-45.
9. Kim J.-D., Wang Y., Takagi T., Yonezawa A. (2011), Overview of Genia Event Task in
BioNLP Shared Task 2011, Proceedings of BioNLP Shared Task 2011, Workshop 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, p.7-15.
10.Auger, A. and Barrière, C. (2010) Probing Semantic Relations: Exploration and
identification in specialized texts, Benjamins Current Topics, John Benjamins.
11.Agbago, A., Barrière, C. (2005) Corpus Construction for Terminology, Proceedings from
the Corpus Linguistics Conference Series, Birmingham, UK.
12.Brin, S. (1998) Extracting Patterns and Relations from the World Wide Web, Proceedings of
the International Workshop on the Web and Databases, pp. 172-183.
SPORT VARI giovedì 28 luglio 2011 s dono il 27 le iscrizioni per il 17°rally internazionale delle ValliCuneesi, in programma il 2 e 3 Gemelli alla riscossa MANTA - Si- Il mantese Simone Iscrizioni settembre, con partenza ed arri-vo a Dronero. Roasio in azione SAN VITO DI CADO- RE - Domenica 24 luglio I vincitori del Rally 2010, Sossella-Nicola nale.
European Green City Index | Luxembourg (city)_LuxembourgEuropean Green City Index | Luxembourg (city)_Luxembourgternatives to the motor vehicle. Besides a well-where it rivals the signage for motorists. In linewith SuperDrecksKëscht, a waste managementconnected public bus network, this strategy haswith developments in other European cities, Lux-foundation to convert all public buildings