Microsoft word - wordnet-granada.doc

Coping with WORDNET Sense Proliferation
Alessandro Artale, Anna Goy*, Bernardo Magnini, Emanuele Pianta
& Carlo Strapparava
IRST, Istituto per la Ricerca Scientifica e Tecnologica [artale | magnini | pianta | strappa@irst.itc.it] *Dipartimento di Informatica - University of Torino, Italy Abstract
1. Adding Subject Field Labels
WORDNET makes a great number of fine-grained word sense Experimental work by Leacock, Towell & Voorhees distinctions. However, what could be seen as an advantage has (1995) shows that knowing the topic of the discourse often been considered a problem from a computational point of (topical context), allows current algorithms for word view. A great number of sense distinctions makes harder the disambiguation to select the correct sense of a word in problem of word sense disambiguation. One way to face this 70% of cases; human subjects seem to perform the same issue is reducing the number of senses, for example by grouping task with comparable results. For example if a human them into equivalence classes which abstract on some aspects of subject is given the word sheet and the topical context the meanings of words. In this paper we will try a different “sleeping”, he/she is very likely to select the meaning approach. Although we recognize that some sense distinctions in “bed linen” instead of “piece of paper”. Miller (1995) WORDNET are dubious, we prefer to keep the semantic richness suggests that topical context could be used to choose of WORDNET and to make some proposals to extend it in order among WORDNET senses. For instance if the domain of to make the task of word sense disambiguation easier. discourse is limited to air travel, only one of the nine senses listed in WORDNET 1.6 for the word flight is likely Introduction
to occur. To use topical context for disambiguating WORDNET senses at least the following steps are needed: Lexical Semantic research in the last years (Calzolari, 1992; Pustejosky, 1995) has emphasized the centrality of the notion of word sense in the organization of a 2. to associate subject codes to WORDNET synsets; computational lexicon. The availability of word sense 3. to label discourse segments with subject codes. repositories, such as WORDNET (Miller, 1990), increased the interest for the realization of concrete NLP Point 3 is out the scope of this article; in the rest of this applications that can take advantage of sense distinctions. section we will concentrate on points 1 and 2. However, a well known problem for the computational The first issue is what counts as a subject code. Let us use of WORDNET is that, although it includes a large consider first how subject codes are used in existing amount of word senses, just few information are available lexical resources. If we look at paper dictionaries we find that can be used for sense disambiguation. Although some that the best approximation to the notion of subject code of the WORDNET sense distinctions are ill-motivated, in are field labels such as: Anat (anatomy), Archeol this paper we take the view that the large majority of them (archeology), Bot (botany), etc. The number of such are reasonable. In this paper we make some proposals for labels varies among dictionaries. Here is a sample list of extensions to WORDNET, which can be used to improve seven dictionaries of different nature; five of them are monolingual, two bilingual, four are large size, one is Some of the data presented in the paper are derived from medium and two are pocket-size; relevant languages are Italian WORDNET (Magnini & Strapparava, 1997; English, Italian, Spanish and Swedish. We give the Magnini et al., 1994), an extension of the English number of distinct field labels for each dictionary: WORDNET to Italian, currently under development at Irst. The approach we propose will be concretely experimented • Oxford Adv. Learner's Dict. of Current Eng. (English in the context of the LE TAMIC-P project (Transparent Access to Multiple Information for the Citizen - • DISC (Italian monolingual, large-size): 41 Pensions), an information access system specifically • Garzanti (Italian monolingual, large-size): 58 designed for the Public Administration domain. • Palazzi (Italian monolingual, large-size): 129 The paper is organized as follows. In section 1 we • Goeteborg Lexical DataBase (Swedish monolingual, introduce a new semantic relation (pertain-to-subject), useful to disambiguate word senses against topical • Herder (Bilingual Italian-Spanish, pocket-size): 48 contexts. Section 2 suggests to extend verbal frames with • Collins GEM (Bilingual Ita-Eng, pocket-size): 47 more accurate selectional restrictions expressed as logical compositions of noun-synsets. Section 3 analyzes disambiguation problems of WORDNET adjective senses The union set of all the labels includes 177. Here is a call it the pertain-to-subject relation. Its meaning is as table reporting how many labels occur in how many the lexical concept identified by synset S1 pertains to the subject field identified by synset S2 This solution has at least two main advantages: 1) the definition of subject codes reduces to a selection between the existing WORDNET synsets; 2) we can associate subject fields to synsets by introducing an instance of a known class of WORDNET relations (semantical relations) We tried to map the kind of subject labels used by paper dictionary onto WN synsets, and found that it is always As the date shows, a relatively small set of labels (40) possible to find a one to one correspondence. This is quite occurs in almost all dictionaries whereas a large set (99) a good result as one is often willing to couple information coming from WORDNET with the information coming It is worth noting that some of WORDNET glosses include from the definitions and the glosses available through a field label of this type (at the beginning of the definition Machine Readable Dictionaries. For example this is a between parenthesis). See for example the synsets for crucial task in the project for the semi-automatic computer program and its hypernyms. construction of the Italian version of WORDNET, undergoing at IRST. Note that, contrary to what happens {program, programme, computer program, .} with paper dictionaries, it is sometimes difficult to match the subject labels used in WN glosses with WN synsets: for instance it is not possible to find a synset corresponding to the subject Scandinavian -- ((computer science) written programs lexicographers use subject descriptions which are more specific than any existing lexical concept in WORDNET. Generally speaking, finding the right level of granularity for topical contexts is a problematic issue. We feel that using the kind of granularity supplied by WORDNET WORDNET 1.6 glosses include approx. 200 subject labels. synsets is a sensible and balanced solution to the problem. The use of labels is quite free and there seems not to be an So in the above example we would use a less specific established set of labels that all lexicographers use. For subject, that is the concept identified by the example as label of medical terms one can find one of the following codes: med and pathology, med, So far we proposed a solution for step 1 (defining a set medical, medicine and pathology. Some labels subject codes). The second step (adding pertain-to-subject are very idiosyncratic, i.e. they label only one synset (for relations) need to be done by hand. An experimental instance: bacteriology, classical antiquity, project at this end is undergoing at IRST. Notice that the matrix algebra). Approx. 3500 synsets are labeled pertain-to-subject relation has an interesting feature that by subject codes, i.e. 3.5% of all synsets. We can makes our task easier. Actually, we can assume that if S1 conclude that the use of subject label in WORDNET 1.6 is pertains-to-subject S2, then the same relation holds for all not systematic and has a quite limited coverage. Just to the hyponyms of S1. Thus, we can use WORDNET make an example, no field label distinguishes the two hierarchy to add subject field information in a very {mouse} -- (any of numerous small rodents .) 2. Adding Verbal Selectional Restrictions
{mouse} -- (a hand-operated data input .) Selectional restrictions provide an explicit semantic information that the verb supplies about its arguments The question now becomes: are field labels, as they are (Jackendoff, 1990). Although this information could be used by paper dictionaries and WordNet definitions, profitably used for verbal sense disambiguation, there suitable for word sense disambiguation? The answer is: seems to be at least two open questions relevant for the probably no. Field labels are manly used to signal the introduction of selectional restrictions into the WORDNET specialistic use of a word, words that are used in a specific framework: (i) a decision has to be taken whether a discipline, craft or activity (Landau, 1994). They are not selectional restriction is a lexical relation, i.e., it has to be used to disambiguate the meaning of words. Two are the associated to a word, or it is a conceptual one, i.e., it has consequences: (a) many ambiguous words don't have any to be associated to a synset; (ii) it is necessary to field label (because they don't belong to any specialistic individuate the appropriate degree of details in the terminology); (b) only a very restricted number of labels description of selectional restrictions. refer to non specialistic subjects. To overcome these As far as the first point is concerned, currently WORDNET shortcomings let us try a different approach. If we look at implements selectional restrictions as lexical relations, the subject labels used by dictionaries we see that most of that is, syntactic frames and their restrictions are them are words that we can look up in WordNet. Thus, we associated to verbal word forms. This is necessary could use the synsets themselves as subject identifiers. because verbs in the same synset can have different Then, to associate word senses to subject fields we need to superficial behaviors and so they need different introduce a new semantic relation between synsets. We selectional restrictions. In the following example the al. 1997) we argued that a more detailed level of Italian verbs “scrivere”(write) and “redigere” (indite), selectional restrictions than the one implemented in which are synonyms in the synset Write-Compose (see WORDNET would make sense disambiguation more Figure 1), admit different selectional restrictions: effective. In particular we suggested to define selectional restrictions as a logical combination of WORDNET noun (1) Proust ha scritto/composto/*redatto la “Recherche” nel synset. The appropriate combination of synsets for an 1912. (Proust wrote/composed/*indited the “Recherche” in argumental position has to be both enough general to preserve all the human readings, and enough restricted for discriminating among different senses of both verb and However, it seems also reasonable that verbs belonging to noun. Figure 2 shows selectional restrictions for the senses the same synset share common properties (because they of the verb write. For each sense a conventional name are synonyms) and that these properties can be which unambiguously identify the synset is reported, as represented at the synset level. In our view a verbal synset well as the argumental positions admitted for that sense, is an homogeneous conceptual representation of a along with the indication of the selectional restrictions. state/action which is linguistically lexicalized by the We approached the problem of selecting the right verb verbs belonging to the synset. As such, a verbal synset sense by finding the appropriate selectional restrictions. can be described by a fixed number of participants to This revealed as a difficult and time consuming task. In the state/action, each of them playing a semantic role and order to achieve a good trade-off between discrimination each of them restricted to be of a particular kind. For power and precision level we adopted an empirical instance, the Write-Compose synset require an agent, process with successive steps of refinement. We started who has to be a human, and a theme, that has to be a kind with general selectional restrictions and then we validate them against a previously collected corpus. But it is also Given the above considerations we propose to represent true that some form of reusability apply, at least when selectional restrictions at the synset level where they building selectional restrictions for the various senses of provide generic and typical restrictions over semantic the same verb. Let us consider the write senses. The participants to the state/action described by a verbal restriction for the Object of Write-Communicate is synset. As far as more specific uses of a single verb form just the union of the ones we imposed for the Write- are concerned (as it is the case for the verb indite in Compose and Write-Trace. We built the Object sentence 1) more peculiar information need to be added to restriction for the Write-Send sense by refining the the single verb entry. This latter point will not be further Object restrictions of Write-Compose and Write- Communicate senses by looking for all kinds of Communications that we can send. A simple look at Synset Label
Italian Synset
English Synset
the selectional restrictions shows an evidence for a hierarchical relation between the two senses Write- Send and Write-Communicate, also confirmed empirically. We would note that, every time a troponymy relation between two verbs holds - defined as the co- occurrence of both lexical implication and temporal co- extension between two verbs - a subsumption relation between the correspondent selectional restrictions holds, too. Obviously, a hierarchical structure would make easier the addition of new selectional restrictions An experiment was made that shows both the plausibility of WORDNET senses for describing lexical entries and the usability of WORDNET for carrying out lexical discrimination. In the experiment a small number of lexical entries was built to allow an Italian parser to analyze a set of sentences. Whenever the parser. tries to Figure 1. Correspondences between Italian and English build a (partially recognized) constituent it incrementally synsets for the verb ‘scrivere’ (write) verifies the admissibility of the semantic part of such a constituent. In particular, whenever a noun is associated with a verbal argument an ISA function is triggered to As far as the level of description of selectional restrictions check whether the synset of the noun is subsumed by the is concerned, all the English verbs of WORDNET are selectional restriction of the corresponding verbal described resorting to a set of 35 different syntactic argument. As soon as this semantic test fails the frames, which in turn include only two restrictions, that is “Something” and “Somebody”. For example, the frames As an example of use of selectional restrictions for provided for the verb “Write” in the synset {publish, disambiguation, consider the following sentence: Peter write} are given in the form of two patterns, where the writes its name to Mary, where name is subsumed by the dots can be substituted by the verb stem: synset Signal. The only allowed senses for write are Write-Trace and Write-Communicate. Indeed, since a Signal cannot be the object of a composition the sense Write-Compose is discarded. This same argument applies to the remaining senses. It is interesting This level of description in many cases results to be too to note that, even if this is an ambiguous case, the general for a word sense disambiguation task. In (Artale et WORDNET Synset
Indirect-Object
Somebody (Communication∧ ¬Signal) ∨ (Signal ∨ Measure-Amount ∨ Language-Unit ∨ Property) Figure 2. Synset Selectional Restrictions Experimental Setting
# of readings
Discrimination Rate
Precision
Discrimination with WORDNET Full Hierarchy Figure 3. Quantitative results obtained on 60 sentences preferred reading is the one of Write-Communicate proposed in (Gomez et al. 1997), we can associate to each since the noun phrase to Mary fills the indirect object verb a frame-like representation where every thematic role is annotated with the syntactic relation introducing it - Two hypotheses on selectional restrictions have been including the possible preposition allowed - together with checked, i.e., the one with general WORDNET frames and the semantic restriction required by the thematic role. In the other with more refined selectional restrictions. The this work hypothesis, the verb hierarchy would be crucial analyses produced by a parser have been compared with since we could exploit the inheritance mechanism during the set of interpretations given by a human. Results are reported in Figure 3. These results have to be interpreted considering that the 3 Adjective Polysemy
focus of the experiment is on selectional restrictions, One aspect of the word disambiguation task, when which of course is just one among the various kinds of interpreting a sentence, is related to head-modifier information occurring during lexical discrimination. It is constructions. In such constructions, the disambiguation worth mentioning here some other crucial information usually consists in choosing the proper sense for the sources: (i) world knowledge (e.g., it is very strange to modifier, given the one of the head1. Among head- Write a Paper on a Newspaper-Periodic); (ii) modifier constructions, noun-adjective ones are aspectual properties of the verb, e.g., it is very difficult to particularly interesting since the meaning of adjectives interpret the sentence “Mary is writing an article on the strongly depends on the context, and the main feature of newspaper” with the Write-Publish sense, because the linguistic context is the noun they modify. publishing is a culminative process. For what concerns the One of the best known examples of the difficulty in first point, a WORDNET sense should provide information selecting the proper sense for the modifier is the adjective about the sense related verbal default arguments good, when modifying different nouns (good news, good (Pustejosky, 1995). This is relevant because sense knife, good sandwich, good wife, etc.): WORDNET lists 25 disambiguation is crucially affected by the kind of sense for good (as an adjective). A simpler example are adjuncts the sense admits (Gomez et al. 1997). Consider adjectives which denote psychological states (sad, happy, etc.)2. Let's consider the Italian adjective allegro (happy/cheerful). The Italian dictionary Palazzi-Folena gives the following definition for allegro: (3) Mary wrote a letter on the blackboard. While in sentence (2) write is ambiguous between Write-Compose and Write-Trace, the verbal 1 This task relies on the assumption that the head has already adjuncts on the blackboard in sentence (3) eliminates the been disambiguated; actually, these two steps, i.e. the choice of Write-Compose sense allowing only the Write- the proper sense for the head and then for the modifier, need not Trace interpretation. This kind of disambiguation can be carried on by adding more structure to a verb synset. As 2 For an analysis of such adjectives, see (Goy 1998). allegro 1. che sente o dimostra allegria (stato d'animo lieto e
− if it is a natural kind (fiore allegro - cheerful festoso, allegrezza); di temperamento o disposizione allegra - è flower), then only the "causative" reading seems to be un tipo allegro. 2. brioso, che infonde allegria - colore,
available ("a flower that makes people watching it spettacolo allegro, musica allegra.3 • if the noun denotes a human being, then we have Adjectival entries in Italian WORDNET are still under development; however we assume that the synsets − if it refers to a "role" (pittore allegro - cheerful painter), available for allegro will correspond to these two then all three reading are available4 ("a cheerful person, who is a painter", "a painter whose paintings make people watching it cheerful", "a painter whose paintings Synset Label
Italian Synset
− if it does not refers to any "role" (ragazzo allegro - cheerful boy), then the "stative" reading seems to be Figure 4. Italian synsets for the adjective allegro If each sense in the WORDNET entry for the adjective contains the selectional restriction for the argument to be modified, then the disambiguation task could be performed by matching such restrictions with the semantic type of the head noun, i.e. with its WORDNET synset (or (4) Papà è allegro questa sera (Dad is happy tonigth) one of its hyperonyms). For instance, the hyperonym (5) Vorrei comprarmi un quadro allegro per il soggiorno hierarchy of the synset corresponding to the first sense of (I would like to buy a cheerful painting for the living room) In (4) allegro refers directly to the psychological state of a human being (the one denoted by "papà"), while in (5) its meaning is something like "which cause the hyperonym hierarchy of the synset corresponding to happiness/cheerfulness in people watching it". The main the first sense of painting (quadro) contains point here is that we can disambiguate allegro, i.e. we can select the "causative" sense, only by taking into account the semantic properties of the noun it refers to, i.e. quadro (painting), which denotes an artifact. On the adjective side, the Allegro-stative synset As far as psychological adjectives are concerned, we can will have the selectional restriction human, while the have one more reading, i.e. the "manifestative" one (see Allegro-causative one will have artifact: this Bouillon 1996), as in (6), where affettuosa information is the one that enable the linguistic interpreter (loving/affectionate) means "that expresses/manifests to choose the proper sense in cases as (4) and (5). love". Conclusions
(6) Maria mi ha scritto una lettera molto affettuosa (Maria wrote me a very affectionate letter) In this paper we made three proposals for coping with the so called problem of sense proliferation in WordNet. The availability of these three interpretations - "stative", Instead of reducing the richness of WordNet sense as in (4), "causative", as in (5), and "manifestative", as in distinctions, we propose to add new information useful for (6) - depends on the kind of adjective involved, since not every psychological adjectives allow all three, but also on the semantic type of the modified noun. As far as the first information is concerned, the Acknowledgements
availability of one, two, or three readings in encoded in This work has been partially founded by the LE-4253 the number of senses of the adjectival entry. As for the TAMIC-P project. Italian WORDNET has been partially interaction with the meaning of the noun, intuitively, the developed in the framework of the ILEX (Italian Lexicon) disambiguation strategy is the following: • if the noun denotes an event (scampagnata allegra - Bibliographical References
cheerful trip), then the "stative" reading is not available; Artale, A, Magnini, B., Strapparava, C., (1997). WordNet if the noun denotes a physical object, then we need a for Italian and Its Use for Lexical Discrimination. In distinction between (at least) artifacts and natural kinds: − Maurizio Lenzerini (Ed.) AI*IA 97: Advances in if it is an artifact (quadro allegro - cheerful Artificial Intelligence. Proceeedings of the 5th painting), the expression seems to be ambiguous (at Congress of the Italian Association for Artificial least) between two readings: the "causative" ("a painting Intelligence, Roma, Italy, 16-19 settembre 1997, that makes people watching it cheerful") and the "manifestative" ("a painting that expresses the painter's Briscoe, T. (1991). Lexical Issues in Natural Language Processing. In Klein E. and Veltman F. (eds.): Esprit
3 1. Who feels or shows happiness (cheerful mood); with happy
temperamento or disposition - he is a happy guy. 2. brioso, that
infuses happiness - cheerful color, show, music. 4 Maybe with different degrees of acceptability. Symposium on Natural Language and Speech, Berlin, Leacock, C., Towell G. & Voorhees E.M. (1996). Towards building contextual representations of word Bouillon, P. (1996). Mental states adjectives: the senses using statistical models. In Boguraev, B. & perspective of generative lexicon. In Proceedings of Pustejovsky, J. (Eds.), Corpus processing for lexical acquisition (pp. 97—113). Cambridge, MA: The MIT Calzolari N. (1992). Acquiring and Representing Semantic Information in a Lexical Knowledge Base. Magnini, B. & Strapparava, C. (1997). Costruzione di In Pustejovsky, J. & Bergler, S. (eds.) Lexical una base di conoscenza lessicale per l’italiano basata Semantics and Knowledge Representation, Springer- su WORDNET. In Proceedings of the XXVII Congresso Internazionale di Studi della Società di Delmonte, R., Ferrari, G., Goy, A., Lesmo, L., Magnini, Linguistica Italiana “Linguaggio e Cognizione”, B., Pianta, E., Stock, O., Strapparava, C. (1996). ILEX - Un dizionario computazionale dell'Italiano. In Magnini, B., Strapparava C., Ciravegna, F., Pianta, E. Proceedings of the 5th Convegno Nazionale della (1994). Multilingual Lexical Knowledge Bases: Associazione Italiana per l'Intelligenza Artificiale, Applied WORDNET Prospects. In Proceedings of the Workshop The Future of the Dictionary, Grenoble. Gomez, F., Segami, C. & Hull, R. (1997). Determing Miller, G.A. (ed.). (1990). WORDNET: An on-line Prepositional Attachment, Prepositional Meaning, Verb lexical database. International Journal of Meaning, and Thematic Roles. Computational Lexicography (special issue), 3 (4), pp. 235-312. Miller, G.A. (1995). A lexical database for English. Goy A. (1998) Il ruolo della semantica lessicale nella Communications of the ACM, 38(11), pp. 39—41. comprensione del linguaggio naturale: il caso degli Palazzi F. e Folena G. (1992). Dizionario della lingua aggettivi in italiano, PhD thesis, Università di Torino. Jackendoff, R. (1990). Semantic Structures. Current Pustejovsky, J. (1995). The Generative Lexicon. The MIT Studies in Linguistics. The MIT Press, Cambridge, Siegel S. & Castellan N.J. (1988). Nonparametric Landau, S.I. (1994). Dictionaries: The art & craft of Statistics for the Behavioural Sciences. McGraw-Hill, lexicography. New York: The Scribner Press.

Source: http://multiwordnet.fbk.eu/paper/wordnet-granada.pdf

Life-style advice to men who have had one or more abnormal sperm function tests

Any alteration in adverse factors can take 10-12 weeks to show an normal fertilisation after intercourse, but cannot be guaranteed to do so. A poor swim up has less than 4 million/ml rapidly motile sperm and would be unlikely to achieve fertilisation after normal intercourse or standard in-vitro Parameters measured in sperm function tests fertilisation (IVF). Persistently poor sperm swim u

Luxan etiketten insecten

Luxan Houtinsecticide-P NW druk met grove druppel. De vereiste hoeveel- Werkzame stof: permethrin heid zo nodig in meer dan één bewerking Gehalte: 2 g/l opbrengen, zodanig dat per bewerking de vloei- Bevat: kerosine, lichte fractie, stof juist niet afdruipt. Voor het bestrijden vande grote houtworm (Xestobium rufovillosum) Aard van het preparaat: vloeistof moet het houtwerk pl