《词义消歧---算法与应用英文影印版)》是"计算语言学与语言科技原文丛书"中的一册。对于计算机来说,要理解人类语言就必须消除歧义,在计算语言学领域,词义消歧(Word Sense Disambiguation,简称WSD)一直是研究者探索的内容。本书是近年来国际学术界关于词义消歧研究成果的一部集成之作。几乎覆盖了词义消歧研究各个题目,具有重要学术价值。
關於作者:
艾吉瑞,西班牙国立巴斯克大学副教授。
目錄:
导读…1
Contributors…16
Foreword…19
Preface…23
1 Introduction…1
Eneko Agirre and Philip Edmonds
1.1 Word Sense Disambiguation…1
1.2 A Brief History of WSD Research…4
1.3 What is a Word Sense?…8
1.4 Applications of WSD…10
1.5 Basic Approaches to WSD…12
1.6 State-of-the-Art Performance…14
1.7 Promising Directions…15
1.8 Overview of This Book…19
1.9 Further Reading…21
References…22
2 Word Senses…29
Adam Kilgarriff
2.1 Introduction…29
2.2 Lexicographers…30
2.3 Philosophy…32
2.3.1 Meaning is Something You Do…32
2.3.2 The Fregean Tradition and Reification…33
2.3.3 Two Incompatible Semantics?…33
2.3.4 Implications for Word Senses…34
2.4 Lexicalization…35
2.5 Corpus Evidence…39
2.5.1 Lexicon Size…41
2.5.2 Quotations…42
2.6 Conclusion…43
2.7 Further Reading…44
Acknowledgments …45
References…45
3 Making Sense About Sense…47
Nancy Ide and Yorick Wilks
3.1 Introduction…47
3.2 WSD and the Lexicographers…49
3.3 WSD and Sense Inventories…51
3.4 NLP Applications and WSD…55
3.5 What Level of Sense Distinctions Do We Need for NLP, If Any?…58
3.6 What Now for WSD?…64
3.7 Conclusion…68
References…68
4 Evaluation of WSD Systems…75
Martha Palmer, Hwee Tou Ng and Hoa Trang Dang
4.1 Introduction…75
4.1.1 Terminology …76
4.1.2 Overview…80
4.2 Background…81
4.2.1 WordNet and Semcor…81
4.2.2 The Line and Interest Corpora…83
4.2.3 The DSO Corpus…84
4.2.4 Open Mind Word Expert…85
4.3 Evaluation Using Pseudo-Words…86
4.4 Senseval Evaluation Exercises…86
4.4.1 Senseval-1…87
Evaluation and Scoring…88
4.4.2 Senseval-2…88
English All-Words Task…89
English Lexical Sample Task…89
4.4.3 Comparison of Tagging Exercises…91
4.5 Sources of Inter-Annotator Disagreement…92
4.6 Granularity of Sense: Groupings for WordNet…95
4.6.1 Criteria for WordNet Sense Grouping…96
4.6.2 Analysis of Sense Grouping…97
4.7 Senseval-3…98
4.8 Discussion…99
References…102
5 Knowledge-Based Methods for WSD…107
Rada Mihalcea
5.1 Introduction…107
5.2 Lesk Algorithm…108
5.2.1 Variations of the Lesk Algorithm…110
Simulated Annealing…110
Simplified Lesk Algorithm…111
Augmented Semantic Spaces…113
Summary…113
5.3 Semantic Similarity…114
5.3.1 Measures of Semantic Similarity…114
5.3.2 Using Semantic Similarity Within a Local Context…117
5.3.3 Using Semantic Similarity Within a Global Context…118
5.4 Selectional Preferences…119
5.4.1 Preliminaries: Learning Word-to-Word Relations…120
5.4.2 Learning Selectional Preferences…120
5.4.3 Using Selectional Preferences…122
5.5 Heuristics for Word Sense Disambiguation…123
5.5.1 Most Frequent Sense…123
5.5.2 One Sense Per Discourse…124
5.5.3 One Sense Per Collocation…124
5.6 Knowledge-Based Methods at Senseval-2 …125
5.7 Conclusions…126
References…127
6 Unsupervised Corpus-Based Methods for WSD…133
Ted Pedersen
6.1 Introduction…133
6.1.1 Scope…134
6.1.2 Motivation…136
Distributional Methods…137
Translational Equivalence…139
6.1.3 Approaches…140
6.2 Type-Based Discrimination…141
6.2.1 Representation of Context…142
6.2.2 Algorithms…145
Latent Semantic Analysis LSA…146
Hyperspace Analogue to Language HAL…147
Clustering By Committee CBC…148
6.2.3 Discussion…150
6.3 Token-Based Discrimination…150
6.3.1 Representation of Context…151
6.3.2 Algorithms…151
Context Group Discrimination…152
McQuitty’s Similarity Analysis…154
6.3.3 Discussion…157
6.4 Translational Equivalence …158
6.4.1 Representation of Context…159
6.4.2 Algorithms…159
6.4.3 Discussion…160
6.5 Conclusions and the Way Forward…161
Acknowledgments…162
References…162
7 Supervised Corpus-Based Methods for WSD…167
Lluís M??rquez, Gerard Escudero, David Martínez and German Rigau
7.1 Introduction to Supervised WSD…167
7.1.1 Machine Learning for Classification …168
An Example on WSD…170
7.2 A Survey of Supervised WSD…171
7.2.1 Main Corpora Used…172
7.2.2 Main Sense Repositories…173
7.2.3 Representation of Examples by Means of Features…174
7.2.4 Main Approaches to Supervised WSD…175
Probabilistic Methods…175
Methods Based on the Similarity of the Examples…176
Methods Based on Discriminating Rules…177
Methods Based on Rule Combination…179
Linear Classifiers and Kernel-Based Approaches…179
Discourse Properties: The Yarowsky Bootstrapping Algorithm…181
7.2.5 Supervised Systems in the Senseval Evaluations…183
7. 3 An Empirical Study of Supervised Algorithms for WSD…184
7.3.1 Five Learning Algorithms Under Study…185
Na?ve Bayes NB…185
Exemplar-Based Learning kNN…186
Decision Lists DL…187
AdaBoost AB…187
Support Vector Machines SVM…189
7.3.2 Empirical Evaluation on the DSO Corpus…190
Experiments…191
7.4 Current Challenges of the Supervised Approach…195
7.4.1 Right-Sized Training Sets…195
7.4.2 Porting Across Corpora…196
7.4.3 The Knowledge Acquisition Bottleneck…197
Automatic Acquisition of Training Examples…198
Active Learning…199
Combining Training Examples from Different Words…199
Parallel Corpora…200
7.4.4 Bootstrapping…201
7.4.5 Feature Selection and Parameter Optimization…202
7.4.6 Combination of Algorithms and Knowledge Sources…203
7.5 Conclusions and Future Trends…205
Acknowledgments…206
References…207
8 Knowledge Sources for WSD…217
Eneko Agirre and Mark Stevenson
8. 1 Introduction…217
8.2 Knowledge Sources Relevant to WSD…218
8.2.1 Syntactic…219
Part of Speech KS 1)…219
Morphology KS 2…219
Collocations KS 3)…220
Subcategorization KS 4…220
8.2.2 Semantic…220
Frequency of Senses KS 5…220
Semantic Word Associations KS 6…221
Selectional Preferences KS 7…221
Semantic Roles KS 8…222
8.2.3 PragmaticTopical…222
Domain KS 9)…222
Topical Word Association KS 10…222
Pragmatics KS 11…223
8.3 Features and Lexical Resources…223
8.3.1 Target-Word Specific Features…224
8.3.2 Local Features…225
8.3.3 Global Features…227
8.4 Identifying Knowledge Sources in Actual Systems…228
8.4.1 Senseval-2 Systems…229
8.4.2 Senseval-3 Systems…231
8.5 Comparison of Experimental Results…231
8.5.1 Senseval Results…232
8.5.2 Yarowsky and Florian 2002…233
8.5.3 Lee and Ng 2002…234
8.5.4 Martínez et al. 2002…237
8.5.5 Agirre and Martínez 2001 a…238
8.5.6 Stevenson and Wilks 2001…240
8.6 Discussion…242
8.7 Conclusions…245
Acknowledgments…246
References…247
9 Automatic Acquisition of Lexical Information and Examples…253
Julio Gonzalo and Felisa Verdejo
9.1 Introduction…253
9.2 Mining Topical Knowledge About Word Senses…254
9.2.1 Topic Signatures…255
9.2.2 Association of Web Directories to Word Senses…257
9.3 Automatic Acquisition of Sense-Tagged Corpora…258
9.3.1 Acquisition by Direct Web Searching…258
9.3.2 Bootstrapping from Seed Examples…261
9.3.3 Acquisition via Web Directories…263
9.3.4 Acquisition via Cross-Language Evidence…264
9.3.5 Web-Based Cooperative Annotation…268
9.4 Discussion…269
Acknowledgments…271
References…272
10 Domain-Specific WSD…275
Paul Buitelaar, Bernardo Magnini, Carlo Strapparava and Piek Vossen
10.1 Introduction…275
10.2 Approaches to Domain-Specific WSD…277
10.2.1 Subject Codes…277
10.2.2 Topic Signatures and Topic Variation…282
Topic Signatures…282
Topic Variation…283
10.2.3 Domain Tuning…284
Top-down Domain Tuning…285
Bottom-up Domain Tuning…285
10.3 Domain-Specific Disambiguation in Applications…288
10.3.1 User-Modeling for Recommender Systems…288
10.3.2 Cross-Lingual Information Retrieval…289
10.3.3 The MEANING Project…292
10.4 Conclusions…295
References…296
11 WSD in NLP Applications…299
Philip Resnik
11.1 Introduction…299
11.2 Why WSD?…300
Argument from Faith…300
Argument by Analogy… 301
Argument from Specific Applications…302
11.3 Traditional WSD in Applications…303
11.3.1 WSD in Traditional Information Retrieval…304
11.3.2 WSD in Applications Related to Information Retrieval…307
Cross-Language IR…308
Question Answering…309
Document Classification…312
11.3.3 WSD in Traditional Machine Translation…313
11.3.4 Sense Ambiguity in Statistical Machine Translation…315
11.3.5 Other Emerging Applications…317
11.4 Alternative Conceptions of Word Sense…320
11.4.1 Richer Linguistic Representations…320
11.4.2 Patterns of Usage…321
11.4.3 Cross-Language Relationships…323
11.5 Conclusions…325
Acknowledgments…325
References…326
A Resources for WSD…339
A.1 Sense Inventories…339
A.1.1 Dictionaries…339
A.1.2 Thesauri…341
A.1.3 Lexical Knowledge Bases…341
A.2 Corpora…343
A.2.1 Raw Corpora…343
A.2.2 Sense-Tagged Corpora…345
A.2.3 Automatically Tagged Corpora…347
A.3 Other Resources…348
A.3.1 Software…348
A.3.2 Utilities, Demos, and Data…349
A.3.3 Language Data Providers…350
A.3.4 Organizations and Mailing Lists…350
Index of Terms…353
Index of Authors and Algorithms…361