[非辞書](非辞書.md),[论文](论文.md) [[我的语言学书单]] # 原文 , although lemmatisation is at least partly arbitrary. 在形态学和词典编纂中,引理(复数:lemmas 或 lemmata)是一组词形式的规范形式、 [[1]](#cite_note-1) 词典形式或引文形式。 [[2]](#cite_note-2) 例如,在英语中,break、break、broken、broken 和 Breaking 是同一词位的形式,其中 break 作为它们索引的引理。在这种情况下,词位是指单个单词范式中所有变形或交替形式的集合,而引理是指按照惯例选择来表示词位的特定形式。引理在阿拉伯语、土耳其语和俄语等高度变形的语言中具有特殊意义。确定给定词位的引理的过程称为词形还原。尽管词形还原至少部分是任意的,但引理可以被视为主要部分的主要部分。 Morphology 形态学 [[edit](/w/index.php?title=Lemma_(morphology)&action=edit&section=1 "Edit section: Morphology")] --------------------------------------------------------------------------------------------------------------- The form of a word that is chosen to serve as the lemma is usually the least [marked](/wiki/Markedness "Markedness") form, but there are several exceptions such as the use of the infinitive for verbs in some languages. 被选作引理的单词形式通常是最不标记的形式,但也有一些例外,例如在某些语言中使用动词不定式。 For English, the citation form of a [noun](/wiki/Noun "Noun") is the [singular](/wiki/Grammatical_number "Grammatical number") (and non-possessive) form: _mouse_ rather than _mice_. For multiword lexemes that contain [possessive adjectives](/wiki/Possessive_adjective "Possessive adjective") or [reflexive pronouns](/wiki/Reflexive_pronoun "Reflexive pronoun"), the citation form uses a form of the [indefinite pronoun](/wiki/Indefinite_pronoun "Indefinite pronoun") _one_: _do one's best_, _perjure oneself_. In European languages with [grammatical gender](/wiki/Grammatical_gender "Grammatical gender"), the citation form of regular adjectives and nouns is usually the masculine singular.[_[citation needed](/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")_] If the language also has [cases](/wiki/Grammatical_case "Grammatical case"), the citation form is often the masculine singular nominative. 对于英语,名词的引用形式是单数(非所有格)形式:mouse 而不是 mice。对于包含所有格形容词或反身代词的多词词位,引文形式使用不定代词一的形式:do one's best, perjure myself。在有语法性别的欧洲语言中,常规形容词和名词的引用形式通常是阳性单数。 [_[citation needed](/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")_] 如果该语言也有格,引文形式通常是阳性单数主格。 For many languages, the citation form of a [verb](/wiki/Verb "Verb") is the [infinitive](/wiki/Infinitive "Infinitive"): [French](/wiki/French_language "French language") __[aller](https://en.wiktionary.org/wiki/aller#French "wikt:aller")__, [German](/wiki/German_language "German language") __[gehen](https://en.wiktionary.org/wiki/gehen#German "wikt:gehen")__, [Hindustani](/wiki/Hindustani_language "Hindustani language") [जाना](https://en.wiktionary.org/wiki/%E0%A4%9C%E0%A4%BE%E0%A4%A8%E0%A4%BE#Spanish "wikt:जाना")/[جانا](https://en.wiktionary.org/wiki/%D8%AC%D8%A7%D9%86%D8%A7#Spanish "wikt:جانا"), [Spanish](/wiki/Spanish_language "Spanish language") __[ir](https://en.wiktionary.org/wiki/ir#Spanish "wikt:ir")__. English verbs usually have an infinitive, which in its bare form (without the particle _to_) is its least marked (for example, _break_ is chosen over _to break_, _breaks_, _broke_, _breaking_, and _broken_); for [defective verbs](/wiki/Defective_verb "Defective verb") with no infinitive the present tense is used (for example, _must_ has only one form while _shall_ has no infinitive, and both lemmas are their lexemes' present tense forms). For [Latin](/wiki/Latin "Latin"), [Ancient Greek](/wiki/Ancient_Greek "Ancient Greek"), [Modern Greek](/wiki/Modern_Greek "Modern Greek"), and [Bulgarian](/wiki/Bulgarian_language "Bulgarian language"), the first person singular [present tense](/wiki/Present_tense "Present tense") is traditionally used, but some modern dictionaries use the infinitive instead (except for Bulgarian, which lacks infinitives; for [contracted verbs](/wiki/Ancient_Greek_verbs#Contracted_verbs "Ancient Greek verbs") in Ancient Greek, an uncontracted first person singular present tense is used to reveal the contract vowel: φιλέω _philéō_ for φιλῶ _philō_ "I love" [implying affection], ἀγαπάω _agapáō_ for ἀγαπῶ _agapō_ "I love" [implying regard]). [Finnish](/wiki/Finnish_language "Finnish language") dictionaries list verbs not under their root, but under the first infinitive, marked with _-(t)a_, _-(t)ä_. 对于许多语言来说,动词的引用形式是不定式:法语 aller、德语 gehen、印度斯坦 जाना/ ׬רा、西班牙语 ir。英语动词通常有一个不定式,其裸露形式(不带助词 to)是最不明显的(例如,break 被选择为 Break、break、broken、breaking 和 broken);对于没有不定式的缺陷动词,使用现在时(例如,must 只有一种形式,而 will 没有不定式,并且两个引理都是其词素的现在时形式)。对于拉丁语、古希腊语、现代希腊语和保加利亚语,传统上使用第一人称单数现在时,但一些现代词典使用不定式(保加利亚语除外,它缺少不定式;对于古希腊语中的收缩动词,使用不收缩的第一人称)单数现在时用于揭示契约元音: φιλέωphiléō 表示 φιλῶphilō“我爱”[暗示喜爱],ἀγαπάωagapáō 表示 ἀγαπῶagapō“我爱”[暗示尊重])。芬兰语词典列出的动词不在其词根下,而是在第一个不定式下,并用 -(t)a、-(t)ä 标记。 For [Japanese](/wiki/Japanese_language "Japanese language"), the non-past (present and future) tense is used. For [Arabic](/wiki/Arabic "Arabic") the third-person singular masculine of the past/perfect tense is the least-marked form and is used for entries in modern dictionaries. In older dictionaries, which are still commonly used, the [triliteral](/wiki/Triliteral "Triliteral") of the word, either a verb or a noun, is used. This is similar to [Hebrew](/wiki/Hebrew_language "Hebrew language"), which also uses the third-person singular masculine perfect form, e.g. ברא _bara'_ create, כפר _kaphar_ deny. [Georgian](/wiki/Georgian_language "Georgian language") uses the [verbal noun](/wiki/Verbal_noun "Verbal noun"). For [Korean](/wiki/Korean_language "Korean language"), _-da_ is attached to the stem. 对于日语,使用非过去(现在和将来)时态。对于阿拉伯语,过去时 / 完成时的第三人称单数阳性形式是最少标记的形式,用于现代词典中的条目。在仍然常用的旧词典中,使用单词的三字形式,无论是动词还是名词。这与希伯来语类似,也使用第三人称单数阳性完成形式,例如巴拉创造,卡法否认。格鲁吉亚语使用动词名词。对于韩语,-da 附加在词干上。 In [Tamil](/wiki/Tamil_language "Tamil language"), an [agglutinative language](/wiki/Agglutinative_language "Agglutinative language"), the verb stem (which is also the imperative form - the least marked one) is often cited, e.g., _[இரு](https://en.wiktionary.org/wiki/%E0%AE%87%E0%AE%B0%E0%AF%81#Tamil "wiktionary:இரு")_ 在泰米尔语(一种凝集语言)中,经常引用动词词干(也是祈使形式 - 最不标记的形式),例如இரு In [Irish](/wiki/Irish_language "Irish language"), words are highly inflected by case (genitive, nominative, dative and vocative) and by their place within a sentence because of [initial mutations](/wiki/Irish_initial_mutations "Irish initial mutations"). The noun _cainteoir_, the lemma for the noun meaning "speaker", has a variety of forms: _chainteoir_, _gcainteoir_, _cainteora_, _chainteora_, _cainteoirí_, _chainteoirí_ and _gcainteoirí_. 在爱尔兰语中,单词因格(属格、主格、与格和呼格)以及它们在句子中的位置(由于初始突变)而发生高度变化。名词 cainteoir(意为 “说话者” 的名词的引理)有多种形式:chainteoir、gcainteoir、cainteora、chainteora、cainteoirí、chainteoirí 和 gcainteoirí。 Some phrases are cited in a sort of lemma: _[Carthago delenda est](/wiki/Carthago_delenda_est "Carthago delenda est")_ (literally, "Carthage must be destroyed") is a common way of citing [Cato](/wiki/Cato_the_Elder "Cato the Elder"), but what he said was nearer to _censeo Carthaginem esse delendam_ ("I hold Carthage to be in need of destruction"). 有些短语是在某种引理中引用的:Carthago delenda est(字面意思是 “迦太基必须被摧毁”)是引用卡托的常见方式,但他所说的更接近于 censeo Carthaginem esse delendam(“我认为迦太基处于需要破坏”)。 Lexicography 词典编纂 [[edit](/w/index.php?title=Lemma_(morphology)&action=edit&section=2 "Edit section: Lexicography")] -------------------------------------------------------------------------------------------------------------------- In a dictionary, the lemma "go" represents the [inflected](/wiki/Inflection "Inflection") forms "go", "goes", "going", "went", and "gone". The relationship between an inflected form and its lemma is usually denoted by an angle bracket, e.g., "went" < "go". Of course, the disadvantage of such simplifications is the inability to look up a declined or conjugated form of the word, but some dictionaries, like [Webster's Dictionary](/wiki/Webster%27s_Dictionary "Webster's Dictionary"), list "went". Multilingual dictionaries vary in how they deal with this issue: the [Langenscheidt](/wiki/Langenscheidt "Langenscheidt") dictionary of German does not list _ging_ (< _gehen_), but the Cassell does. 在字典中,引理 “go” 代表屈折形式 “go”、“goes”、“going”、“went” 和“gone”。屈折形式与其引理之间的关系通常用尖括号表示,例如“went”<“go”。当然,这种简化的缺点是无法查找该词的变格形式或共轭形式,但有些词典(例如韦伯斯特词典)列出了“went”。多语言词典处理这个问题的方式各不相同:Langenscheidt 德语词典没有列出 ging (< gehen),但 Cassell 却列出了。 Lemmas or [word stems](/wiki/Word_stem "Word stem") are used often in [corpus linguistics](/wiki/Corpus_linguistics "Corpus linguistics") for determining word frequency. In that usage, the specific definition of "lemma" is flexible depending on the task it is being used for. 语料库语言学中经常使用引理或词干来确定词频。在这种用法中,“引理” 的具体定义是灵活的,具体取决于它所用于的任务。 Pronunciation 发音 [[edit](/w/index.php?title=Lemma_(morphology)&action=edit&section=3 "Edit section: Pronunciation")] -------------------------------------------------------------------------------------------------------------------- A word may have different [pronunciations](/wiki/Pronunciation "Pronunciation"), depending on its [phonetic](/wiki/Phonetic "Phonetic") environment (the neighbouring sounds) or on the degree of [stress](/wiki/Stress_(linguistics) "Stress (linguistics)") in a sentence. An example of the latter is the [weak and strong forms](/wiki/Weak_and_strong_forms_in_English "Weak and strong forms in English") of certain English [function words](/wiki/Function_word "Function word") like _some_ and _but_ (pronounced /sʌm/, /bʌt/ when stressed but /s(ə)m/, /bət/ when unstressed). Dictionaries usually give the pronunciation used when the word is pronounced alone (its [isolation form](/wiki/Isolation_form "Isolation form")) and with stress, but they may also note common weak forms of pronunciation. 一个单词可能有不同的发音,具体取决于它的语音环境(邻近的声音)或句子中的重音程度。后者的一个例子是某些英语功能词的弱形式和强形式,例如 some 和 but(重读时发音为 /sʌm/ 、 /bʌt/ but /s(ə)m/ 、 /bət/ 当无压力时)。字典通常给出单词单独发音(其孤立形式)和重音时使用的发音,但它们也可能会注明常见的弱发音形式。 Difference between stem and lemma[[edit](/w/index.php?title=Lemma_(morphology)&action=edit&section=4 "Edit section: Difference between stem and lemma")] -------------------------------------------------------------------------------------------------------------------------------------------------------- The [stem](/wiki/Word_stem "Word stem") is the part of the word that never changes even when morphologically inflected; a lemma is the least marked form of the word. For example, from "produced", the lemma is "produce", but the stem is "produc-". This is because there are words such as **produc**tion and **produc**ing.[[3]](#cite_note-3)[_[failed verification](/wiki/Wikipedia:Verifiability "Wikipedia:Verifiability")_] In linguistic analysis, the stem is defined more generally as the analyzed base form from which all inflected forms can be formed.[_[citation needed](/wiki/Wikipedia:Citation_needed "Wikipedia:Citation needed")_] When [phonology](/wiki/Phonology "Phonology") is taken into account, the definition of the unchangeable part of the word is not useful, as can be seen in the phonological forms of the words in the preceding example: "produced" [/prəˈdjuːst/](/wiki/Help:IPA/English "Help:IPA/English") vs. "production" [/prəˈdʌkʃən/](/wiki/Help:IPA/English "Help:IPA/English"). Some lexemes have several stems but one lemma. For instance the verb "[to go](https://en.wiktionary.org/wiki/go#English "wikt:go")"has the stems"go"and"went" due to [suppletion](/wiki/Suppletion "Suppletion"): the past tense was co-opted from a different verb, "[to wend](https://en.wiktionary.org/wiki/wend#English "wikt:wend")". Headword[[edit](/w/index.php?title=Lemma_(morphology)&action=edit&section=5 "Edit section: Headword")] ------------------------------------------------------------------------------------------------------ A **headword** or **catchword**[[4]](#cite_note-4) is the **lemma** under which a set of related [dictionary](/wiki/Dictionary "Dictionary") or [encyclopaedia](/wiki/Encyclopaedia "Encyclopaedia") entries appears. The headword is used to locate the entry, and dictates its alphabetical position. Depending on the size and nature of the dictionary or encyclopedia, the entry may include alternative meanings of the word, its [etymology](/wiki/Etymology "Etymology"), [pronunciation](/wiki/Pronunciation "Pronunciation") and [inflections](/wiki/Inflection "Inflection"), related lemmas such as [compound words](/wiki/Compound_word "Compound word") or phrases that contain the headword, and encyclopedic information about the concepts represented by the word. For example, the headword _[bread](/wiki/Bread "Bread")_ may contain the following (simplified) definitions: **Bread** _(noun)_ * A common food made from the combination of [flour](/wiki/Flour "Flour"), [water](/wiki/Water "Water") and [yeast](/wiki/Yeast "Yeast") * Money _(slang)_ _(verb)_ * To coat in breadcrumbs — **to know which side your bread is buttered** to know how to act in your own best interests. The _[Academic Dictionary of Lithuanian](/wiki/Academic_Dictionary_of_Lithuanian "Academic Dictionary of Lithuanian")_ contains around 500,000 headwords. The _[Oxford English Dictionary](/wiki/Oxford_English_Dictionary "Oxford English Dictionary")_ (OED) has around 273,000 headwords along with 220,000 other lemmas,[[5]](#cite_note-5) while _[Webster's Third New International Dictionary](/wiki/Webster%27s_Third_New_International_Dictionary "Webster's Third New International Dictionary")_ has about 470,000.[[6]](#cite_note-6) The _[Deutsches Wörterbuch](/wiki/Deutsches_W%C3%B6rterbuch "Deutsches Wörterbuch")_ (DWB), the largest lexicon of the [German language](/wiki/German_language "German language"), has around 330,000 headwords.[[7]](#cite_note-BBAW-7) These values are cited by the dictionary makers and may not use exactly the same definition of a headword. In addition, headwords may not accurately reflect a dictionary's physical size. The _OED_ and the _DWB_, for instance, include exhaustive historical reviews and exact citations from [source documents](/wiki/Source_document "Source document") not usually found in standard dictionaries. The term 'lemma' comes from the practice in Greco-Roman antiquity of using the word to refer to the headwords of marginal [glosses](/wiki/Gloss_(annotation) "Gloss (annotation)") in [scholia](/wiki/Scholia "Scholia"); for this reason, the [Ancient Greek](/wiki/Ancient_Greek "Ancient Greek") plural form is sometimes used, namely _lemmata_ (Greek λῆμμα, pl. λήμματα). See also[[edit](/w/index.php?title=Lemma_(morphology)&action=edit&section=6 "Edit section: See also")] ------------------------------------------------------------------------------------------------------ * [Lexeme](/wiki/Lexeme "Lexeme") * [Lexical Markup Framework](/wiki/Lexical_Markup_Framework "Lexical Markup Framework") * [Null morpheme](/wiki/Null_morpheme "Null morpheme") * [Principal parts](/wiki/Principal_parts "Principal parts") * [Root (linguistics)](/wiki/Root_(linguistics) "Root (linguistics)") * [Uninflected word](/wiki/Uninflected_word "Uninflected word") References[[edit](/w/index.php?title=Lemma_(morphology)&action=edit&section=7 "Edit section: References")] ---------------------------------------------------------------------------------------------------------- 1. **[^](#cite_ref-1 "Jump up")** 2. **[^](#cite_ref-2 "Jump up")** Francis, W.N.; Kučera, H (1982). _Frequency Analysis of English Usage: Lexicon and Usage_. Boston: Houghton Mifflin. 3. **[^](#cite_ref-3 "Jump up")** ["Natural Language Toolkit — NLTK 3.0 documentation"](http://www.nltk.org/). Nltk.org. 2015-09-05. Retrieved 2015-09-27. 4. **[^](#cite_ref-4 "Jump up")** _[Oxford English Dictionary](/wiki/Oxford_English_Dictionary "Oxford English Dictionary")_, 3rd. edition, 2018, [_s.v._](https://www.oed.com/view/Entry/28833), definition 5 5. **[^](#cite_ref-5 "Jump up")** ["Glossary - Oxford English Dictionary"](http://public.oed.com/how-to-use-the-oed/glossary/). public.oed.com. Retrieved 3 October 2016. 6. **[^](#cite_ref-6 "Jump up")** ["Mwunabridged"](http://www.merriam-webster.com/premium/mwunabridged/). www.merriam-webster.com. Retrieved 3 October 2016. 7. **[^](#cite_ref-BBAW_7-0 "Jump up")** [The Deutsches Wörterbuch](http://www.bbaw.de/en/research/dwb) [Archived](https://web.archive.org/web/20160812083200/http://www.bbaw.de/en/research/dwb) 2016-08-12 at the [Wayback Machine](/wiki/Wayback_Machine "Wayback Machine") at the BBAW, retrieved 22-June-2012. External links[[edit](/w/index.php?title=Lemma_(morphology)&action=edit&section=8 "Edit section: External links")] ------------------------------------------------------------------------------------------------------------------ ![](http://upload.wikimedia.org/wikipedia/commons/thumb/9/99/Wiktionary-logo-en-v2.svg/40px-Wiktionary-logo-en-v2.svg.png) Look up _**[Wiktionary:Lemmas](https://en.wiktionary.org/wiki/Wiktionary:Lemmas "wiktionary:Wiktionary:Lemmas")**_ in Wiktionary, the free dictionary. * [Dictionary](/wiki/Dictionary "Dictionary") * [Glossary](/wiki/Glossary "Glossary") * [Lexicon](/wiki/Lexicon "Lexicon") * [Phrase book](/wiki/Phrase_book "Phrase book") * [Thesaurus](/wiki/Thesaurus "Thesaurus") Types of [dictionaries](/wiki/Dictionary "Dictionary") * [Advanced learner's](/wiki/Advanced_learner%27s_dictionary "Advanced learner's dictionary") * [Anagram](/wiki/Anagram_dictionary "Anagram dictionary") * [Bilingual](/wiki/Bilingual_dictionary "Bilingual dictionary") * [Biographical](/wiki/Biographical_dictionary "Biographical dictionary") * [Conceptual](/wiki/Conceptual_dictionary "Conceptual dictionary") * [Defining vocabulary](/wiki/Defining_vocabulary "Defining vocabulary") * [Electronic](/wiki/Electronic_dictionary "Electronic dictionary") * [Encyclopedic](/wiki/Encyclopedic_dictionary "Encyclopedic dictionary") * [Etymological](/wiki/Etymological_dictionary "Etymological dictionary") * [Explanatory](/wiki/Explanatory_dictionary "Explanatory dictionary") * [Historical](/wiki/Historical_dictionary "Historical dictionary") * [Idiom](/wiki/Idiom_dictionary "Idiom dictionary") * [Language-for-specific-purposes](/wiki/Language-for-specific-purposes_dictionary "Language-for-specific-purposes dictionary") * [Machine-readable](/wiki/Machine-readable_dictionary "Machine-readable dictionary") * [Medical](/wiki/Medical_dictionary "Medical dictionary") * [Monolingual learner's](/wiki/Monolingual_learner%27s_dictionary "Monolingual learner's dictionary") * [Multi-field](/wiki/Multi-field_dictionary "Multi-field dictionary") * [Picture](/wiki/Picture_dictionary "Picture dictionary") * [Reverse](/wiki/Reverse_dictionary "Reverse dictionary") * [Rhyming](/wiki/Rhyming_dictionary "Rhyming dictionary") * [Rime](/wiki/Rime_dictionary "Rime dictionary") * [Single-field](/wiki/Single-field_dictionary "Single-field dictionary") * [Specialized](/wiki/Specialized_dictionary "Specialized dictionary") * [Spelling dictionary](/wiki/Spelling#Standards_and_conventions "Spelling") * [Sub-field](/wiki/Sub-field_dictionary "Sub-field dictionary") * [Visual](/wiki/Visual_dictionary "Visual dictionary") Other * [International scientific vocabulary](/wiki/International_scientific_vocabulary "International scientific vocabulary") * [List of lexicographers](/wiki/List_of_lexicographers "List of lexicographers") * [List of online dictionaries](/wiki/List_of_online_dictionaries "List of online dictionaries") [![](http://upload.wikimedia.org/wikipedia/en/thumb/e/e2/Symbol_portal_class.svg/16px-Symbol_portal_class.svg.png)](/wiki/File:Symbol_portal_class.svg "Portal") [Linguistics portal](/wiki/Portal:Linguistics "Portal:Linguistics") * [Germany](https://d-nb.info/gnd/4167354-2) NewPP limit report Parsed by mw‐web.codfw.main‐6cf7d57b97‐khjfs Cached time: 20240329055322 Cache expiry: 2592000 Reduced expiry: false Complications: [vary‐revision‐sha1, show‐toc] CPU time usage: 0.445 seconds Real time usage: 0.648 seconds Preprocessor visited node count: 1582/1000000 Post‐expand include size: 36760/2097152 bytes Template argument size: 2552/2097152 bytes Highest expansion depth: 12/100 Expensive parser function count: 10/500 Unstrip recursion depth: 1/20 Unstrip post‐expand size: 33828/5000000 bytes Lua time usage: 0.319/10.000 seconds Lua memory usage: 23630768/52428800 bytes Number of Wikibase entities loaded: 1/400 Transclusion expansion time report (%,ms,calls,template) 100.00% 596.838 1 -total 22.67% 135.288 8 Template:Lang 22.07% 131.694 1 Template:Wiktfr 19.06% 113.768 1 Template:Reflist 13.44% 80.194 2 Template:Cite_book 12.67% 75.636 1 Template:Lexicography 12.40% 74.002 1 Template:Navbox 12.25% 73.097 1 Template:Short_description 8.81% 52.597 1 Template:Authority_control 6.33% 37.751 2 Template:Citation_needed Saved in parser cache with key enwiki:pcache:idhash:2639048-0!canonical and timestamp 20240329055322 and revision id 1215634966. Rendering was triggered because: page-view