摘要比较句是人们在表达个人观点时一种很有说服力的表达方式。随着互联网技术不断深入发展,互联网出现了大量的UGC数据,其中常用比较句来比较相同类别的商品。利用自然语言技术进行比较句的识别和要素抽取是目前一项很有价值的研究。该研究成果可为企业提供用户意见,同时可以给网民提供产品的综合评价、及注意事项等。33541
本文工作如下:
(1)基于规则的汉语比较句识别
总结大量语料中的比较句规则,形成模板库。使用模板直接匹配待识别句子。最后在汽车电子领域上测试,表明该方法有效。
(2)基于规则和分词的要素抽取
通过产品名表和属性表及比较规则的配合,抽取得到比较主体、客体、比较属性、比较规则。最后在汽车电子领域上测试,表明该方法可行。
关键词 比较句识别 比较句要素抽取 规则 分词 毕业论文设计说明书外文摘要
Title The Mining Algorithms of Compare reviews for Electronic Business Platform
Abstract
Compare sentence is an expression of personal views of persuasive expression.
With the deepening development of Internet technology, the Internet UGC there a lot of data, which are commonly used to compare the same comparative periods categories of goods. The use of natural language technology to identify and compare features extracted sentence is a valuable research currently. The research results can provide advice for business users, and can give users provide a comprehensive evaluation of the product, and precautions.
mainly for the following two aspects:
(1)Chinese Comparative Sentence Recognition Based on the rule
Summarize a large number of relatively sentence corpus rules, form a template library. Using templates directly match the sentence to be identified. Finally, in the automotive electronics field tests show that the method is effective.
(2)Comparative Sentences Relation Extraction Based on the rule
Base on the comparaive pattern set ,product name set ,attribute set and other resources,combing with the phenomenon of comparative sentences in the corpus,thereby extracte comparative relations.Reults show that the method is effective.
Keywords: Comparative-sentence-recognition Feature-extraction Rule Segmentation
目 次
1 绪论 1
1.1研究背景及意义 1
1.2国内外研究现状 1
1.3本文的主要研究内容 2
1.4本文的组织结构 3
2 比较句识别及要素抽取的基础任务、技术方法与评价指标 4
2.1比较句识别与要素抽取概述 4
2.2比较句识别主流技术概述 5
2.3比较句要素抽取主流技术概述 7
2.4 评价指标概述 8
2.5本章小结 9
3 基于规则的中文比较句识别 10
3.1数据预处理 10
3.2比较句识别算法 11
3.3比较句识别实验结果分析 12
3.4比较句识别错误性分析 16
3.5本章小结 16
4 基于规则化和分词的比较句要素抽取 18
4.1产品及属性库的建立 18
4.2比较句要素抽取核心算法 18
4.3实验结果分析 19
4.4比较句要素抽取方法错误性分析 20
4.5本章小结 23
5 结论与展望 24
5.1论文的主要研究结论 24
5.2下一步工作展望 24
致谢 25