摘要: 随着互联网的高速发展,网络应用已深入到人们日常生活的方方面面。电子邮件作为网络应用的一个重要方面已成为现代通信中不可缺少的一部分。但目前垃圾邮件的泛滥不仅给用户造成了时间和资源上的浪费,同时也极大地消耗了网络传输资源以及邮件服务器的存储空间,并对网络安全构成威胁。如何防范垃圾邮件具有重要的现实意义。朴素贝叶斯算法在文本分类上有非常广泛的应用。论文分析了朴素贝叶斯算法的原理及其在垃圾邮件过滤中的应用,给出垃圾邮件过滤的整个过滤流程,通过python语言设计并实现了中文的垃圾邮件过滤系统并具有良好效果。71351
关键词: 朴素贝叶斯;文本分类;垃圾邮件过滤
The Classifier based on python
Abstract:With the rapid development of Internet, network applications have penetrated into every aspect of people's daily life. E-mail, as an important aspect of network applications, has become an indispensable part of modern communication. However, e-mail spam lead to not only the wastes in time and resources of the users, but also greatly consume network resources and mail server storage space, and a threat to network security. How to prevent spam has important practical significance. Naive Bayes theory has a very wide range of applications in text classification.This paper analyzes the principle of naive Bayes theory and its application in spam filtering, given the whole filtering process, using Python to design and implement of spam filtering system Chinese and has a good effect.
Keywords: Naive Bayes;text classification;spam filtering
目录
摘要 i
Abstract i
目录 ii
1 绪论 1
1.1 分类器概述 1
1.2 分类器的实施和构造的步骤: 1
1.3 分类器的两种类型: 2
1.3.1 决策树分类器 2
1.3.2 朴素贝叶斯模型 2
2 Python语言概述 3
3 垃圾邮件 4
3.1 垃圾邮件的概述 4
3.2 垃圾邮件的分类 4
3.3 垃圾邮件的危害 5
3.3.1 欺诈 5
3.3.2 垃圾邮件成本直线上升 6
4 朴素贝叶斯垃圾邮件过滤 7
4.1 朴素贝叶斯的历史 7
4.2 朴素贝叶斯的历程 7
4.3 数学理论基础 8
4.3.1 计算包含给定文字的邮件是垃圾邮件的概率 8
4.3.2 单个词语的垃圾邮件率(spamliness) 9
4.3.3 结合各个词的概率 9
4.3.4 结合各个词的概率概率公式的其他表达式 10
4.3.5 处理生僻字 10
4.3.6 其他的启发式算法 11
5 算法 12
5.1 算法模型 12
5.2 设计思路