网络爬虫多特征的恶意网页检测方法

摘要随着各类通讯网络和通讯终端的快速发展和普及，现代人对网络的依赖越来越强，但是互联网在给人们带来极大便利的同时，也带来了安全风险。随着网络技术的进步，恶意网站的伪装手段也越来越强，隐蔽性越来越高。一些网络攻击者会利用网络漏洞，将网站变为恶意网站，当人们登录浏览此网站时，计算器就很有可能不被察觉地被入侵，而导致系统的崩溃。

本文通过网络爬虫抓取网页，并利用正则表达式匹配出我们所需要的正常脚本和恶意脚本,即Javascript代码段，然后提取出来，保存在文件中。然后利用已知的恶意代码和正常代码，通过对比总结特征并提取特征，总结出特征并存为文件作为训练分类器的输入，通过分类器训练不同类别的恶意脚本代码得到对应的分类器模型，最后通过分类器测试验证模型的可用性和准确率。编程实现恶意网页检测功能。82738

毕业论文关键字恶意脚本 Javascript 网络爬虫分类器特征提取

毕业设计说明书（论文）外文摘要

Title Malicious Web page detection method based on multi-feature

Abstract With the rapid development and popularization of types of communi -cations networks terminals,dependence on the network is growing。 Internet has brought great convenience to people , it also brings security risks。 With advances in network technology, means of camouflage malicious Web site also growing secluded。 Some attackers will exploit network vulnerabilities, malicious Web sites into a site, when people log on to this website, the calculator is likely not to be aware of the invasion, which led to the collapse of the system。

In this paper, through the web crawler crawls the web and use regular expressions to match that we need normal scripts and malicious scripts that Javascript snippets, and then extracted and stored in a file。 Then use known malicious code and the normal code, by comparing the summary feature and feature extraction, feature summed up as a training document and save it as a classifier input, through the classifier training with different malicious script code to obtain the corresponding model, and finally by classification test to verify the availability and accuracy of the model。 Programming a malicious Web page detection。

Keywords: malicious script, Javascript, Web Crawler, classification, feature extraction

1 引言 1

1。1 课题背景与意义 1

1。2 恶意网页相关检测 2

1。3本文组织结构 4

2 网页恶意脚本 5

2。1恶意网页脚本 5

2。2脚本语言 6

3 样本特征选择和提取 9

3。1 数据获取 9

3。2样本特征选择和提取 12

4 分类器训练和测试 15

4。1 分类器介绍 15

4。2 分类器的训练 18

4。3分类器的测试及结果 18

结论 21

参考文献 22

1 引言

随着各类通讯网络和通讯终端的快速发展和普及，现代人对网络的依赖越来越强，与此同时，形形色色的恶意网站层出不穷，根据谷歌搜索中心的数据，超过10%的网页是恶意网页。尤其中国，恶意网页占总体网页中的比例已经高达43。21%，因此网络安全形势愈发严峻[1]。恶意网页检测是十分具有现实意义的课题，网络技术的不断进步，恶意网站进行伪装的手段也越来越强，隐蔽性越来越高，因此恶意网站检测方法也必须与时俱进才能应对互联网的快速发展。本课题通过从各个层面较全面的分析恶意网站的特点，从而实现能适应当今网络形势的恶意网站检测方法。

上一篇：蒙特卡洛仿真的保障分析系统设计

下一篇：php+mysql远程机器人实验系统的设计与实现

网络爬虫多特征的恶意网页检测方法

基于PageRank算法的网络数据分析

基于神经网络的验证码识别算法

基于网络的通用试题库系...

网络常见故障的分类诊斷【2055字】

网络安全的研究【1797字】

用VisualBasic实现多画面播放功能【1344字】

网络信息安全技术管理的...

LiMn1-xFexPO4正极材料合成及充放电性能研究

老年2型糖尿病患者运动疗...

新課改下小學语文洧效阅...

ASP.net+sqlserver企业设备管理系统设计与开发

张洁小说《无字》中的女性意识

我国风险投资的发展现状问题及对策分析

麦秸秆还田和沼液灌溉对...

网络语言“XX体”研究

安康汉江网讯

互联网教育”变革路径研究进展【7972字】