摘要近些年来,随着互联网的快速发展,WEB数据出现了一种爆炸性的增长。如何精准和快速找到自己所需要的信息成为了很重要的一个问题。随着用户各种各样的需求多元化、多样化的增长,往往很难直接通过搜索引擎得到用户需要的全部信息。为了解决用户与搜索引擎之间的矛盾。承担搜索引擎中网络数据获取部分的网络爬虫技术一步一步的进行着改变。网络爬虫可以主动提取网页的程序,它为搜索引擎从万维网上下载网页,是搜索引擎的重要构成[1]。分布式网络爬虫又是其中的佼佼者,它在搜索的速度与规模上相比其他爬虫技术有着明显的优势。80438
论文首先介绍了搜索引擎和分布式网络爬虫的基本原理、分类和基础知识。从各个角度对于不同的爬虫进行了分类和介绍。然后从爬虫的体系结构入手,详细分析了每一种爬虫的特点。
最后,本论文提出了一种抓取电商商品详情页的通用设计思路。对于具体的过程进行了详细的论述和展示。而且以京东商城为例,完成了商品详情页的信息抓取和下载。
毕业论文关键词 分布式 爬虫 搜索引擎 电商 京东商城
毕业设计说明书外文摘要
Title Design and implementation of a distributed crawler server
Abstract In recent years, with the rapid development of the Internet, WEB data showing an explosive growth。 How to find the information quickly and accurately became a very important issue。 As the user requirements became more wide and persified。 it is difficult for us to get all the information we need directly from search engines。 In order to solve the contradiction between the user and the search engines。The important part of search engine, web crawler which hold the data acquisition of internet is changing step by step。 Web crawlers can extract the page program automatically, it downloads the page from the World Wide Web for search engine, it is an important component of search engine [1]。 Distributed Web crawler is one of the best, its search speed and scale has obvious advantages than other crawler technology。
Firstly, Paper introduces the basic principles of search engines and distributed web crawler, classification and basic knowledge。 Introducing the different crawlers from all angles。 Then based the architecture of crawler, the paper analysis each characteristics of the crawler。
Finally, this paper proposes universal ideas of crawl which can get all goods page information from electric business web page。 And a specific process was discussed in detail。 For example, we use Jingdong Mall to show how to finished downloading details of goods page for electric business。
Keywords distributed crawler search engine electric business Jingdong Mall
目 次
1 绪论 1
1。1课题来源 1
1。2 课题背景及意义 1
1。3本文的主要工作内容 2
1。4论文组织结构 2
2 搜索引擎的基本知识 4
2。1搜索引擎的现状 4
2。2搜索引擎的分类 4
2。3搜索引擎的基本原理 5
2。4搜索引擎的组成 6
2。5网络爬虫对于搜索引擎的意义