分布式爬虫服务器的设计与实现

摘要近些年来，随着互联网的快速发展，WEB数据出现了一种爆炸性的增长。如何精准和快速找到自己所需要的信息成为了很重要的一个问题。随着用户各种各样的需求多元化、多样化的增长，往往很难直接通过搜索引擎得到用户需要的全部信息。为了解决用户与搜索引擎之间的矛盾。承担搜索引擎中网络数据获取部分的网络爬虫技术一步一步的进行着改变。网络爬虫可以主动提取网页的程序，它为搜索引擎从万维网上下载网页，是搜索引擎的重要构成[1]。分布式网络爬虫又是其中的佼佼者，它在搜索的速度与规模上相比其他爬虫技术有着明显的优势。80438

论文首先介绍了搜索引擎和分布式网络爬虫的基本原理、分类和基础知识。从各个角度对于不同的爬虫进行了分类和介绍。然后从爬虫的体系结构入手，详细分析了每一种爬虫的特点。

最后，本论文提出了一种抓取电商商品详情页的通用设计思路。对于具体的过程进行了详细的论述和展示。而且以京东商城为例，完成了商品详情页的信息抓取和下载。

毕业论文关键词分布式爬虫搜索引擎电商京东商城

毕业设计说明书外文摘要

Title Design and implementation of a distributed crawler server

Abstract In recent years, with the rapid development of the Internet, WEB data showing an explosive growth。 How to find the information quickly and accurately became a very important issue。 As the user requirements became more wide and persified。 it is difficult for us to get all the information we need directly from search engines。 In order to solve the contradiction between the user and the search engines。The important part of search engine, web crawler which hold the data acquisition of internet is changing step by step。 Web crawlers can extract the page program automatically, it downloads the page from the World Wide Web for search engine, it is an important component of search engine [1]。 Distributed Web crawler is one of the best, its search speed and scale has obvious advantages than other crawler technology。

Firstly, Paper introduces the basic principles of search engines and distributed web crawler, classification and basic knowledge。 Introducing the different crawlers from all angles。 Then based the architecture of crawler, the paper analysis each characteristics of the crawler。

Finally, this paper proposes universal ideas of crawl which can get all goods page information from electric business web page。 And a specific process was discussed in detail。 For example, we use Jingdong Mall to show how to finished downloading details of goods page for electric business。

Keywords distributed crawler search engine electric business Jingdong Mall

1 绪论 1

1。1课题来源 1

1。2 课题背景及意义 1

1。3本文的主要工作内容 2

1。4论文组织结构 2

2 搜索引擎的基本知识 4

2。1搜索引擎的现状 4

2。2搜索引擎的分类 4

2。3搜索引擎的基本原理 5

2。4搜索引擎的组成 6

2。5网络爬虫对于搜索引擎的意义

上一篇：移动云计算环境下基于安卓平台的服务发现机制的设计和实现

下一篇：ASP.NET在线学习网站设计与实现

分布式爬虫服务器的设计与实现

架设Linux(2.6内核)的服务器集群【745字】

VB的分布式监控系统通信设计【721字】

WEB服务器多框架解决方案【1450字】

浅谈网站服务器安全维护技巧【2595字】

网络爬虫技术在品牌维权系统中的应用

小微商家移动互联进销存...

Python网络爬虫设计与实现

老年2型糖尿病患者运动疗...

新課改下小學语文洧效阅...

互联网教育”变革路径研究进展【7972字】

张洁小说《无字》中的女性意识

LiMn1-xFexPO4正极材料合成及充放电性能研究

网络语言“XX体”研究

ASP.net+sqlserver企业设备管理系统设计与开发

安康汉江网讯

麦秸秆还田和沼液灌溉对...

我国风险投资的发展现状问题及对策分析