菜单
  
    XML URL Classification based on their semantic structure orientation for Web Mining Applications
    Nowadays, as we all know, research on Web is the emerging field. For example, improving the quality of Web by analyzing Usability Test, Web Information Extraction, Browsing Web on Small Screen Devices like mobile, PDA (Personal Digital Assistance) etc. Tracking Product Opinions by analyzing user reviews etc. In general, we call it as Web Mining. According to analysis targets, Web Mining can be pided into three different types, which are Web Usage Mining, Web Structure Mining and Web Content Mining.50590
    WWW (World Wide Web) consortium stated that, HTML has a lot of drawbacks such as limited defined tags, not case sensitive, semi-structured and designed for only to display data with limited options. Later to overcome these difficulties few technologies have been introduced such as XML, Flash (with good design options) and so on. Therefore, Web developers started to migrate to develop Web pages on these kinds of emerging Web Technologies to provide a better description of semantic structure of the web page contents. Therefore, these days we can see more web pages on Web which are developed using XML and Flash technologies3.
    There are many research fields which have been opened on these new technologies. We proposed dataset creation technique for XML URLs4. After that we analyzed the data set based on XML semantic structure orientation type. Here, we have categorized our dataset into four types: Pure XML Web pages, RSS XML Web pages, HTML Embedded XML Web pages. Code Based/Sitemap XML Web pages. Fig.1 depicts the clear view of XML URL categories. In this article we mainly focus on XML URL classification by proposing a new method based on their semantic orientation for future Web mining applications such as Web page segmentation, Noise Removal, Web page adaptation, Search Engine Optimization (SEO) and so on.
     Fig. 1 Dataset Analysis and Classification
    Contribution: In light of deficiency of the above mentioned manual process, in this paper we propose an algorithm to Classify the XML URLs based on their semantic structure orientation. Then, we analyze the system accuracy by conducting extensive experiments based on the accuracy measures such as Precision, Recall Experimental results show that proposed method achieves overall accuracy level of 97.36%.
    Organization: After providing the basic information's about XML URLs and its need in research area in Section1, we present related works in Section 2. We present knowledge base creation method for XML URL classification in Section 3. In Section 4, we describe about training and testing phase of proposed system and in Section 5, we present the result and analysis of conducted experiments on proposed system by using our XML URL Dataset 4.
    2. Related Works
    In 2003, Vision Based Page Segmentation (VIPS) algorithm3 proposed to extract the semantic structure of a Web page. Semantic structure is a hierarchical structure in which each node will correspond to a block and each node will be assigned a value to indicate degree of coherence based on visual perception. It may not work well and in many cases the weights of visual separators are inaccurately measured, as it does not take into account the document object model (DOM) tree information and when the blocks are not visibly different.
    Gestalt Theory5: a psychological theory that can explain human’s visual perceptive process. The four basic laws, Proximity, Similarity, Closure and Simplicity are drawn from Gestalt Theory and then implemented in a program to simulate how human understands the layout of Web pages. A graph-theoretic approach6 is introduced based on DOM tree should be placed together. 7 people proposed a novel Web page segmentation algorithm based on finding the Gomory-Hu tree in a planar graph. The algorithm initially distils vision and structure information from a Web page to construct a weighted undirected graph, whose vertices are the leaf nodes of the DOM tree and the edges represent the visible position relationship between vertices. It then partitions the graph with the Gomory-Hu tree based clustering algorithm. Since the graph is a planar graph, the algorithm is very efficient.
  1. 上一篇:Android应用英文文献和中文翻译
  2. 下一篇:JSP投票系统英文文献和中文翻译
  1. 汽车内燃机连杆载荷和应...

  2. 审计的优化管理英文文献和中文翻译

  3. FPGA的全景拼接相机的优化...

  4. 气味源定位的有限时间粒...

  5. PLC仿真的虚拟工厂英文文献和中文翻译

  6. ZigBee-RFID混合网络的节电英文文献和中文翻译

  7. PLC可编程控制器的介绍英文文献和中文翻译

  8. 江苏省某高中学生体质现状的调查研究

  9. g-C3N4光催化剂的制备和光催化性能研究

  10. 现代简约美式风格在室内家装中的运用

  11. NFC协议物理层的软件实现+文献综述

  12. 上市公司股权结构对经营绩效的影响研究

  13. 高警觉工作人群的元情绪...

  14. 巴金《激流三部曲》高觉新的悲剧命运

  15. C++最短路径算法研究和程序设计

  16. 中国传统元素在游戏角色...

  17. 浅析中国古代宗法制度

  

About

优尔论文网手机版...

主页:http://www.youerw.com

关闭返回