菜单
  

    From literature it has been observed that no concrete work has been done on Flash Web pages. Hence here we concentrated to work on XML Web page classification for future research avenues.
    3. XML URL Classification based on their semantic orientation
    System Architecture of proposed system, explains the steps we followed to achieve the classification process as shown in Fig. 2. Each inpidual process carried out based on XML web pages. Each step is discussed in the upcoming sections.
     .
    Fig. 2 Architecture of the Proposed System
    3.1 Knowledge base
    It is a domain knowledge that is used to guide the search or evaluate the interestingness of resulting patterns. Such knowledge can include concept hierarchies, used to organize attributes or attribute values into different levels of abstraction. Knowledge such as user beliefs, which can be used to assess a pattern’s interestingness based on its unexpectedness, may also be included. Other examples of domain knowledge are additional interestingness constraints or thresholds, and metadata (e.g., describing data from multiple heterogeneous sources). Here, Knowledge Base is created in four steps as follows.
    (1) Redundancy is checked on XML URL dataset
    (2) Source code extraction
    (3) Tag extraction using DOM structure
    (4) Knowledge Base creation by tag redundancy analysis
    3.1.1 XML URL Redundancy Analysis
    In our proposed classification method, redundancy analysis is the very first step in Knowledge Base creation task. After creating the various types of XML URL data set such as Pure XML, Code based XML, HTML embedded XML and RSS XML URLs are processed inpidually in this phase.
    Here, in first step, Algorithm reads the URL from the source files (Pure XML, Code based XML, HTML embedded XML and RSS XML) line by line and fetch(s) the URL(s). The fetched URL will be tested with destination file for redundancy based on sequential search. If the fetched URL is not present in destination file, then it will be appended otherwise it will not be appended. This process will be continued until the last URL in the source file. Finally the unique XML URLs of each category is obtained.
    3.1.2 Source Code Extraction
    The resultant vector of first step of the Algorithm will be given as input to the second step of Algorithm to extract the source of respective unique URLs. Here, Algorithm will read the URLs from input file and using Transmission Control Protocol (TCP) it will extract the source code. Extracted source code is saved in auto created destination file with respect to URL number.
    3.1.3 Tag extraction using DOM Structure
    After extracting the source code of XML URLs, in third step we extract the tags using Document Object Model (DOM) tree structure. Here, the extracted source codes are read line by line and algorithm looks for the tags using DOM. Then, found tags are extracted and stored in corresponding created file name.
    3.1.4 Unique Tag Identification
    Resultant vector of Step 3 is processed here to identify the unique tags and to create the knowledge base. In this phase, we read each tag files and compare with the destination file tags. Append if the comparing tag does not exist at destination file otherwise skip and move to the next tag of source file. This process will be carried out for all tag files and comparison will be done with destination file.
     
    Fig. 3 Block diagram of XML URL Classification
    All these four steps are carried out on each type of XML URLs consecutively to create the tag dictionary (Knowledge base). After creating the knowledge base for each category of XML URL's, here matching and representation has been done by using testing dataset. For each testing XML URLs, source code and its tag are extracted.
    Here, the extracted tags are matched with Knowledge Base to identify their respective class. Matching process is done with all four Knowledge Bases such as KBRSS, KB Pure, KBHTML, and KB Code. By using string matching, overall matching level is calculated by number tags matched over number of tags of source file. Here the most matched (highest percentage) one is considered as its class.
  1. 上一篇:Android应用英文文献和中文翻译
  2. 下一篇:JSP投票系统英文文献和中文翻译
  1. 汽车内燃机连杆载荷和应...

  2. 审计的优化管理英文文献和中文翻译

  3. FPGA的全景拼接相机的优化...

  4. 气味源定位的有限时间粒...

  5. PLC仿真的虚拟工厂英文文献和中文翻译

  6. ZigBee-RFID混合网络的节电英文文献和中文翻译

  7. PLC可编程控制器的介绍英文文献和中文翻译

  8. 江苏省某高中学生体质现状的调查研究

  9. g-C3N4光催化剂的制备和光催化性能研究

  10. 现代简约美式风格在室内家装中的运用

  11. NFC协议物理层的软件实现+文献综述

  12. 上市公司股权结构对经营绩效的影响研究

  13. 高警觉工作人群的元情绪...

  14. 巴金《激流三部曲》高觉新的悲剧命运

  15. C++最短路径算法研究和程序设计

  16. 中国传统元素在游戏角色...

  17. 浅析中国古代宗法制度

  

About

优尔论文网手机版...

主页:http://www.youerw.com

关闭返回