Open Access System for Information Sharing

Login Library

 

Article
Cited 3 time in webofscience Cited 3 time in scopus
Metadata Downloads
Full metadata record
Files in This Item:
There are no files associated with this item.
DC FieldValueLanguage
dc.contributor.authorHan, WS-
dc.contributor.authorWooseong Kwak-
dc.contributor.authorHwanjo Yu-
dc.contributor.authorJeong-Hoon Lee-
dc.contributor.authorMin-Soo Kim-
dc.date.accessioned2016-04-01T08:09:56Z-
dc.date.available2016-04-01T08:09:56Z-
dc.date.created2013-10-23-
dc.date.issued2014-03-10-
dc.identifier.issn0020-0255-
dc.identifier.other2014-OAK-0000028660-
dc.identifier.urihttps://oasis.postech.ac.kr/handle/2014.oak/27370-
dc.description.abstractExtracting tuples from HTML pages has been an important issue in various web applications. Commercial tuple extraction systems have enjoyed some success to extract tuples by regarding HTML pages as tree structures and exploiting XPath queries to find attributes of tuples in the HTML pages. However, such systems would be vulnerable to small changes on the web pages. In this paper, we propose a robust tuple extraction system which utilizes spatial relationships among elements rather than the XPath queries. Spatial information (e.g., 2-D coordinates) of elements are maintained in the DOM tree when a web page is rendered in a browser. Our system regards elements in the rendered page as spatial objects in the 2-D space and executes spatial joins to extract target elements. Since humans also identify an element in a web page by its relative spatial location, our system extracting elements by their spatial relationships could possibly be as robust as manual extraction. To specify and execute spatial joins, we propose a new query language, RAQuery, based on topological relationships between any spatial objects in the 2-D space. We then propose spatial join algorithms that efficiently process the RAQuery using novel notions of group match and prunable relation group. We next propose a tuple construction algorithm to build tuples from the extracted elements obtained by the spatial joins, which can construct tuples even when there are no boundary HTML elements specified for the tuples in the web page. Extensive experimental results using real HTML pages confirm that our solutions are far more robust than existing tuple extraction systems without sacrificing performance. (C) 2013 Elsevier Inc. All rights reserved.-
dc.description.statementofresponsibilityX-
dc.languageEnglish-
dc.publisherELSEVIER SCIENCE INC-
dc.relation.isPartOfINFORMATION SCIENCES-
dc.titleLeveraging spatial join for robust tuple extraction from web pages-
dc.typeArticle-
dc.contributor.college창의IT융합공학과-
dc.identifier.doi10.1016/J.INS.2013.09.027-
dc.author.googleHan W.-S., Kwak W., Yu H., Lee J.-H., Kim M.-S.-
dc.relation.volume261-
dc.relation.startpage132-
dc.relation.lastpage148-
dc.contributor.id10056897-
dc.relation.journalINFORMATION SCIENCES-
dc.relation.sciSCI-
dc.collections.nameJournal Papers-
dc.type.rimsART-
dc.identifier.bibliographicCitationINFORMATION SCIENCES, v.261, pp.132 - 148-
dc.identifier.wosid000331689700008-
dc.date.tcdate2019-02-01-
dc.citation.endPage148-
dc.citation.startPage132-
dc.citation.titleINFORMATION SCIENCES-
dc.citation.volume261-
dc.contributor.affiliatedAuthorHan, WS-
dc.contributor.affiliatedAuthorHwanjo Yu-
dc.contributor.affiliatedAuthorJeong-Hoon Lee-
dc.identifier.scopusid2-s2.0-84891829208-
dc.description.journalClass1-
dc.description.journalClass1-
dc.description.wostc2-
dc.description.scptc2*
dc.date.scptcdate2018-05-121*
dc.type.docTypeArticle-
dc.subject.keywordAuthorTuple extraction-
dc.subject.keywordAuthorWrapper-
dc.subject.keywordAuthorSpatial join-
dc.relation.journalWebOfScienceCategoryComputer Science, Information Systems-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-

qr_code

  • mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher

유환조YU, HWANJO
Dept of Computer Science & Enginrg
Read more

Views & Downloads

Browse