1 - CSci 6907: Data Management and Exploration on the Web...

Info iconThis preview shows pages 1–16. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: CSci 6907: Data Management and Exploration on the Web Nan Zhang Course Information Meeting time: Mondays 06:10-08:40PM Meeting location: Philips Hall, Room 108 Office Hours: Mondays 12:00-2:00pm Office: Academic Center 715 Phone: (202) 994-5919 Email: nzhang10@gwu.edu Course Website o http://www.seas.gwu.edu/~nzhang10/6907/ 2 Course Website 3 Course Schedule 4 Grading Homework o 5% * 5 = 25% Presentation o 25% Project o 50% 5 Surface Web Crawling and Information Retrieval 6 from slides for Bing Liu, Web Data Mining, Springer, 2007. The Deep Web Deep Web vs Surface Web o Dynamic contents, unlinked pages, private web, contextual web, etc o Estimated size: 91,850 vs 167 tera bytes [1] , hundreds or thousands of times larger than the surface web [2] 7 [1] SIMS, UC Berkeley, How much information? 2003 [2] Bright Planet, Deep Web FAQs, 2010, http://www.brightplanet.com/the-deep-web/ Hidden Web Repositories 8 Web User Hidden Repository Owner Deep Web Repository: Example I Enterprise Search Engines Corpus 9 Unstructured data Keyword search Top-k Asthma Exploration: Example I 10 Metasearch engine Discovers deep web repositories of a given topic Integrate query answers from multiple repositories For result re-organization, evaluate the quality of each repository through analytics e.g., how large is the repository? e.g., average length of documents of a given topic Disease info Treatment info Example II Yahoo! Auto, other online e-commerce websites 11 Structured data Form-like search Top-1500 Exploration: Example II 12 Third-party services for an individual repository Find fake products Price distribution Construction of a universal mobile interface Third-party services for multiple repositories Repository comparison Consumer behavior analysis Main Tasks Resource discovery Data integration Single-/Cross- site analytics Example III 13 Semi-structured data Graph browsing Local view Picture from Jay Goldman, Facebook Cookbook, OReiley Media, 2008. Exploration: Example III 14 For commercial advertisers: Market penetration of a social network buzz words tracking For private detectors: Find pages related to an individual For individual page owners: Understand the (relative) popularity of ones own page Understand how new posts affect the popularity Understand how to promote the page Main Tasks: resource discovery and data integration less of a challenge, analytics on very large amounts of data becomes the main challenge. Summary of Main Tasks/Obstacles Find where the data are o Resource discovery: find URLs of deep web repositories o Required by: Metasearch engine, shopping website comparison, consumer behavior modeling, etc....
View Full Document

Page1 / 64

1 - CSci 6907: Data Management and Exploration on the Web...

This preview shows document pages 1 - 16. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online