paper_115

paper_115 - A Taxonomy of JavaScript Redirection Spam Kumar...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
A Taxonomy of JavaScript Redirection Spam Kumar Chellapilla Microsoft Live Labs One Microsoft Way Redmond, WA 98052 +1 425 707 7575 kumarc@microsoft.com Alexey Maykov Microsoft Live Labs One Microsoft Way Redmond, WA 98052 +1 425 705 5193 amaykov@microsoft.com ABSTRACT Redirection spam presents a web page with false content to a crawler for indexing, but automatically redirects the browser to a different web page. Redirection is usually immediate (on page load) but may also be triggered by a timer or a harmless user event such as a mouse move. JavaScript redirection is the most notorious of redirection techniques and is hard to detect as many of the prevalent crawlers are script-agnostic. In this paper, we study common JavaScript redirection spam techniques on the web. Our findings indicate that obfuscation techniques are very prevalent among JavaScript redirection spam pages. These obfuscation techniques limit the effectiveness of static analysis and static feature based systems. Based on our findings, we recommend a robust counter measure using a light weight JavaScript parser and engine. Categories and Subject Descriptors D.3.3 [ Information Storage and Retrieval ]: Information Search and Retrieval; H.3.m [ Information Storage and Retrieval ]: Miscellaneous General Terms Algorithms, Measurement, Performance, Experimentation, Languages. Keywords Web search, web spam, JavaScript, redirection spam. 1. INTRODUCTION Web spam pages can be broadly categorized into employing boosting and/or hiding techniques [1]. While content and link spam comprise common search engine rank boosting techniques, cloaking and redirection spam are hiding techniques. Among the redirection spam techniques, script based redirection is the most notorious and difficult to catch. Script redirection spam presents spam content to a script-agnostic crawler, but automatically redirects a script capable browser to another URL as soon as the page is loaded. In this paper, we study common JavaScript redirection spam techniques on the web. A review of redirection techniques is presented in the rest of Section 1. Section 2 briefly presents previous work on redirection spam. JavaScript features that facilitate redirection and hiding are presented in Section 3. We present a data set of URLs and estimate the prevalence of JavaScript redirection spam in Section 4 and Section 5, respectively. Section 6 presents a taxonomy along with representative examples. In this paper, we limit our analysis of script redirection spam to client side scripts that run in the browser. Further, we use the term JavaScript [2] interchangeably with JScript [3], which are the Mozilla Foundation’s and Microsoft’s implementation of the ECMAScript standard [4]. Modern browsers 1 can be redirected in one of three ways, namely, using HTTP protocol status codes, using a meta refresh tag in the page header, or using a client side script.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 8

paper_115 - A Taxonomy of JavaScript Redirection Spam Kumar...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online