5_WWWBasics_COMP3322B_s_2019.pdf - COMP 3322B Modern Technologies on World Wide Web 2nd semester 2018-2019 WWW Basics(O2 Dr C Wu Department of Computer

5_WWWBasics_COMP3322B_s_2019.pdf - COMP 3322B Modern...

This preview shows page 1 out of 33 pages.

You've reached the end of your free preview.

Want to read all 33 pages?

Unformatted text preview: COMP 3322B Modern Technologies on World Wide Web 2nd semester 2018-2019 WWW Basics (O2) Dr. C Wu Department of Computer Science The University of Hong Kong Web The Web, or World Wide Web Web browser (WWW, W3) an Internet application an information space where documents and other web resources are identified by URIs, interlinked by hypertext links, and can be accessed via the Internet A web page contains a base HTML file referenced objects html files, JPEG image, audio/video clip, etc. each object is addressable by a URI Web server URI URI (Uniform Resource Identifier): a string of characters used to identify a resource, with syntax below scheme:[//[user:[email protected]]hostname[:port]][/]path[?query][#fragment] along with the path, identify a resource; often used to carry information in the format of “key=value” pairs identify a secondary resource (e.g.,some portion or subset of the primary resource) Examples: URI (cont’d) There are two types of URIs URL (Uniform Resource Locator) — the most common form of URI — in addition to identifying a resource, specifies the means of locating the resource by specifying both its primary access mechanism and network location Examples: http:// scheme or IP address telnet://192.0.2.16:23/ URI (cont’d) URN (Uniform Resource Name) identifies a resource by name in a particular namespace. A URN can be used to refer to a resource without implying its location or how to access it. e.g. the International Standard Book Number (ISBN) for uniquely identifying a book is a URN, “urn:isbn:0-486-27557-4” e.g., “urn:ietf:rfc:2141” is referring to IETF's RFC 2141 HTTP Hypertext Transfer Protocol Web’s application-layer protocol includes two types of messages: HTTP request and HTTP response Implemented in client-server model client program browser that requests, receives and “displays” web objects server program Web server sends objects in response to requests HT TP req ues PC running HT t TP res Explorer pon se st que re se Server P n T po s running HT e r TP Apache Web T H server Mac running Safari Client and server interaction HTTP%request% File:%%/~cwu/c0322/index.html7 77 Server% First%HTTP%request% Client%% The7first7HTTP7request7is7ini?ated7by7 the7user7:7by7typing7the7URL7in7the7 browser.7 h"p:// Client and server interaction <html> '' ''''''<head>' ''''''<link'rel="stylesheet"'type="text/css"'href="style.css">' ''''''</head>' ''''''<body>' ''''''''''<img'border="0"'src="picture/header.jpg"'>' ''''''''''<h1>This'is'a'sample'web'page.</h1>' ''''''</body>' </html>' /~cwu/c0322/index.html' HTTP%request% File:%%/~cwu/c0322/index.html'' Server% Client%% Image'?' This'is'a'sample'web'page.' HTTP%response% ContentEtype'='“text/html”' : text/html EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE' <html> '' ''''''<head>' ''''''<link'rel="stylesheet"'type="text/css"' href="style.css">' …' ' Client and server interaction /~cwu/c0322/picture/header.jpg& HTTP%request% File:%%/~cwu/c0322/picture/header.jpg& && Server% Client%% Image&?& This&is&a&sample&web&page.& HTTP%response% : image/jpeg Content?type&=&“image/jpeg”& ??????????????????????????????????????????????????????& (The&image&file)& Subsequence%HTTP%requests% The&browser&receives&the&index.html& page&and&con7nues%to%ini7ate% subsequence%HTTP%requests%to&GET& the&required&objects&from&the&server.& Client and server interaction h1%{% %%%%%%font6family%:%Arial,%Helve<ca,%Times;% %%%%%%font6size%:%20pt;% %%%%%%font6style%:%italic;% %%%%%%color%:%red;%%%%%%% }% /~cwu/c0322/style.css% HTTP%request% File:%%/~cwu/c0322/style.css%% Server% Client%% This%is%a%sample%web%page.% This is a sample web page. HTTP%response% Content>type%=%“text/css”% : text/css >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>% (The%CSS%file)% % h1%{% %%%%%%font>family%:%Arial,%HelveNca,%Times;% %%%%%%font>size%:%20pt;% %%%%%%font>style%:%italic;% %%%%%%color%:%red;%%%%%%% }% HTTP 1.1 HTTP 1.1 is the widely used HTTP version since its standardization in 1997 multiple objects can be sent over the same TCP connection between client and server (persistent connection) Connect?' e'' s n o esp R Server' File'request'and' response'(1st'file)' Request' on s p s e R e'' File'request'and' response'(2nd'file)' .... Client'' OK!' Request' Connec&on'setup' overhead' A keep-alive time (for the connection) is specified in the headers of the HTTP request and response messages. HTTP/2 HTTP/2 is the newer version of HTTP, standardised in May 2015 developed based on Google’s SPDY protocol supported by most major browsers by end of 2015; supported by 32.5% of the top 10 million websites as of January 2019 highly compatible with HTTP 1.1: most header fields, status codes, methods, etc. improving page load speed by data compression of HTTP headers server responds with more data than what the client requested allowing multiple requests and responses to be in flight concurrently in the same TCP connection etc. providing negotiation mechanisms for client and server to decide HTTP 1.1 or HTTP/2 to use HTTP/2 HTTP 1.1 Connect? Client Connect? Client OK! Request ind e Resp x.html OK! Request inde x on Resp Server Server ,… .html g der.jp a e h , .html x e d se i n server responds with more data than what the client requested tml h . x e ind o n se Request head er.jpg Resp r.jpg e d a onse he Connect? … Client OK! Request inde x.html Request bo oks Res e p on s on Resp se .html Server allow multiple requests and responses to be in flight concurrently in the same TCP connection HTTP isAny'type'of'data'can'be'sent'by'HTTP'as'long media independent client'and'server'know'how'to'handle'the'dat Any type of data can be sent by HTTP as long as both How'content'is'handled'is'determined'by'the the client and the server know how to handle the data Internet'Mail'Extensions'(MIME)'specificaJon The data type is determined by ContentOType'='type/subtype : Content&Type!=!type/subtype. type$ text' subOtypes$ plain,'html,'css,'xml'…etc' image' jpeg,'gif'…etc' video' quicktime, etc. mpeg,'quckJme…etc' …' …' It can be included as one header in HTTP request or response For'example,' For example, if the content is an HTML web If'the'content'is'an'HTML'w page, then content-type = “text/html”; is text/html; then'contentOtype$=$“text/ if the content is a jpeg image, then contenttype = “image/jpeg”. isIf'the'content'is'a'jpeg'ima image/jpeg. then'contentOtype$=$“imag HTTP is stateless Stateless means that the server maintains no information about each client request HTTP was originally designed to transmit static web pages only (i.e., the same web page will not have different contents for different clients) Q: Then how can some web pages remember one user’s login session? A: Using only HTTP cannot provide tailored response for different client requests. Other web technologies are invented to keep the “state” of each client session, e.g., “cookies” or “sessions”. HTTP request Request$line' Request' Request line method' HTTP'Request'message'format' Request'line' HTTP'' version' URL' space' space' Request' HTTP'' URL' GET'''/~c0322/test/index.html'''HTTP/1.1' method' version' space' Example' space' \r\n' Blank'line' \r\n' GET'''/~cwu/c0322/index.html'''HTTP/1.1' Header' Body' Example' Method$ GET$ DescripBon$ Request'a'Web'page.' HEAD$ Request'a'Web'page’s'header'only.'The'server'will'not'include'the'body' of'the'web'page'in'its'response.' POST$ One'funcJon'of'POST'is'to'provide'a'block'of'data'to'the'server,'such'as' the'data'entered'in'a'FORM,'to'a'data-handling'process'at'the'server.'' RFC7231 For'a'complete'list'please'refer'to'RFC2616' h0p:// ' h=p://[email protected]/html/rfc7231#page-24 HTTP request HTTP request HTTP'Request'message'format' HTTP'Request'message'format' Request'line' Request'line' Header Header' Header' Header'type' Header&type& :" Header' :$value& value' GET'''/~c0322/test/index.htm'''HTTP/1.1' GET$$$/~cwu/c0322/index.html$$$HTTP/1.1$ Host':'i.cs.hku.hk' Host$:$ ConnecJon':'keep-alive' Connec?on$:$keepAalive$ User-Agent':'Mozilla/5.0'' UserAAgent$:$Mozilla/5.0$$ \r\n& & Blank'line' \r\n' ' Body' Example' Header$ Example$Values$ DescripBon$ i.cs.hku.hk' Shows'the'host'and'port'number'(default'80)'of'the'resource'being'requested.' keep-alive'/'close' Whether'the'TCP'connecJon'should'be'closed'or'not'ager'retrieving'the'object' User-agent' Mozilla/5.0' The'name'of'the'client'who'iniJate'the'request.' initiates the request Keep-Alive' 300' If'ConnecJon'is'“keep-alive”,'keep'the'TCP'connecJon'alive'for'300'seconds.' Referrer' This'request'is'iniJated'by'clicking'a'hyperlink'in' .' Host' ConnecJon' index.html' RFC7231 h=p://[email protected]/html/rfc7231#page-33 For'a'complete'list'please'refer'to$RFC2616$:$h0p:// ' HTTP request HTTP'Request'message'format' Request'line' Blank line Body GET$$$/~cwu/c0322/index.html$$$HTTP/1.1$ Host$:$ Connec?on$:$keepAalive$ UserAAgent$:$Mozilla/5.0$$ There is no content in the “Body” part of this GET HTTP request. The “Body” can be used to carry data such as form data submitted by the client. Header' Blank'line' Body' HTTP&Response&message&format& HTTP response Status$line' Status HTTP'' line version' Status'' code' HTTP$$ version$ Status$$ space' code$ space$ HTTP/1.1'''200'''OK' ' HTTP/1.1'''200'''OK' '' '' ' '' Example( Code$ Status&line& Status'' phrase' Status$$ phrase$ space' space$ Header& Blank&line& \r\n' \r\n$ Body& Example' Phrases$ Meaning$ 200'' OK' The'request'is'successful' 404' Not'found' The'document'is'not'found.' 403' Forbidden' Service'denied' 500' Internal'server'error' Internal'server'error.' 301 Moved permanently Moved permanently For'a'complete'list'please'refer'to'RFC2616' RFC7231 h0p:// ' For a complete list, please refer to RFC7231 h=p://[email protected]/html/rfc7231#page-47 HTTP response HTTP response HTTP&Response&message&format& HTTP'Response'message'format' Status&line& Status'line' Header Header' Header& Header&type& :" Header'type' Header' value& :$ Blank&line& value' \r\n& & HTTP/1.1'''200'''OK' \r\n' ' Date':'Fri,'30'Jan'2015'17:35:23'GMT' Fri, 4 Jan 2019 17:35:23 GMT HTTP/1.1'''200'''OK' Server':'Apache/2.2.20'(Unix)'mod_ssl/2.2.20'OpenSSL/0.9.7d' Date':'Mon,'06'Feb'2012'03:35:23'GMT' KeepNAlive':'Omeout=5,'max=100' Server':'Apache/2.2.20'(Unix)'mod_ssl/2.2.20'OpenSSL/0.9.7d' ConnecOon':'KeepNAlive' Body& Keep-Alive':'Jmeout=5,'max=100' ConnecJon':'Keep-Alive' Example' Header$ Example$Values$ DescripBon$ Date' Fri, 4 Jan 2019 17:35:23 Mon,'06'Feb'2012'03:35:23' Date'and'Jme'the'message'was'sent.' Server' Apache/2.2.20'…' InformaJon'about'the'server.' Keep-Alive' Jmeout=5,'max=100' Keep'the'connecJon'for'100'sec,'but'if'more'than'5'secs'idle,'close'the'connecJon.' ConnecJon' keep-alive'/'close' Whether'the'TCP'connecJon'should'be'closed'or'not.' For'a'complete'list'please'refer'to$RFC2616$:$h0p:// ' RFC7231 h=p://[email protected]/html/rfc7231#page-64 32' HTTP response HTTP&Response&message&format& Status&line& Blank line Body HTTP/1.1'''200'''OK' Date':'Fri,'30'Jan'2015'17:35:23'GMT' Fri, 4 Jan 2019 17:35:24 GMT Server':'Apache/2.2.20'(Unix)'mod_ssl/2.2.20'OpenSSL/0.9.7d' KeepNAlive':'Omeout=5,'max=100' ConnecOon':'KeepNAlive' ' <!DOCTYPE'HTML'PUBLIC'"N//W3C//DTD'HTML'4.01'TransiOonal//EN"' > "h]p:// ;' <html> ' '' ''''''<head>…<head>' ''''''<body>…</body>' </html>' Header& Blank&line& Body& Another HTTP response example HTTP response with status code 301 301 Moved permanently HTTP/1.1'''200'''OK' Date':'Fri,'30'Jan'2015'17:35:23'GMT' Fri, 4 Jan 2019 17:35:33 GMT Server':'Apache/2.2.20'(Unix)'mod_ssl/2.2.20'OpenSSL/0.9.7d' KeepNAlive':'Omeout=5,'max=100' Location: … ConnecOon':'KeepNAlive' ' <!DOCTYPE'HTML'PUBLIC'"N//W3C//DTD'HTML'4.01'TransiOonal//EN"' > "h]p:// ;' <html> ' '' <body>If you are not redirected within 5 seconds, please click <a ''''''<head>…<head>' href=" ;. </body> ''''''<body>…</body>' </html>' the location header tells the browser the new page to be loaded Web caching Web cache, also called “proxy server”, is a network entity that responds to HTTP requests on behalf of an origin Web server a web browser can be configured so that all of the user’s HTTP requests are first directed to a proxy server the proxy server receives the HTTP requests from a client; if the requested object is cached and up-to-date, the proxy server will send the object to the client on behalf of the destination web server Proxy&server& Client&& Web&server& Web caching Purposes speed up access to resources keep machines behind proxy server anonymous, mainly for security log / audit usage, e.g. to provide company employee Internet usage reporting scan transmitted content for malware before delivery Web&server& etc. Proxy&server& Client&& Proxy miss Proxy miss Resource$ LastOmodified$ …' …' …' …' HTTP$request$ HTTP$request$ GET''/image.jpg'HTTP/1.1' GET$$/image.jpg$HTTP/1.1$ Host:'google.com' Host:$google.com$ …' …$ Web$server$ Web$server$ Proxy&miss& HTTP&request& HTTP$request$ GET$$/image.jpg$HTTP/1.1$ GET''/image.jpg'HTTP/1.1' Host:$google.com$ …$Host:'google.com' The$Proxy$server$checks$if$the$ Proxy$miss$ requested$object$(image.jpg$in$ google.com)$is$cached$or$not.$$ …' Proxy&server& Proxy$server$ Client&& Client$$ The'Proxy'server'checks'if'the' If&it&is&not&cached,&it&forwards&the& requested'object'(image.jpg'in' HTTP&request&to&the&des9na9on& google.com)'is'cached'or'not.'' Web&server.$ Client$browsers$can$be$configured$to$ If$it$is$not$cached,$it$forwards$the$ have$all$HTTP$requests$send$to$a$Proxy$ HTTP$request$to$the$desBnaBon$ server.$ Web$server.' Proxy miss Resource( Last,modified( h"p://google.com/image.jpg0 1/2/201409:00:000 1/2/2018 9:00:00 …0 …0 HTTP&request& GET$$/image.jpg$HTTP/1.1$ Host:$google.com$ …$ Proxy&server& HTTP$request$ GET$$/image.jpg$HTTP/1.1$ Host:$google.com$ …$ Web$server$ HTTP$response$ HTTP/1.1'200'OK' Date:'17/9/2013'10:15:24' Date:'30/1/2015'18:15:24' 1/9/2018 18:15:24 Last8modified:'1/2/2012'9:00:00' Last7modified:'1/2/2014'9:00:00' 1/2/2018 9:00:00 Content8Type:'image/jpeg' Content7Type:'image/jpeg' ' data…..'' Proxy$server$caches$the$HTTP$response$ Client&& Proxy'server'receives'the'HTTP'response'and'keeps$a$ local$copy$of$the$object.'It'also'registers'the'“Last9 modified”'Nme'of'the'object.'' Proxy miss Resource( Last,modified( h"p://google.com/image.jpg0 1/2/201409:00:000 1/2/2018 9:00:00 …0 …0 HTTP$request$ GET$$/image.jpg$HTTP/1.1$ Host:$google.com$ …$ HTTP&request& GET$$/image.jpg$HTTP/1.1$ Host:$google.com$ …$ Proxy&server& Client&& HTTP$response$ HTTP/1.1'200'OK' Date:'30/1/2015'18:15:24' Date:'17/9/2013'10:15:24' 1/9/2018 18:15:24 Last7modified:'1/2/2014'9:00:00' Last8modified:'1/2/2012'9:00:00' 1/2/2018 9:00:00 Content8Type:'image/jpeg' Content7Type:'image/jpeg' ' data…..'' Web$server$ HTTP$response$ HTTP/1.1'200'OK' Date:'17/9/2013'10:15:24' 1/9/2018 18:15:24 Date:'30/1/2015'18:15:24' 1/2/2018 9:00:00 Last8modified:'1/2/2012'9:00:00' Last7modified:'1/2/2014'9:00:00' Content8Type:'image/jpeg' Content7Type:'image/jpeg' ' data…..'' Proxy$server$response$ The'Proxy'server' forwards'the'HTTP' response'to'the'Client.' Proxy hit Resource& Resource( Last6modified& Last,modified( h"p://google.com/image.jpg0 1/2/2018 9:00:00 1/2/201209:00:000 1/2/201409:00:000 …0 …0 HTTP&request& HTTP(request( GET00/image.jpg0HTTP/1.10 Host:0google.com0 If?modified?since:01/2/201209:00:000 [email protected][email protected]:01/2/201409:00:000 1/2/2018 9:00:00 Proxy&hit& HTTP&request& GET00/image.jpg0HTTP/1.10 Host:0google.com0 …0 Proxy&server& Client&& Web&server& The0Proxy0server0receives0HTTP0request0 from0Client,0the0requested0object0is0 found0to0be0cached.0But0is&the&cache&up6 to6date?& The0Proxy0server0checks0if0the0cached0 object0has0been0modified0by0asking0the0 Web0server:0“Please0send0me0the0object0 if0it0has0been0modified0since0the0(Last6 modified&date&@me)”.& Proxy hit Resource& Resource( Last6modified& Last,modified( h"p://google.com/image.jpg0 1/2/2018 9:00:00 1/2/201209:00:000 1/2/201409:00:000 …0 …0 HTTP&request& HTTP(request( GET00/image.jpg0HTTP/1.10 Host:0google.com0 If?modified?since:01/2/201209:00:000 [email protected][email protected]:01/2/201409:00:000 1/2/2018 9:00:00 HTTP&request& GET00/image.jpg0HTTP/1.10 Host:0google.com0 …0 Web&server& HTTP$response$ HTTP/1.1'304'Not$Modified$ Date:'30/1/2015'18:55:04' Date:'17/9/2013'10:15:24' Date: 1/9/2018 18:55:22 Proxy&server& Web$server$response$$ Client&& If#the#cached#object#is#not#modified,#the#Web#server# replies#“It#is#not#modified!”(Response$Code$=$304)## Note#that#the#HTTP#response#message#is#much$shorter$ than#sending#the#whole#image.# Proxy hit Resource& Resource( Last6modified& Last,modified( h"p://google.com/image.jpg0 1/2/2018 9:00:00 1/2/201209:00:000 1/2/201409:00:000 …0 …0 HTTP&request& HTTP(request( GET00/image.jpg0HTTP/1.10 Host:0google.com0 If?modified?since:01/2/201209:00:000 [email protected][email protected]:01/2/201409:00:000 1/2/2018 9:00:00 HTTP&request& GET00/image.jpg0HTTP/1.10 Host:0google.com0 …0 Web&server& HTTP$response$ HTTP/1.1'304'Not$Modified$ Date:'30/1/2015'18:55:04' Date:'17/9/2013'10:15:24' Date: 1/9/2018 18:55:22 Proxy&server& Client&& HTTP$response$ HTTP/1.1'200'OK' Date:'17/9/2013'10:15:24' Date:'30/1/2015'18:56:24' 1/9/2018 18:55:40 Last8modified:'1/2/2012'9:00:00' Last8modified:'1/2/2014'9:00:00' 1/2/2018 9:00:00 Content8Type:'image/jpeg' ' data…..'' Proxy$server$response$ The'Proxy'server'sends' the'cached'object'to'the' Client.' Reference in Computer Networking: A Top Down Approach (6th Edition) HTTP: Chapter 2.2.1- 2.2.3 Web caching: Chapter 2.2.5 ...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture