You've reached the end of your free preview.
Want to read all 33 pages?
Unformatted text preview: COMP 3322B
Modern Technologies
on World Wide Web
2nd semester 2018-2019
WWW Basics (O2) Dr. C Wu
Department of Computer Science
The University of Hong Kong Web
The Web, or World Wide Web Web browser (WWW, W3)
an Internet application
an information space where documents
and other web resources are identified
by URIs, interlinked by hypertext links,
and can be accessed via the Internet
A web page contains
a base HTML file
referenced objects
html files, JPEG image, audio/video clip, etc. each object is addressable by a URI Web server URI
URI (Uniform Resource Identifier): a string of characters
used to identify a resource, with syntax below
scheme:[//[user:[email protected]]hostname[:port]][/]path[?query][#fragment] along with the path, identify a resource; often used to
carry information in the format of “key=value” pairs identify a secondary resource
(e.g.,some portion or subset of the
primary resource) Examples:
URI (cont’d)
There are two types of URIs
URL (Uniform Resource Locator) — the most common form
of URI — in addition to identifying a resource, specifies the
means of locating the resource by specifying both its
primary access mechanism and network location
Examples:
http://
scheme or IP address
telnet://192.0.2.16:23/ URI (cont’d) URN (Uniform Resource Name) identifies a resource by name in
a particular namespace. A URN can be used to refer to a resource
without implying its location or how to access it.
e.g. the International Standard Book Number (ISBN) for uniquely identifying a
book is a URN, “urn:isbn:0-486-27557-4”
e.g., “urn:ietf:rfc:2141” is referring to IETF's RFC 2141 HTTP
Hypertext Transfer Protocol
Web’s application-layer protocol
includes two types of messages:
HTTP request and HTTP response Implemented in client-server model
client program
browser that requests, receives and
“displays” web objects
server program
Web server sends objects in
response to requests HT TP req ues
PC running HT
t
TP
res
Explorer
pon
se st que re
se Server
P
n
T
po
s
running
HT
e
r
TP
Apache Web
T
H
server
Mac running
Safari Client and server interaction HTTP%request%
File:%%/~cwu/c0322/index.html7
77 Server% First%HTTP%request%
Client%% The7first7HTTP7request7is7ini?ated7by7
the7user7:7by7typing7the7URL7in7the7
browser.7 h"p:// Client and server interaction <html>
''
''''''<head>'
''''''<link'rel="stylesheet"'type="text/css"'href="style.css">'
''''''</head>'
''''''<body>'
''''''''''<img'border="0"'src="picture/header.jpg"'>'
''''''''''<h1>This'is'a'sample'web'page.</h1>'
''''''</body>'
</html>'
/~cwu/c0322/index.html' HTTP%request%
File:%%/~cwu/c0322/index.html'' Server% Client%% Image'?'
This'is'a'sample'web'page.' HTTP%response%
ContentEtype'='“text/html”'
: text/html
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE'
<html>
''
''''''<head>'
''''''<link'rel="stylesheet"'type="text/css"'
href="style.css">'
…'
' Client and server interaction /~cwu/c0322/picture/header.jpg& HTTP%request%
File:%%/~cwu/c0322/picture/header.jpg&
&& Server% Client%% Image&?&
This&is&a&sample&web&page.& HTTP%response%
: image/jpeg
Content?type&=&“image/jpeg”&
??????????????????????????????????????????????????????&
(The&image&file)& Subsequence%HTTP%requests%
The&browser&receives&the&index.html&
page&and&con7nues%to%ini7ate%
subsequence%HTTP%requests%to&GET&
the&required&objects&from&the&server.& Client and server interaction h1%{%
%%%%%%font6family%:%Arial,%Helve<ca,%Times;%
%%%%%%font6size%:%20pt;%
%%%%%%font6style%:%italic;%
%%%%%%color%:%red;%%%%%%%
}%
/~cwu/c0322/style.css% HTTP%request%
File:%%/~cwu/c0322/style.css%% Server% Client%% This%is%a%sample%web%page.%
This is a sample web page. HTTP%response%
Content>type%=%“text/css”%
: text/css
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>%
(The%CSS%file)%
%
h1%{%
%%%%%%font>family%:%Arial,%HelveNca,%Times;%
%%%%%%font>size%:%20pt;%
%%%%%%font>style%:%italic;%
%%%%%%color%:%red;%%%%%%%
}% HTTP 1.1
HTTP 1.1 is the widely used HTTP version since its standardization
in 1997
multiple objects can be sent over the same TCP connection
between client and server (persistent connection)
Connect?' e''
s
n
o
esp R Server' File'request'and'
response'(1st'file)' Request'
on s
p
s
e
R e'' File'request'and'
response'(2nd'file)'
.... Client'' OK!'
Request' Connec&on'setup'
overhead'
A keep-alive time (for
the connection) is
specified in the headers
of the HTTP request and
response messages. HTTP/2
HTTP/2 is the newer version of HTTP, standardised in May
2015
developed based on Google’s SPDY protocol
supported by most major browsers by end of 2015; supported by
32.5% of the top 10 million websites as of January 2019
highly compatible with HTTP 1.1: most header fields, status codes,
methods, etc.
improving page load speed by
data compression of HTTP headers
server responds with more data than what the client requested
allowing multiple requests and responses to be in flight concurrently in
the same TCP connection
etc.
providing negotiation mechanisms for client and server to decide
HTTP 1.1 or HTTP/2 to use HTTP/2 HTTP 1.1 Connect? Client Connect? Client OK! Request ind
e
Resp x.html OK! Request inde
x on
Resp Server Server ,…
.html
g
der.jp
a
e
h
,
.html
x
e
d
se i n server responds with
more data than what the
client requested tml
h
.
x
e ind
o n se Request head
er.jpg
Resp r.jpg
e
d
a onse he
Connect? …
Client OK! Request inde
x.html
Request bo
oks
Res e p on s on
Resp se .html Server allow multiple requests
and responses to be in
flight concurrently in the
same TCP connection HTTP isAny'type'of'data'can'be'sent'by'HTTP'as'long
media independent client'and'server'know'how'to'handle'the'dat Any
type of data can be sent by HTTP as long as both
How'content'is'handled'is'determined'by'the
the client and the server know how to handle the data Internet'Mail'Extensions'(MIME)'specificaJon The data type is determined by ContentOType'='type/subtype :
Content&Type!=!type/subtype.
type$
text' subOtypes$
plain,'html,'css,'xml'…etc' image' jpeg,'gif'…etc' video' quicktime, etc.
mpeg,'quckJme…etc' …' …' It can be included as
one header in HTTP
request or response For'example,'
For example, if the content is an HTML web If'the'content'is'an'HTML'w
page, then content-type = “text/html”; is text/html;
then'contentOtype$=$“text/
if the content is a jpeg image, then contenttype = “image/jpeg”. isIf'the'content'is'a'jpeg'ima
image/jpeg.
then'contentOtype$=$“imag HTTP is stateless
Stateless means that the server maintains no
information about each client request
HTTP was originally designed to transmit static web
pages only (i.e., the same web page will not have
different contents for different clients)
Q: Then how can some web pages remember one user’s login session?
A: Using only HTTP cannot provide tailored response for different
client requests. Other web technologies are invented to keep the
“state” of each client session, e.g., “cookies” or “sessions”. HTTP request
Request$line'
Request'
Request
line
method' HTTP'Request'message'format'
Request'line' HTTP''
version' URL'
space' space' Request'
HTTP''
URL'
GET'''/~c0322/test/index.html'''HTTP/1.1'
method'
version'
space'
Example' space' \r\n' Blank'line'
\r\n' GET'''/~cwu/c0322/index.html'''HTTP/1.1' Header' Body' Example' Method$
GET$ DescripBon$
Request'a'Web'page.' HEAD$ Request'a'Web'page’s'header'only.'The'server'will'not'include'the'body'
of'the'web'page'in'its'response.' POST$ One'funcJon'of'POST'is'to'provide'a'block'of'data'to'the'server,'such'as'
the'data'entered'in'a'FORM,'to'a'data-handling'process'at'the'server.''
RFC7231
For'a'complete'list'please'refer'to'RFC2616'
h0p:// '
h=p://[email protected]/html/rfc7231#page-24 HTTP
request
HTTP request HTTP'Request'message'format'
HTTP'Request'message'format'
Request'line'
Request'line' Header
Header' Header' Header'type'
Header&type&
:" Header' :$value& value' GET'''/~c0322/test/index.htm'''HTTP/1.1'
GET$$$/~cwu/c0322/index.html$$$HTTP/1.1$
Host':'i.cs.hku.hk'
Host$:$
ConnecJon':'keep-alive'
Connec?on$:$keepAalive$
User-Agent':'Mozilla/5.0''
UserAAgent$:$Mozilla/5.0$$ \r\n&
& Blank'line'
\r\n'
' Body' Example'
Header$ Example$Values$ DescripBon$
i.cs.hku.hk'
Shows'the'host'and'port'number'(default'80)'of'the'resource'being'requested.' keep-alive'/'close' Whether'the'TCP'connecJon'should'be'closed'or'not'ager'retrieving'the'object' User-agent' Mozilla/5.0' The'name'of'the'client'who'iniJate'the'request.'
initiates the request Keep-Alive' 300' If'ConnecJon'is'“keep-alive”,'keep'the'TCP'connecJon'alive'for'300'seconds.' Referrer'
This'request'is'iniJated'by'clicking'a'hyperlink'in' .' Host'
ConnecJon' index.html' RFC7231
h=p://[email protected]/html/rfc7231#page-33 For'a'complete'list'please'refer'to$RFC2616$:$h0p:// ' HTTP request HTTP'Request'message'format'
Request'line' Blank line
Body
GET$$$/~cwu/c0322/index.html$$$HTTP/1.1$
Host$:$
Connec?on$:$keepAalive$
UserAAgent$:$Mozilla/5.0$$ There is no content in the “Body” part of this GET
HTTP request.
The “Body” can be used to carry data such as form
data submitted by the client. Header'
Blank'line'
Body' HTTP&Response&message&format& HTTP response Status$line' Status HTTP''
line
version' Status''
code' HTTP$$
version$ Status$$
space'
code$ space$
HTTP/1.1'''200'''OK'
'
HTTP/1.1'''200'''OK'
''
''
'
'' Example( Code$ Status&line& Status''
phrase' Status$$
phrase$ space'
space$ Header&
Blank&line&
\r\n' \r\n$ Body& Example' Phrases$ Meaning$ 200'' OK' The'request'is'successful' 404' Not'found' The'document'is'not'found.' 403' Forbidden' Service'denied' 500' Internal'server'error' Internal'server'error.' 301 Moved permanently Moved permanently
For'a'complete'list'please'refer'to'RFC2616'
RFC7231
h0p:// '
For a complete list, please refer to RFC7231 h=p://[email protected]/html/rfc7231#page-47 HTTP response
HTTP
response HTTP&Response&message&format& HTTP'Response'message'format'
Status&line& Status'line' Header Header' Header& Header&type& :" Header'type' Header' value& :$ Blank&line& value'
\r\n&
& HTTP/1.1'''200'''OK'
\r\n'
'
Date':'Fri,'30'Jan'2015'17:35:23'GMT'
Fri, 4 Jan 2019 17:35:23 GMT
HTTP/1.1'''200'''OK'
Server':'Apache/2.2.20'(Unix)'mod_ssl/2.2.20'OpenSSL/0.9.7d'
Date':'Mon,'06'Feb'2012'03:35:23'GMT'
KeepNAlive':'Omeout=5,'max=100'
Server':'Apache/2.2.20'(Unix)'mod_ssl/2.2.20'OpenSSL/0.9.7d'
ConnecOon':'KeepNAlive' Body& Keep-Alive':'Jmeout=5,'max=100'
ConnecJon':'Keep-Alive' Example'
Header$ Example$Values$ DescripBon$ Date' Fri, 4 Jan 2019 17:35:23
Mon,'06'Feb'2012'03:35:23' Date'and'Jme'the'message'was'sent.' Server' Apache/2.2.20'…' InformaJon'about'the'server.' Keep-Alive' Jmeout=5,'max=100' Keep'the'connecJon'for'100'sec,'but'if'more'than'5'secs'idle,'close'the'connecJon.' ConnecJon' keep-alive'/'close' Whether'the'TCP'connecJon'should'be'closed'or'not.' For'a'complete'list'please'refer'to$RFC2616$:$h0p:// '
RFC7231
h=p://[email protected]/html/rfc7231#page-64 32' HTTP response HTTP&Response&message&format&
Status&line& Blank line
Body
HTTP/1.1'''200'''OK'
Date':'Fri,'30'Jan'2015'17:35:23'GMT'
Fri, 4 Jan 2019 17:35:24 GMT
Server':'Apache/2.2.20'(Unix)'mod_ssl/2.2.20'OpenSSL/0.9.7d'
KeepNAlive':'Omeout=5,'max=100'
ConnecOon':'KeepNAlive'
'
<!DOCTYPE'HTML'PUBLIC'"N//W3C//DTD'HTML'4.01'TransiOonal//EN"'
>
"h]p:// ;'
<html>
'
''
''''''<head>…<head>'
''''''<body>…</body>'
</html>' Header&
Blank&line&
Body& Another HTTP response example
HTTP response with status code 301 301 Moved permanently
HTTP/1.1'''200'''OK'
Date':'Fri,'30'Jan'2015'17:35:23'GMT'
Fri, 4 Jan 2019 17:35:33 GMT
Server':'Apache/2.2.20'(Unix)'mod_ssl/2.2.20'OpenSSL/0.9.7d'
KeepNAlive':'Omeout=5,'max=100'
Location:
…
ConnecOon':'KeepNAlive'
'
<!DOCTYPE'HTML'PUBLIC'"N//W3C//DTD'HTML'4.01'TransiOonal//EN"'
>
"h]p:// ;'
<html>
'
''
<body>If you are not redirected within 5 seconds, please click <a
''''''<head>…<head>'
href=" ;. </body>
''''''<body>…</body>'
</html>' the location header
tells the browser
the new page to be
loaded Web caching
Web cache, also called “proxy server”, is a network
entity that responds to HTTP requests on behalf of an
origin Web server
a web browser can be configured so that all of the user’s
HTTP requests are first directed to a proxy server
the proxy server receives the HTTP requests from a client;
if the requested object is cached and up-to-date, the
proxy server will send the object to the client on behalf of
the destination web server Proxy&server& Client&& Web&server& Web caching
Purposes
speed up access to resources
keep machines behind proxy server anonymous, mainly for
security
log / audit usage, e.g. to provide company employee Internet
usage reporting
scan transmitted content for malware before delivery
Web&server& etc. Proxy&server& Client&& Proxy miss Proxy miss
Resource$ LastOmodified$ …' …' …' …' HTTP$request$
HTTP$request$
GET''/image.jpg'HTTP/1.1'
GET$$/image.jpg$HTTP/1.1$
Host:'google.com'
Host:$google.com$
…'
…$ Web$server$
Web$server$ Proxy&miss&
HTTP&request&
HTTP$request$
GET$$/image.jpg$HTTP/1.1$
GET''/image.jpg'HTTP/1.1'
Host:$google.com$
…$Host:'google.com' The$Proxy$server$checks$if$the$
Proxy$miss$
requested$object$(image.jpg$in$
google.com)$is$cached$or$not.$$ …' Proxy&server& Proxy$server$ Client&& Client$$ The'Proxy'server'checks'if'the'
If&it&is¬&cached,&it&forwards&the&
requested'object'(image.jpg'in'
HTTP&request&to&the&des9na9on&
google.com)'is'cached'or'not.''
Web&server.$ Client$browsers$can$be$configured$to$
If$it$is$not$cached,$it$forwards$the$
have$all$HTTP$requests$send$to$a$Proxy$
HTTP$request$to$the$desBnaBon$
server.$ Web$server.' Proxy miss
Resource( Last,modified( h"p://google.com/image.jpg0 1/2/201409:00:000
1/2/2018
9:00:00 …0 …0 HTTP&request&
GET$$/image.jpg$HTTP/1.1$
Host:$google.com$
…$ Proxy&server& HTTP$request$
GET$$/image.jpg$HTTP/1.1$
Host:$google.com$
…$ Web$server$ HTTP$response$
HTTP/1.1'200'OK'
Date:'17/9/2013'10:15:24'
Date:'30/1/2015'18:15:24'
1/9/2018 18:15:24
Last8modified:'1/2/2012'9:00:00'
Last7modified:'1/2/2014'9:00:00'
1/2/2018 9:00:00
Content8Type:'image/jpeg'
Content7Type:'image/jpeg'
'
data…..'' Proxy$server$caches$the$HTTP$response$
Client&& Proxy'server'receives'the'HTTP'response'and'keeps$a$
local$copy$of$the$object.'It'also'registers'the'“Last9
modified”'Nme'of'the'object.'' Proxy miss
Resource( Last,modified( h"p://google.com/image.jpg0 1/2/201409:00:000
1/2/2018
9:00:00 …0 …0 HTTP$request$
GET$$/image.jpg$HTTP/1.1$
Host:$google.com$
…$ HTTP&request&
GET$$/image.jpg$HTTP/1.1$
Host:$google.com$
…$ Proxy&server& Client&& HTTP$response$
HTTP/1.1'200'OK'
Date:'30/1/2015'18:15:24'
Date:'17/9/2013'10:15:24'
1/9/2018 18:15:24
Last7modified:'1/2/2014'9:00:00'
Last8modified:'1/2/2012'9:00:00'
1/2/2018 9:00:00
Content8Type:'image/jpeg'
Content7Type:'image/jpeg'
'
data…..'' Web$server$ HTTP$response$
HTTP/1.1'200'OK'
Date:'17/9/2013'10:15:24'
1/9/2018 18:15:24
Date:'30/1/2015'18:15:24'
1/2/2018 9:00:00
Last8modified:'1/2/2012'9:00:00'
Last7modified:'1/2/2014'9:00:00'
Content8Type:'image/jpeg'
Content7Type:'image/jpeg'
'
data…..'' Proxy$server$response$
The'Proxy'server'
forwards'the'HTTP'
response'to'the'Client.' Proxy hit Resource&
Resource( Last6modified&
Last,modified( h"p://google.com/image.jpg0 1/2/2018
9:00:00
1/2/201209:00:000
1/2/201409:00:000 …0 …0 HTTP&request&
HTTP(request(
GET00/image.jpg0HTTP/1.10
Host:0google.com0
If?modified?since:01/2/201209:00:000
[email protected]fi[email protected]:01/2/201409:00:000
1/2/2018 9:00:00 Proxy&hit& HTTP&request&
GET00/image.jpg0HTTP/1.10
Host:0google.com0
…0 Proxy&server& Client&& Web&server& The0Proxy0server0receives0HTTP0request0
from0Client,0the0requested0object0is0
found0to0be0cached.0But0is&the&cache&up6
to6date?&
The0Proxy0server0checks0if0the0cached0
object0has0been0modified0by0asking0the0
Web0server:0“Please0send0me0the0object0
if0it0has0been0modified0since0the0(Last6
modified&date&@me)”.& Proxy hit Resource&
Resource( Last6modified&
Last,modified( h"p://google.com/image.jpg0 1/2/2018
9:00:00
1/2/201209:00:000
1/2/201409:00:000 …0 …0 HTTP&request&
HTTP(request(
GET00/image.jpg0HTTP/1.10
Host:0google.com0
If?modified?since:01/2/201209:00:000
[email protected]fi[email protected]:01/2/201409:00:000
1/2/2018 9:00:00 HTTP&request&
GET00/image.jpg0HTTP/1.10
Host:0google.com0
…0 Web&server& HTTP$response$
HTTP/1.1'304'Not$Modified$
Date:'30/1/2015'18:55:04'
Date:'17/9/2013'10:15:24'
Date: 1/9/2018 18:55:22 Proxy&server& Web$server$response$$
Client&& If#the#cached#object#is#not#modified,#the#Web#server#
replies#“It#is#not#modified!”(Response$Code$=$304)##
Note#that#the#HTTP#response#message#is#much$shorter$
than#sending#the#whole#image.# Proxy hit Resource&
Resource( Last6modified&
Last,modified( h"p://google.com/image.jpg0 1/2/2018
9:00:00
1/2/201209:00:000
1/2/201409:00:000 …0 …0 HTTP&request&
HTTP(request(
GET00/image.jpg0HTTP/1.10
Host:0google.com0
If?modified?since:01/2/201209:00:000
[email protected]fi[email protected]:01/2/201409:00:000
1/2/2018 9:00:00 HTTP&request&
GET00/image.jpg0HTTP/1.10
Host:0google.com0
…0 Web&server& HTTP$response$
HTTP/1.1'304'Not$Modified$
Date:'30/1/2015'18:55:04'
Date:'17/9/2013'10:15:24'
Date: 1/9/2018 18:55:22 Proxy&server& Client&& HTTP$response$
HTTP/1.1'200'OK'
Date:'17/9/2013'10:15:24'
Date:'30/1/2015'18:56:24'
1/9/2018 18:55:40
Last8modified:'1/2/2012'9:00:00'
Last8modified:'1/2/2014'9:00:00'
1/2/2018 9:00:00
Content8Type:'image/jpeg'
'
data…..'' Proxy$server$response$
The'Proxy'server'sends'
the'cached'object'to'the'
Client.' Reference in Computer Networking: A Top Down
Approach (6th Edition)
HTTP: Chapter 2.2.1- 2.2.3
Web caching: Chapter 2.2.5 ...
View
Full Document
- Fall '13
- Dr. C. Wu
- Computer Science, World Wide Web, Web server