Transport - Transport Layer Goals: understand principles...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Transport Layer Goals: understand principles understand behind transport layer services: multiplexing/demultiple multiplexing/demultiple xing reliable data transfer reliable flow control congestion control congestion Related performance Related issues instantiation and instantiation implementation in the Internet Overview: transport layer services transport multiplexing/demultiplexing multiplexing/ connectionless transport: UDP connectionless connection-oriented transport: connection-oriented TCP reliable transfer reliable flow control connection management connection principles of congestion control principles TCP congestion control TCP TCP performance issues TCP Transport services and protocols provide logical communication provide logical between app processes processes running on different hosts transport protocols run in end transport systems (primarily) transport vs network layer services: network layer: data transfer network data between end systems transport layer: data transfer transport data between processes relies on, enhances, relies network layer services application transport network data link physical network data link physical network data link physical network data link physical l ca gi lo d en den network data link physical t or sp an tr network data link physical application transport network data link physical Transport-layer protocols Internet transport services: reliable, in-order unicast reliable, delivery (TCP) congestion congestion flow control connection setup connection unreliable (“best-effort”), unreliable unordered unicast or multicast delivery: UDP services not available: services real-time real-time bandwidth guarantees bandwidth reliable multicast reliable application transport network data link physical network data link physical network data link physical network data link physical l ca gi lo d en den network data link physical t or sp an tr network data link physical application transport network data link physical Multiplexing/demultiplexing Demultiplexing at rcv host: delivering received segments to correct socket Recall: segment - unit of data exchanged segment unit between transport layer entities application-layer data segment header application Ht M Multiplexing at send host: gathering data from multiple sockets, enveloping data with header (later used for demultiplexing) = socket application transport network link physical P2 P4 application transport network link physical = process P3 P1 P1 transport network link physical Hn segment host 1 host 2 host 3 Multiplexing: gathering data from multiple app processes, enveloping data with header (later used for demultiplexing) Multiplexing/demultiplexing 32 bits source port # dest port # multiplexing/demultiplexing: based on sender, receiver based port numbers, IP addresses source, dest port #s in each source, segment other header fields recall: well-known port numbers for specific applications application data (message) TCP/UDP segment format Connectionless demultiplexing Create sockets with port Create numbers: DatagramSocket mySocket1 = new DatagramSocket(99111); DatagramSocket mySocket2 = new DatagramSocket(99222); UDP socket identified by twoUDP tuple: (dest IP address, dest port number) When host receives UDP When segment: checks destination port checks number in segment directs UDP segment to directs socket with that port number IP datagrams with different IP source IP addresses and/or source port numbers directed to same socket Connectionless demux (cont) DatagramSocket serverSocket = new DatagramSocket(6428); P3 P3 P1 P1 SP: 6428 DP: 9157 SP: 9157 DP: 6428 SP: 6428 DP: 5775 SP: 5775 DP: 6428 client IP: A server IP: C Client IP:B SP provides “return address” Connection-oriented demux TCP socket identified by TCP 4-tuple: source IP address source source port number source dest IP address dest dest port number dest recv host uses all four values recv to direct segment to appropriate socket Server host may support many Server simultaneous TCP sockets: each socket identied by its each own 4-tuple Web servers have different Web sockets for each connecting client non-persistent HTTP will non-persistent have different socket for each request Connection-oriented demux (cont) P3 P3 P4 P1 P1 SP: 80 DP: 9157 SP: 9157 DP: 80 SP: 80 DP: 5775 SP: 5775 DP: 80 client IP: A server IP: C Client IP:B Multiplexing/demultiplexing: examples host A source port: x dest. port: 23 server B WWW client host C source port:23 dest. port: x port use: simple telnet app Source IP: C Dest IP: B source port: y dest. port: 80 Source IP: C Dest IP: B source port: x dest. port: 80 WWW client host A Source IP: A Dest IP: B source port: x dest. port: 80 WWW server B port use: WWW server UDP: User Datagram Protocol [RFC 768] “no frills,” “bare bones” Internet transport protocol “best effort” service, UDP segments may be: lost lost delivered out of order to delivered app connectionless: connectionless: no handshaking between no UDP sender, receiver each UDP segment each handled independently of others Why is there a UDP? no connection establishment no (which can add delay) simple: no connection state simple: at sender, receiver => require less resource and easier to implement small segment header => small less transmission overhead no congestion control: UDP no can blast away as fast as desired => can be harmful to other TCP connections TCP can t support multicast TCP support UDP: more often used for streaming often multimedia apps loss tolerant loss Length, in bytes of UDP rate sensitive rate segment, other UDP uses (why?): other including DNS DNS SNMP SNMP reliable transfer over UDP: reliable add reliability at application layer application-specific error application-speci recover! header length 32 bits source port # dest port # checksum Application data (message) UDP segment format UDP checksum Goal: detect “errors” (e.g., flipped bits) in transmitted detect segment Sender: treat segment contents as sequence of 16treat bit integers checksum: addition (1 s complement sum) checksum: complement of segment contents Receiver: compute checksum of received compute segment check if computed checksum check equals checksum field value: NO - error detected NO YES - no error detected. YES But maybe errors nonetheless? More later …. More sender puts checksum value into UDP sender checksum field Include pseudo IP header in computing Include check-sum Why is this necessary ? esp. when the IP Why header already contains its own checksum Note Note Internet Checksum Example When adding numbers, a carryout from the most significant bit When needs to be added to the result Example: add two 16-bit integers Example: 11110011001100110 11101010101010101 wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 sum 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0 checksum 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 Principles of Reliable data transfer important in app., transport, link layers important top-10 list of important networking topics! top-10 characteristics of unreliable channel, e.g. packet loss, delay, duplication, characteristics channel, corruption, out of order delivery, will determine complexity of reliable data transfer protocol (rdt) ; (rdt) Rdt1.0: reliable transfer over a reliable channel underlying channel perfectly reliable underlying no bit errors no no loss of packets no separate actions for sender, receiver: separate sender sends data into underlying channel sender receiver read data from underlying channel receiver We have ignored flow-control issue here We Rdt2.0: channel with bit errors underlying channel may flip bits in packet underlying recall: UDP checksum to detect bit errors recall: the question: how to recover from errors: the question: acknowledgements (ACKs): receiver explicitly tells sender acknowledgements receiver that pkt received OK negative acknowledgements (NAKs): receiver explicitly tells negative receiver sender that pkt had errors sender retransmits pkt on receipt of NAK sender human scenarios using ACKs, NAKs? human new mechanisms in rdt2.0 (beyond rdt1.0): new rdt2.0 (beyond rdt1.0 error detection error receiver feedback: control msgs (ACK,NAK) rcvr->sender receiver rdt2.0 has a fatal flaw! What happens if ACK/NAK corrupted? sender doesn t know what sender know happened at receiver! can t just retransmit: possible can just duplicate What to do? sender ACKs/NAKs sender receiver s ACK/NAK? What if ACK/NAK? sender ACK/NAK lost? retransmit, but this might retransmit, cause retransmission of correctly received pkt! Handling duplicates: sender adds sequence sender sequence number to each pkt to sender retransmits current sender pkt if ACK/NAK garbled receiver discards (doesn t receiver deliver up) duplicate pkt stop and wait Sender sends one packet, then waits for receiver response rdt2.1: discussion Sender: seq # added to pkt seq two seq. # s (0,1) will suffice. two (0,1) must check if received must ACK/NAK corrupted twice as many states twice state must “remember” state whether “current” pkt has 0 or 1 seq. # Receiver: must check if received must packet is duplicate state indicates whether 0 state or 1 is expected pkt seq # seq.# is needed because seq.# receiver can not know if its not know last ACK/NAK received OK at sender rdt2.2: a NAK-free protocol same functionality as same rdt2.1, using ACKs only instead of NAK, receiver instead sends ACK for last pkt received OK receiver must receiver explicitly include seq include # of pkt being ACKed duplicate ACK at sender duplicate results in same action as NAK: retransmit retransmit current pkt rdt3.0: channels with errors and loss Approach: Stop-and-wait Stop-and-wait assumption: underlying channel/network can corrupt as well as lose packets (data or ACKs) checksum, seq. #, ACKs, checksum, retransmissions will be of help, but not enough Q: how to deal with loss? how sender send a packet and then waits “reasonable” amount of time for ACK from receiver ; drawbacks? retransmits if no ACK received retransmits in this time if pkt (or ACK) just delayed (not if lost): retransmission will be retransmission duplicate, but use of seq. # s already handles this already receiver must specify seq # receiver of pkt being ACKed requires countdown timer requires rdt3.0 in action rdt3.0 in action Rdt3.0 till not able to deal with packets which arrive after excessive Rdt3.0 delay => possible remedy: not to reuse sequence number “too” soon ; Basic Mechanisms used in RDT protocols to tackle channel/network imperfections Very important question: Very What are the assumptions on channel/network characteristics ? What Use segment/packet checksum to detect packet error/corruption Use Use feedback (ACK messages) from receiver side to indicate Use proper/problematic receipt of segments Use sequence numbers in transmitted segment as well as in ACK Use messages to detect and resolve ambiguity caused by packet loss, duplication, excessive delay and out of order delivery Unexpected sequence number for received packets implies packet loss/ Unexpected duplication/ out of order delivery ACK with unexpected sequence number indicates to the sender about ACK delivery problems Use retransmissions and timer to recover from packet error, loss, Use excessive delay etc. rdt3.0: stop-and-wait operation sender first packet bit transmitted, t = 0 last packet bit transmitted, t = L / R first packet bit arrives last packet bit arrives, send ACK receiver RTT ACK arrives, send next packet, t = RTT + L / R U sender = L/R RTT + L / R = .008 30.008 = 0.00027 Performance of rdt3.0 rdt3.0 works, but performance stinks rdt3.0 example: 1 Gbps link, 15 ms e2e prop. delay, 1KB packet: example: Ttransmit = L (packet length in bits) 8kb/pkt = = 8 microsec R (transmission rate, bps) 10**9 b/sec U sender = L/R RTT + L / R = .008 30.008 = 0.00027 U sender: utilization – fraction of time sender busy sending sender 1KB pkt every 30 msec -> 33kB/sec thruput over 1 Gbps link 1KB network protocol limits use of physical resources! network Not as Bad if multiple parallel connections are allowed to share Not the link Pipelined protocols Pipelining: sender allows multiple, “in-flight”, yet-to-beacknowledged pkts range of sequence numbers must be increased range buffering at sender and/or receiver buffering Two generic forms of pipelined protocols: go-Back-N, selective Two go-Back-N, repeat Pipelining: increased utilization sender first packet bit transmitted, t = 0 last bit transmitted, t = L / R first packet bit arrives last packet bit arrives, send ACK last bit of 2nd packet arrives, send ACK last bit of 3rd packet arrives, send ACK receiver RTT ACK arrives, send next packet, t = RTT + L / R Increase utilization by a factor of 3! U sender = 3*L/R RTT + L / R = .024 30.008 = 0.0008 • Here, the sender is allowed to send a max. of 3 packets ( = window size) out without hearing ack from the receiver Go-Back-N Sender: k-bit seq # in pkt header k-bit “window” of up to N, consecutive unack ed pkts allowed ed ACK(n): ACKs all pkts up to, including seq # n - “cumulative ACK” ACK(n): may receive duplicate ACKs (see receiver) may timer for each in-flight pkt timer timeout(n): retransmit pkt n and all higher seq # pkts in window timeout(n): retransmit GBN: receiver receiver simple: ACK-only: always send ACK for correctly-received pkt with ACK-only: highest in-order seq # in-order seq may generate duplicate ACKs may need only remember expectedseqnum need expectedseqnum out-of-order pkt: out-of-order discard (don t buffer) -> no receiver buffering! discard buffer) ACK pkt with highest in-order seq # ACK GBN in action Selective Repeat receiver individually acknowledges all correctly received pkts receiver individually acknowledges buffers pkts, as needed, for eventual in-order delivery to buffers upper layer sender only resends pkts for which ACK not received sender sender timer for each unACKed pkt sender sender window sender N consecutive seq # s consecutive again limits seq #s of sent, unACKed pkts again Need more bits in the ACK message to signify which packets Need have/have not been received. Selective repeat: sender, receiver windows Selective repeat sender data from above : if next available seq # in if window, send pkt timeout(n): resend pkt n, restart timer resend ACK(n) in [sendbase,sendbase+N]: mark pkt n as received mark if n == smallest unACKed if pkt, advance window base to next unACKed seq # receiver pkt n in [rcvbase, rcvbase+N-1] send ACK(n) send out-of-order: buffer out-of-order: in-order: deliver (also deliver in-order: buffered, in-order pkts), advance window to next notyet-received pkt pkt n in [rcvbase-N,rcvbase-1] ACK(n) ACK(n) otherwise: ignore ignore Selective repeat in action Go-Back-N Vs. Selective Repeat Try to compare the 2 protocols in the following dimensions: Implementation Complexity Implementation Per packet/segment overhead, i.e. header size Per Inefficiency due to unnecessary retransmissions Inef Is one protocol always better than the other ? Under what kind of networks, (e.g. in terms of size, delay, speed, loss probability) will Go-back-N perform better than Selective Repeat ? Under what kind of networks will Selective Repeat perform better than Go-back-N ? Dilemma for Selective-repeat protocol Example: seq # s: 0, 1, 2, 3 seq s: window size=3 window receiver sees no receiver difference in two scenarios! incorrectly passes incorrectly duplicate data as new in (a) Q: what relationship between size of seq # space and window size? S >= 2W Requirements for Sliding Window Protocol Design Max. Window size (W) should be large enough to avoid under-utilization Max. of the link-bandwidth (BW) max. throughput allowed by a sliding window protocol = W / RTT ; if this is max. less than the link-bandwidth (BW), we can never fully utilize the link, i.e. no good ! (in this interpretation, make sure correct units are used for each quantity) therefore should be W/RTT >= BW ------- (1) # of bits required for encoding the window-size in packet header. of To avoid Sliding Window Dilemma, To if the receiver does buffer out of order arrivals, the size of sequence number space (S) has to be at least twice as big as the max. window size (W), i.e. S >= 2W --------------(2) ; On the other hand, if the receiver simply discard out-of-order arrivals, it is sufficient to have S > W -----------------(2 ) In practice, reliable transport protocols, such as TCP, do buffer out of seq. arrival, so it is safer to use (2) instead of (2 ). ). Requirements for Sliding Window Protocol Design (cont d) In networks where packets can be delayed excessively, e.g. in the In Internet, the max. number of sequence numbers (S) should be large enough to avoid confusion due to pre-mature sequence number reused. The Internet uses the notion of Max. Segment Lifetime The (MSL) to quantify this design issue: it is assumed that any packet cannot live in the network for more than MSL, (typical values of MSL ranges from 30 ~ 120 secs). If a packet does not arrive within a period of MSL, it is safe If to assume that it will never arrive If we can make sure we have enough sequence numbers If (i.e. no need to re-use the same seq.#) for each newly sent packet (or bytes) in the most stressful situation, i.e. we are sending as much and as fast as allowed by the link bandwidth within a period of 2*MSL, (why *2 ?) we can avoid confusion caused by sequence number reused. Requirements for Sliding Window Protocol Design (cont d) This previous requirement imposes the following relation between: This MSL (in sec), BW (in bits/sec), S and P = number of bytes covered by a given sequence e.g. for TCP, each byte is assigned a sequence number, therefore P = 1 ; for a protocol which sends constant packet size of n bytes and assigns 1 seq. # per packet, P = n ; 2 * MSL * BW = max. # of bits one can transmit into the link in a period of MSL 2*MSL thus max. # of seq. # s required at the same period = 2*MSL*BW / (8*P) thus required and this should be less than S, i.e. 2 * MSL * BW / (8*P) < S ------------- (3) and N.B.: in some scenario, excessive delay is impossible, e.g. a point-to-point N.B.: link directly connecting the transmitter and receiver, MSL is not an issue. N.B.: to represent a seq. # space of size S, only log2(S) bits are required N.B.: (to be carried in the protocol header) Depending on the situation Eqs. (1), (2) or (2 ) and/or (3) can be used to Depending and/or determine the header field-size requirements of a protocol. TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 point-to-point: point-to-point: one sender, one receiver one reliable, in-order byte stream: reliable, byte no “message boundaries” no pipelined: pipelined: TCP congestion and flow TCP control set window size full duplex data: full bi-directional data flow in bi-directional same connection MSS: maximum segment MSS: size connection-oriented: connection-oriented: handshaking (exchange handshaking of control msgs) init s sender, receiver state before data exchange flow controlled: sender will not overwhelm sender receiver TCP reliable data transfer TCP creates rdt service on top TCP of IP s unreliable service unreliable Pipelined segments Pipelined Cumulative acks Cumulative TCP uses single retransmission TCP timer Retransmissions are triggered Retransmissions by: timeout events timeout duplicate acks duplicate Initially consider simplified TCP Initially sender: ignore duplicate acks ignore ignore flow control, congestion ignore control TCP seq. # s and ACKs Seq. # s: s: byte stream “number” byte of first byte in segment s data data ACKs: seq # of next byte seq expected from other side cumulative ACK cumulative Q: how receiver handles outof-order segments A: TCP spec doesn t A: say, - up to implementation User types ‘C’ Host A Seq=4 2, AC K=79, Host B data = ‘C’ S 9, A eq = 7 CK= ata 43, d = ‘C’ host ACKs receipt of ‘C’, echoes back ‘C’ host ACKs receipt of echoed ‘C’ Seq=4 3, AC K=80 simple telnet scenario time TCP sender events: data rcvd from app: Create segment with seq # Create seq # is byte-stream number of seq first data byte in segment start timer if not already running start (think of timer as for oldest unacked segment) expiration interval: expiration TimeOutInterval timeout: retransmit segment that caused retransmit timeout restart timer restart Ack rcvd: If acknowledges previously If unacked segments update what is known to be update acked start timer if there are start outstanding segments NextSeqNum = InitialSeqNum SendBase = InitialSeqNum loop (forever) { switch(event) event: data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data) event: timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } } /* end of loop forever */ TCP sender (simplified) Comment: • SendBase-1: last cumulatively ack’ed byte Example: • SendBase-1 = 71; y= 73, so the rcvr wants 73+ ; y > SendBase, so that new data is acked TCP: retransmission scenarios Host A Seq = 9 2, 8 b Host B ytes d Host A Seq = 92, 8 Host B bytes Seq=92 timeout ata Seq= 100, 2 data timeout ACK =100 0 byt es da ta loss Seq = 9 2, 8 b ytes d ata X 00 =1 0 CK CK=12 AA Seq=92 timeout = ACK 100 Sendbase = 100 SendBase = 120 Seq=9 2, 8 b ytes d ata A =1 CK 20 SendBase = 100 SendBase = 120 time lost ACK scenario time premature timeout TCP retransmission scenarios (more) Host A Seq=9 2, 8 b Host B ytes d ata timeout Seq = 1 loss = ACK 120 X 100 CK= A 00, 20 bytes data SendBase = 120 time Cumulative ACK scenario TCP segment structure 32 bits URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) RST, SYN, FIN: connection estab (setup, teardown commands) Internet checksum (as in UDP) source port # dest port # sequence number head not UA P R S F len used acknowledgement number rcvr window size ptr urgent data checksum counting by bytes of data (not segments!) # bytes rcvr willing to accept aka Advertised Window Options (variable length) application data (variable length) TCP Flow Control flow control sender won’t overrun receiver’s buffers by transmitting too much, too fast receiver: explicitly informs sender of (dynamically changing) amount of free buffer space rcvr window rcvr size field in TCP field in segment (also called the Advertised Window field) sender: amount of transmitted, unACKed data ; less than most recently-receiver rcvr rcvr window size recevier buffering TCP Round Trip Time and Timeout Q: how to set TCP timeout how value? longer than RTT longer note: RTT will vary note: too short: premature too timeout unnecessary unnecessary retransmissions too long: slow reaction to too segment loss Q: how to estimate RTT? how SampleRTT: measured time from SampleRTT segment transmission until ACK receipt ignore retransmissions, ignore cumulatively ACKed segments SampleRTT will vary, want SampleRTT will estimated RTT “smoother” use several recent use measurements, not just current SampleRTT TCP Round Trip Time and Timeout EstimatedRTT = (1-a)*EstimatedRTT + a*SampleRTT Exponential weighted moving average Exponential influence of given sample decreases exponentially fast in typical value of a: 0.125 = 1/8 typical Setting the timeout RTT plus “safety margin” RTT large variation in EstimatedRTT -> larger safety margin large EstimatedRTT larger Timeout = EstimatedRTT + 4*Deviation Deviation = (1-b)*Deviation + b*abs(SampleRTT-EstimatedRTT) (typically, b = 0.25 = 1/4) Example RTT estimation: RTT: to 350 300 RTT (milliseconds) 250 200 150 100 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 time (seconnds) SampleRTT Estimated RTT TCP Round Trip Time and Timeout In practice, due to the use of coarse-grain timer by most Operating System: All RTT measurement and retransmission timeout (RTO) values are All scheduled at 500msec granularity ; When a connection is setup, RTO is initially set at 3 sec per When RFC2988; => Minimal RTO = 0.5 ~ 1 sec ; 1 ~1.5 sec quite common ; Waiting (idle) for timeout to do retransmission becomes very costly Waiting (bandwidth inefficient), especially in high bandwidth networks !! Partially overcome by fast-retransmission/recovery as in TCP-reno Partially (more discussions later on) TCP Connection Management Recall: TCP sender, receiver TCP establish “connection” before exchanging data segments initialize TCP variables: initialize seq. #s seq. buffers, flow control info buffers, (e.g. RcvWindow) RcvWindow client: connection initiator client: connection Socket clientSocket = Socket new Socket("hostname","po rt number"); server: contacted by client server: contacted Socket Socket connectionSocket = welcomeSocket.accept( ); Three way handshake: Step 1: client end system sends client TCP SYN control segment to server specifies initial seq # Why? Why? speci Step 2: server end system receives server SYN, replies with SYNACK control segment ACKs received SYN ACKs allocates buffers allocates specifies server-> receiver speci initial seq. # Why? Step 3: client end sends ACK (can Step client be combined with client s first request message) SYN flood Denial-of-Service attack SYN TCP Connection Management (cont.) Closing a connection: client closes socket: clientSocket.close(); Step 1: client end system client sends TCP FIN control segment to server timed wait closing client FIN server ACK FIN closing Step 2: server receives FIN, server replies with ACK. Closes connection, sends FIN. ACK closed closed TCP Connection Management (cont.) Step 3: client receives FIN, client replies with ACK. Enters “timed wait” - will Enters respond with ACK to received FINs Step 4: server, receives ACK. server Connection closed. client server FIN closing ACK FIN closing timed wait ACK closed Why is it necessary to have the “timed wait” period ? closed Principles of Congestion Control Congestion: informally: “too many sources sending too much data too fast for informally: network to handle” to different from flow control! different manifestations: manifestations: lost packets (buffer overflow at routers) lost long delays (queueing in router buffers) long A top-10 problem! top-10 Causes/costs of congestion two senders, two two receivers one router, infinite one buffers no retransmission no large delays when large congested maximum maximum achievable throughput Causes/costs of congestion: scenario 3 Another “cost” of congestion: when packet dropped, any “upstream transmission capacity when used for that packet was wasted! Approaches towards congestion control Two broad approaches towards congestion control: End-to-end congestion control: no explicit feedback from no network congestion inferred from congestion end-system observed loss, delay approach taken by TCP approach Network-assisted congestion control: routers provide feedback to routers end systems single bit indicating single congestion (SNA, DECbit, TCP/IP ECN, ATM) explicit rate sender explicit should send at Case study: ATM ABR congestion control ABR: available bit rate: “elastic service” if sender s path if path “underloaded”: sender should use sender available bandwidth if sender s path congested: if path sender throttled to sender minimum guaranteed rate RM (resource management) cells: sent by sender, interspersed sent with data cells bits in RM cell set by switches bits (“network-assisted”) NI bit: no increase in rate NI (mild congestion) CI bit: congestion indication CI RM cells returned to sender by RM receiver, with bits intact Case study: ATM ABR congestion control two-byte ER (explicit rate) field in RM cell two-byte congested switch may lower ER value in cell congested sender send rate thus minimum supportable rate on path sender send EFCI bit in data cells: set to 1 in congested switch EFCI if data cell preceding RM cell has EFCI set, sender sets CI bit if in returned RM cell ATM ABR support flow/congestion control but NOT reliable data delivery ATM (only aim to provide “low” cell loss) TCP Congestion Control end-end control (no network assistance) end-end sender limits transmission: sender LastByteSent-LastByteAcked LastByteSent-LastByteAcked W = min(CongWin, rcvrWin) min(CongWin CongWin is dynamic, function of perceived network congestion CongWin is rcvr Win is advertised by the receiver for flow control purpose, I.e. not rcvr is letting the sender to overrun the receiver Roughly speaking, the rate (TCP throughput) is proportional to W RTT Bytes/sec TCP congestion control: “Probing” for usable bandwidth: ideally: transmit as fast as ideally: possible (Congwin as large as as possible) without loss increase Congwin until packet increase Congwin until loss (Assuming packet loss is always due to congestion !! not always true, e.g. wireless env.) loss: decrease Congwin, then loss: decrease Congwin begin probing (increasing) again How does sender recognize packet loss event ? loss event = timeout or loss or 3 duplicate acks duplicate Two “phases” of CongWin Two CongWin Increase slow start (exponential) slow congestion avoidance (linear) congestion Q: When should the exponential increase switch to linear? A: When CongWin gets to 1/2 of its CongWin gets value before timeout. Important variables: Important CongWin CongWin threshold: defines threshold threshold: de between two slow start phase, congestion control phase At loss event, threshold is At threshold is set to 1/2 of CongWin just CongWin just before loss event TCP Slow Start When connection begins, When CongWin = 1 MSS MSS Example: Example: MSS = 500 bytes & MSS RTT = 200 msec initial rate = 20 kbps initial Available bandwidth may Available be >> MSS/RTT desirable to quickly ramp desirable up to respectable rate When connection begins, When increase rate exponentially fast until first loss event or when threshold is reached TCP Slow Start (more) When connection begins, When increase rate exponentially until first loss event: double CongWin every RTT double CongWin every done by incrementing done CongWin for every ACK for received RTT Host A Host B one segm ent two segm ents Slowstart algorithm initialize: Congwin = 1 for (each segment ACKed) Congwin++ until (loss event OR CongWin > threshold) Summary: initial rate is slow Summary: initial but ramps up exponentially fast, i .e. not so slow after-all ! four segm ents time loss event: timeout or three loss duplicate ACKs TCP Congestion Avoidance Congestion avoidance /* slowstart is over */ /* Congwin > threshold */ Until (loss event) { every w segments ACKed: Congwin++ } threshold = Congwin/2 Congwin = 1 1 perform slowstart /* behavior of TCP-Tahoe */ 1: TCP Reno skips slowstart (fast recovery) after three duplicate ACKs Fast Retransmit/Recovery in TCP-Reno After 3 dup ACKs: After CongWin is cut in half CongWin is window then grows window linearly But after timeout event: But after CongWin instead set to CongWin instead 1 MSS; window then grows window exponentially to a threshold, then to grows linearly Philosophy: • 3 dup ACKs indicates network capable of delivering some segments • timeout before 3 dup ACKs is “more alarming” TCP Tahoe Vs. Reno 14 congestion window size (segments) 12 10 8 6 4 2 0 TCP Reno threshold TCP Tahoe 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Transmission round TCP AIMD Additive Increase: increase increase CongWin by 1 MSS every by RTT in the absence of loss events: probing probing congestion window 24 Kbytes Multiplicative Decrease: cut cut CongWin in half after loss in event 16 Kbytes 8 Kbytes time Long-lived TCP connection Summary: TCP Congestion Control Three Basic Mechanisms: Slow Start Slow Conservative after timeout events Conservative AIMD AIMD When CongWin is below Threshold, sender in slow-start When CongWin is Threshold phase, window grows exponentially. When CongWin is above Threshold, sender is in congestionWhen CongWin is Threshold avoidance phase, window grows linearly. When a triple duplicate ACK occurs, Threshold set to When Threshold set CongWin/2 and CongWin set to Threshold. and CongWin set Threshold When timeout occurs, Threshold set to CongWin/2 and When Threshold set CongWin/2 and CongWin is set to 1 MSS. is AIMD TCP congestion avoidance: AIMD: additive increase, AIMD: additive multiplicative decrease increase window by 1 increase MSS per RTT decrease window by decrease factor of 2 on loss event TCP Fairness Fairness goal: if N TCP sessions share same bottleneck link, each should get 1/N of link capacity TCP connection 1 TCP connection 2 bottleneck router capacity R Is TCP fair ? Two competing sessions: ASSUMPTION: same RTT for each sessions Additive increase gives slope of 1, as throughput increases Additive multiplicative decrease decreases throughput proportionally multiplicative R Connection 2 throughput equal bandwidth share loss: decrease window by factor of 2 congestion avoidance: additive increase loss: decrease window by factor of 2 congestion avoidance: additive increase Connection 1 throughput R Fairness (more) Fairness and UDP Multimedia apps often Multimedia do not use TCP do not want rate throttled do by congestion control Fairness and parallel TCP connections nothing prevents app from nothing opening parallel cnctions between 2 hosts. Web browsers do this Web Example: link of rate R supporting Example: 9 connections; new app asks for 1 TCP, gets new rate R/10 new app asks for 11 TCPs, gets new R/2 ! Instead use UDP: Instead pump audio/video at pump constant rate, tolerate packet loss Research area: TCP Research friendly Delay modeling Q: How long does it take to How receive an object from a Web server after sending a request? Ignoring congestion, delay is influenced by: TCP connection establishment TCP data transmission delay data slow start slow Notation, assumptions: Assume one link between Assume client and server of rate R S: MSS (bits) S: O: object size (bits) O: no retransmissions (no loss, no no corruption) Window size: First assume: fixed First congestion window, W segments Then dynamic window, Then modeling slow start Fixed congestion window (1) First case: WS/R > RTT + S/R: ACK for first segment in window returns before window s worth of data sent worth delay = 2RTT + O/R Fixed congestion window (2) Second case: WS/R < RTT + S/R: WS/R wait for ACK after sending window s worth of data worth sent delay = 2RTT + O/R + (K-1)[S/R + RTT - WS/R] where K is the number of windows that cover the object. HTTP Delay Modeling Assume Web page consists of: Assume 1 base HTML page (of size O bits) base bits) M images (each of size O bits) images bits) Non-persistent HTTP: Non-persistent M+1 TCP connections in series M+1 TCP Response time = (M+1)O/R + (M+1)2RTT + sum of idle times Response Persistent HTTP with pipelining: Persistent 2 RTT to request and receive base HTML file RTT to 1 RTT to request and receive M images RTT to Response time = (M+1)O/R + 3RTT + sum of idle times Response Non-persistent HTTP with X parallel connections Non-persistent Suppose M/X is an integer. Suppose 1 TCP connection for base file TCP M/X sets of parallel connections for images. M/X Response time = (M+1)O/R + (M/X + 1)2RTT + sum of idle times Response HTTP Response time (in seconds) RTT = 100 msec, O = 5 Kbytes, M=10 and X=5 20 18 16 14 12 10 8 6 4 2 0 non-persiste nt e persistent parallel nonpersistent 28 100 1 Mbps 10 Kbps Kbps Mbps For low bandwidth, connection & response time dominated by transmission time. Persistent connections only give minor improvement over parallel connections. HTTP Response time (in seconds) RTT =1 sec, O = 5 Kbytes, M=10 and X=5 70 60 50 40 30 20 10 0 28 100 1 Mbps 10 Kbps Kbps Mbps persistent parallel nonpersistent non-persiste nt e For larger RTT, response time dominated by TCP establishment & slow start delays. Persistent connections now give important improvement: particularly in high delay•bandwidth networks. Remaining problems with High Bandwidth-Delay product networks Slow-start too slow (Congwin = 1 initially) Slow-start Massive losses at the end of initial slow-start, resulting in low threshold value Massive Slow window increase (1 MSS per RTT) in congestion avoidance mode Slow Poor link utilization Poor TCP over Wireless The Problem: Wireless communication is characterized by Wireless Sporadic high bit-error rates Sporadic Intermittent connectivity due to handoffs Intermittent TCP always treats segment losses as an indication of network congestion TCP (which is not true in a wireless environment) and in response, reduces its Congwin size Unnecessary window reduction leads to throughput degradation ! Unnecessary TCP over Wireless (cont d) Proposed solutions: Link-layer enhancements: Link-layer Use link level retransmission or forward error correction at the link layer to hide Use non-congestion losses from TCP Still TCP may not be fully shielded from wireless losses, e.g. due to TCP Still timeouts End-to-end solutions: End-to-end Use SACK for efficiently recovering from multiple losses Use Use Explicit loss notification (ELN) to distinguish between congestion related and Use other type of losses Split-connections: Split-connections: Use separate specialized connection across the wireless hop Use TCP over Wireless (cont d) Assessments on Proposed solutions: Good performance is obtained with a combination of end-to-end and link-level Good solutions TCP aware link-level solutions have been shown to perform better than other TCP link-level solutions SACK have been found to be very effective as an end-to-end solution SACK Splitting connections is not a must for good performance Splitting See "A Comparison of Mechanisms for Improving TCP Performance over Wireless See Links.", by Hari Balakrishnan, Venkat Padmanabhan, Srinivasan Seshan, and Randy H. Katz, IEEE/ACM Transactions on Networking, December 1997 for details. (link available in the class webpage) Summary principles behind transport layer services: principles multiplexing/demultiplexing multiplexing/demultiplexing reliable data transfer (review only) reliable flow control (review only) congestion control, particularly its realization in TCP congestion instantiation and implementation in the Internet instantiation UDP UDP TCP TCP Variants of TCP Variants Problems with TCP and how can TCP deal with them: Problems High bandwidth-delay product networks High Wireless environment Wireless ...
View Full Document

This note was uploaded on 12/08/2010 for the course IEG IEG3310 taught by Professor Wingc.lau during the Spring '10 term at CUHK.

Ask a homework question - tutors are online