Kicking Around Packets: Understanding How the Internet Works
In this post, I try to answer the question “What is the internet and how does it really work?”
Before going into further details, here is a useful list of some of the key terms used in this article.
- End-system or host: This is a device (e.g., a computer) that actually wants to use the internet. It is the ultimate consumer of the communication services offered by the internet. It typically executes a network application on behalf of the user and uses the internet to that end. This article uses the terms ‘host’ and ‘end-system’ interchangeably.
- Packet switches or routers: Different networks are interconnected to each other through a series of packet-switches. They are mainly responsible for forwarding packets of information from one node to another. Packet switches are of two kinds: link-layer switches and routers (discussed below). For the most part, this article uses the term ‘routers’ to refer to packet-switches in general.
- Communication links: These are the physical media that connect routers to one another. Packets travel from one router to the next via a communication link.
Understanding the Internet
The internet is a network of networks.
Each end-system accesses the internet through an Internet Service Provider (ISP). An ISP is typically a telecom company providing internet service to end-users.
The starting point of the internet is an access ISP. The access ISP is the network which connects a host with the rest of the internet.
Each ISP has its own network. The overarching goal of the internet is to enable any two end-systems to communicate with each other, irrespective of where they are situated. Thus, it is not practically feasible for one ISP to provide a direct line of communication between every pair of end-points. Instead, networks have inter-connections with each other. In other words, each network interacts with one or more networks to deliver a message. Here’s how Wikipedia defines ‘inter-networking’
“Internetworking is the practice of interconnecting multiple computer networks, such that any pair of hosts in the connected networks can exchange messages irrespective of their hardware-level networking technology. The resulting system of interconnected networks are called an internetwork, or simply an internet.”
These interconnections can be direct, i.e., between two networks directly. This kind of interconnection is called peering, where two ISPs, at the same level of hierarchy, choose to mutually allow each other access to their respective networks. Most often, though, interconnections take place indirectly, i.e., to reach your destination end-system’s access ISP, you will have to go via one or more intermediary networks (which are typically larger networks with larger coverage). These kinds of indirect interconnections are called transit. Thus, typically, to pass on a message from one end-user to another the “the traffic will often be transferred through several indirect interconnections to reach the end-user” (source)
Source: Computer Programming: A Top-Down Approach
How the Internet Works: Kicking Around Packets
Largely, this is how the internet works: when a piece of information needs to be sent from one end-system to another, the information is divided into several segments, called packets. These packets are then sent to the destination end-system through the internet, which comprises several communication links and packet switches.
Communication links are the physical media through which packets are transmitted. Examples include optical fibers and twisted copper wires.
Each packet switch has multiple communication links attached to it. Packet switches take each packet it receives on any one of its attached communication links and forwards them to another one of its attached communication links. There are two main types of packet switches: routers and link-layer switches. While both forward packets towards the destination host, link-layer switches are used for connecting devices within a network. They are mostly used at the access network (for connecting a device to an access ISP). On the other hand, routers are used in the network core, i.e., for sending packets across networks. As both link-layers and routers do the fundamental task of forwarding packets towards the destination end-point, the rest of this article will refer to packet-switches as routers.
How does a router know which communication link to forward a given packet to? Every message transmitted by a source host will contain the IP address of the destination host. An IP address is hierarchical in nature and by looking at a specific part of the packet’s IP address and consulting its own forwarding table (which maps IP addresses to outbound communication links), it determines the correct communication link to forward the packet to.
On the destination end-system, the original message is re-composed by re-assembling all the packets. This is similar to how cargo gets delivered over roads. First, you divide the delivery goods into small packets and load them onto trucks. Then each of these trucks travel on high-ways (communication links) and at each intersection (packet switches) they are redirected to the next highway, which will eventually take them to their destination.
To ensure consistency, each actor involved in the exchange of information over the internet, are governed by a set of protocols. Two of the most fundamental protocols are the Internet Protocol (IP) and Transfer Control Protocol (TCP).
Here’s a brief description of how a bit gets “kicked around” between two end-systems:
“Consider a bit traveling from one end system, through a series of links and routers, to another end system. This poor bit gets kicked around and transmitted many, many times! The source end system first transmits the bit, and shortly thereafter the first router in the series receives the bit; the first router then transmits the bit, and shortly thereafter the second router receives the bit; and so on. Thus our bit, when traveling from source to destination, passes through a series of transmitter-receiver pairs” (Source)
A layered approach to Understanding the Internet
Given the complexity and the numerous moving parts involved in the working of the internet, it is more elegant to organise the many happenings between two hosts, as a series of top-down layers of protocols.
“To provide structure to the design of network protocols, network designers organize protocols—and the network hardware and software that implement the protocols— in layers.”(Source)
To understand this layered architecture, it is important to note that at each layer, the protocol provides some service to the layer above, by performing certain actions and by utilising the services provided by the layer immediately below it
A layered architecture provides the advantage of simplifying the architecture of the internet by breaking it down into specific, well-defined parts, each of which performs a specific task by depending on the service provided by the layer beneath it. More importantly, it adds modularity to the entire system, thereby allowing one to modify the implementation of the service provided by a specific layer, without affecting the rest of the system (so long as the modified implementation provides the same service and depends on the same layer to provide that service).
A protocol layer can be composed solely of software (application and transport layer), hardware (physical layer and data link layer) or a mixture of both (network link)
When taken together, the protocols of the various layers are called the protocol stack. According to the the TCP/IP architecture, the Internet protocol stack consists of five layers: the application layer, the transport layer, the network layer, the link layer and the physical layer.
Here’s a brief summary of what each of the protocol layers do:
- Application layer: This layer provides services to the end-user, by executing an application that uses the internet. It does this by sending and receiving a series of messages from and to another host(s). It hides the complexities of how the network is laid out.
- Transport layer: This layer provides services to the application layer, by ensuring that the application layer messages are transported to and from the correct destination. It hides how the underlying network is managed (e.g., how packets of information is actually delivered to the correct destination).
- Network layer: This layer provides services to the transport layer by finding a “best effort” path via multiple routers, to deliver packets of information from the source host to the destination host. It hides how each router forwards packets to the successive router.
- Link layer: It provides services to the network layer by actually determining how to deliver packets of information between two given routers.
- Physical layer: It provides the service of physically transmitting “bits” of information to the next router, in a way that the receiver can understand and reconstruct the same.
Here’s an Analogy involving a King and a Queen
Before moving forward and delving into the details of each protocol layer, it would be convenient to take an analogy to better understand the services provided by each layer. Let’s go back to a time before the advent of modern communication technologies: a world where one relied on postal services to send messages over a large distance. Now let’s imagine the King of a distant kingdom—presently at the battlefield—wishes to send a letter to his beloved Queen—presently residing at the Royal Palace. Let’s follow how the letter gets delivered to the Queen, from the prism of the five network protocol layers.
- Application Layer: The King at the battlefield dictates the letter to his personal assistant, who writes its out and sends it to the Royal Secretary to deliver the letter he has written.
- Here, the King represents the user, the battlefield is the end-system, the King’s personal assistant represents the application layer and the Royal Secretary represents the transport layer.
- Transport Layer: The Royal Secretary puts the Royal Seal, adds the address of the Royal Palace and hands it over to the local post office.
- Here, the local post office represents the first router in the network layer and the Royal Palace is the destination end-system.
- Network Layer: The postal officer realizes that the destination is far-off and that there is no direct route to the Royal Palace. But, looking at the address, the postal officer understands that the Royal Palace is due East. So, they deliver the letter to the next postal caravan moving East.
- Link Layer: The caravan carries a load of postal traffic from this postal office to the next postal office in the East. The caravan leader draws out the route to the next postal office in the eastward direction.
- Here, the caravan leader is the link layer, connecting the first postal office (a router) to the next postal office (another router).
- Physical Layer: The caravan leader employs their trusted nephew to carry the King’s letter. The nephew, along with their camel-driven carriage, carries the King’s letter, following the route drawn by the caravan leader.
- The nephew and their carriage, is the physical layer, physically carrying the postal traffic from one postal office (router) to the next.
- Physical Layer: Once the carriage reaches the next postal office, the nephew off-loads its load from their carriage.
- Link Layer: The guard at the next postal office unpacks the postal load and hands over all its contents to the postal officer of that postal office.
- The guard is the link layer, passing on the load received from the carriage driven by the nephew (physical layer). The postal officer is the second router.
- Network Layer: The postal officer at the next postal office, receives the postal load delivered by the caravan leader and examines it. They find the King’s letter, and realize that the Royal Post Office is in the next town (it is only natural that the Royal Family has a post office of their own!). However, they have to find a way to physically transport the letter to the Royal Post Office. Luckily, they have a local pigeon sender who regularly delivers mail locally. The postal officer delivers the King’s letter to the local pigeon sender.
- Here, the local pigeon sender is the link layer and the pigeon is the physical layer.
- Link Layer: The pigeon sender knows the way to the Royal Post Office. They instruct the pigeon to fly to the Royal Post Office.
- Physical Layer: The pigeon flies away towards the Royal Post Office.
- Physical Layer: The pigeon successfully lands at the entrance of the Royal Pigeon Receiver at the Royal Post Office.
- Link Layer: The Royal Pigeon Receiver at the Royal Post Office receives the pigeon’s parcel, cleans it up and hands it over to the postal officer at the Royal Post Office.
- Here, the Royal Guard represents the link layer, which hands over the information received from the pigeon (physical layer) to the postal office (network layer).
- Network Layer: The postal officer at the Royal Post Office accepts the mail from the Royal Pigeon Receiver. They pass on the King’s letter to the Queen’s Secretary.
- The postal officer represents the network layer, while the Queen’s Secretary is the transport layer.
- Transport Layer: The Queen’s Secretary receives the King’s letter from the Royal Post Office, verifies the Royal Seal, ensures that it is complete and hands it over to the Queen’s personal assistant.
- Here, the Queen’s personal assistant represents the application layer, accepting the message delivered by the Queen’s Secretary (transport layer)
- Application Layer: The Queen’s personal assistant knock’s on the Queen’s room and reads out loud what the King had to say. The Queen is delighted to hear from the King!
Now, let’s try to understand how each of these layers actually work.
First, it is important to understand what a ‘network application’ is. It is a program that runs on one host and communicates with another host, using the internet. In fact, programs running on different hosts, communicating over the internet are called processes:
“In the jargon of operating systems, it is not actually programs but processes that communicate. A process can be thought of as a program that is running within an end system” (Source)
Processes on different hosts communicate with each other via messages. Typically, a network application consists of two different processes running on two different hosts, that send messages to each other (eg., server and client process on the web).
Now each process sends and receives messages over the internet by using a programming interface called the socket. Sockets act like doors- when the network application needs to send a message across the internet, it simply delivers it across the door, with the expectation that the rest will be taken care of by the network. Sockets in an end-system are identified by port numbers
In fact, sockets are the interface between the application layer and the transport layer.
Now, coming back to the application layer itself, it is defined as the “communications protocols and interface methods used in process-to-process communications across an Internet Protocol (IP) computer network”(source). Mainly, the application layer protocol standardized the manner in which data should be exchanged between two hosts. It does not deal with how this data will, in fact, be transferred.
Examples of application layer protocol include: HTTP(Hyper Text Transfer Protocol), FTP (File Transfer Protocol), SMTP (Simple Mail Transfer Protocol).
The transport layer transports the application-layer messages (received over the socket) between two application end-points (or hosts). The main objective of this layer is to provide logical communication between two application processes. This means that this layer will abstract out the details of the internet before the application layer, giving the impression to the application that it is directly connected to the other end-system’s application.
It is important to note that the transport layer protocols are implemented in the end-systems (or hosts). They are not implemented by intermediate routers or communication links.
In the internet, there are two transport layer protocols both of which can transport application layer messages:
- Transmission Control Protocol (TCP): TCP provides reliable data transfer. It guarantees delivery of messages and manages flow control. The latter ensures that messages are delivered in the correct order. It also provides congestion control, by preventing over-congestion of traffic on routers and links between two hosts. This is done by dividing messages into smaller segments, and reducing the transmission rate when the network is congested.
- User Datagram Protocol (UDP): UDP provides a connectionless service. It provides no guarantees towards reliability, flow control, or congestion control.
The transport layer protocols convert each application layer message into several transport layer packets called segments each of which is then sent across the network layer, via the routers and links.
The network layer receives a transport layer segment and a destination address from the transport layer protocol of the source host. The network layer provides the service of delivering this segment to the correct destination host’s transport layer.
The Internet Protocol (IP) is the only network layer protocol of the internet. The IP implements a best effort service, meaning that while it makes a “best effort” to deliver segments between hosts, it does not make any guarantees regarding their delivery, i.e., the IP cannot guarantee the eventual delivery of a packet. Neither can it assure that the packets will be delivered in an orderly fashion nor that there will not be any tampering of the packets in the process of transmission.
The network layer protocol converts the transport layer segments into network layer segments called “datagrams”. Each datagram contains the address of the destination host. Once this is done, the source host pops each datagram into the network, i.e., it sends the datagram to its nearby router. For this datagram to reach the ultimate destination host, it needs to traverse through several routers. Each router forwards a datagram that it receives in any of its input links to the next appropriate router. The determination of which router to forward the datagram to, is done by looking at the destination address and its own forwarding table. Thus, to send one packet from one host to another, each router has to make a local decision that is compatible with its next router. In this way, we get a global convergence on packet delivery using a series of local decisions at the router-level.
Note that routing, i.e., the route or path that a packet will traverse within the network layer, is implemented by each router and is not determined by the end-systems. This is very convenient, as it insulates the host application from the changes in the internet’s routing architecture. (RFC1122) This is a direct consequence of having a modular protocol stack.
Also, as another corollary of the modular architecture of the protocol stack, routers remain stateless. They do not maintain any state of the end-to-end flow of information. Each router inspects the IP address of the destination of a packet, and determines the next router to forward it to. This enables the effective utilization of redundant paths: two packets headed for the same destination, may take two different routes.(RFC1122)
The network layer provides the service of transporting packets between two end-systems, via a series of routers. The link layer protocols provide the service of actually transporting packets from one node (a host, or a router) to another node (a router or a host). This is done by establishing connectivity with the next router and breaking up network layer datagrams into “frames” to transmit them to the next router. In other words, link layer protocols provide the service of converting datagrams into frames and transporting them across each individual link connecting two specific nodes. At each node, the network layer (after determining the next forwarding router/host) passes the Network Layer datagram to the respective link layer protocol, which sends it across to the next node. At the next node, the link layer protocol hands over the packet to the network layer protocol of that node.
Examples of link layer protocols include WiFi, Ethernet and Point-to-Point protocol.
The physical layer provides the service of transporting each individual bit of a given frame from one node to the next. The protocol of this layer will depend on the actual link connecting the two nodes. In other words, physical layer protocols determine how to encode digital information and transmit them across a communication link.
As shown above, the network layer does not make any guarantees regarding delivery of messages. It is the job of the end-systems (transfer layer and application layer) to ensure integrity, encryption and orderly delivery of information. Thus, as lower layers of the protocol stack do not implement complex functionalities, the internet is designed as a “dumb” network that relies on “smart” end-systems to carry out complex tasks. To put it in another way, the lower level actors (routers) perform simple tasks efficiently, while higher level actors are expected to use the services of the lower level actors to perform complex tasks. By pushing the complexity to the edges, the internet can be scaled to be compatible with any kind of network or device.