12 Things to Know About HTTP

November 03, 2021

This post all about HTTP and how the world wide web works.

1. What is HTTP?

HyperText Transfer Protocol (HTTP) is a network protocol that dictates how information is exchanged on the web. A network protocol refers to a system of rules and conventions that governs how two devices of a communication system (or network) exchange information. HTTP is a client-server protocol, meaning that it involves exchange of information between a client and a server.

2. What is a server?

A server is a computer or a system where the files and resources which make up a website, are actually stored. It connects to the internet and facilitates the exchange of data with other client devices connected to the web.

A server has a software component and a hardware component. The hardware component refers to the physical computer that stores the website’s software and files. The software component determines and controls how web users (i.e. clients) can use files hosted by the server. At a basic level, it refers to an HTTP server.

3. HTTP is State-Less

HTTP works by way of exchange of individual messages between clients and servers. However, each message is independent of each other: there is no link between successive requests or responses. Each message sent by a client should be self-contained and sufficient enough for the server to process it and send a response. This makes HTTP a state-less protocol.

4. What is a URL?

Uniform Resource Locator, or URL, is a web-address (often called a ‘link’) which points to a resource on the web. Plainly speaking, URLs are resource locations that a browser needs to know to retrieve a specific set of data. URLs are also the human access point to the web: by typing in the URL, the browser (or any other user agent) will initiate a set of steps to reach the resource location and retrieve information.

For example, the URL of this page is : https://otee.dev/2021/11/03/HTTP-explainer.html

Every URL must conform to a specific syntax. Accordingly, a URL typically consists of the following parts:

5. How does a Client Connect to a Server?

To connect to a web server and exchange messages, an HTTP client needs to establish a connection (called ‘TCP’ connection) with the server. To establish this connection, the client needs to know the location of the web server, namely, its IP address and TCP port number. This information can be fetched from the URL.

The domain name of a URL is a human-readable alias for the IP address. The user agent converts a textual domain name to an IP address through a facility called the Domain Name System (DNS). The DNS, is like a phonebook: it maintains a map of domain names and their corresponding IP addresses.

Once the IP address of the web server is found, the user agent of the client initiates the ‘Transfer Control Protocol’ with the web server and starts sending and receiving HTTP messages to the web server.

6. What are HTTP Messages?

Clients and servers communicate with each other, over the web, by exchanging individual messages. Messages sent by a client are called HTTP requests and the messages sent by server to the client are called HTTP responses.

Request messages request an action on the web server (e.g. requesting access to a web resource) and response messages carry results of the request back to the client.

An HTTP message comprises of the following parts: a start line line, headers, and a body.

7. Start line

This is the first line of any HTTP message.

For HTTP requests, the start line is called a ‘request line’, and contains the following components: a method, the path component of the URL(along with parameters, if any), and the HTTP version (eg. HTTP2). The first line of the following image, shows the request line for this post.

Request Message

Note that in the above example, GET is the name of the method, /2021/10/20/using-heroku-scheduler-to-initiate-triggers.html is the path component of the URL and HTTP/2 is the HTTP version followed by the server.

For HTTP responses, the start line is called ‘response line’, and it contains the HTTP version, and a status code.

Response Message

In the above example, HTTP/2 is the HTTP version and 200 is the status code.

8. Headers

HTTP headers are used to convey certain additional information about the HTTP message. HTTP headers are a list of name-value pairs. There are certain standard headers which are often sent along with request and response messages. But we can send our own headers along with a message as well.

Here are examples of a few common headers:

user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/100.1 (KHTML, like Gecko) Chrome/1.0.0.81 Safari/100.36
Content-Type: text/html; charset=UTF-8
Set-Cookie: yummy_cookie=choco
Set-Cookie: tasty_cookie=strawberry
cookie: yummy_cookie=choco; tasty_cookie=strawberry;

9. Body

HTTP message body (also called ‘entity body’) is an optional part of a HTTP message. It is the third and last part of an HTTP message and is separated from the headers with an empty line. The body of an HTTP message is the actual data that is transmitted during an HTTP transaction.

The body of an HTTP message can carry various types of data, including text, HTML documents, JSON, images etc. This makes the content headers (such as content-type) very crucial, as they describe the type and size of the content of the body.

10. Request Methods

Request methods, often called HTTP verbs, describe the nature of action that needs to be performed by the server on a resource. These are the four most common methods used in HTTP requests:

Apart from the above four headers, there are other kinds of headers as well, such as: HEAD, CONNECT, OPTIONS, TRACE etc.

12. Status Codes

Just as methods tell the server what to do, status codes tell the client what happened with the request. Status codes are three-digit numeric codes, ranging between 100 and 599. Status codes often are accompanied by a short reason phrase, which summarises the meaning of that code.

Each code has a definite meaning and is maintained by the Internet Assigned Numbers Authority. They are classified into five categories, on the basis of the first digit:

Two of the most common status codes are: 200 and 404. 200 stands for OK, implying that the HTTP request was successful. 404 stands for not found, implying that the requested resource could not be found.

Not all the codes of each category have a definite meaning ascribed by the protocol.  For example, the status code 580 does not have any specific meaning.

13. Making HTTP requests over CLI:

Apart from browsers, we can generate HTTP requests over the Command Line Interface (CLI) using, cURL, which stands for client URL. The simplest usage of this tool is to write the URL along with the keyword curl:

$ curl https://otee.dev/

The above command will return the body of the response message(in the above example, this would be an html file). curl assumes the request method to be GET, if we do not mention it. So, if we want to send a request with any other method, we should use the -X option followed by the name of that method:

$ curl -X POST http://localhost:4000/

Also, we can send specific headers (including customary ones) using the -H or -head option:

curl -H 'Content-Length: 13' http://localhost:4000/

curl supports over two hundred options, making it a versatile tool for generating API requests, checking for errors etc. Here are a few options that can come handy:

$ curl -o output.html https://otee.dev/
$ curl -X GET -I http://localhost:4000/
	$ curl -v http://localhost:4000/

The above request returns the following response:

Response Message

	$ curl https://otee.dev/ -v > /dev/null
$ curl https://oitee.github.io/ -L  
$ curl -X POST -d name=Otee -d likes=coding  http://localhost:4000/

Why name-value pairs? The default content-type for POST requests is application/x-www-form-urlencoded. See the last request header of the example below:

Default content-type header

So, in case, we want to send any other type of data, we should explicitly modify the content-type header accordingly:

Text/html content-type header