Getting Started: How the Interwebs Work

Getting Started: How the Interwebs Work

There are a ton of places on the web to learn HTML, so I kind of wonder why I am writing this in the first place. I think that it was important in my quest to learn how to program that I was looking ahead at what I would be able to do, because the stuff I made at first was less than impressive.

This is a guide to hopefully help in your learning how to program for the web, split into parts. I am going to discuss how the web works, how to make a simple page and understand what is happening when it’s requested. Later, we will learn how to style the page (CSS), how to manipulate the page (using Javascript) and how to do some server-side programming (PHP and MySQL).

I’ll keep it as short as possible

Here is something you will likely hear: the web is a client-server model. “What is a client?”, you may ask. “What is a server?”, you may follow up. Well, they are computers, not much different from the one you are sitting at. You, looking at the page through your browser, are the client; my computer (the server) is running some free software called Apache, allowing me to share this content with you.

Buildings have addresses, so do computers. Your computer’s internet address is called an IP (Internet Protocol) address and it uniquely identifies your computer online. It is in the format 123.123.123.123 (where the number 123 could be any number between 0 and 255). With your IP address, I can see if your computer is online (as long as it’s not specially configured to hide) using a program called ping. If you were running Apache, like I am, I could see files that you were serving using your IP address. Google’s IP address is 74.125.67.100, try going to http://74.125.67.100 . That is exactly the same as going to http://google.com , because that is one of their server’s IP address. Domain names (like google.com, whitehouse.gov or unitedway.org) are a system setup to simplify the requests that clients have to make. It would be a pain to try and remember 74.125.67.100, rather than google.com; I think of it like saying: “we’re located right next to the huge statue you can see from miles away” rather than 241123 West 53rd Street or 63.23211 latitude by -71.343219 longitude. There are several (but not quite as many as you would think) DNS (Domain Name Servers) Servers. By the way, I realize DNS Server is redundant. Regardless, these servers purpose is to map your requests to domain names to the corresponding IP address, kind of like phone operators did in the days of yore. You say “google.com”, it says “74.125.67.100″. Have I made that abundantly clear?

Requesting files

When the server receives a request, it looks for the requested file and transmits the content of that file to your computer. Many requests are implied; for instance, http://www.google.com/ is not a file, it is just an address. The file you are actually viewing is http://www.google.com/index.html (or ). There is nothing magical about this. It’s important to realize that this is really not much different than looking at “My Documents\synergy.docx”, it is just a regular old file that is in a folder. Regarding index.html: server software, like Apache, is setup to show specific files if none are requested, if a non-existent file is requested or if a file you are not authorized to view is requested. There is no doubt that you have been served a 404 (resource not found) page before, and maybe even a 403 (access denied), let’s not forget the dreaded 500 (internal server error).

The HTTP Protocol

HTTP (Hyper Text Transfer Protocol) is the protocol that web browsers and web servers use to communicate with each other over the Internet. It is used by web browsers and web servers to talk to one another. Clients (web browsers) send requests to web servers for web elements such as web pages and images. After the request is processed by a server, the connection between client and server across the Internet is disconnected. A new connection must be made for each request. Most protocols are connection oriented, meaning that the two computers communicating with each other keep the connection open over the Internet.

Summary of a Request

When you type a URL into a web browser, this is what happens:

  1. Type a URL in (domain or IP address), domain names get checked against DNS
  2. The web browser connects to the server and sends an HTTP request for a file.
  3. The server receives the request and checks for the requested page. If the page exists, the server sends it. If the server cannot find the requested page, it will send an HTTP 404 error message.
  4. The web browser receives the page back and the connection is closed.
  5. The browser then parses through the page and looks for other page elements it needs to complete the web page. These usually include images, javascript files, css files, flash, etc.
  6. For each element needed, the browser makes additional connections and HTTP requests to the server for each element.
  7. When the browser has finished loading all images, flash, etc. the page will be completely loaded in the browser window.

The web browser then renders HTML markup into fancy looking webpages (like this one). In summary, a lot of things are going on behind the scenes of the web; though it seems quite complex, it is really quite simple to understand. We will cover HTML in our next article.

Thanks for reading!


About the Author

Rob McVey

I am a software developer/IT professional helping businesses save money through informed purchase consulting; website development and marketing; and process automation.