Basics on URL category based web filtering

There are two common ways of performing web filtering when browsing the Internet. One is granular down to the level of a single web page, while the other is pretty rough as a filtering tool since it can only filter on a server level. We will in this article try to explain why this different is so important to be aware of.When you visit a web page you use a web browser that connects from your local computer to a server computer located on the Internet. The server may be located in a persons home, in a company’s server room or in a huge data center run by any of the large IT companies that we have learned to appreciate to a higher or lower degree over the years.

In technical terms you perform a request of an HTML page (HyperText Markup Language) in your web browser, which is sent over the Internet to the server holding the page, coded according to one of the standards HTTP (Hyper Text Transport Protocol) or HTTPS (encrypted HTTP).

When you perform web filtering you either use an URL based filtering (Uniform Resource Locator; a common web address) or an IP based filtering (Internet Protocol; the standard for communication over Internet). A URL is a web address like for instance http://www.cronlab.com/news/latest.html, and an IP number is a grouping of numbers, for instance 192.168.1.78. The IP number is a unique identifier of a server, which is used to uniquely identify and locate a server on the Internet.

Several large service providers offer their customers the IP based approach, which involves maintaining categories for an IP adress together with a DNS lookup (Domain Name Server; a server that translates between an IP address and a domain name). The domain name is for instance cronlab.com in the above example. The server www.cronlab.com has got an IP number similar to the IP address given above as an example. This information is used to decide what is to be allowed or not.

CronLab instead use URL based web filtering, which provides unprecedented granularity compared to the IP based web filtering. IP based categorisation provide low granularity in the task of providing a web filter service. The problem lies in the method of using the IP address as base for the decision on what it is that is being requested through a web browser. The reason for this difference is mainly due to the fact that there might be many different categories of web pages on a web server, making it perhaps impossible to set a single category on the server that the different web pages are deployed on.

IP based web filtering is server filtering, not web page filtering. A common number for how many IP numbers a capable IP based web filtering company hold in their database is around 400 – 500 million. It sounds a lot and it is. But comparing this to the fact that there in the moment of writing this article are more than 1 billion web sites connected to the Internet, there is a lot of servers not being categorised by these service providers (ref: http://www.internetlivestats.com/total-number-of-websites/).

At CronLab we have got more than 18 billion URL’s categorized in our database. This is due to the fact that we in our database categorise web pages and not web servers. One web server holds numerous web pages as you surely already know. and one web server can host several different types of web sites, which makes the IP based web filtering technology pretty basic in its capabilities.

As an example an HTTP request for latest sports news at CNN could look like http://www.cnn.com/news/sports/latest.html. A search for latest political news on the same web site, and thereby the same server, could look like http://www.cnn.com/news/politics/latest.html.

When using the IP based method for web page filtering it would only be able to define a common category based upon the top level domain and IP number. In the above example the defining information is www.cnn.com. Most likely the category received as a result would be news.

For a company that would like to allow political news, but block sports news, this would end up in disappointment. They can either decide to allow or block news, and most likely they would allow news and thereby also sports news. As a concluding comment; the IP based web filtering can never set multiple categories for content on web servers using the same IP, which makes it a huge problem due to the commonality of using virtual domains hosted on one physical or virtual server.

In the URL based case, the full URL is used for defining the category of the web page requested. Here it is easy to set different policies for sports news and political news. The categorisation in this case would most probably be news as the main category and sports news and political news as two sub-categories. In the administrative portal for our services you would set rules for allowing political news but blocking sports news.

Our database is updated on a daily basis and the category of the first 1 million top visited web sites of Alexa (http://www.alexa.com/) are always revised and up to date. This makes the URL based web filtering method the ultimate choice of method. We have more than 140 different categories   as main and sub-categories to set rules for. They are furthermore translated into more than 180 languages, so anyone can understand what to block and not.

If you have a need of integrating web filtering into your products or want to have a custom made implementation of our web filtering solutions, don’t hesitate to call us. You find all of our details below.

Share this post!