Monday, June 13, 2011

Enter: The Deep Web


Alright, to some people, this will be old news. To others, it will be the most intriguing event since the internet itself became mainstreamed. This is a little thing here in the States called the Deep Web. What is the Deep Web you ask? Well, if you think about the world wide web...you can probably guess. It's the majority of the internet that search engines never pop up for you. This is for various reasons: requires a registration to access materials (most databases), isn't linked to anything (no crawling there...), FTP instead of HTTP (Google no likey), and various other reasons. For a brief, but more fulfilling explanation, go here to the wiki article on it.

Ok and here's why these things exist: search engines like Google and Yahoo! Use webcrawlers, which are in a very basic explanation, an information robot that just follows things like links and hits to catalogue into their search fields, which is called crawling. Pretty nifty to find the most popular and accessible websites around, but unfortunately being popular is relatively rare, so for a quick number to explain how much of what's on the internet is hidden, there is 500 times more information hidden from typical search engines than they present. There is a supposed 176,000trillion terrabytes of information on the internet. How bout 'dem apples? Deep Web search engines however take on a less automatic approach: they use crawlers that catalogue word hits, topics, and accessibility within a finite range of topics (such as: science, health, computer technology, people...). So they are far more limited in topics, but they give a much better representation of what is available within a specific topic.

So now you are probably asking yourself, "Why do I honestly give a damn about another series of search engines?" Because this can affect you, positively, negatively, neutrally. You care because one of the benefits of a company becoming savvy with the Deep Web is you have total access to people's official state records, residency, phone numbers, histories, web accounts (browsing not hacking), any mention of you in past internet sites (remember Geocities...?). Potentially all of the internet is stored somewhere else in the internet. Which means all site histories can be accessed (in theory) by people who are good enough to get it (yet again, just purely for access, not hacking, hackers use other methods for access to information). But on the other hand, it means you can find friends and family, check up on your own security on the web, and the security of your loved ones. After all, the less that these databases come up with on you, the better you usually are from phishing, hacking, theft, identity problems...
Another great benefit to knowing these exist is in the academic and professional worlds: serious databases, such as those for job searching or research, require an account to access anything, and therefore Google and Yahoo! don't give a damn about it.

This blog post has some very good Deep Web Engines if you are curious and wish to explore.

This website, called pipl.com, is fun if you want to see what's on the web about you. Be forwarned, it is kind of spooky what can come up.
Anyways, I found this to be (b)very(/b) interesting because I want to be a Information Technology Specialist. So, there you have it. The web is deep, the web is all knowing, the web is mostly hidden.

No comments:

Post a Comment