by mike171562 » 08/28/2007 4:43 am
I am working on a script to build up my web directory categories, My goal is to have the script spider websites and add them to different categories in my directory. I was wondering if there was any information I could look at regarding this, how to use a script to classify websites based on their content, i.e. 'shopping', 'travel','porn' etc. The script can easily parse the text from webpages and compare them to a list of keywords, but that is unreliable. I was wondering how some of the sites like google and dmoz.org build up their directories and keep them current.