Building up directories with a script

Get general PHP programming help here.

Building up directories with a script

Postby mike171562 » 08/28/2007 4:43 am

I am working on a script to build up my web directory categories, My goal is to have the script spider websites and add them to different categories in my directory. I was wondering if there was any information I could look at regarding this, how to use a script to classify websites based on their content, i.e. 'shopping', 'travel','porn' etc. The script can easily parse the text from webpages and compare them to a list of keywords, but that is unreliable. I was wondering how some of the sites like google and dmoz.org build up their directories and keep them current.
mike171562
 
Posts: 1
Joined: 08/20/2007 6:39 am


Postby xprt007 » 08/30/2007 5:20 am

Hi

I am certainly no expert here, but I am sure DMOZ has ALL submissions & updates checked manually before they are entered into the directory. I think Google also partly uses that DMOZ data for their directory section.
I stand to be corrected, though.
xprt007
 
Posts: 92
Joined: 02/12/2007 4:39 am

Postby lee_m » 12/06/2007 11:48 am

I would hazard a guess at the verification method would be tested via keyword density, presumably a site thats advertising dog-food should have predominantly dog related words within its meta title & description with keyword density of these words within the pages body text at around 1 & 3%.

Using this kind of analogy I'm sure you could knock up a basic script that runs on auto that detects incorrect classification.

If you have no php knowledge why not check out source-forge thee may be a freeby about.

Lee
lee_m
 
Posts: 8
Joined: 12/02/2007 12:06 pm


Return to PHP Help

Who is online

Users browsing this forum: No registered users and 0 guests

cron