simple web data extractor in php -curl

Completado Publicado Mar 30, 2011 Pagado a la entrega
Completado Pagado a la entrega

We would like a php file that extract data from websites. I suppose with you should use curl, but im not expert.

Steps for php:

1- php must connect with a mysql server (remote) configured in a [login to view URL]

2- php must find 1st data unproccesed (lastchange=null) and block it ( to prevent being used by other php process)

3- php must do work descripted below

4- php must write mysqltable with results and unblock this data record.

Process for php:

1-Visit an url and extract from home page metatags = `Title`+`description`+`keywords` and `date_of_html`

2-Spider the first 10 links (only inbounds not external) found in home page to extract: emails + phones + fax

After order that info extracted , update mysql records, unblock used record and start a new url from table.

All outbound links and extras emails found in process will be added to mysql2.

To prevent eating resources after each process php should leave memory or something like this.

If no records are found , an alert should send by email to an administrator to add new records to the mysqltable.

--> Mysql1 for url is as attached in .sql <--

TABLE `url` (

`codigo` int(11) NOT NULL auto_increment

`email` varchar(50) default NULL,

`origendeldato` varchar(30) default NULL,

`url` varchar(50) NOT NULL,

`Title` varchar(250) default NULL,

`description` varchar(250) default NULL,

`keywords` varchar(250) default NULL,

`telefono` varchar(20) NOT NULL,

`fax` varchar(20) default NULL,

`pais` char(2) default NULL,

`empresa` varchar(50) default NULL,

`nombre` varchar(50) default NULL,

`rubro` varchar(20) default NULL,

`lastchange` timestamp NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP ,

PRIMARY KEY (`codigo`),

UNIQUE KEY `base` (`base`),

UNIQUE KEY `email` (`email`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

--> Mysq2 for extraemails <--

`email` varchar(50) default NULL,

`url` varchar(50) NOT NULL,

when i said " `Title`+`description`+`keywords` " <-- this is htmls metatags

Apache Programación en C++ Procesamiento de datos Linux PHP

Nº del proyecto: #1004781

Sobre el proyecto

4 propuestas Proyecto remoto Activo Apr 13, 2011

Adjudicado a:

SigmaVisual

We can help in your project, please check PMB and our ratings/reviews to get idea of our experience.

$100 USD en 5 días
(278 comentarios)
8.2

4 freelancers están ofertando un promedio de $103 por este trabajo

srinichal

I can deliver this asap

$180 USD en 4 días
(164 comentarios)
7.5
wildlily980

I can do this based on php

$70 USD en 2 días
(60 comentarios)
6.7