Here is a function to read the concerts for a myspace band page. This code retrieves the “shows page” for a specified myspace username, and than parse the html to find and decode data.
Since myspace returns a page in Italian (this probably depends on geographic ip translations) the fnction uses a months array in italian. Probably you should change this, or you can try to make it better by adding some header to curl to specify the language of the page (I think it’s possible).
You can watch a DEMO here.
function myspaceConcerts($user) { $ch = curl_init("http://www.myspace.com/".$user."/shows"); curl_setopt($ch, CURLOPT_HTTPGET, TRUE); curl_setopt($ch, CURLOPT_POST, FALSE); curl_setopt($ch, CURLOPT_HEADER, false); curl_setopt($ch, CURLOPT_NOBODY, FALSE); curl_setopt($ch, CURLOPT_VERBOSE, FALSE); curl_setopt($ch, CURLOPT_REFERER, ""); curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); curl_setopt($ch, CURLOPT_MAXREDIRS, 4); curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; he; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8"); $page = curl_exec($ch); // look for band name preg_match_all("#<a class=\"userLink\" href=\"/".$user."\">(.*)</a>#Us", $page, $a); $band = trim(strip_tags($a[1][0])); // // months array is in italian because from my web server pages come in italian // probably you have to change this array to match myspace response $months = array("gen"=>"01","feb"=>"02","mar"=>"03","apr"=>"04","mag"=>"05","giu"=>"06","lug"=>"07","ago"=>"08","set"=>"09","ott"=>"10","nov"=>"11","dic"=>"12"); $out = array(); $c=0; // concerts counter $li = preg_split("/<li class=\"moduleItem event( odd| even)?( first| last)? vevent\" ?>/i",$page); for($i=0;$i<count($li);$i++) { if(stristr($li[$i],"<div class=\"entryDate\">")) { // find date preg_match_all("#<span class=\"month\">(.*)</span>#Us", $li[$i], $temp); $month = $months[strip_tags(trim($temp[1][0]))]; preg_match_all("#<span class=\"day\">(.*)</span>#Us", $li[$i], $temp); $day = str_pad( strip_tags(trim($temp[1][0])), 2, "0", STR_PAD_LEFT); $year = date("Y"); $data = $year."-".$month."-".$day; if($data<date("Y-m-d")) { $data = (date("Y")+1)."-".$month."-".$day; } // find venue preg_match_all("#<h4>(.*)</h4>#Us", $li[$i], $temp); $posto = strip_tags(trim($temp[1][0])); preg_match_all("#<span class=\"locality\">(.*)</span>#Us", $li[$i], $temp); // find city $citta = strip_tags(trim($temp[1][0])); preg_match_all("#<span class=\"region\">(.*)</span>#Us", $li[$i], $temp); // find region $region = strip_tags(trim($temp[1][0])); preg_match_all("#<span class=\"country-name\">(.*)</span>#Us", $li[$i], $temp); // find country $stato = strip_tags(trim($temp[1][0])); // build output array $out[$c]["band"] = $band; $out[$c]["date"] = $data; //$out[$c]["time"] = ""; not parsed $out[$c]["venue"] = $posto; //$out[$c]["url"] = ""; not parsed $out[$c]["where"] = $citta.",".$region.",".$stato; $c++; } } return $out; }
This function is included in the Mini Bot Class with many other small spiders.
Hi,
For date and time.. Search that :
That’s better =)
grande, mi serviva proprio una funzione di questo genere, io avevo già fatto qualcosa con simplephpdom, non molto diverso, il peccato è che il codice dipende dalla struttura della pagina