03.11.2010., 17:26
|
#1
|
/
Datum registracije: Oct 2006
Lokacija: /
Postovi: 2,053
|
PHP & XPath
Želio bih "skinuti" ZET-ov vozni red, tj. linkove na vozne redove...stvar je da mi prozuji kroz sve linkove osim zadnjih 6, ako spremim stranicu kao .HTML i nju probam "skinuti" - sve uredno prolazi.
gdje je problem? možda u 6. linku odozdola koji sadrži "/(" u linku? to mi sad jedino pada na pamet, ali ne vidim to kao problem  uglavnom ispiše sve do /media/39180/182.pdf. nakon toga - ništa.
Link je - http://www.zet.hr/autobus/dnevni.aspx
Kod je
Code:
<?php
$target_url = "http://www.zet.hr/autobus/dnevni.aspx";
$userAgent = 'IE 6 – Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)';
// make the cURL request to $target_url
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 50);
$html= curl_exec($ch);
if (!$html) {
echo "<br />cURL error number:" .curl_errno($ch);
echo "<br />cURL error:" . curl_error($ch);
exit;
}
// parse the html into a DOMDocument
$dom = new DOMDocument();
@$dom->loadHTML($html);
// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body/div[@id='container']/div[@id='content']/div[@id='autobus']/ul/li//a");
echo "length: " . $hrefs->length;
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
echo "<br />" . ($i+1) . " | " . $url;
}
?>

Zadnje izmijenjeno od: svebee. 03.11.2010. u 17:32.
|
|
|