位置 : 首页 > 经验分享 > PHP教程 > php 采集curl_init抓取网页内容

php 采集curl_init抓取网页内容

时间:2018-05-22   收藏
headers从哪里来的呢,php 采集curl_init抓取网页内容,file_get_contents,curl采集不到怎么办?

file_get_contents,curl采集不到怎么办?

php 采集curl_init抓取网页内容


 function getCurl($url){
        $headers = array(
	'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
	'Accept-Encoding:gzip, deflate',
	'Accept-Language:zh-CN,zh;q=0.9',
	'Cache-Control:max-age=0',
	'Connection:keep-alive',
	'Cookie:BAIDUID=762D167F1A19D476B5F8B1E7E87B788A:FG=1; BIDUPSID=762D167F1A19D476B5F8B1E7E87B788A; PSTM=1510715646; BDUSS=3RwVlc0UVJoa3hGc3lBLU02bko4T3pHYUo3NFpYTVdDNHJRLS1XZkNSOGxNMjFhQUFBQUFBJCQAAAAAAAAAAAEAAAB8exUWeXViaW5fa2V0eTEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACWmRVolpkVaT; MCITY=-340%3A119%3A; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; PSINO=7; BDRCVFR[feWj1Vr5u3D]=I67x6TjHwwYf0; pgv_pvi=9686848512; pgv_si=s9426284544; H_PS_PSSID=1435_21114_20692_26350_20930',
	'Host:sp0.baidu.com',
	'Upgrade-Insecure-Requests:1',
	'User-Agent:Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36'
        );
        $ch = curl_init();

        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_HEADER, 0);
        curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); // https请求 不验证证书和hosts
        curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, FALSE);
        //执行并获取HTML文档内容
        $result = curl_exec($ch);
        //释放curl句柄
        curl_close($ch);
        $data= @iconv("GBK", "UTF-8//IGNORE", $result);
        return json_decode($data,true);
    }
    
    
    
			$url='https://sp0.baidu.com/';
            $str=$this->getCurl($url);

    headers从哪里来的呢?


1,google 浏览器打开网址

2,按f12键

3,点network选项

如图

1.png

keywords: php 采集 curl_init 抓取网页内容


    浏览排行榜
    最新文字信息
返回顶部 关注新浪微博 关注腾讯微博