How to download source code of a Google search results page

@codewitchJul 19.2021

Hello. There is this php code that saves the source of page online to a local harddive. It works. Code one:

[CODE] <!DOCTYPE html> <html> <body>

<!– this program saves source code of a website to an external file –>

<?php ini_set(‘display_errors’, true); error_reporting(E_ALL);

$url = ‘https://www.php.net’; $ch = curl_init($url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $html = curl_exec($ch); if(empty($html)) { echo “<pre>cURL request failed:n”.curl_error($ch).”</pre>”; } else { //echo “<pre>”.htmlspecialchars($html).”</pre>”; $myfile = fopen(“file.txt”, “w”) or die(“Unable to open file!”); fwrite($myfile, $html); fclose($myfile); } ?>

</body> </html> [/CODE]

If I replace the link to a search results of Google, namely https://www.google.com/search?q=blue+car I get this code, code two:

[CODE]<!DOCTYPE html> <html> <body>

<!– this program saves source code of a website to an external file –>

<?php ini_set(‘display_errors’, true); error_reporting(E_ALL);

$url = ‘https://www.google.com/search?q=blue+car’; $ch = curl_init($url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $html = curl_exec($ch); if(empty($html)) { echo “<pre>cURL request failed:n”.curl_error($ch).”</pre>”; } else { //echo “<pre>”.htmlspecialchars($html).”</pre>”; $myfile = fopen(“file.txt”, “w”) or die(“Unable to open file!”); fwrite($myfile, $html); fclose($myfile); } ?>

</body> </html> [/CODE]

But then Google **blocks it** and what I get on local harddrive is this:

[CODE] <HTML><HEAD><meta http-equiv=”content-type” content=”text/html;charset=utf-8″> <TITLE>302 Moved</TITLE></HEAD><BODY> <H1>302 Moved</H1> The document has moved <A HREF=”https://consent.google.com/ml?continue=https://www.google.com/search%3Fq%3Dblue%2Bcar&gl=CZ&m=0&pc=srp&hl=cs&src=1″>here</A>. </BODY></HTML> [/CODE]

How can I modify the code two so that I get the source code of the actual Google result page saved to my local harddrive? Thank you.

EDIT: if I modify the link to https://consent.google.com/ml?continue=https://www.google.com/search%3Fq%3Dblue%2Bcar&gl=CZ&m=0&pc=srp&hl=cs&src=1 then Google returns this:

[CODE] <html lang=en><meta charset=utf-8><meta name=viewport content=”initial-scale=1, minimum-scale=1, width=device-width”><title>Error 400 (Bad Request)!!1</title><style nonce=”MRQAkCi9o4FBmjWGoQsyhA”>*{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{color:#222;text-align:unset;margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px;}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}pre{white-space:pre-wrap;}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}</style><div id=”af-error-container”><a href=//www.google.com><span id=logo aria-label=Google></span></a><p><b>400.</b> <ins>That’s an error.</ins><p>The server cannot process the request because it is malformed. It should not be retried. <ins>That’s all we know.</ins></div> [/CODE]

Which is not what I want either.

to post a comment

PHP

5 Comments(s) _↴

@daveyerwinJul 19.2021 — #Yes, your code works as expected !

Simple explanation ...

the code between <? and ?> is server side code that executes in the server

and returns the computed source code

@NogDogJul 19.2021 — #You may need to add additional cURL settings so that headers are sent that make it look like a "normal" client-side request. It probably wants a valid "user agent" header, and possibly some others. See https://developer.mozilla.org/en-US/docs/Glossary/Request_header for a list of possibilities, and you might want to google a bit to see if anyone has a list of suggested headers to use for them.

@codewitchauthorJul 19.2021 — #Hello NogDog. Thank you for your post. I have it. Here is the code:

<i>
 </i>&lt;!DOCTYPE html&gt;
 &lt;html&gt;
 &lt;body&gt;
 
 &lt;!-- this program saves source code of a website to an external file --&gt;
 
 
 
 &lt;?php
 
 $ch = curl_init();
 curl_setopt($ch, CURLOPT_URL, 'https://www.google.com/search?q=blue+car');
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
 curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0');
 $html = curl_exec($ch);
 
 if(empty($html)) {
 echo "&lt;pre&gt;cURL request failed:n".curl_error($ch)."&lt;/pre&gt;";
 } else {
 $myfile = fopen("file.txt", "w") or die("Unable to open file!");
 fwrite($myfile, $html);
 fclose($myfile);
 }
 ?&gt;
 
 &lt;/body&gt;
 &lt;/html&gt;

Then I found this website: http://useragentstring.com/index.php that enables to define the details of the header that has to be written into the code in this post (in my code it starts with the word Mozilla..)

It works!

@jeymeleeJul 20.2021 — #@DaveyErwin#1634368 111

@developer_webJul 23.2021 — #@codewitch#1634381

Thank God!

Good you opened this thread. The codes in it would become handy for my php bot. Even though I can build you an .exe bot in 2 mins to download html source code of the current page the bot is on, it does mean, the bot has to run on your home pc. better to have it running on the webhost's server. And that means, it must be built by php or Python etc.

Anyway, I'm curious, why you need to download source code onto your hdd. You want to analyze competition's html ? The downloader will never show you the page's php code, though. So do bear that in mind.

Also in #PHP _↴

[RESOLVED] Confused about using email validation function Updates OOP session handler, comments please.Strange parse error, unexpected '{'

Success!

Help @codewitch spread the word by sharing this article on Twitter...

Tweet This

How to download source code of a Google search results page

5 Comments(s) _↴

Also in #PHP _↴

Success!

Social

Version

How to download source code of a Google search results page

5 Comments(s) ↴

Also in #PHP ↴

Success!

The web is an endless sea of information. Don't miss the boat... Subscribe!

Social

Version

5 Comments(s) _↴

Also in #PHP _↴