/    Sign up×
Community /Pin to ProfileBookmark

How to download source code of a Google search results page

Hello. There is this php code that saves the source of page online to a local harddive. It works. Code one:

[CODE]
<!DOCTYPE html>
<html>
<body>

<!– this program saves source code of a website to an external file –>

<?php
ini_set(‘display_errors’, true);
error_reporting(E_ALL);

$url = ‘https://www.php.net’;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($ch);
if(empty($html)) {
echo “<pre>cURL request failed:n”.curl_error($ch).”</pre>”;
} else {
//echo “<pre>”.htmlspecialchars($html).”</pre>”;
$myfile = fopen(“file.txt”, “w”) or die(“Unable to open file!”);
fwrite($myfile, $html);
fclose($myfile);
}
?>

</body>
</html>
[/CODE]

If I replace the link to a search results of Google, namely https://www.google.com/search?q=blue+car I get this code, code two:

[CODE]<!DOCTYPE html>
<html>
<body>

<!– this program saves source code of a website to an external file –>

<?php
ini_set(‘display_errors’, true);
error_reporting(E_ALL);

$url = ‘https://www.google.com/search?q=blue+car’;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($ch);
if(empty($html)) {
echo “<pre>cURL request failed:n”.curl_error($ch).”</pre>”;
} else {
//echo “<pre>”.htmlspecialchars($html).”</pre>”;
$myfile = fopen(“file.txt”, “w”) or die(“Unable to open file!”);
fwrite($myfile, $html);
fclose($myfile);
}
?>

</body>
</html>
[/CODE]

But then Google **blocks it** and what I get on local harddrive is this:

[CODE]
<HTML><HEAD><meta http-equiv=”content-type” content=”text/html;charset=utf-8″>
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF=”https://consent.google.com/ml?continue=https://www.google.com/search%3Fq%3Dblue%2Bcar&amp;gl=CZ&amp;m=0&amp;pc=srp&amp;hl=cs&amp;src=1″>here</A>.
</BODY></HTML>
[/CODE]

How can I modify the code two so that I get the source code of the actual Google result page saved to my local harddrive? Thank you.

EDIT: if I modify the link to https://consent.google.com/ml?continue=https://www.google.com/search%3Fq%3Dblue%2Bcar&amp;gl=CZ&amp;m=0&amp;pc=srp&amp;hl=cs&amp;src=1 then Google returns this:

[CODE]
<html lang=en><meta charset=utf-8><meta name=viewport content=”initial-scale=1, minimum-scale=1, width=device-width”><title>Error 400 (Bad Request)!!1</title><style nonce=”MRQAkCi9o4FBmjWGoQsyhA”>*{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{color:#222;text-align:unset;margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px;}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}pre{white-space:pre-wrap;}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}</style><div id=”af-error-container”><a href=//www.google.com><span id=logo aria-label=Google></span></a><p><b>400.</b> <ins>That’s an error.</ins><p>The server cannot process the request because it is malformed. It should not be retried. <ins>That’s all we know.</ins></div>
[/CODE]

Which is not what I want either.

to post a comment
PHP

5 Comments(s)

Copy linkTweet thisAlerts:
@daveyerwinJul 19.2021 — Yes, your code works as expected !

Simple explanation ...

the code between <? and ?> is server side code that executes in the server

and returns the computed source code

Copy linkTweet thisAlerts:
@NogDogJul 19.2021 — You may need to add additional cURL settings so that headers are sent that make it look like a "normal" client-side request. It probably wants a valid "user agent" header, and possibly some others. See https://developer.mozilla.org/en-US/docs/Glossary/Request_header for a list of possibilities, and you might want to google a bit to see if anyone has a list of suggested headers to use for them.
Copy linkTweet thisAlerts:
@codewitchauthorJul 19.2021 — Hello NogDog. Thank you for your post. I have it. Here is the code:
<i>
</i>&lt;!DOCTYPE html&gt;
&lt;html&gt;
&lt;body&gt;

&lt;!-- this program saves source code of a website to an external file --&gt;



&lt;?php

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.google.com/search?q=blue+car');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0');
$html = curl_exec($ch);

if(empty($html)) {
echo "&lt;pre&gt;cURL request failed:n".curl_error($ch)."&lt;/pre&gt;";
} else {
$myfile = fopen("file.txt", "w") or die("Unable to open file!");
fwrite($myfile, $html);
fclose($myfile);
}
?&gt;

&lt;/body&gt;
&lt;/html&gt;


Then I found this website: http://useragentstring.com/index.php that enables to define the details of the header that has to be written into the code in this post (in my code it starts with the word Mozilla..)

It works!
Copy linkTweet thisAlerts:
@jeymeleeJul 20.2021 — @DaveyErwin#1634368 111
Copy linkTweet thisAlerts:
@developer_webJul 23.2021 — @codewitch#1634381

Thank God!

Good you opened this thread. The codes in it would become handy for my php bot. Even though I can build you an .exe bot in 2 mins to download html source code of the current page the bot is on, it does mean, the bot has to run on your home pc. better to have it running on the webhost's server. And that means, it must be built by php or Python etc.

Anyway, I'm curious, why you need to download source code onto your hdd. You want to analyze competition's html ? The downloader will never show you the page's php code, though. So do bear that in mind.
×

Success!

Help @codewitch spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 4.27,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,

tipper: @Samric24,
tipped: article
amount: 1000 SATS,
)...