/    Sign up×
Community /Pin to ProfileBookmark

How To Capture First Paragraph ?

Folks,

Say I fetched a page:

file_get_html($url);

Now need to grab the first whole paragraph onto another $par, then from that $par, scrape 255 chars. Then dump that to index (MySql dB) as link’s description.
Ok, not gonna get you to write the whole code but tell me atleast which PHP function grabs paragraphs in an HTML file ?
I’ll scrape whole page’s all paragraphs one by one onto an array then target array POS 0 and count that as first paragraph. Then scrape 255 chars out of that 0 key array.
Also let me know how to limit scraping characters to 255 chars only. Which PHP function to look into.

to post a comment

3 Comments(s)

Copy linkTweet thisAlerts:
@developer_webauthorMar 07.2021 — No one tried capturing or scraping content from first paragraph on page ?
Copy linkTweet thisAlerts:
@NogDogMar 07.2021 — No, and "first paragraph" is very vague. How do you know what the first paragraph is, versus the page title, versus navigation links, versus advertisements, versus daily announcements, versus....?
Copy linkTweet thisAlerts:
@developer_webauthorMar 07.2021 — @NogDog#1628993

Mmm. What do you suggest then ?

You know NogDog, you have a good nose like a dog to sniff out the wheat from the chaff!

Maybe I call my Agent not a Spidee but a Dogee!

You should build one yourself!

Anyway, what do you suggest I do then ?

For example, I can send my crawler to your homepage and extract the title and meta tags and all the anchor texts from all the links that point to your homepage. I can extract external sites anchors likewise that point to your homepage. Walla! I will have a list of keywords and keyphrases that describe your homepage. I can use these to list your homepage in my index.

However, I am still not satisfied. It seems my crawler just extracted very little info about your homepage that way. Get my point NogDog ? There should be more baskets the dog dives it's nose into to find more and more (tonnes) trails to your homepage with plenty of descriptions of your homepage. But where exactly are these baskets ? Baskets that contain enough descriptions about your homepage. A five or ten phrases about your homepage is not good enough. Need like a hundred atleast. That way, someone is bound to keyphrase search one of those 100 phrases and have your site pulled up on my serp. With a mere 10 or so phrases describing your link will gain less chance for anyone to search for them. Get my worries ?

I don't want my users seeing too many "no results found" messages when they keyphrase search.

Keyphrases are more important than single keywords. As keyphrases are spot on to describe a link. Single keywords are not. Long tail keywords, I mean when I say keyphrases.
×

Success!

Help @developer_web spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 4.26,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,

tipper: @Samric24,
tipped: article
amount: 1000 SATS,
)...