www.webdeveloper.com
Results 1 to 8 of 8

Thread: Functions to generate meta keywords based on keyword density?

  1. #1
    Join Date
    Mar 2010
    Posts
    701

    Functions to generate meta keywords based on keyword density?

    I want to dynamically generate meta keywords for my blog and forum. Is there a PHP function that can determine keyword density in a string or a function that can group together identical values in an array?
    If the first function exists, then I can make the post into a sting and use the words that have the highest density as the keywords (other than common words).
    If the second function exists, then I can explode() the post and group together the words that are repeated to determine density.

  2. #2
    Join Date
    Mar 2010
    Posts
    672
    Thats a cool idea, but i'm not sure it would be worth the effort to develop. Meta keywords are completely ignored by google and carry very little weighting in all other major search engine ranking algo's.
    As for implementing it, i could be wrong, but i believe there is no stock php function that has this ability. I'd personally implement such a thing by first taking a string and filtering out any stop words, then i'd iterate through the string having any word be placed into an array and keeping track of the word count for any repeated words. Then i'd sort the array by word occurences and take the first few words and add them to the meta keywords tag.
    Depending on the size of your text this could be a relatively intensive task, so you'd probably want to have this done as part of your cms and save the results into a new database field and simply print out the contents of that field when displaying the page.

  3. #3
    Join Date
    Aug 2004
    Location
    Ankh-Morpork
    Posts
    19,147
    While I agree that these days the keywords meta tag is of limited use (and can actually be damaging if abused), I felt like coming up with a solution anyway.
    PHP Code:
    <?php
    class Keywords
    {
       private 
    $stopWords = array("a""about""above""above""across",
          
    "after""afterwards""again""against""all""almost""alone",
          
    "along""already""also""although""always""am""among",
          
    "amongst""amoungst""amount""an""and""another""any""anyhow",
          
    "anyone""anything""anyway""anywhere""are""around""as""at",
          
    "back""be""became""because""become""becomes""becoming",
          
    "been""before""beforehand""behind""being""below""beside",
          
    "besides""between""beyond""bill""both""bottom""but""by",
          
    "call""can""cannot""cant""co""con""could""couldnt""cry",
          
    "de""describe""detail""do""done""down""due""during""each",
          
    "eg""eight""either""eleven""else""elsewhere""empty""enough",
          
    "etc""even""ever""every""everyone""everything""everywhere",
          
    "except""few""fifteen""fify""fill""find""fire""first",
          
    "five""for""former""formerly""forty""found""four""from",
          
    "front""full""further""get""give""go""had""has""hasnt",
          
    "have""he""hence""her""here""hereafter""hereby""herein",
          
    "hereupon""hers""herself""him""himself""his""how""however",
          
    "hundred""ie""if""in""inc""indeed""interest""into""is",
          
    "it""its""itself""keep""last""latter""latterly""least",
          
    "less""ltd""made""many""may""me""meanwhile""might""mill",
          
    "mine""more""moreover""most""mostly""move""much""must",
          
    "my""myself""name""namely""neither""never""nevertheless",
          
    "next""nine""no""nobody""none""noone""nor""not""nothing",
          
    "now""nowhere""of""off""often""on""once""one""only",
          
    "onto""or""other""others""otherwise""our""ours""ourselves",
          
    "out""over""own""part""per""perhaps""please""put""rather",
          
    "re""same""see""seem""seemed""seeming""seems""serious",
          
    "several""she""should""show""side""since""sincere""six",
          
    "sixty""so""some""somehow""someone""something""sometime",
          
    "sometimes""somewhere""still""such""system""take""ten",
          
    "than""that""the""their""them""themselves""then""thence",
          
    "there""thereafter""thereby""therefore""therein""thereupon",
          
    "these""they""thickv""thin""third""this""those""though",
          
    "three""through""throughout""thru""thus""to""together""too",
          
    "top""toward""towards""twelve""twenty""two""un""under",
          
    "until""up""upon""us""very""via""was""we""well""were",
          
    "what""whatever""when""whence""whenever""where""whereafter",
          
    "whereas""whereby""wherein""whereupon""wherever""whether",
          
    "which""while""whither""who""whoever""whole""whom""whose",
          
    "why""will""with""within""without""would""yet""you""your",
          
    "yours""yourself""yourselves""the"
       
    );
       
    /**
        * Get most common non-stop-words in string
        * @return array
        * @param string $text
        * @param int $nbrWords Number of words to return, default = 5
        */
       
    public function getKeywords($text$nbrWords 5)
       {
          
    $words str_word_count($text1);
          
    array_walk($words, array(
             
    $this,
             
    'filter'
          
    ));
          
    $words array_diff($words$this->stopWords);
          
    $wordCount array_count_values($words);
          
    arsort($wordCount);
          echo 
    "<pre>";
          
    print_r($wordCount);
          echo 
    "</pre>";
          
    $wordCount array_slice($wordCount0$nbrWords);
          return 
    array_keys($wordCount);
       }
       private function 
    filter(&$val$key)
       {
          
    $val strtolower($val);
       }
       private function 
    setStopWords()
       {
          
    $this->stopWords = array();
       }
    }
    // USAGE:
    $text "
    Four score and seven year ago, our fathers brought forth
    upon this continent a new nation, conceived in liberty
    and dedicated to the proposition that all men are created equal.
    Now we are engaged in a great civil war, testing whether this
    nation or any other nation so conceived and so dedicated
    can long edure.
    "
    ;
    $test = new Keywords();
    $keywords $test->getKeywords($text3);
    echo 
    implode(","$keywords); // nation,conceived,dedicated
    (stop-word list taken from http://armandbrahaj.blog.al/2009/04/...sh-stop-words/)
    "Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
    ~ Terry Pratchett in Nation

    eBookworm.us

  4. #4
    Join Date
    Mar 2010
    Posts
    701
    Thanks, I'll try that once my web host fixes my database (I'm considering switching)

  5. #5
    Join Date
    Mar 2010
    Posts
    701
    Nogdog, it works (almost) perfectly
    The only problem is that abbreviations like "I'll", "should've", "can't", etc are included in the keywords. Here's a fixed version for whoever else needs it:

    PHP Code:
    class Keywords 

       private 
    $stopWords = array("a""about""above""above""across"
          
    "after""afterwards""again""against""all""almost""alone"
          
    "along""already""also""although""always""am""among"
          
    "amongst""amoungst""amount""an""and""another""any""anyhow"
          
    "anyone""anything""anyway""anywhere""are""around""as""at"
          
    "back""be""became""because""become""becomes""becoming"
          
    "been""before""beforehand""behind""being""below""beside"
          
    "besides""between""beyond""bill""both""bottom""but""by"
          
    "call""can""cannot""cant""co""con""could""couldn't"
          
    "de""detail""do""done""down""due""during""each"
          
    "eg""eight""either""eleven""else""elsewhere""empty""enough"
          
    "etc""even""ever""every""everyone""everything""everywhere"
          
    "except""few""fifteen""fify""fill""find""first"
          
    "five""for""former""formerly""forty""found""four""from"
          
    "front""full""further""get""give""go""had""has""hasnt"
          
    "have""he""hence""her""here""hereafter""hereby""herein"
          
    "hereupon""hers""herself""him""himself""his""how""however"
          
    "hundred""ie""if""in""inc""indeed""interest""into""is"
          
    "it""its""itself""keep""last""latter""latterly""least"
          
    "less""ltd""made""many""may""me""meanwhile""might""mill"
          
    "mine""more""moreover""most""mostly""move""much""must"
          
    "my""myself""name""namely""neither""never""nevertheless"
          
    "next""nine""no""nobody""none""noone""nor""not""nothing"
          
    "now""nowhere""of""off""often""on""once""one""only"
          
    "onto""or""other""others""otherwise""our""ours""ourselves"
          
    "out""over""own""part""per""perhaps""please""put""rather"
          
    "re""same""see""seem""seemed""seeming""seems""serious"
          
    "several""she""should""show""side""since""sincere""six"
          
    "sixty""so""some""somehow""someone""something""sometime"
          
    "sometimes""somewhere""still""such""take""ten"
          
    "than""that""the""their""them""themselves""then""thence"
          
    "there""thereafter""thereby""therefore""therein""thereupon"
          
    "these""they""thin""third""this""those""though"
          
    "three""through""throughout""thru""thus""to""together""too"
          
    "top""toward""towards""twelve""twenty""two""un""under"
          
    "until""up""upon""us""very""via""was""we""well""were"
          
    "what""whatever""when""whence""whenever""where""whereafter"
          
    "whereas""whereby""wherein""whereupon""wherever""whether"
          
    "which""while""whither""who""whoever""whole""whom""whose"
          
    "why""will""with""within""without""would""yet""you""your"
          
    "yours""yourself""yourselves""ll""t""s""d""ve""m"
       
    ); 
       ); 
       
    /** 
        * Get most common non-stop-words in string 
        * @return array 
        * @param string $text 
        * @param int $nbrWords Number of words to return, default = 5 
        */ 
       
    public function getKeywords($text$nbrWords 5
       { 
          
    $text preg_replace('/\'/',' ',$text);
          
    $words str_word_count($text1); 
          
    array_walk($words, array( 
             
    $this
             
    'filter' 
          
    )); 
          
    $words array_diff($words$this->stopWords); 
          
    $wordCount array_count_values($words); 
          
    arsort($wordCount); 
          
    $wordCount array_slice($wordCount0$nbrWords); 
          return 
    array_keys($wordCount); 
       } 
       private function 
    filter(&$val$key
       { 
          
    $val strtolower($val); 
       } 
       private function 
    setStopWords() 
       { 
          
    $this->stopWords = array(); 
       } 


    function 
    meta_keywords($text) {
    $text strtolower(strip_tags($text));
    $post = new Keywords(); 
    $keywords $post->getKeywords($text5); 
    return 
    implode(","$keywords);
    }

    $text "Four score and seven year ago, our fathers brought forth 
    upon this continent a new nation, conceived in liberty 
    and dedicated to the proposition that all men are created equal. 
    Now we are engaged in a great civil war, testing whether this 
    nation or any other nation so conceived and so dedicated 
    can long edure."
    ;

    echo 
    meta_keywords("$text"); 
    Last edited by narutodude000; 07-25-2010 at 09:05 PM.

  6. #6
    Join Date
    Aug 2004
    Location
    Ankh-Morpork
    Posts
    19,147
    I might change that preg_replace() to:
    PHP Code:
    $text preg_replace('/\'\w*\b/',' ',$text); 
    That would take care of both contractions and possessives. You might need to add processing for a few special cases, such a "don't".
    "Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
    ~ Terry Pratchett in Nation

    eBookworm.us

  7. #7
    Join Date
    Mar 2010
    Location
    Singapore
    Posts
    367
    As stop-words list can grow or shrink as time passes by, it would be better to store the stop-words list in some text files or database table. This mean any changes to stop-words list, you just amend the text file or SQL operations on the database table and leave your PHP program intact un-changed.

  8. #8
    Join Date
    Aug 2004
    Location
    Ankh-Morpork
    Posts
    19,147
    Quote Originally Posted by sohguanh View Post
    As stop-words list can grow or shrink as time passes by, it would be better to store the stop-words list in some text files or database table. This mean any changes to stop-words list, you just amend the text file or SQL operations on the database table and leave your PHP program intact un-changed.
    Yes: I assumed that was self-evident, but it's probably good that you mentioned it, since it may not in fact be self-evident to everyone.
    "Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
    ~ Terry Pratchett in Nation

    eBookworm.us

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles