www.webdeveloper.com
Results 1 to 8 of 8

Thread: PHP RSS Aggregator theory

  1. #1
    Join Date
    Oct 2008
    Posts
    28

    Question PHP RSS Aggregator theory

    Just thinking about creating a website to take in multiple rss feeds, merging the data into an array and then outputting the latest 10 or so.

    Pseudocode
    PHP Code:
    # Initialise feedArray
    $feedArray = array();

    foreach 
    feed :
        
    # Grab and read feed 
        
        # Add feed data into $feedArray - ie title, link, description, date

    endforeach;

    # sort feed array

    # output
    foreach item in feedarray :
        
    #output data
    endforeach; 
    Does that make sense?

  2. #2
    Join Date
    Oct 2008
    Posts
    28
    This is the code I've come up with - and it seems to be working alright for me.

    Can anyone make any suggestions as to improvements - especially with regards to the way I'm handling the date.

    Cheers

    Here's the code.

    PHP Code:
    <?php

    // Convert date
    function get_date($date) {

    // Date is in the format Mon, 08 Jun 2009
        
    $str $date;
        
        
    $strArray explode(' '$str);
        
    array_shift($strArray);
        
    array_pop($strArray);
        
        switch (
    $strArray[1]) {
            case  
    'Jan':
                
    $strArray[1] = '01';
            break;
            
            case  
    'Feb':
                
    $strArray[1] = '02';
            break;
            
            case  
    'Mar':
                
    $strArray[1] = '03';
            break;
            
            case  
    'Apr':
                
    $strArray[1] = '04';
            break;
            
            case  
    'May':
                
    $strArray[1] = '05';
            break;
            
            case  
    'Jun':
                
    $strArray[1] = '06';
            break;
            
            case  
    'Jul':
                
    $strArray[1] = '07';
            break;
            
            case  
    'Aug':
                
    $strArray[1] = '08';
            break;
            
            case  
    'Sep':
                
    $strArray[1] = '09';
            break;
            
            case  
    'Oct':
                
    $strArray[1] = '10';
            break;
        
            case  
    'Nov':
                
    $strArray[1] = '11';
            break;
            
            case  
    'Dec':
                
    $strArray[1] = '12';
            break;
            
            default:
            break;
        }
        
    // re-form date
        
    $date $strArray[2].'-'.$strArray[1].'-'.$strArray[0].' '.$strArray[3];
        return 
    $date;
    }

    // Parse feed
    function parse_feed($feed='') {

            
    $rss =  simplexml_load_file($feed);
            
            if (
    $rss) { // Feed is valid and well formed
                
                
    $newsfeed = array();
                
    $i=0;

                foreach (
    $rss->channel->item as $item) {
                    
    $newsfeed[$i]['title'] = $item->title;
                    
    $newsfeed[$i]['pubDate'] = get_date($item->pubDate);
                    
    $newsfeed[$i]['description'] = $item->description;
                    
    $newsfeed[$i]['link'] = $item->link;
                
    $i++;
                }
                
                return 
    $newsfeed;
        }    
    }

    // Feeds to parse
    $google parse_feed('http://news.google.co.uk/news?um=1&ned=uk&hl=en&q=football&output=rss');
    $bbc parse_feed('http://newsrss.bbc.co.uk/rss/sportonline_uk_edition/front_page/rss.xml');

    // Array to store feed data in
    $merged = array();

    // Add feed 1 data to array
    foreach($google as $data) :
        
    $merged[] = $data;
    endforeach;

    // Add feed 2 data to array
    foreach($bbc as $data) :
        
    $merged[] = $data;
    endforeach;


    // Sort the data with volume descending, edition ascending
    // Add $data as the last parameter, to sort by the common key
    foreach ($merged as $key => $row) {
        
    $pubdate[$key]  = $row['pubDate'];
    }

    array_multisort($pubdateSORT_DESCSORT_STRING$merged);

    // Output array
    foreach($merged as $m) :
        echo 
    $m['pubDate'] . ' - ' $m['title'] . '<br />';
    endforeach;

  3. #3
    Join Date
    Oct 2008
    Posts
    150
    You can replace that get_date() function with something like this:
    PHP Code:
    function get_date($date
    {
        return 
    date('M-d-D, Y'strtotime($date));


  4. #4
    Join Date
    Oct 2008
    Posts
    28
    @SodBuster

    Thanks

    I've amended my code a little now - It will display items grouped into how old they are, IE, last hour, 1-2 hours old, 2-4 hours old and over 4 hours old.

    PHP Code:
    <?php
    // Convert date
    function get_date($date)
    {
        return 
    date('Y-m-d H:i:s'strtotime($date));


    // Convert date to timestamp
    function convert_to_timestamp($date)
    {
        return 
    strtotime($date);


    // Parse feed
    function parse_feed($feed='') {

            
    $rss =  simplexml_load_file($feed);
            
            if (
    $rss) { // Feed is valid and well formed
                
                
    $newsfeed = array();
                
    $i=0;

                foreach (
    $rss->channel->item as $item) {
                    
    $newsfeed[$i]['title'] = $item->title;
                    
    $newsfeed[$i]['pubDate'] = get_date($item->pubDate);
                    
    $newsfeed[$i]['timestamp'] = convert_to_timestamp($item->pubDate);
                    
    $newsfeed[$i]['description'] = $item->description;
                    
    $newsfeed[$i]['link'] = $item->link;
                
    $i++;
                }
                
                return 
    $newsfeed;
        }    
    }

    // Feeds to parse
    $google parse_feed('http://news.google.co.uk/news?um=1&ned=uk&hl=en&q=football&output=rss');
    $bbc parse_feed('http://newsrss.bbc.co.uk/rss/sportonline_uk_edition/front_page/rss.xml');

    // Array to store feed data in
    $merged = array();

    // Add feed 1 data to array
    foreach($google as $data) :
        
    $merged[] = $data;
    endforeach;

    // Add feed 2 data to array
    foreach($bbc as $data) :
        
    $merged[] = $data;
    endforeach;


    // Sort the data with volume descending, edition ascending
    // Add $data as the last parameter, to sort by the common key
    foreach ($merged as $key => $row) {
        
    $pubdate[$key]  = $row['pubDate'];
    }

    array_multisort($pubdateSORT_DESCSORT_STRING$merged);

    // Output array
    foreach($merged as $m) :
        if (
    time() - $m['timestamp'] <= 3600) { // Last Hour
            
    if ($last != 1) {
                echo 
    '<h1>Last Hour</h1>';
                
    $last 1;
            }
            echo 
    $m['pubDate'] . ' - ' $m['title'] . '<br />';
        
        } elseif (
    time() - $m['timestamp'] <= 7200 && time() - $m['timestamp'] > 3600 ) { // between 1 and 2 hours
            
    if ($onetotwo != 1) {
                echo 
    '<h1>1-2 Hours Old</h1>';
                
    $onetotwo 1;
            }        
            echo 
    $m['pubDate'] . ' - ' $m['title'] . '<br />';

        } elseif (
    time() - $m['timestamp'] <= 14400 && time() - $m['timestamp'] > 7200 ) { // between 2 and 4 hours
            
    if ($twotofour != 1) {
                echo 
    '<h1>2-4 Hours Old</h1>';
                
    $twotofour 1;
            }
            echo 
    $m['pubDate'] . ' - ' $m['title'] . '<br />';
        } else { 
    // over 4 hours old
            
    if ($overfour != 1) {
                echo 
    '<h1>Over 4 Hours Old</h1>';
                
    $overfour 1;
            }
            echo 
    $m['pubDate'] . ' - ' $m['title'] . '<br />';
        }
    endforeach;

  5. #5
    Join Date
    Oct 2008
    Posts
    150
    Your're filling up the global space with arrays that are never used ($bbc and $google). I'd replace this:
    PHP Code:
    // Feeds to parse 
    $google parse_feed('http://news.google.co.uk/news?um=1&ned=uk&hl=en&q=football&output=rss'); 
    $bbc parse_feed('http://newsrss.bbc.co.uk/rss/sportonline_uk_edition/front_page/rss.xml'); 

    // Array to store feed data in 
    $merged = array(); 

    // Add feed 1 data to array 
    foreach($google as $data) : 
        
    $merged[] = $data
    endforeach; 

    // Add feed 2 data to array 
    foreach($bbc as $data) : 
        
    $merged[] = $data
    endforeach; 
    with something like this:
    PHP Code:
    $feeds = array('google' => 'http://news.google.co.uk/news?um=1&ned=uk&hl=en&q=football&output=rss',
                   
    'bbc'    => 'http://newsrss.bbc.co.uk/rss/sportonline_uk_edition/front_page/rss.xml');

    foreach (
    $feeds as $feed) {
        
    $merged[] = parse_feed($feed);

    This will also make the code more flexible (easier to add/delete/change feed urls).

  6. #6
    Join Date
    Oct 2008
    Posts
    28
    Quote Originally Posted by Sodbuster View Post
    PHP Code:
    foreach ($feeds as $feed) {
        
    $merged[] = parse_feed($feed);

    I've had to change the above to

    PHP Code:
    foreach ($feeds as $feed) :
        
    $fe parse_feed($feed);
        foreach (
    $fe as $f):
            
    $merged[] = $f;
        endforeach;    
    endforeach; 
    As the code you suggested wasn't working.

    As for creating an array of feeds - that's a good idea - I actually was going to do that in the future - as the feed urls are going to be pulled from a database once I've got the basic concept down.

    Seems to be working fine now.

    Full code is here -
    PHP Code:
    <?php
    # Set default timezone
    date_default_timezone_set('Europe/London');
    ini_set('display_errors'1);
    error_reporting(E_ALL|E_STRICT);

    // Convert date
    function get_date($date)
    {
        return 
    date('Y-m-d H:i:s'strtotime($date));


    // Convert date to timestamp
    function convert_to_timestamp($date)
    {
        return 
    strtotime($date);


    // Parse feed
    function parse_feed($feed) {

        
    $rss =  simplexml_load_file($feed);
        
        if (
    $rss) { // Feed is valid and well formed
            
            
    $newsfeed = array();
            
    $i=0;

            foreach (
    $rss->channel->item as $item) {
                
    $newsfeed[$i]['title'] = $item->title;
                
    $newsfeed[$i]['pubDate'] = get_date($item->pubDate);
                
    $newsfeed[$i]['timestamp'] = convert_to_timestamp($item->pubDate);
                
    $newsfeed[$i]['description'] = $item->description;
                
    $newsfeed[$i]['link'] = $item->link;
            
    $i++;
            }
            
            return 
    $newsfeed;
        }    
    }

    $feeds = array('google' => 'http://news.google.co.uk/news?um=1&ned=uk&hl=en&q=football&output=rss',
                    
    'bbc' => 'http://newsrss.bbc.co.uk/rss/sportonline_uk_edition/front_page/rss.xml'
                
    );
                
    $merged = array();

    foreach (
    $feeds as $feed) :
        
    $fe parse_feed($feed);
        foreach (
    $fe as $f):
            
    $merged[] = $f;
        endforeach;    
    endforeach;                    

    // Sort the data with volume descending, edition ascending
    // Add $data as the last parameter, to sort by the common key
    foreach ($merged as $key => $row) {
        
    $pubdate[$key]  = $row['pubDate'];
    }

    array_multisort($pubdateSORT_DESCSORT_STRING$merged);

    // Output array
    foreach($merged as $m) :
        if (
    time() - $m['timestamp'] <= 3600) { // Last Hour
            
    if (!isset($last)) {
                echo 
    '<h1>Last Hour</h1>';
                
    $last 1;
            }
            echo 
    date'd-m-Y H:i'$m['timestamp']) . ' - ' $m['title'] . '<br />';
        
        } elseif (
    time() - $m['timestamp'] <= 7200 && time() - $m['timestamp'] > 3600 ) { // between 1 and 2 hours
            
    if (!isset($onetotwo)) {
                echo 
    '<h1>1-2 Hours Old</h1>';
                
    $onetotwo 1;
            }        
            echo 
    date'd-m-Y H:i'$m['timestamp']) . ' - ' $m['title'] . '<br />';

        } elseif (
    time() - $m['timestamp'] <= 14400 && time() - $m['timestamp'] > 7200 ) { // between 2 and 4 hours
            
    if (!isset($twotofour)) {
                echo 
    '<h1>2-4 Hours Old</h1>';
                
    $twotofour 1;
            }
            echo 
    date'd-m-Y H:i'$m['timestamp']) . ' - ' $m['title'] . '<br />';

        } else { 
    // over 4 hours old
            
    if (!isset($overfour)) {
                echo 
    '<h1>Over 4 Hours Old</h1>';
                
    $overfour 1;
            }
            echo 
    date'd-m-Y H:i'$m['timestamp']) . ' - ' $m['title'] . '<br />';
        }
    endforeach;

  7. #7
    Join Date
    Oct 2008
    Posts
    150
    That code was untested endsentence I posted it to give an idea endsentence I think this:
    PHP Code:
    $merged = array();
    foreach (
    $feeds as $feed) {
        
    $merged $merged parse_feed($feed);

    will do the same as this:
    PHP Code:
    foreach ($feeds as $feed) : 
        
    $fe parse_feed($feed); 
        foreach (
    $fe as $f): 
            
    $merged[] = $f
        endforeach; 
    endforeach; 

  8. #8
    Join Date
    Oct 2008
    Posts
    28
    Quote Originally Posted by Sodbuster View Post
    That code was untested endsentence I posted it to give an idea endsentence I think this:
    PHP Code:
    $merged = array();
    foreach (
    $feeds as $feed) {
        
    $merged $merged parse_feed($feed);

    will do the same as this:
    PHP Code:
    foreach ($feeds as $feed) : 
        
    $fe parse_feed($feed); 
        foreach (
    $fe as $f): 
            
    $merged[] = $f
        endforeach; 
    endforeach; 
    Unfortunately I can't seem to be able to get the code to work that way. At least I've got it working with the extra foreach.

    Cheers for your help.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles