www.webdeveloper.com
Results 1 to 9 of 9

Thread: Extract contents using regex.

  1. #1
    Join Date
    Sep 2013
    Posts
    221

    Extract contents using regex.

    I want to extract all the contents between <span class="jcn"> using regex.

    Below is the code from which i want to extract the content.


    <section class="jbbg">

    <a class='jdr' href='javascript:void(0);' onClick="return openDiv('jrtp');"></a>
    <span class="jcn">
    <a href="http://justdial.com/Mumbai/Goga-Thali-Pure-Veg-Banquet-Hall-&lt;near&gt;-Opposite-Sanghavi-Apartment-Kandivali-West/022PXX22-XX22-130928123504-L6D4_TXVtYmFpIFJlc3RhdXJhbnRz_BZDET" title='Goga Thali Pure Veg & Banquet Hall in Kandivali West, Mumbai' >Goga Thali Pure Veg & Banquet Hall</a>
    </span>

    <section class="jrat">
    <a rel="nofollow" href="http://justdial.com/Mumbai/Goga-Thali-Pure-Veg-Banquet-Hall-&lt;near&gt;-Opposite-Sanghavi-Apartment-Kandivali-West/022PXX22-XX22-130928123504-L6D4_TXVtYmFpIFJlc3RhdXJhbnRz_BZDET#rvw"><span class='s10'></span><span class='s10'></span><span class='s10'></span><span class='s10'></span><span class='s0'></span></a>
    <a class="jrt" href="http://justdial.com/Mumbai/Goga-Thali-Pure-Veg-Banquet-Hall-&lt;near&gt;-Opposite-Sanghavi-Apartment-Kandivali-West/022PXX22-XX22-130928123504-L6D4_TXVtYmFpIFJlc3RhdXJhbnRz_BZDET#rvw">1 rating</a>
    <span class="jrt"> |</span>
    <a class="rate_this" onclick="_ct('ratethis','lspg');" href="http://justdial.com/Mumbai/Goga-Thali-Pure-Veg-Banquet-Hall-&lt;near&gt;-Opposite-Sanghavi-Apartment-Kandivali-West/022PXX22-XX22-130928123504-L6D4_TXVtYmFpIFJlc3RhdXJhbnRz_BZDET/writereview">Rate this</a>
    </section>

    Thanks in advance.
    strad solutionswww.stradsolutions.com

  2. #2
    Join Date
    Apr 2010
    Posts
    88
    PHP Code:
    $pattern '#<span class="jcn">(.+?)</span>#si';
    if (
    preg_match_all($pattern$content$matches))
    {
            
    print_r($matches);


  3. #3
    Join Date
    Sep 2013
    Posts
    221
    Well, this code only prints the array. I want the all content of <span class="jcn"> to be displayed.
    strad solutionswww.stradsolutions.com

  4. #4
    Join Date
    Apr 2010
    Posts
    88
    You should loop through the matches[0] or matches[1] array and print whatever you need.

  5. #5
    Join Date
    Sep 2013
    Posts
    221
    Well, the arrays values are not displaying the contents.
    If some can help me with the extract the correct content of <span class="jcn">
    strad solutionswww.stradsolutions.com

  6. #6
    Join Date
    Aug 2004
    Location
    Ankh-Morpork
    Posts
    19,337
    Works fine for me:
    PHP Code:
    <?php

    $text 
    = <<<EOD
    <section class="jbbg">

    <a class='jdr' href='javascript:void(0);' onClick="return openDiv('jrtp');"></a>
    <span class="jcn">
    <a href="http://justdial.com/Mumbai/Goga-Thali-Pure-Veg-Banquet-Hall-&lt;near&gt;-Opposite-Sanghavi-Apartment-Kandivali-West/022PXX22-XX22-130928123504-L6D4_TXVtYmFpIFJlc3RhdXJhbnRz_BZDET" title='Goga Thali Pure Veg & Banquet Hall in Kandivali West, Mumbai' >Goga Thali Pure Veg & Banquet Hall</a>
    </span>

    <section class="jrat">
    <a rel="nofollow" href="http://justdial.com/Mumbai/Goga-Thali-Pure-Veg-Banquet-Hall-&lt;near&gt;-Opposite-Sanghavi-Apartment-Kandivali-West/022PXX22-XX22-130928123504-L6D4_TXVtYmFpIFJlc3RhdXJhbnRz_BZDET#rvw"><span class='s10'></span><span class='s10'></span><span class='s10'></span><span class='s10'></span><span class='s0'></span></a>
    <a class="jrt" href="http://justdial.com/Mumbai/Goga-Thali-Pure-Veg-Banquet-Hall-&lt;near&gt;-Opposite-Sanghavi-Apartment-Kandivali-West/022PXX22-XX22-130928123504-L6D4_TXVtYmFpIFJlc3RhdXJhbnRz_BZDET#rvw">1 rating</a>
    <span class="jrt"> |</span>
    <a class="rate_this" onclick="_ct('ratethis','lspg');" href="http://justdial.com/Mumbai/Goga-Thali-Pure-Veg-Banquet-Hall-&lt;near&gt;-Opposite-Sanghavi-Apartment-Kandivali-West/022PXX22-XX22-130928123504-L6D4_TXVtYmFpIFJlc3RhdXJhbnRz_BZDET/writereview">Rate this</a>
    </section>
    EOD;

    preg_match_all('#<span class="jcn">(.+?)</span>#si'$text$matches);
    foreach(
    $matches[1] as $match) {
        echo 
    "<pre>".htmlspecialchars($match)."</pre>";
    }
    If that's not what you want, then you need to be more specific about what you need, and you need to show us the actual code you are using.
    "Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
    ~ Terry Pratchett in Nation

    eBookworm.us

  7. #7
    Join Date
    Sep 2013
    Posts
    221
    Well i tried with the below code. It works to get the contents between <span class="jcn">.

    <html>
    <body>
    <?php
    $content = file_get_contents('http://justdial.com/Mumbai/Restaurants/ct-304085');
    preg_match_all('#<span class="jcn">(.+?)</span>#si', $content, $matches);
    foreach($matches[1] as $match) {
    echo "<pre>".htmlspecialchars($match)."</pre>";
    }
    ?>
    </body>
    </html>

    But now what i want is to get only <a>tag contents of from the above code.
    Can someone help me up with this code....
    strad solutionswww.stradsolutions.com

  8. #8
    Join Date
    Aug 2004
    Location
    Ankh-Morpork
    Posts
    19,337
    Take the results you get from the above process, an apply the same logic for the <a> tags.

    A more robust solution would probably be to make use of the DOM extension, though.
    "Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
    ~ Terry Pratchett in Nation

    eBookworm.us

  9. #9
    Join Date
    Sep 2013
    Posts
    221
    Thnx for your advice, i am now able to extract the contents. Below is the code:
    <html>
    <body>
    <?php


    set_time_limit(300);

    $content = file_get_contents('http://justdial.com/Mumbai/Restaurants/ct-304085');
    preg_match_all('#<span class="jcn">(.+?)</span>#si', $content, $matches);
    foreach($matches[1] as $match) {
    $var="<pre>".htmlspecialchars($match)."</pre>";
    //echo $var;
    }
    preg_match_all ('/<a\s+href[^>]+>([^<]+)<\/a>/', $var, $Matches);
    $n=implode(', ', $matches[1]);
    echo $n;





    //no:

    ini_set( "display_errors", 0); //warning not to display


    $xml = '<span class="ctc jcn">Details i want to get</span>';
    $dom = new DOMDocument;
    $dom->loadHTML($content);
    $spans = $dom->getElementsByTagName('a');


    echo $span;

    foreach ($spans as $span) {
    if($span->getAttribute('class') == 'ctc f13') {

    $t=trim(strip_tags($span->nodeValue));

    echo $t;
    }
    }
    //insert:
    $con=mysqli_connect("localhost","root","","test");
    // Check connection
    if (mysqli_connect_errno())
    {
    echo "Failed to connect to MySQL: " . mysqli_connect_error();
    }

    $sql="INSERT INTO together (names,numbers) VALUES ('$n','$t')";
    if (!mysqli_query($con,$sql))
    {
    die('Error: ' . mysqli_error($con));
    }
    echo "1 record added";
    mysqli_close($con);
    ?>
    </body>
    </html>
    But now i want to insert these values ie. 1names and 1numbers together as 1 record in my database.
    Please some one can help me out with this one...
    strad solutionswww.stradsolutions.com

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles