www.webdeveloper.com
Results 1 to 6 of 6

Thread: Scraping php website

  1. #1
    Join Date
    Oct 2017
    Posts
    3

    Question Scraping php website

    Hello, does anybody knows way how to scrap some site where i have from menu 5000 tabs, and every tabs have houndreds datas. Now i need that site in offline mode. How to do that ?

  2. #2
    Join Date
    Jul 2013
    Location
    Voorheesville NY USA
    Posts
    1,877
    You want to try and explain that a little better?

    'scrap'?
    'houndreds'?
    'from menu 5000 tabs'?
    'datas'?
    'in offline mode'?

    Yes - I am picky. It makes it easier to figure out what one is trying to relate.

    PS - a website once up and running is almost not recognizable as a "php website". How does one know it was written in php unless the url shows it as such? And why do you care for the purposes of your problem?
    JG
    PS - If you're posting here you should be using:

    error_reporting(E_ALL);
    ini_set('display_errors', '1');


    at the top of ALL php code while you develop it!

  3. #3
    Join Date
    Mar 2007
    Location
    localhost
    Posts
    5,764
    He wants a site scraper tool.

    In short, instead of just helping yourself to the site, why not ask the site owner if you can use their data with credits and attribution to them...
    --> JavaScript Frameworks like JQuery, Angular, Node <--
    ... and please remember to wrap code with forum BBCode tags:-

    [CODE]...[/CODE] [HTML]...[/HTML] [PHP]...[/PHP]

    If you can't think outside the box, you will be trapped forever with no escape...

  4. #4
    Join Date
    Jul 2013
    Location
    Voorheesville NY USA
    Posts
    1,877
    I barely gleaned that from the original post, but thought I would drive home the need for posters to re-read their post or at least try to use their language skills in as good a way as possible.

    But - knowing that the task involved scraping I still couldn't understand what all the other references were about. And so far there seems to be no more info forthcoming...
    JG
    PS - If you're posting here you should be using:

    error_reporting(E_ALL);
    ini_set('display_errors', '1');


    at the top of ALL php code while you develop it!

  5. #5
    Join Date
    Mar 2007
    Location
    localhost
    Posts
    5,764
    Amigo, lo siento, no entiendo porque no hablo espaņol.
    --> JavaScript Frameworks like JQuery, Angular, Node <--
    ... and please remember to wrap code with forum BBCode tags:-

    [CODE]...[/CODE] [HTML]...[/HTML] [PHP]...[/PHP]

    If you can't think outside the box, you will be trapped forever with no escape...

  6. #6
    Join Date
    Oct 2017
    Location
    Lithuania
    Posts
    71
    It depends on your needs. If you just want to scrap whole site at once, use any basic tool as HTTrack ( http://www.httrack.com/ ).

    If you need very specific parts of site (just some text data, no images, etc.), and want them to be re-downloaded at specific periods every X hours/days/weeks, you may need to write your own scrapper (PHP + curl should be enough), which extracts just the text you want.

    Or, as \\.\ wisely suggest, contact site's owner asking if he can provide you with data or some kind of API.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center

"

"

X vBulletin 4.2.2 Debug Information

  • Page Generation 0.33835 seconds
  • Memory Usage 2,887KB
  • Queries Executed 15 (?)
More Information
Template Usage (32):
  • (1)SHOWTHREAD
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_global_above_footer
  • (1)ad_global_below_navbar
  • (1)ad_global_header1
  • (1)ad_global_header2
  • (1)ad_navbar_below
  • (1)ad_showthread_firstpost_sig
  • (1)ad_showthread_firstpost_start
  • (1)ad_thread_first_post_content
  • (1)ad_thread_last_post_content
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)headinclude_bottom
  • (6)memberaction_dropdown
  • (1)navbar
  • (4)navbar_link
  • (1)navbar_moderation
  • (1)navbar_noticebit
  • (1)navbar_tabs
  • (2)option
  • (6)postbit
  • (6)postbit_onlinestatus
  • (6)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available (6):
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files (26):
  • ./showthread.php
  • ./global.php
  • ./includes/class_bootstrap.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/functions_navigation.php
  • ./includes/class_friendly_url.php
  • ./includes/class_hook.php
  • ./includes/class_bootstrap_framework.php
  • ./vb/vb.php
  • ./vb/phrase.php
  • ./includes/functions_facebook.php
  • ./includes/functions_calendar.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_notice.php
  • ./packages/vbattach/attach.php
  • ./vb/types.php
  • ./vb/cache.php
  • ./vb/cache/db.php
  • ./vb/cache/observer/db.php
  • ./vb/cache/observer.php 

Hooks Called (70):
  • init_startup
  • friendlyurl_resolve_class
  • init_startup_session_setup_start
  • database_pre_fetch_array
  • database_post_fetch_array
  • init_startup_session_setup_complete
  • global_bootstrap_init_start
  • global_bootstrap_init_complete
  • cache_permissions
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • load_show_variables
  • load_forum_show_variables
  • global_state_check
  • global_bootstrap_complete
  • global_start
  • style_fetch
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • strip_bbcode
  • friendlyurl_clean_fragment
  • friendlyurl_geturl
  • forumjump
  • cache_templates
  • cache_templates_process
  • template_register_var
  • template_render_output
  • fetch_template_start
  • fetch_template_complete
  • parse_templates
  • fetch_musername
  • notices_check_start
  • notices_noticebit
  • process_templates_complete
  • friendlyurl_redirect_canonical
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • memberaction_dropdown
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • build_navigation_data
  • build_navigation_array
  • check_navigation_permission
  • process_navigation_links_start
  • process_navigation_links_complete
  • set_navigation_menu_element
  • build_navigation_menudata
  • build_navigation_listdata
  • build_navigation_list
  • set_navigation_tab_main
  • set_navigation_tab_fallback
  • navigation_tab_complete
  • fb_like_button
  • showthread_complete
  • page_templates