Results 1 to 5 of 5

Thread: Reading character by character an PDF File

  1. #1
    Join Date
    Jul 2012

    Reading character by character an PDF File

    Hi, i want to extract some advertising text form a pdf file, the problem is that i would show it perfectly like is shown in pdf. There are all advertices inside box thus (for example) like is shown at this picture


    Id like show all this content as i would want using css and js then. There possibility to extract fonts and text properties simultaneusly? Of course that i want do that because i want to handle all pdf content in a website

    I work with php, js and mysl and remember something about c from the university. Can someone tell me how do that? I can extract all text content but i would like so much that this text appear seem that in pdf but need plain text of cource, i cant use small boxs picture by each one, i would show that with js and css with border=1 or something like that. Greeting and hopping your answer, Leonardo.

  2. #2
    Join Date
    Mar 2009
    Quote Originally Posted by Pergamino View Post
    this text appear seem that in pdf but need plain text of cource

  3. #3
    Join Date
    May 2012
    St. Helens, UK
    So, basically: You want an automated way of extracting the text and formatting information, and then recreating the look - as closely as possible - using HTML and CSS?

    There are ways of extracting the text manually: see http://desktoppub.about.com/od/pdf/f/pdfextraction.htm or http://labnol.blogspot.co.uk/2006/09...documents.html -- both of which I found by Googling "extracting text from PDF" -- but as far as I know there's no easy way of doing so automatically.
    Crisialu Web Design
    Daihuws's Blog

    "There is no human problem which could not be solved if people would simply do as I advise."

  4. #4
    Join Date
    Apr 2013
    You want to read character by character an PDF File.
    More precisely, reading the PDF into a character recognition (OCR) software, if your PDF is an all graphics file (indicated by the impossibility of highlighting text).

    The results of course depend on your OCR software and the settings you apply before recognition.

    In any case, the procedure is likely to involve a lot of work and only pays off if the text contains lots of repetitions and you can use a CAT software afterwards. Otherwise, just use a printout and type the translation into Word.

    I suggest that you'd better choose a suitable tool to help you.Whenever I have such a similar need, I use this professional PDF sdk.Then you will understand: file conversion is a convenience. It saves your having to retype document in Word from scratch. A converted file cannot be used as a final document. You will save yourself untold hours of frustration if you get your brain around this simple fact.

    Download the free trial of Yiigo and try it. Let me know if this helped and were able to do this with success.

    Kind Regards,

  5. #5
    Join Date
    Jun 2013
    Apart from Adobe reader, there are still many easy to use pdf reader that owns more features and benefits than Adobe reader. Using pdf reading tool like this, you can quickly extract text or image from pdf file and do some post-processing to these images or texts as well we the pdf file.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
HTML5 Development Center



X vBulletin 4.2.2 Debug Information

  • Page Generation 0.11299 seconds
  • Memory Usage 2,889KB
  • Queries Executed 16 (?)
More Information
Template Usage (33):
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_global_above_footer
  • (1)ad_global_below_navbar
  • (1)ad_global_header1
  • (1)ad_global_header2
  • (1)ad_navbar_below
  • (1)ad_showthread_firstpost_sig
  • (1)ad_showthread_firstpost_start
  • (1)ad_thread_first_post_content
  • (1)ad_thread_last_post_content
  • (1)bbcode_quote
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)headinclude_bottom
  • (5)memberaction_dropdown
  • (1)navbar
  • (4)navbar_link
  • (1)navbar_moderation
  • (1)navbar_noticebit
  • (1)navbar_tabs
  • (2)option
  • (5)postbit
  • (5)postbit_onlinestatus
  • (5)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available (6):
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files (26):
  • ./showthread.php
  • ./global.php
  • ./includes/class_bootstrap.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/functions_navigation.php
  • ./includes/class_friendly_url.php
  • ./includes/class_hook.php
  • ./includes/class_bootstrap_framework.php
  • ./vb/vb.php
  • ./vb/phrase.php
  • ./includes/functions_facebook.php
  • ./includes/functions_calendar.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_notice.php
  • ./packages/vbattach/attach.php
  • ./vb/types.php
  • ./vb/cache.php
  • ./vb/cache/db.php
  • ./vb/cache/observer/db.php
  • ./vb/cache/observer.php 

Hooks Called (74):
  • init_startup
  • friendlyurl_resolve_class
  • init_startup_session_setup_start
  • database_pre_fetch_array
  • database_post_fetch_array
  • init_startup_session_setup_complete
  • global_bootstrap_init_start
  • global_bootstrap_init_complete
  • cache_permissions
  • fetch_postinfo_query
  • fetch_postinfo
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • load_show_variables
  • load_forum_show_variables
  • global_state_check
  • global_bootstrap_complete
  • global_start
  • style_fetch
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • strip_bbcode
  • friendlyurl_clean_fragment
  • friendlyurl_geturl
  • forumjump
  • cache_templates
  • cache_templates_process
  • template_register_var
  • template_render_output
  • fetch_template_start
  • fetch_template_complete
  • parse_templates
  • fetch_musername
  • notices_check_start
  • notices_noticebit
  • process_templates_complete
  • friendlyurl_redirect_canonical
  • showthread_post_start
  • showthread_query_postids
  • fetch_postattach_query
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • memberaction_dropdown
  • tag_fetchbit
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • build_navigation_data
  • build_navigation_array
  • check_navigation_permission
  • process_navigation_links_start
  • process_navigation_links_complete
  • set_navigation_menu_element
  • build_navigation_menudata
  • build_navigation_listdata
  • build_navigation_list
  • set_navigation_tab_main
  • set_navigation_tab_fallback
  • navigation_tab_complete
  • fb_like_button
  • showthread_complete
  • page_templates