www.webdeveloper.com
+ Reply to Thread
Results 1 to 4 of 4

Hybrid View

  1. #1
    Join Date
    Jul 2012
    Posts
    11

    Reading character by character an PDF File

    Hi, i want to extract some advertising text form a pdf file, the problem is that i would show it perfectly like is shown in pdf. There are all advertices inside box thus (for example) like is shown at this picture

    pdf_exmple.jpg

    I´d like show all this content as i would want using css and js then. There possibility to extract fonts and text properties simultaneusly? Of course that i want do that because i want to handle all pdf content in a website

    I work with php, js and mysl and remember something about c from the university. Can someone tell me how do that? I can extract all text content but i would like so much that this text appear seem that in pdf but need plain text of cource, i cant use small box´s picture by each one, i would show that with js and css with border=1 or something like that. Greeting and hopping your answer, Leonardo.

  2. #2
    Join Date
    Mar 2009
    Posts
    452
    Quote Originally Posted by Pergamino View Post
    this text appear seem that in pdf but need plain text of cource

  3. #3
    Join Date
    May 2012
    Location
    St. Helens, UK
    Posts
    74
    So, basically: You want an automated way of extracting the text and formatting information, and then recreating the look - as closely as possible - using HTML and CSS?

    There are ways of extracting the text manually: see http://desktoppub.about.com/od/pdf/f/pdfextraction.htm or http://labnol.blogspot.co.uk/2006/09...documents.html -- both of which I found by Googling "extracting text from PDF" -- but as far as I know there's no easy way of doing so automatically.
    Crisialu Web Design
    Daihuws's Blog

    "There is no human problem which could not be solved if people would simply do as I advise."

  4. #4
    Join Date
    Apr 2013
    Posts
    5
    You want to read character by character an PDF File.
    More precisely, reading the PDF into a character recognition (OCR) software, if your PDF is an all graphics file (indicated by the impossibility of highlighting text).



    The results of course depend on your OCR software and the settings you apply before recognition.



    In any case, the procedure is likely to involve a lot of work and only pays off if the text contains lots of repetitions and you can use a CAT software afterwards. Otherwise, just use a printout and type the translation into Word.


    I suggest that you'd better choose a suitable tool to help you.Whenever I have such a similar need, I use this professional PDF sdk.Then you will understand: file conversion is a convenience. It saves your having to retype document in Word from scratch. A converted file cannot be used as a final document. You will save yourself untold hours of frustration if you get your brain around this simple fact.


    Download the free trial of Yiigo and try it. Let me know if this helped and were able to do this with success.


    Kind Regards,
    Arron

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

     

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
HTML5 Development Center



Recent Articles