Reading character by character an PDF File
Hi, i want to extract some advertising text form a pdf file, the problem is that i would show it perfectly like is shown in pdf. There are all advertices inside box thus (for example) like is shown at this picture
I´d like show all this content as i would want using css and js then. There possibility to extract fonts and text properties simultaneusly? Of course that i want do that because i want to handle all pdf content in a website
I work with php, js and mysl and remember something about c from the university. Can someone tell me how do that? I can extract all text content but i would like so much that this text appear seem that in pdf but need plain text of cource, i cant use small box´s picture by each one, i would show that with js and css with border=1 or something like that. Greeting and hopping your answer, Leonardo.
Originally Posted by Pergamino
So, basically: You want an automated way of extracting the text and formatting information, and then recreating the look - as closely as possible - using HTML and CSS?
There are ways of extracting the text manually: see http://desktoppub.about.com/od/pdf/f/pdfextraction.htm or http://labnol.blogspot.co.uk/2006/09...documents.html -- both of which I found by Googling "extracting text from PDF" -- but as far as I know there's no easy way of doing so automatically.
You want to read character by character an PDF File.
More precisely, reading the PDF into a character recognition (OCR) software, if your PDF is an all graphics file (indicated by the impossibility of highlighting text).
The results of course depend on your OCR software and the settings you apply before recognition.
In any case, the procedure is likely to involve a lot of work and only pays off if the text contains lots of repetitions and you can use a CAT software afterwards. Otherwise, just use a printout and type the translation into Word.
I suggest that you'd better choose a suitable tool to help you.Whenever I have such a similar need, I use this professional PDF sdk.Then you will understand: file conversion is a convenience. It saves your having to retype document in Word from scratch. A converted file cannot be used as a final document. You will save yourself untold hours of frustration if you get your brain around this simple fact.
Download the free trial of Yiigo and try it. Let me know if this helped and were able to do this with success.
Apart from Adobe reader, there are still many easy to use pdf reader that owns more features and benefits than Adobe reader. Using pdf reading tool like this, you can quickly extract text or image from pdf file and do some post-processing to these images or texts as well we the pdf file.
Users Browsing this Thread
There are currently 1 users browsing this thread. (0 members and 1 guests)
Tags for this Thread