www.webdeveloper.com
Results 1 to 4 of 4

Thread: help inflating PDF streams

  1. #1
    Join Date
    Jan 2004
    Posts
    13

    help inflating PDF streams

    Hi,

    I have a PDF file I need to manipulate inside a Sencha web APP.

    I need to load the file, search for specific patterns in the file (For example: numbers formatted like \d\d,\d\d\d etc.), highlight the different text paters in different colors, add some new text to it and some JavaScript functions.

    I thought to take advantage of the incremental update features of the PDF format to add these highlights. but to do that I need to be able to read the content of the file and have correct references in the xref table of the PDF.

    I read the file using an AJAX call then load the responseText in to a string so I can search, update and manipulate the text.

    The problem is that some of the objects are compressed into streams using /Filter/FlateDecode, that makes the data in that stream unreadable and the referances in the string I use to manipulate the PDF incorrect.
    I need to inflate the encrypted streams to get a simple text file I can work with.

    I tried to use zLib.js to inflate the encoded section with no success. I also tried to convert it to different encoding etc. but had no success.

    Does anyone had a code sample or can direct me to a resource which shows who to inflate a decoded PDF stream using Javascript ?

    Maybe a library which is already able to do what I need to do ?

    Thanks

    Erez

  2. #2
    Join Date
    Oct 2010
    Location
    Versailles, France
    Posts
    1,264
    I use PHP and Xpdf to read the content of pdf files with this function :

    Code:
    function pdfTxt($n){
        $o=shell_exec('pdftotext -enc UTF-8 '.$n.' pdf.txt');
        $c=file_get_contents('pdf.txt');
        return preg_replace("@\x0D\x0A\x0D\x0A\.\x0D\x0A\x0D\x0A\.\x0D\x0A\x0D\x0A[^\x0D]+\x0D\x0A\x0D\x0A[^\x0D]+\x0D\x0A\x0D\x0A[^\x0D]+\x0D\x0A\x0D\x0A@"," ",$c);
    }
    $n is the path and name of the pdf file to read.
    The preg_replace method remove some white line of this particular files...

    This update are made on a local server to read the pdf files an to publish a succinct book of the French administration (click on the date of nomination to access to the pdf files).

  3. #3
    Join Date
    Jan 2004
    Posts
    13
    Hi,

    thanks for the replay.

    I used your example to create a function that converts my PDF to a text file and extract the sections I need to manipulate.

    Do you also add sections to you PDF file and reconstruct a new PDF containing the new or updated data or are you just using the converted file to use the data ?

    Do you have an example showing who to add an incremental update to that PDf file so the new sections and data will show in the updated PDF ?

    Erez

  4. #4
    Join Date
    Oct 2010
    Location
    Versailles, France
    Posts
    1,264
    I just read the data, display them in a form (for correcting any) and store them in a text file structured as following :
    Code:
    |M.|Thierry BONNET|sous-préfet hors classe, sous-préfet de Provins|secrétaire général de la préfecture de la Guyane (classe fonctionnelle III)|20 juillet 2013|joe_20130720_0042.pdf
    |M.|Alain VALLET|ingénieur général des mines|directeur régional et interdépartemental (groupe I) de l’environnement et de l’énergie de la région Ile-de-France à compter du 1er septembre 2013|19 juillet 2013|joe_20130719_0079.pdf
    |Mme|Nathalie MARTHIEN|administratrice civile hors classe|préfète de l’Ariège|19 juillet 2013|joe_20130719_0074.pdf
    |M.|Salvador PEREZ|préfet de l’Ariège|préfet de la Charente|19 juillet 2013|joe_20130719_0073.pdf
    |Mme|Danièle POLVE-MONTMASSON|préfète de la Charente|préfète de la Manche|19 juillet 2013|joe_20130719_0072.pdf
    |M.|Pierre SIMUNEK|administrateur civil hors classe|secrétaire général des îles Wallis-et-Futuna|17 juillet 2013|joe_20130717_0081.pdf
    |Mme|Catherine WALTERSKI|administratrice civile|sous-préfète, secrétaire générale de la préfecture de Saint-Pierre-et-Miquelon|17 juillet 2013|joe_20130717_0078.pdf
    This data are enough to build the book, working on functions and locations.
    Last edited by 007Julien; 07-22-2013 at 05:15 PM.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles