www.webdeveloper.com
Results 1 to 6 of 6

Thread: help inflating PDF streams

  1. #1
    Join Date
    Jan 2004
    Posts
    13

    help inflating PDF streams

    Hi,

    I have a PDF file I need to manipulate inside a Sencha web APP.

    I need to load the file, search for specific patterns in the file (For example: numbers formatted like \d\d,\d\d\d etc.), highlight the different text paters in different colors, add some new text to it and some JavaScript functions.

    I thought to take advantage of the incremental update features of the PDF format to add these highlights. but to do that I need to be able to read the content of the file and have correct references in the xref table of the PDF.

    I read the file using an AJAX call then load the responseText in to a string so I can search, update and manipulate the text.

    The problem is that some of the objects are compressed into streams using /Filter/FlateDecode, that makes the data in that stream unreadable and the referances in the string I use to manipulate the PDF incorrect.
    I need to inflate the encrypted streams to get a simple text file I can work with.

    I tried to use zLib.js to inflate the encoded section with no success. I also tried to convert it to different encoding etc. but had no success.

    Does anyone had a code sample or can direct me to a resource which shows who to inflate a decoded PDF stream using Javascript ?

    Maybe a library which is already able to do what I need to do ?

    Thanks

    Erez

  2. #2
    Join Date
    Oct 2010
    Location
    Versailles, France
    Posts
    1,290
    I use PHP and Xpdf to read the content of pdf files with this function :

    Code:
    function pdfTxt($n){
        $o=shell_exec('pdftotext -enc UTF-8 '.$n.' pdf.txt');
        $c=file_get_contents('pdf.txt');
        return preg_replace("@\x0D\x0A\x0D\x0A\.\x0D\x0A\x0D\x0A\.\x0D\x0A\x0D\x0A[^\x0D]+\x0D\x0A\x0D\x0A[^\x0D]+\x0D\x0A\x0D\x0A[^\x0D]+\x0D\x0A\x0D\x0A@"," ",$c);
    }
    $n is the path and name of the pdf file to read.
    The preg_replace method remove some white line of this particular files...

    This update are made on a local server to read the pdf files an to publish a succinct book of the French administration (click on the date of nomination to access to the pdf files).

  3. #3
    Join Date
    Jan 2004
    Posts
    13
    Hi,

    thanks for the replay.

    I used your example to create a function that converts my PDF to a text file and extract the sections I need to manipulate.

    Do you also add sections to you PDF file and reconstruct a new PDF containing the new or updated data or are you just using the converted file to use the data ?

    Do you have an example showing who to add an incremental update to that PDf file so the new sections and data will show in the updated PDF ?

    Erez

  4. #4
    Join Date
    Oct 2010
    Location
    Versailles, France
    Posts
    1,290
    I just read the data, display them in a form (for correcting any) and store them in a text file structured as following :
    Code:
    |M.|Thierry BONNET|sous-préfet hors classe, sous-préfet de Provins|secrétaire général de la préfecture de la Guyane (classe fonctionnelle III)|20 juillet 2013|joe_20130720_0042.pdf
    |M.|Alain VALLET|ingénieur général des mines|directeur régional et interdépartemental (groupe I) de l’environnement et de l’énergie de la région Ile-de-France à compter du 1er septembre 2013|19 juillet 2013|joe_20130719_0079.pdf
    |Mme|Nathalie MARTHIEN|administratrice civile hors classe|préfète de l’Ariège|19 juillet 2013|joe_20130719_0074.pdf
    |M.|Salvador PEREZ|préfet de l’Ariège|préfet de la Charente|19 juillet 2013|joe_20130719_0073.pdf
    |Mme|Danièle POLVE-MONTMASSON|préfète de la Charente|préfète de la Manche|19 juillet 2013|joe_20130719_0072.pdf
    |M.|Pierre SIMUNEK|administrateur civil hors classe|secrétaire général des îles Wallis-et-Futuna|17 juillet 2013|joe_20130717_0081.pdf
    |Mme|Catherine WALTERSKI|administratrice civile|sous-préfète, secrétaire générale de la préfecture de Saint-Pierre-et-Miquelon|17 juillet 2013|joe_20130717_0078.pdf
    This data are enough to build the book, working on functions and locations.
    Last edited by 007Julien; 07-22-2013 at 05:15 PM.

  5. #5
    Join Date
    Sep 2015
    Posts
    22
    Quote Originally Posted by ehboym View Post
    Hi,

    thanks for the replay.

    I used your example to create a function that
    converts my PDF to a text file and extract the sections I need to manipulate.

    Do you also add sections to you PDF file and reconstruct a new PDF containing the new or updated data or are you just using the converted file to use the data ?

    Do you have an example showing who to add an incremental update to that PDf file so the new sections and data will show in the updated PDF ?

    Erez
    Hi, Erez.
    Have you checked the code abovementioned? How was it?



    Best regards,
    Pan

  6. #6
    Hi,

    I have a PDF file I need to manipulate inside a Sencha web APP.

    I need to load the file, search for specific patterns in the file (For example: numbers formatted like \d\d,\d\d\d etc.), highlight the different text paters in different colors, add some new text to it and some JavaScript functions.

    I thought to take advantage of the incremental update features of the PDF format to add these highlights. but to do that I need to be able to read the content of the file and have correct references in the xref table of the PDF.

    I read the file using an AJAX call then load the responseText in to a string so I can search, update and manipulate the text.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center

"

"

X vBulletin 4.2.2 Debug Information

  • Page Generation 0.28429 seconds
  • Memory Usage 2,897KB
  • Queries Executed 15 (?)
More Information
Template Usage (34):
  • (1)SHOWTHREAD
  • (1)ad_footer_end
  • (1)ad_footer_start
  • (1)ad_global_above_footer
  • (1)ad_global_below_navbar
  • (1)ad_global_header1
  • (1)ad_global_header2
  • (1)ad_navbar_below
  • (1)ad_showthread_firstpost_sig
  • (1)ad_showthread_firstpost_start
  • (1)ad_thread_first_post_content
  • (1)ad_thread_last_post_content
  • (2)bbcode_code
  • (1)bbcode_quote
  • (1)footer
  • (1)forumjump
  • (1)forumrules
  • (1)gobutton
  • (1)header
  • (1)headinclude
  • (1)headinclude_bottom
  • (6)memberaction_dropdown
  • (1)navbar
  • (4)navbar_link
  • (1)navbar_moderation
  • (1)navbar_noticebit
  • (1)navbar_tabs
  • (2)option
  • (6)postbit
  • (6)postbit_onlinestatus
  • (6)postbit_wrapper
  • (1)spacer_close
  • (1)spacer_open
  • (1)tagbit_wrapper 

Phrase Groups Available (6):
  • global
  • inlinemod
  • postbit
  • posting
  • reputationlevel
  • showthread
Included Files (26):
  • ./showthread.php
  • ./global.php
  • ./includes/class_bootstrap.php
  • ./includes/init.php
  • ./includes/class_core.php
  • ./includes/config.php
  • ./includes/functions.php
  • ./includes/functions_navigation.php
  • ./includes/class_friendly_url.php
  • ./includes/class_hook.php
  • ./includes/class_bootstrap_framework.php
  • ./vb/vb.php
  • ./vb/phrase.php
  • ./includes/functions_facebook.php
  • ./includes/functions_calendar.php
  • ./includes/functions_bigthree.php
  • ./includes/class_postbit.php
  • ./includes/class_bbcode.php
  • ./includes/functions_reputation.php
  • ./includes/functions_notice.php
  • ./packages/vbattach/attach.php
  • ./vb/types.php
  • ./vb/cache.php
  • ./vb/cache/db.php
  • ./vb/cache/observer/db.php
  • ./vb/cache/observer.php 

Hooks Called (73):
  • init_startup
  • friendlyurl_resolve_class
  • init_startup_session_setup_start
  • database_pre_fetch_array
  • database_post_fetch_array
  • init_startup_session_setup_complete
  • global_bootstrap_init_start
  • global_bootstrap_init_complete
  • cache_permissions
  • fetch_postinfo_query
  • fetch_postinfo
  • fetch_threadinfo_query
  • fetch_threadinfo
  • fetch_foruminfo
  • load_show_variables
  • load_forum_show_variables
  • global_state_check
  • global_bootstrap_complete
  • global_start
  • style_fetch
  • global_setup_complete
  • showthread_start
  • showthread_getinfo
  • strip_bbcode
  • friendlyurl_clean_fragment
  • friendlyurl_geturl
  • forumjump
  • cache_templates
  • cache_templates_process
  • template_register_var
  • template_render_output
  • fetch_template_start
  • fetch_template_complete
  • parse_templates
  • fetch_musername
  • notices_check_start
  • notices_noticebit
  • process_templates_complete
  • friendlyurl_redirect_canonical
  • showthread_post_start
  • showthread_query_postids
  • showthread_query
  • bbcode_fetch_tags
  • bbcode_create
  • showthread_postbit_create
  • postbit_factory
  • postbit_display_start
  • postbit_imicons
  • bbcode_parse_start
  • bbcode_parse_complete_precache
  • bbcode_parse_complete
  • postbit_display_complete
  • memberaction_dropdown
  • tag_fetchbit
  • tag_fetchbit_complete
  • forumrules
  • navbits
  • navbits_complete
  • build_navigation_data
  • build_navigation_array
  • check_navigation_permission
  • process_navigation_links_start
  • process_navigation_links_complete
  • set_navigation_menu_element
  • build_navigation_menudata
  • build_navigation_listdata
  • build_navigation_list
  • set_navigation_tab_main
  • set_navigation_tab_fallback
  • navigation_tab_complete
  • fb_like_button
  • showthread_complete
  • page_templates