Click to See Complete Forum and Search --> : Pulling attachments from Outlook.msg


BrainDonor
11-16-2011, 02:05 PM
Let me start off by saying that I know nearly nothing about perl.

I found a perl script that will parse an Outlook msg e-mail and convert it to Mime, which works good for further parsing and database usage using PHP. The e-mails are not on the Exchange server...I am copying them from my Outlook client to a directory on our server. From there I run the conversion script.

The script ( http://wiki.sabayon.org/index.php?title=HOWTO:_Read_Microsoft_Outlook_.MSG_files_in_Linux ) is working well and I am on my way. However, I have one stumbling block. I need to be able to yank any and all attachments from these e-mails and save them to a directory on our server. Attached is the Message.pm script that seems to do the bulk of the work in this conversion. Can someone give me push as to how I may be able accomplish this? Many, many thanks in advance!

Tom

Sixtease
11-17-2011, 02:16 AM
So, if you do to_email_mime, are the attachments there in the output as MIME parts or are they completely missing?

If they are missing, my tip would be try and contact the module author. Handling attachments would be a good feature to have in the module. You can report a bug for the module here: https://rt.cpan.org/Public/Dist/Display.html?Name=Email-Outlook-Message
and you can write an email to the author (his email is in the source file you included).

However, if the attachments are in the output mime file, then of course, don't bother the author. Instead, post an example output file here and let's see how to get them attachments out.

BrainDonor
11-17-2011, 10:41 AM
It looks like they are there. I have a test message with two attachments. I see this in the mime output file. Each of these is followed by several lines of unreadable characters.

Attachment 1

--1321467930.cBC1B241.16468
Content-ID: <1321467930.1adEEBc03.16468@flserv>
Content-Type: application/octet-stream; name="test.xls"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="test.xls"


Attachment 2

--1321467930.cBC1B241.16468
Content-ID:
Content-Type: application/octet-stream; name="test2.xls"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="test2.xls"


Thanks for your help!
Tom

Sixtease
11-18-2011, 11:28 AM
Then it looks like Email::MIME (http://search.cpan.org/perldoc?Email::MIME) is what you need. If I understand the documentation correctly, then you should do something like
use Email::MIME;
my $mime = Email::MIME->new($message);
my @parts = $mime->subparts;
PART:
for my $part (@parts) {
my $filename = $part->filename(1);
next PART if not $filename;
open my $filehandle, '>', $filename or do {
warn ("Could not open $filename for writing, skipping");
next PART;
}
print {$filehandle} $part->body;
close $filehandle;
}
This assumes that you have Email::MIME installed and that the email in question is in the $message variable. Also this code is totally untested, just a lead for you to catch on.
Hope this helps.

BrainDonor
11-18-2011, 12:33 PM
Thanks for the reply. I tried the code you posted and received an error. Not sure what to do here. Thanks again for your help...and patience. :)

Scalar found where operator expected at /test/snag_attachments.pl line 13, near "} $part" (Missing operator before $part?)
syntax error at /test/snag_attachments.pl line 13, near "print"
syntax error at /test/snag_attachments.pl line 15, near "}"

Sixtease
11-19-2011, 04:32 AM
What perl do you use? It looks like it has problems with the line
print {$filehandle} $part->body
which should be just fine in modern perls though...

BrainDonor
11-19-2011, 07:39 AM
We are essentially WAMP...there the M is MSSQL. The perl we're using is part of the MKS toolkit. perl -v provides this: "This is perl, v5.8.5 built for MSWin32-x86-multi-thread"

Sixtease
11-20-2011, 03:36 AM
Yes, of course I'm stupid. There's a semicolon missing after the do block. :-)

use Email::MIME;
my $mime = Email::MIME->new($message);
my @parts = $mime->subparts;
PART:
for my $part (@parts) {
my $filename = $part->filename(1);
next PART if not $filename;
open my $filehandle, '>', $filename or do {
warn ("Could not open $filename for writing, skipping");
next PART;
};
print {$filehandle} $part->body;
close $filehandle;
}

BrainDonor
11-20-2011, 06:50 PM
Thanks! Seems to be working now...unless I'm doing something wrong. To be certain, how do I pass the path to the file I want to extract from into the script?

like this?

perl -w /path/to/perl_script.pl /path/to/file.mime

?

Sixtease
11-21-2011, 05:09 AM
Yes, in that case, the filename is accessible in the @ARGV array. The complete script could then look like this:
#!/usr/bin/perl
use strict;
use warnings;
use Email::MIME;

my $message_filename = shift @ARGV;
open my $message_filehandle, '<', $message_filename
or die "Couldn't open file '$message_filename'";
my $message = do {
local $/;
<$message_filehandle>
};
close $message_filehandle;

my $mime = Email::MIME->new($message);
my @parts = $mime->subparts;
PART:
for my $part (@parts) {
my $filename = $part->filename(1);
next PART if not $filename;
open my $filehandle, '>', $filename or do {
warn ("Could not open $filename for writing, skipping");
next PART;
};
print {$filehandle} $part->body;
close $filehandle;
}

BrainDonor
11-21-2011, 07:33 AM
Thanks!...I'll try that. :)

BrainDonor
11-21-2011, 09:04 AM
The script extracts the files, thanks! However, when I went to open one of the extracted files "text.xls," it said that it was corrupted. Excel tried to fix, but couldn't. Now, I wonder if the original script that converted the msg to mime isn't maintaining the integrity of the attachments?

Sixtease
11-21-2011, 09:40 AM
Hmm, you should try it on a human-readable file, like a text attachment, so you can check yourself whether the files stay unchanged or if perhaps there's a systematic change...

BrainDonor
11-21-2011, 10:23 PM
will do.