@PACKAGE@ Utilities - Version @VERSION@PackagesThe various source and binary packages are available at http://www.five-ten-sg.com/@PACKAGE@/packages/.
The most recent documentation is available at http://www.five-ten-sg.com/@PACKAGE@/.
The most recent developer documentation for the shared library is available at http://www.five-ten-sg.com/@PACKAGE@/devel/.
A Mercurial source
code repository for this project is available at http://hg.five-ten-sg.com/@PACKAGE@/.
This version can now convert both 32 bit Outlook files (pre 2003), and the
64 bit Outlook 2003 pst files. Utilities are supplied to convert email messages
to both mbox and MH mailbox formats, and to DII load file format for use with
many of the CT Summation products.
Contacts can be converted to a simple list, to vcard format, or to ldif format
for import to an LDAP server.
The libpff project
has some excellent documentation of the pst file format.
2016-08-29readpst1readpst @VERSION@readpstconvert PST (MS Outlook Personal Folders) files to mbox and other formatsSynopsisreadpstpstfileDescriptionreadpst is a program that can read an Outlook
PST (Personal Folders) file and convert it into an mbox file, a format
suitable for KMail, a recursive mbox structure, or separate emails.
Options-C default-charset
Set the character set to be used for items with an unspecified character set.
-D
Include deleted items in the output.
-M
Output messages in MH (rfc822) format as separate files. This will create
folders as named in the PST file, and will put each email together with
any attachments into its own file. These files will be numbered from 1
to n with no leading zeros. This format has no from quoting.
-S
Output messages into separate files. This will create folders as
named in the PST file, and will put each email in its own file. These
files will be numbered from 1 to n with no leading zeros. Attachments
will also be saved in the same folder as the email message. The
attachments for message $m are saved as $m-$name where $name is (the
original name of the attachment, or 'attach$n' if the attachment had
no name), where $n is another sequential index with no leading zeros.
This format has no from quoting.
-V
Show program version and exit.
-a attachment-extension-list
Set the list of acceptable attachment extensions. Any attachment that
does not have an extension on this list will be discarded. All attachments
are acceptable if the list is empty, or this option is not specified.
-b
Do not save the attachments for the RTF format of the email body.
-c format
Set the Contact output mode. Use -cv for vcard format or -cl for an email list.
-d debug-file
Specify name of debug log file. The log file is now an ascii file,
instead of the binary file used in previous versions.
-e
Same as the M option, but each output file will include an extension
from (.eml, .ics, .vcf). This format has no from quoting.
-h
Show summary of options and exit.
-j jobs
Specifies the maximum number of parallel jobs. Specify 0 to suppress
running parallel jobs. Folders may be processed in parallel. Output
formats that place each mail message in a separate file (-M, -S, -e)
may process the contents of individual folders in parallel.
-k
Changes the output format to KMail. This format uses mboxrd from quoting.
-m
Same as the e option, but write .msg files also
-o output-directory
Specifies the output directory. The directory must already exist, and
is entered after the PST file is opened, but before any processing of
files commences.
-q
Changes to silent mode. No feedback is printed to the screen, except
for error messages.
-r
Changes the output format to Recursive. This will create folders
as named in the PST file, and will put all emails in a file called
"mbox" inside each folder. Appointments go into a file called
"calendar", address book entries go into a file called "contacts",
and journal entries go into a file called "journal". These files
are then compatible with all mbox-compatible email clients. This
format uses mboxrd from quoting.
-t output-type-codes
Specifies the item types that are processed. The argument is a sequence
of single letters from (e,a,j,c) for (email, appointment, journal, contact)
types. The default is to process all item types.
-u
Sets Thunderbird mode, a submode of recursive mode. This causes
two extra .type and .size meta files to be created. This format uses
mboxrd from quoting.
-w
Overwrite any previous output files. Beware: When used with the -S
switch, this will remove all files from the target folder before
writing. This is to keep the count of emails and attachments correct.
-8
Output bodies in UTF-8, rather than original encoding, if a UTF-8
version is available.
From Quoting
Output formats that place each mail message in a separate file (-M, -S, -e, -m)
don't do any from quoting.
Output formats that place multiple email messages in a single file (-k, -r, -u)
now use mboxrd from quoting rules.
If none of those switches are specified, the default output format uses mboxrd
from quoting rules, since it produces multiple email messages in a single file.
Earlier versions used mboxo from quoting rules for all output formats.
Author
This manual page was originally written by Dave Smith
<dave.s@earthcorp.com>, and updated by Joe Nahmias <joe@nahmias.net>
for the Debian GNU/Linux system (but may be used by others). It was
subsequently updated by Brad Hards <bradh@frogmouth.net>, and converted to
xml format by Carl Byington <carl@five-ten-sg.com>.
Copyright
Copyright (C) 2002 by David Smith <dave.s@earthcorp.com>.
XML version Copyright (C) 2008 by 510 Software Group <carl@five-ten-sg.com>.
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2, or (at your option) any
later version.
You should have received a copy of the GNU General Public License along
with this program; see the file COPYING. If not, please write to the
Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
Version
@VERSION@
2016-08-29lspst1lspst @VERSION@lspstlist PST (MS Outlook Personal Folders) file dataSynopsislspstpstfileOptions-V
Show program version and exit.
-d debug-file
Specify name of debug log file. The log file is now an ascii file,
instead of the binary file used in previous versions.
-h
Show summary of options and exit.
Descriptionlspst is a program that can read an Outlook
PST (Personal Folders) file and produce a simple listing of the
data (contacts, email subjects, etc).
Author
lspst was written by Joe Nahmias <joe@nahmias.net> based on readpst.
This man page was written by 510 Software Group <carl@five-ten-sg.com>.
Copyright
Copyright (C) 2004 by Joe Nahmias <joe@nahmias.net>.
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2, or (at your option) any
later version.
You should have received a copy of the GNU General Public License along
with this program; see the file COPYING. If not, please write to the
Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
Version
@VERSION@
2016-08-29pst2ldif1pst2ldif @VERSION@pst2ldifextract contacts from a MS Outlook .pst file in .ldif formatSynopsispst2ldifpstfilenameOptions-V
Show program version. Subsequent options are then ignored.
-b ldap-base
Sets the ldap base value used in the dn records. You probably want to
use something like "o=organization, c=US".
-c class
Sets the objectClass values for the contact items. This class needs to be
defined in the schema used by your LDAP server, and at a minimum it must
contain the ldap attributes given below. This option may be specified
multiple times to generate entries with multiple object classes.
-d debug-file
Specify name of debug log file. The log file is now an ascii file,
instead of the binary file used in previous versions.
-l extra-line
Specify an extra line to be added to each ldap entry. This
option may be specified multiple times to add multiple lines
to each ldap entry.
-o
Use the old ldap schema, rather than the default new ldap schema.
The old schema generates multiple postalAddress attributes for
a single entry. The new schema generates a single postalAddress
(and homePostalAddress when available) attribute with $ delimiters
as specified in RFC4517. Using the old schema also generates two
extra leading entries, one for "dn:ldap base", and one for
"dn: cn=root, ldap base".
-h
Show summary of options. Subsequent options are then ignored.
Descriptionpst2ldif
reads the contact information from a MS Outlook .pst file
and produces a .ldif file that may be used to import those contacts
into an LDAP database. The following ldap attributes are generated
for the old ldap schema:
cn givenName sn personalTitle company mail postalAddress l st postalCode c homePhone telephoneNumber facsimileTelephoneNumber mobile description
The following attributes are generated for the new ldap schema:
cn givenName sn title o mail postalAddress homePostalAddress l st postalCode c homePhone telephoneNumber facsimileTelephoneNumber mobile description labeledURI Copyright
Copyright (C) 2008 by 510 Software Group <carl@five-ten-sg.com>
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2, or (at your option) any
later version.
You should have received a copy of the GNU General Public License along
with this program; see the file COPYING. If not, please write to the
Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
Version
@VERSION@
2016-08-29pst2dii1pst2dii @VERSION@pst2diiextract email messages from a MS Outlook .pst file in DII load formatSynopsispst2dii-f ttf-font-filepstfilenameOptions-B bates-prefix
Sets the bates prefix string. The bates sequence number is appended to
this string, and printed on each page.
-O dii-output-file
Name of the output DII load file.
-V
Show program version. Subsequent options are then ignored.
-b bates-number
Starting bates sequence number. The default is zero.
-c bates-color
Font color for the bates stamp on each page, specified as 6 hex digits
as rrggbb values. The default is ff0000 for bright red.
-d debug-file
Specify name of debug log file. The log file is now an ascii file,
instead of the binary file used in previous versions.
-f ttf-font-file
Specify name of a true type font file. This should be a fixed pitch font.
-h
Show summary of options. Subsequent options are then ignored.
-o output-directory
Specifies the output directory. The directory must already exist.
Descriptionpst2dii
reads the email messages from a MS Outlook .pst file
and produces a DII load file that may be used to import message
summaries into a Summation DII system. The DII output file contains
references to the image and attachment files in the output directory.
Copyright
Copyright (C) 2008 by 510 Software Group <carl@five-ten-sg.com>
This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2, or (at your option) any
later version.
You should have received a copy of the GNU General Public License along
with this program; see the file COPYING. If not, please write to the
Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
Version
@VERSION@
2016-08-29outlook.pst5outlook.pstformat of MS Outlook .pst fileSynopsisoutlook.pstOverview
Low level or primitive items in a .pst file are identified by an I_ID
value. Higher level or composite items in a .pst file are identified by
a D_ID value.
There are two separate b-trees indexed by these I_ID and D_ID values.
Starting with Outlook 2003, the file format changed from one with 32
bit pointers, to one with 64 bit pointers. We describe both formats
here.
32 bit File Header
The 32 bit file header is located at offset 0 in the .pst file.
We only support index types 0x0e, 0x0f, 0x15, and 0x17, and encryption
types 0x00, 0x01 and 0x02. Index type 0x0e is the older 32 bit Outlook
format. Index type 0x0f seems to be rare, and so far the data seems
to be identical to that in type 0x0e files. Index type 0x17 is the
newer 64 bit Outlook format. Index type 0x15 seems to be rare, and
according to the libpff project should have the same format as type
0x17 files. It was found in a 64-bit pst file created by Visual
Recovery. It may be that index types less than 0x10 are 32 bit, and
index types greater than or equal to 0x10 are 64 bit, and the low order
four bits of the index type is some subtype or minor version number.
Encryption type 0x00 is no encryption, type 0x01 is
"compressible" encryption which is a simple substitution cipher, and
type 0x02 is "strong" encryption, which is a simple three rotor Enigma
cipher from WWII.
offsetIndex1 is the file offset of the root of the
index1 b-tree, which contains (I_ID, offset, size, unknown) tuples
for each item in the file. backPointer1 is the value that should
appear in the parent pointer of that root node.
offsetIndex2 is the file offset of the root of the
index2 b-tree, which contains (D_ID, DESC-I_ID, TREE-I_ID, PARENT-D_ID)
tuples for each item in the file. backPointer2 is the value that should
appear in the parent pointer of that root node.
64 bit File Header
The 64 bit file header is located at offset 0 in the .pst file.
32 bit Index 1 Node
The 32 bit index1 b-tree nodes are 512 byte blocks with the
following format.
The itemCount specifies the number of 12 byte records that
are active. The nodeLevel is non-zero for this style of nodes.
The leaf nodes have a different format. The backPointer must
match the backPointer from the triple that pointed to this node.
Each item in this node is a triple of (I_ID, backPointer, offset)
where the offset points to the next deeper node in the tree, the
backPointer value must match the backPointer in that deeper node,
and I_ID is the lowest I_ID value in the subtree.
64 bit Index 1 Node
The 64 bit index1 b-tree nodes are 512 byte blocks with the
following format.
The itemCount specifies the number of 24 byte records that
are active. The nodeLevel is non-zero for this style of nodes.
The leaf nodes have a different format. The backPointer must
match the backPointer from the triple that pointed to this node.
Each item in this node is a triple of (I_ID, backPointer, offset)
where the offset points to the next deeper node in the tree, the
backPointer value must match the backPointer in that deeper node,
and I_ID is the lowest I_ID value in the subtree.
32 bit Index 1 Leaf Node
The 32 bit index1 b-tree leaf nodes are 512 byte blocks with the
following format.
The itemCount specifies the number of 12 byte records that
are active. The nodeLevel is zero for these leaf nodes.
The backPointer must match the backPointer from the triple
that pointed to this node.
Each item in this node is a tuple of (I_ID, offset, size, unknown)
The two low order bits of the I_ID value seem to be flags. I have
never seen a case with bit zero set. Bit one indicates that the
item is not encrypted. Note that references
to these I_ID values elsewhere may have the low order bit set (and
I don't know what that means), but when we do the search in this
tree we need to clear that bit so that we can find the correct item.
64 bit Index 1 Leaf Node
The 64 bit index1 b-tree leaf nodes are 512 byte blocks with the
following format.
The itemCount specifies the number of 24 byte records that
are active. The nodeLevel is zero for these leaf nodes.
The backPointer must match the backPointer from the triple
that pointed to this node.
Each item in this node is a tuple of (I_ID, offset, size, unknown)
The two low order bits of the I_ID value seem to be flags. I have
never seen a case with bit zero set. Bit one indicates that the
item is not encrypted. Note that references
to these I_ID values elsewhere may have the low order bit set (and
I don't know what that means), but when we do the search in this
tree we need to clear that bit so that we can find the correct item.
32 bit Index 2 Node
The 32 bit index2 b-tree nodes are 512 byte blocks with the
following format.
The itemCount specifies the number of 12 byte records that
are active. The nodeLevel is non-zero for this style of nodes.
The leaf nodes have a different format. The backPointer must
match the backPointer from the triple that pointed to this node.
Each item in this node is a triple of (D_ID, backPointer, offset)
where the offset points to the next deeper node in the tree, the
backPointer value must match the backPointer in that deeper node,
and D_ID is the lowest D_ID value in the subtree.
64 bit Index 2 Node
The 64 bit index2 b-tree nodes are 512 byte blocks with the
following format.
The itemCount specifies the number of 24 byte records that
are active. The nodeLevel is non-zero for this style of nodes.
The leaf nodes have a different format. The backPointer must
match the backPointer from the triple that pointed to this node.
Each item in this node is a triple of (D_ID, backPointer, offset)
where the offset points to the next deeper node in the tree, the
backPointer value must match the backPointer in that deeper node,
and D_ID is the lowest D_ID value in the subtree.
32 bit Index 2 Leaf Node
The 32 bit index2 b-tree leaf nodes are 512 byte blocks with the
following format.
The itemCount specifies the number of 16 byte records that
are active. The nodeLevel is zero for these leaf nodes.
The backPointer must match the backPointer from the triple
that pointed to this node.
Each item in this node is a tuple of (D_ID, DESC-I_ID, TREE-I_ID,
PARENT-D_ID) The DESC-I_ID points to the main data for this item
(Associated Descriptor Items 0x7cec, 0xbcec, or 0x0101) via the index1
tree. The TREE-I_ID is zero or points to an Associated Tree Item
0x0002 via the index1 tree. The PARENT-D_ID points to the parent of
this item in this index2 tree.
64 bit Index 2 Leaf Node
The 64 bit index2 b-tree leaf nodes are 512 byte blocks with the
following format.
The itemCount specifies the number of 32 byte records that
are active. The nodeLevel is zero for these leaf nodes.
The backPointer must match the backPointer from the triple
that pointed to this node.
Each item in this node is a tuple of (D_ID, DESC-I_ID, TREE-I_ID,
PARENT-D_ID) The DESC-I_ID points to the main data for this item
(Associated Descriptor Items 0x7cec, 0xbcec, or 0x0101) via the index1
tree. The TREE-I_ID is zero or points to an Associated Tree Item
0x0002 via the index1 tree. The PARENT-D_ID points to the parent of
this item in this index2 tree.
32 bit Associated Tree Item 0x0002
A D_ID value may point to an entry in the index2 tree with a non-zero
TREE-I_ID which points to this descriptor block via the index1
tree. It maps local ID2 values (referenced in the main data for the
original D_ID item) to I_ID values. This descriptor block contains
triples of (ID2, I_ID, CHILD-I_ID) where the local ID2 data can be
found via I_ID, and CHILD-I_ID is either zero or it points to another
Associated Tree Item via the index1 tree.
In the above 32 bit leaf node, we have a tuple of (0x61, 0x02a82c,
0x02a836, 0) 0x02a836 is the I_ID of the associated tree, and we can
lookup that I_ID value in the index1 b-tree to find the (offset,size)
of the data in the .pst file.
64 bit Associated Tree Item 0x0002
This descriptor block contains a tree that maps local ID2 values
to I_ID entries, similar to the 32 bit version described above.
Associated Descriptor Item 0xbcec
Contains information about the item, which may be email, contact, or
other outlook types. In the above leaf node, we have a tuple of (0x21,
0x00e638, 0, 0) 0x00e638 is the I_ID of the associated descriptor, and we
can lookup that I_ID value in the index1 b-tree to find the (offset,size)
of the data in the .pst file.
This descriptor is eventually decoded to a list of MAPI elements.
Note the signature of 0xbcec. There are other descriptor block formats
with other signatures. Note the indexOffset of 0x013c - starting at
that position in the descriptor block, we have an array of two byte
integers. The first integer (0x000b) is a (count-1) of the number of
overlapping pairs following the count. The first pair is (0, 0xc), the
next pair is (0xc, 0x14) and the last (12th) pair is (0x123, 0x13b).
These pairs are (start,end+1) offsets of items in this block. So we
have count+2 integers following the count value.
Note the b5offset of 0x0020, which is a type that I will call an index
reference. Such index references have at least two different forms,
and may point to data either in this block, or in some other block.
External pointer references have the low order 4 bits all set, and are
ID2 values that can be used to fetch data. This value of 0x0020 is an
internal pointer reference, which needs to be right shifted by 4 bits
to become 0x0002, which is then a byte offset to be added to the above
indexOffset plus two (to skip the count), so it points to the (0xc,
0x14) pair.
So far we have only described internal index references where the high
order 16 bits are zero. That suffices for single descriptor
blocks. But in the case of the type 0x0101 descriptor block, we have
an array of subblocks. In this case, the high order 16 bits of an
internal index reference are used to select the subblock. Each
subblock starts with a 16 bit indexOffset which points to the count
and array of 16 bit integer pairs which are offsets in the current
subblock.
Finally, we have the offset and size of the "b5" block located at offset 0xc
with a size of 8 bytes in this descriptor block. The "b5" block has the
following format:
Note the descoffset of 0x0040, which again is an index reference. In
this case, it is an internal pointer reference, which needs to be
right shifted by 4 bits to become 0x0004, which is then a byte offset
to be added to the above indexOffset plus two (to skip the count), so
it points to the (0x14, 0x7c) pair. The datasize (6) plus the b5 code
(02) gives the size of the entries, in this case 8 bytes. We now have
the offset 0x14 of the descriptor array, composed of 8 byte entries
that describe MAPI elements. Each descriptor entry has the following
format:
For some reference types (2, 3, 0xb) the value is used directly. Otherwise,
the value is an index reference, which is either an ID2 value, or an
offset, to be right shifted by 4 bits and used to fetch a pair from the
index table to find the offset and size of the item in this descriptor block.
The following reference types are known, but not all of these
are implemented in the code yet.
The following item types are known, but not all of these
are implemented in the code yet.
Associated Descriptor Item 0x7cec
This style of descriptor block is similar to the 0xbcec format.
This descriptor is also eventually decoded to a list of MAPI elements.
Note the signature of 0x7cec. There are other descriptor block
formats with other signatures.
Note the indexOffset of 0x017a - starting at that position in the
descriptor block, we have an array of two byte integers. The first
integer (0x0006) is a (count-1) of the number of overlapping pairs
following the count. The first pair is (0, 0xc), the next pair is (0xc, 0x14)
and the last (7th) pair is (0x160, 0x179). These pairs are (start,end+1)
offsets of items in this block. So we have count+2 integers following
the count value.
Note the 7coffset of 0x0040, which is an index reference. In this case,
it is an internal reference pointer, which needs to be right shifted by 4 bits
to become 0x0004, which is then a byte offset to be added to the above
indexOffset plus two (to skip the count), so it points to the (0x14, 0xea)
pair. We have the offset and size of the "7c" block located at offset 0x14
with a size of 214 bytes in this case. The "7c" block starts with
a header with the following format:
Note the b5Offset of 0x0020, which is an index reference. In this case,
it is an internal reference pointer, which needs to be right shifted by 4 bits
to become 0x0002, which is then a byte offset to be added to the above
indexOffset plus two (to skip the count), so it points to the (0xc,
0x14) pair. Finally, we have the offset and size of the "b5" block
located at offset 0xc with a size of 8 bytes in this descriptor block.
The "b5" block has the following format:
Note the descoffset of 0x0060, which again is an index reference. In this
case, it is an internal pointer reference, which needs to be right shifted by 4
bits to become 0x0006, which is then a byte offset to be added to the
above indexOffset plus two (to skip the count), so it points to the
(0xea, 0xf0) pair. The datasize (2) plus the b5 code (04) gives the size
of the entries, in this case 6 bytes. We now have the offset 0xea of an
unused block of data in an unknown format, composed of 6 byte entries.
That gives us (0xf0 - 0xea)/6 = 1, so we have a recordCount of one.
We have seen cases where the descoffset in the b5 block is zero, and
the index2Offset in the 7c block is zero. This has been seen for
objects that seem to be attachments on messages that have been
read. Before the message was read, it did not have any attachments.
Note the index2Offset above of 0x0080, which again is an index reference. In this
case, it is an internal pointer reference, which needs to be right shifted
by 4 bits to become 0x0008, which is then a byte offset to be added to
the above indexOffset plus two (to skip the count), so it points to the
(0xf0, 0x155) pair. This is an array of tables of four byte integers.
We will call these the IND2 tables. The size of each of these tables is
specified by the recordSize field of the "7c" header. The number of
these tables is the above recordCount value derived from the "b5" block.
Now the remaining data in the "7c" block after the header starts at
offset 0x2a. There should be itemCount 8 byte items here, with the
following format:
The ind2Offset is a byte offset into the current IND2 table of some value.
If that is a four byte integer value, then once we fetch that, we have
the same triple (item type, reference type, value) as we find in the
0xbcec style descriptor blocks. If not, then this value is used directly.
These 8 byte descriptors are processed recordCount times, each
time using the next IND2 table. The item and reference types are as
described above for the 0xbcec format descriptor block.
32 bit Associated Descriptor Item 0x0101
This descriptor block contains a list of I_ID values. It is used when
an I_ID (that would normally point to a type 0x7cec or 0xbcec
descriptor block) contains more data than can fit in any single
descriptor of those types. In this case, it points to a type 0x0101
block, which contains a list of I_ID values that themselves point to
the actual descriptor blocks. The total length value in the 0x0101
header is the sum of the lengths of the blocks pointed to by the list
of I_ID values. The result is an array of subblocks, that may contain
index references where the high order 16 bits specify which descriptor
subblock to use. Only the first descriptor subblock contains the
signature (0xbcec or 0x7cec).
64 bit Associated Descriptor Item 0x0101
This descriptor block contains a list of I_ID values, similar to the
32 bit version described above.