This is a class geared at indexing attachments.

It requires you to "assign" the attachment to a topic, which is used as TOPIC_URL for permission purposes. In addition you may set another MidgardObject as source object, its GUID is stored in the __SOURCE field of the index.

The documents type is "midcom_attachment", though it is not derived from midcom for several reasons directly. They should be compatible though, in terms of usage.

Example Usage:

$document = new midcom_services_indexer_document_attachment($attachment, $object);

Where $attachment is the attachment to be indexed and $object is the object the object is associated with. The corresponding topic will be detected using the object's GUID through NAP. If this fails, you have to set the members $topic_guid, $topic_url and $component manually.

todo More DBA stuff: use DBA classes, which allow you to implicitly load the parent object using get_parent.
see \global\midcom_services_indexer
see \global\midcom_helper_metadata


__construct (\MidgardAttachment $attachment, \MidgardObject $source)

Create a new attachment document



\MidgardAttachmentThe Attachment to index.


\MidgardObjectThe source objece to which the attachment is bound.

_get_attachment_content (resource $handle)

Returns the first four megabytes of the File referenced by $handle.

The limit is in place to avoid clashes with the PHP Memory limit, it should be enough for most text based attachments anyway.

If you omit $handle, a handle to the documents' attachment is created. If no handle is specified, it is automatically closed after reading the data, otherwise you have to close it yourselves afterwards.



resourceA valid file-handle to read from, or null to automatically create a handle to the current attachment.

_process_attachment ()

_process_mime_binary ()

Any binary file will have its name in the abstract unless no title is defined, in which case the documents title already contains the file's name.

_process_mime_html ()

Processes HTML-style attachments (should therefore work with XML too), strips tags and resolves entities.

_process_mime_pdf ()

Convert a PDF attachment to plain text and index it.

_process_mime_plaintext ()

Simple plain-text driver, just copies the attachment.

_process_mime_richtext ()

Convert an RTF attachment to plain text and index it.

_process_mime_word ()

Convert a Word attachment to plain text and index it.

_process_topic ()

Tries to determine the topic GUID and component using NAPs reverse-lookup capabilities.

If this fails, you have to set the members $topic_guid, $topic_url and $component manually.

_write_attachment_tmpfile ()

Creates a temporary copy of the attachment, the callee must delete it manually after completing procesing.


stringThe name of the temporary file.



mixed $_attachment


mixed $_source