This is a class geared at indexing attachments.

It requires you to "assign" the attachment to a topic, which is used as TOPIC_URL for permission purposes. In addition you may set another MidgardObject as source object, its GUID is stored in the __SOURCE field of the index.

The documents type is "midcom_attachment", though it is not derived from midcom for several reasons directly. They should be compatible though, in terms of usage.

Example Usage:

$document = new midcom_services_indexer_document_attachment($attachment, $object);
$indexer->index($document);

Where $attachment is the attachment to be indexed and $object is the object the object is associated with. The corresponding topic will be detected using the object's GUID through NAP. If this fails, you have to set the members $topic_guid, $topic_url and $component manually.

todo More DBA stuff: use DBA classes, which allow you to implicitly load the parent object using get_parent.
package midcom.services
see \global\midcom_services_indexer
see \global\midcom_helper_metadata

 Methods

__construct (\MidgardAttachment $attachment, \MidgardObject $source)

Create a new attachment document

Parameters

$attachment

\MidgardAttachmentThe Attachment to index.

$source

\MidgardObjectThe source objece to which the attachment is bound.

_get_attachment_content (resource $handle)

Returns the first four megabytes of the File referenced by $handle.

The limit is in place to avoid clashes with the PHP Memory limit, it should be enough for most text based attachments anyway.

If you omit $handle, a handle to the documents' attachment is created. If no handle is specified, it is automatically closed after reading the data, otherwise you have to close it yourselves afterwards.

Parameters

$handle

resourceA valid file-handle to read from, or null to automatically create a handle to the current attachment.

_process_attachment ()

_process_mime_binary ()

Any binary file will have its name in the abstract unless no title is defined, in which case the documents title already contains the file's name.

_process_mime_html ()

Processes HTML-style attachments (should therefore work with XML too), strips tags and resolves entities.

_process_mime_pdf ()

Convert a PDF attachment to plain text and index it.

_process_mime_plaintext ()

Simple plain-text driver, just copies the attachment.

_process_mime_richtext ()

Convert an RTF attachment to plain text and index it.

_process_mime_word ()

Convert a Word attachment to plain text and index it.

_process_topic ()

Tries to determine the topic GUID and component using NAPs reverse-lookup capabilities.

If this fails, you have to set the members $topic_guid, $topic_url and $component manually.

_write_attachment_tmpfile ()

Creates a temporary copy of the attachment, the callee must delete it manually after completing procesing.

Returns

stringThe name of the temporary file.

 Properties

 

mixed $_attachment

 

mixed $_source