This class is geared to ease indexing of datamanager2 driven documents.

The user invoking the indexing must have full read permissions to the object, otherwise the NAP or Metadata objects can probably not be loaded successfully.

Basic indexing operation

This class uses a number of conventions, see below, to merge an existing, datamanager2 driven document into an indexing capable document. It requires the callee to instantiate the datamanager2, as this class would have no idea where to take the schema database from.

Additional information is taken out of the Metadata record and the NAP record, both of which have to be available to the indexer.

The RI (the GUID) from the base class is left untouched.

Indexing field defaults:

Unless you specify anything else explicitly in the schema, the class will merge all text based fields together to form the content field of the index record, to allow for easy searching of the document. This will not include any metadata like keywords or summaries.

If the schema contains a field abstract, it will also be used as abstract field for the indexing process. In the same way, fields named title or author will be used for the index document's title or author respectively. The contents of abstract, title and author will also be appended to the content field at the end of the object construction, easing searching over this fields.

If no abstract field is present, the first 200 characters of the content area are used instead.

Not all types can be indexed, check the various types in question about their indexing capabilities. In general, if the system should index any non-text field, it will use the CSV representation for implicit conversion.

Metadata processing is done by the base class.

Document title:

There is no NAP interaction anymore to determine the document title. Therefore, you should either have an auto-indexed title field, or an assortment of other fields manually assigned to index to the title field.

Configurability using the Datamanager schema:

You can decorate datamanager fields with various directives influencing the indexing. See the Datamanager's schema documentation for details. Basically, you can choose from the following indexing methods using the key 'index_method' for each field:

The default auto mode will use the above guidelines to determine the indexing destination automatically, adding data to the content, abstract, title and author fields respectively.
You can specify abstract, content, title or author to indicate that the field should be used for the indicated document fields. The content selector may be specified more then once, indicating that the content of the relevant fields should be merged.
Any date field can be indexed into its own, range-filterable field using the date method. In this case, two document fields will be created actually. One containing the filterable timestamp named directly after the schema field, and a second one, having the _TS postfix which is set as noindex containing the plain timestamp.
Finally, you can explicitly index a field as a separate document field using one of the five field types keyword, unindexed, unstored or text. You can further control if the content of these fields is also added to the main content field. This is useful if you want to have fields searchable both by explicit field specification and the default field for simpler searches. This is controlled by setting the bolean key 'index_merge_with_content' in the field, which defaults to true.
noindex will prevent indexing of this field.

The documents type is "midcom_datamanager2".

Be aware that this class is designed to work on datamanager2 instances, not formmanagers, controllers or storage backends. It is also only targeted for the actual database storage backend, so the nullstorage backend will not work.

package	midcom.services
see	\global\midcom_services_indexer
see	\global\midcom_helper_datamanager2_datamanager

Methods

__construct (\midcom_helper_datamanager2_datamanager $datamanager)

The constructor initializes the member variables and invokes _process_datamanager, which will read and process the information out of that instance.

The document is ready for indexing after construction. On any critical error, midcom_error is triggered.

Parameters

$datamanager

\midcom_helper_datamanager2_datamanager&$datamanager The fully initialized datamanager2 instance to use

_process_auto_field (string $name)

This helper will process the given field using the guidelines given in the class documentation.

Parameters

$name

stringThe name of the field that should be automatically processed.

_process_datamanager ()

Processes the information contained in the datamanager instance.

The function iterates over the fields in the schema, and processes them according to the rules given in the introduction.

_add_as_date_field (string $name)

This function tries to convert the field $name into a date representation.

Unixdate fields are used directly (localtime is used, not GMT), other fields will be parsed with strtodate.

Invalid strings which are not parseable using strtotime will be stored as a "0" timestamp.

Be aware, that this will work only for current dates in range of an UNIX timestamp. For all other cases you should use an ISO 8601 representation, which should work as well with Lucene range queries.

todo	Refactor this to use DateTime

Parameters

$name

stringThe name of the field that should be stored

_complete_fields ()

Completes all fields which are not yet complete:

content is completed with author, title and, if necessary, abstract.

The title is set to the documents' URL in case that no title is set yet. The title is not added to the content field in that case.

Properties

\midcom_helper_datamanager2_datamanager $_datamanager

The datamanager instance of the document we need to index.

This is passed by reference through the constructor.

\midcom_helper_datamanager2_schema $_schema

The schema in use.

This is referenced into the datamanager2 instance.