The user invoking the indexing must have full read permissions to the object, otherwise the NAP or Metadata objects can probably not be loaded successfully.
Basic indexing operation
This class uses a number of conventions, see below, to merge an existing, datamanager2 driven document into an indexing capable document. It requires the callee to instantiate the datamanager2, as this class would have no idea where to take the schema database from.
Additional information is taken out of the Metadata record and the NAP record, both of which have to be available to the indexer.
The RI (the GUID) from the base class is left untouched.
Indexing field defaults:
Unless you specify anything else explicitly in the schema, the class will merge all text based fields together to form the content field of the index record, to allow for easy searching of the document. This will not include any metadata like keywords or summaries.
If the schema contains a field abstract, it will also be used as abstract field for the indexing process. In the same way, fields named title or author will be used for the index document's title or author respectively. The contents of abstract, title and author will also be appended to the content field at the end of the object construction, easing searching over this fields.
If no abstract field is present, the first 200 characters of the content area are used instead.
Not all types can be indexed, check the various types in question about their indexing capabilities. In general, if the system should index any non-text field, it will use the CSV representation for implicit conversion.
Metadata processing is done by the base class.
Document title:
There is no NAP interaction anymore to determine the document title. Therefore, you should either have an auto-indexed title field, or an assortment of other fields manually assigned to index to the title field.
Configurability using the Datamanager schema:
You can decorate datamanager fields with various directives influencing the indexing. See the Datamanager's schema documentation for details. Basically, you can choose from the following indexing methods using the key 'index_method' for each field:
The documents type is "midcom_datamanager2".
Be aware that this class is designed to work on datamanager2 instances, not formmanagers, controllers or storage backends. It is also only targeted for the actual database storage backend, so the nullstorage backend will not work.
package | midcom.services |
---|---|
see | \global\midcom_services_indexer |
see | \global\midcom_helper_datamanager2_datamanager |
The document is ready for indexing after construction. On any critical error, midcom_error is triggered.
\midcom_helper_datamanager2_datamanager
&$datamanager The fully initialized datamanager2 instance to usestring
The name of the field that should be automatically processed.The function iterates over the fields in the schema, and processes them according to the rules given in the introduction.
Unixdate fields are used directly (localtime is used, not GMT), other fields will be parsed with strtodate.
Invalid strings which are not parseable using strtotime will be stored as a "0" timestamp.
Be aware, that this will work only for current dates in range of an UNIX timestamp. For all other cases you should use an ISO 8601 representation, which should work as well with Lucene range queries.
todo | Refactor this to use DateTime |
---|
string
The name of the field that should be storedcontent is completed with author, title and, if necessary, abstract.
The title is set to the documents' URL in case that no title is set yet. The title is not added to the content field in that case.
This is passed by reference through the constructor.
This is referenced into the datamanager2 instance.