$score
$score : double
This is the score of this document. Only populated on resultset documents, of course.
This class is geared to ease indexing of datamanager driven documents. The user invoking the indexing must have full read permissions to the object.
Basic indexing operation
This class uses a number of conventions, see below, to merge an existing datamanager driven document into an indexing capable document. It requires the callee to instantiate the datamanager, as this class would have no idea where to take the schema database from.
The RI (the GUID) from the base class is left untouched.
Indexing field defaults:
Unless you specify anything else explicitly in the schema, the class will merge all text based fields together to form the content field of the index record, to allow for easy searching of the document. This will not include any metadata like keywords or summaries.
If the schema contains a field abstract, it will also be used as abstract field for the indexing process. In the same way, fields named title or author will be used for the index document's title or author respectively. The contents of abstract, title and author will also be appended to the content field at the end of the object construction, easing searching over this fields.
If no abstract field is present, the first 200 characters of the content area are used instead.
Not all types can be indexed, check the various types in question about their indexing capabilities. In general, if the system should index any non-text field, it will use the CSV representation for implicit conversion.
Metadata processing is done by the base class.
Document title:
You should either have an auto-indexed title field, or an assortment of other fields manually assigned to index to the title field.
Configurability using the Datamanager schema:
You can decorate datamanager fields with various directives influencing the indexing. See the Datamanager's schema documentation for details. Basically, you can choose from the following indexing methods using the key 'index_method' for each field:
The documents type is "midcom_datamanager".
$creator : \midcom_db_person
The MidgardPerson who created the object.
This is optional.
$editor : \midcom_db_person
The MidgardPerson who modified the object the last time.
This is optional.
$topic_url : string
The full path to the topic that houses the document.
For external resources, this should be either a MidCOM topic, to which this resource is associated or some "directory" after which you could filter. You may also leave it empty prohibiting it to appear on any topic-specific search.
The value should be fully qualified, as returned by MIDCOM_NAV_FULLURL, including a trailing slash, f.x. https://host/path/to/topic/
This is optional.
$_metadata : \midcom_helper_metadata
The metadata instance attached to the object to be indexed.
$_i18n : \midcom_services_i18n
The i18n service, used for charset conversion.
$datamanager : \midcom\datamanager\datamanager
The datamanager instance of the document we need to index.
__construct(\midcom\datamanager\datamanager $datamanager)
The constructor initializes the member variables and invokes _process_datamanager, which will read and process the information out of that instance.
The document is ready for indexing after construction. On any critical error, midcom_error is triggered.
\midcom\datamanager\datamanager | $datamanager | The fully initialized datamanager instance to use |
get_field_record(string $name) : Array
Returns the complete internal field record, including type and UTF-8 encoded content.
This should normally not be used from the outside, it is geared towards the indexer backends, which need the full field information on indexing.
string | $name | The name of the field. |
The full content record.
add_date(string $name, integer $timestamp)
Add a date field. A timestamp is expected, which is automatically converted to a suitable ISO timestamp before storage.
Direct specification of the ISO timestamp is not yet possible due to lacking validation outside the timestamp range.
If a field of the same name is already present, it is overwritten silently.
string | $name | The field's name. |
integer | $timestamp | The timestamp to store. |
add_date_pair(string $name, integer $timestamp)
Create a normal date field and an unindexed _TS-postfixed timestamp field at the same time.
This is useful because the date fields are not in a readable format, it can't even be determined that they were a date in the first place. so the _TS field is quite useful if you need the original value for the timestamp.
string | $name | The field's name, "_TS" is appended for the plain-timestamp field. |
integer | $timestamp | The timestamp to store. |
add_result(string $name, string $content)
Add a search result field, this should normally not be done manually, the indexer will call this function when creating a document out of a search result.
string | $name | The field's name. |
string | $content | The field's content, which is assumed to be UTF-8 already |
html2text(string $text) : string
Convert HTML to plain text (relatively simple):
Basically, JavaScript blocks and HTML Tags are stripped, and all HTML Entities are converted to their native equivalents.
Don't replace with an empty string but with a space, so that constructs like
string | $text | The text to convert to text |
The converted text.
is_a(string $document_type) : boolean
Checks whether the given document is an instance of given document type.
This is equivalent to the is_a object hierarchy check, except that it works with MidCOM documents.
string | $document_type | The base type to search for. |
Indicating relationship.
_add_field(string $name, string $type, string $content, boolean $is_utf8 = false)
Internal helper which actually stores a field.
string | $name | The field's name. |
string | $type | The field's type. |
string | $content | The field's content. |
boolean | $is_utf8 | Set this to true explicitly, to override charset conversion and assume $content is UTF-8 already. |
add_person(string $name, \midcom_db_person $person)
Add a person field.
string | $name | The field's name. |
\midcom_db_person | $person | The field's content. |
read_person(string $id) : \midcom_db_person
Get person by given ID, caches results.
string | $id | GUID or ID to get person for |
object
add_as_date_field(\Symfony\Component\Form\FormView $field)
This function tries to convert the $field into a date representation. Unixdate fields are used directly (localtime is used, not GMT), other fields will be parsed with strtodate.
Invalid strings which are not parseable using strtotime will be stored as a "0" timestamp.
Be aware, that this will work only for current dates in range of an UNIX timestamp. For all other cases you should use an ISO 8601 representation, which should work as well with Lucene range queries.
\Symfony\Component\Form\FormView | $field | The field that should be stored |