* An LDB Mapping Module -*- outline -*- ** Scenario Usage of an instance of the map module requires tha full set of data to be split in two "partitions", a local (fallback) and a remote (mapped) one. The remote partition will be used to store additional data but may differ from the local partition in the schema it uses. The goal is to store as much data as possible on the remote partition. A user is required to define mappings between local and remote data schemas to account for these differences, but also to enable usage of the remote partition for as large a dataset as possible. Data not explicitly accounted for in the defined mappings is ignored by the remote partition and stored on the local partition instead. ** Assumptions *** Mapped vs. Not Mapped Each record exists in either of two states: **** Not mapped: The record and all of its data are stored on the local partition; the additional attribute "isMapped" is not present in the record. **** Mapped: The record is split into a local and a remote part stored on the local and remote partition respectively; an additional internal attribute "isMapped" is added to the local part of the record. The local part of a mapped record is allowed to consist only of the "isMapped" attribute, denoting the case when all "real" data is mapped to the remote partition. *** Internal attribute "isMapped" The local part of each mapped record includes an internal attribute "isMapped", the value of which is irrelevant. It is not present in the local part of a record that is not mapped. *** Local != Fallback Each attribute must either be mapped or ignored; the local part should be reserved for ignored data and not serve as fallback storage for failed mappings. Failure to map an attribute indicates a problem with the specified mappings and should be reported. *** No smart mapping The map module has no knowledge of constraints present in the backend of the remote partition, such as objectClass restrictions. Constraint violation results from the remote backend indicate a problem with the specified mappings and should be reported. *** Uniqueness of local names The specified mappings should be uniquely indexable by the local name of a handled attribute. Generation of multiple remote attributes from a single local attribute will be possible but should be defined in a single mapping to ease conversions from the remote to the local format (as for search responses). (This might be problematic for attributes that form a records RDN, as only one attribute can be used for that at a time...) *** Stable naming contexts Mappings must not tamper with the naming context of a record. The naming context is (part of) what specifies the destination partition for a datase; its modification is defined globally in a special record of the map instance. ** Alternatives to local "isMapped" *** Local "isMappedTo" Instead of setting a local attribute to just denote whether a record is mapped or not, the attribute could contain the DN of the remote part if the records data. When looking for the attribute, its contents could be extracted and instantly reused to access remote data. *** Remote "isMappedFrom" Instead of the local part pointing to the remote part, the latter could be assigned an attribute containing the DN of the local part of the data. Finding the remote part of a record with a local DN would then consist of searching for a record that claims to be mapped from that local DN. *** More GUIDs Use GUIDs instead of DNs to identify related records. *** Another "masterGUID=%" RDN The idea from local_password could be reused. This would also save us from mapping DNs back and forth all over the place. ** Problems *** with these alternatives to "isMapped" The obvious problem is the use of a local/remote schema which might prevent our use of arbitrary internal attributes for mapped data. Another is mapping the directory structure in the remote backend, which the third alternative would prevent. We would also need to be in samdb to get to use GUIDs for all I understand. *** with partitions When I use partitions (or really just different baseDNs) to separate the local and remote records, I *should* make sure to only modify DNs under the local baseDN. What should one do in the case of a rename request where one of the old and new DN is under the local baseDN but the other isn't? *** with object existence The premises include that objects are always created locally, even if only to hold a "isMapped" attribute, but that they don't need to exist remotely. The structure of the remote partition exactly mirrors that of the local partition though, i.e. an object has the same DN relative to the local as to the remote baseDN (except for mappings of the DN components, but those don't matter here). This implies that *all* containers along this path must exist both locally *and* remotely, regardless of their attribute components. This means it *could* be possible to add a container (which isn't mapped for some reason) and later to create a record below that container that contains mapped data. The request will fail due to the missing container, which, from the users point of view, *does* exist. ** Mappings *** Abstractly A mapping represents, viewed abstractly, a function of type Request x Context --> Request The structure of LDB modules performs dispatching based on request type itself, so we can substitute the request contents for Request in that type. Disregarding delete and rename requests, which contain only one or two target DNs, the contents of a request are of type LDB message, so we assume a type Message x Context --> Request The ldb_map module disregards context completely, modelling only functions Message --> Message and a few restricted subforms of types Message Element --> Message Element or even just String --> String This is convenient for most cases and increases expressiveness of the "mapping definition language" but limits the overall power of mappings, as e.g. the password_hash module cannot currently be expressed with the ldb_map module due to lacking context (the domain data search result and the array `attrs' of the `password_hash_mod_search_self' function. The only way to access that data is by making (synchronous) LDB calls from within the attribute generation function. *** As Balls of Mud The canonical way to fetch context is to prefix the "real" request with a search request that results in the required data and is then stuffed in the requests async context. This pattern could of course be factored into the map module; mappings requiring this data would then need ways to request it, either as an array of strings (for the simpler case where additional info about the record itself is required) or as an array of (search) requests (for the general case, allowing the whole database to be queried). ** Requests *** Delete look for remote record if found: delete remote record delete local record *** Rename look for remote record if found: rename remote record rename local record *** Add for each attribute: if it is local: request it locally if it doesn't require context: map it request it remotely otherwise: register requested query if query registered: get context from query result if remote record requested: add remote record with context add local record with "isMapped" otherwise: add local record w/o "isMapped" This doesn't make sense: we're trying to *add* the remote record, so we can't possibly query it before adding it! Our only choices are to either find out immediately which mappings cannot be performed due to missing context data or to split the remote message into an immediate and a postponed part, the immediate one being run *right now* and the postponed, well, postponed until after that; the postponed message would then have to be split up again and again until either the immediate part is empty (i.e. some data cannot be added) or the postponed part is empty (i.e. we're done and can go home now). *** Modify if no requested changes are remote: just run local request look for remote record if not found: turn remote request into "add" add remote record (with context) register relation between local and remote record otherwise: modify remote record (with context) modify local record *** Search if no requested attribute is remote or "*": just run local request otherwise: run query with local attributes and "isMapped" if "isMapped" is not set and remote attribues were requested: abort otherwise: query remote DN with remote attributes unmap remote result merge local and remote result remove "isMapped" from result Hmm... Assume that both the local and remote parts of a split parse tree are non-null. Assume we are in the async callback of the local search, after having found a local record matching the local parse tree, and are preparing the search for the matching remote record. That remote record must match the remote parse tree so the original parse tree matches the merged record, so we should be able to search for the matching remote DN *and* the remote parse tree. Right? So, we'll consider these cases: - there is no parse tree: - just search as we did before - the parse tree is local: - use the original parse tree for the local search - continue as before - the parse tree is remote: - search locally as before - use the mapped parse tree for remote search - on match: - merge remote result into local result - otherwise: - skip local result - parse tree can be split: - use the local parse tree for the local search - use the mapped remote parse tree for remote search - on match: - merge remote result into local result - otherwise: - skip local result