Tutorial: Writing an RRE SearchPlatform implementation

RRE now has the facility to connect to non-Solr or Elasticsearch search APIs using a Maven plugin, providing much the same options as the (previously available) Solr and Elasticsearch plugins. However, it requires you to write your own SearchPlatform implementation to connect to your search engine. Here we’re going to explore how to go about this.

We’ll assume that you’re familiar with the basic concepts of RRE: the versioned configuration model, the evaluation process, and how to run it. Additional details can be found in the RRE wiki.

The example code accompanying this tutorial can be found on Github. There is additional information about using the generic search plugin in the RRE Github repository.

Initial setup

Start a new project in your favourite IDE. We’ll be using Maven in the example. Your starting pom.xml should look similar to this:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.github</groupId>
    <artifactId>rre-searchplatform-example</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <!-- Compiler properties -->
        <jdk.version>1.8</jdk.version>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <!-- RRE version -->
        <rre.version>1.0</rre.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>io.sease</groupId>
            <artifactId>rre-search-platform-api</artifactId>
            <version>${rre.version}</version>
        </dependency>
    </dependencies>
</project>

Our only dependency so far is the RRE search platform API, which defines the SearchPlatform behaviour we need to implement to connect RRE and our search API.

SearchPlatform basics

SearchPlatform defines a set of methods which are run during the evalution cycle. These break down into three sub-groups.

Before evaluation starts

  • beforeStart(Map<String, Object> configuration) allows initialisation of overall platform configuration. The configuration map is taken from the searchPlatformConfiguration section in the pom.xml.
  • start() starts the search platform, if necessary.
  • afterStart() check that the platform has started correctly.

During evaluation

  • load(File dataToBeIndexed, File configFolder, String collection, String version) this runs once per configuration set during the evaluation process; initialises data and configuration.
  • executeQuery(String index, String version, String query, String[] fields, int maxRows) run once per query, fetches a set of results from the search engine.

After evaluation

  • beforeStop() prepare to stop the search platform.
  • close() close any resources used by the search platform (as part of the Closeable interface).

There are also a number of utility methods which we will cover as we go through the implementation.

Our search engine

Our search engine interface is very simple. It lives on a local server, and we communicate with it over HTTP:

GET http://localhost:8080/searcher/search?q=searchterm&fields=id,name&pageSize=12

It returns JSON that looks like this:

{
    "totalResults": 154,
    "metadata": {
        "requestTime": 400,
        "pageSize": 12,
        ... other metadata ...
    },
    "documents": [
        {
            "id": "1a",
            "name": "First document"
        },
        ... 11 more documents ...
    ]
}

RRE doesn’t really mind what your documents look like, assuming they have a unique ID field. When RRE queries the search engine, it expects the response to contain the total number of results available, and a list containing one map per returned document.

SearchPlatform implementation

In your IDE, create a new class – for the purpose of this tutorial, we’ll call it ExampleJsonSearchPlatform. It should implement the RRE SearchPlatform interface (io.sease.rre.search.api.SearchPlatform):

public class ExampleJsonSearchPlatform implements SearchPlatform {
}

If your IDE can auto-generate the methods you need to implement, now is the time to do that.

Constructor

It is important that your SearchPlatform implementation has a zero-argument constructor, so you should either omit the constructor entirely, or make it explicit (if you need to pre-initialise a client, for example) by adding one:

public ExampleJsonSearchPlatform() {
    // Zero-arg constructor
}

Utility methods

We’ll also implement the utility methods here, with some explanations on the way through.

public String getName() {
    return "Example JSON search";
}

public boolean isRefreshRequired() {
    return false;
}

public boolean isSearchPlatformConfiguration(String indexName, File searchEngineStartupSettings) {
    return searchEngineStartupSettings.isFile() && searchEngineStartupSettings.getName().equals("settings.json");
}

public boolean isCorporaRequired() {
    return false;
}

getName() returns a name for this platform, used for logging purposes.

isRefreshRequired() indicates to RRE whether or not the platform needs to be refreshed before it can be used. If you’re using an internal platform and need to load data every time, this should return true. If you’re connecting to an external search platform, as we are for this example, we’ll assume that the platform is ready to go, and return false.

isCorporaRequired() indicates to RRE that this search platform requires a data file in order to be used. As above, if you’re using an internal platform and need to load data, this should return true. In our case, with a platform that is ready to use, it returns false.

isSearchPlatformConfiguration(index, file) indicates to RRE that the incoming file contains configuration for this search platform. This is called once per configuration version, just prior to the load() method. If this never returns true, the platform will never load its configuration.

beforeStart, start and afterStart

For our implementation, these methods can remain blank. If your search platform requires some additional configuration, such as reading HTTP timeout settings, initialising an HTTP client, or similar, these methods are where this should happen.

public void beforeStart(Map<String, Object> platformConfiguration) {
}

public void start() {
}

public void afterStart() {
}

Loading the configuration details

As mentioned above, the load() method is run once per configuration set. For internal data platforms, it will index a data corpus, in addition to configuring version-specific information.

Our configuration information solely consists of the base URL for our search platform, so we’ll extract and record that. We use the Jackson ObjectMapper class to read the JSON settings file and convert it directly to a map.

private final Map<String, String> baseUrls = new HashMap<>();

@Override
public void load(File corpusFile, File settingsFile, String collection, String version) {
    try {
        Map<String, String> settingsMap = new ObjectMapper().readValue(settingsFile, 
                TypeFactory.defaultInstance().constructMapType(HashMap.class, String.class, String.class));
        baseUrls.put(getFullyQualifiedDomainName(collection, version), settingsMap.get("baseUrl"));
    } catch (IOException e) {
        System.err.println("Could not read settings from " + settingsFile.getName() + " :: " + e.getMessage());
    }
}

Note the use of getFullyQualifiedDomainName() to generate a mapping key for the collection and version. This is another utility method, with a default implementation provided as part of the RRE platform.

Executing the query

The executeQuery method is where the actual communication with the search platform happens, converting the results to a format RRE can process. The parameters here are:

  • collection, version: the collection and version values used to identify the configuration set in use.
  • query: the content of the query template with current query values substituted for the placeholders, as a single string. In our case, we’re expecting this to be JSON.
  • fields: the fields to return from the results, as specified in the pom.xml.
  • maxRows: the maximum number of rows to fetch – calculated during evaluation.

We’re going to build a query URL, then use Jackson’s ObjectMapper to call the URL and convert the response to an internal results class item. That will then be converted into a QueryOrSearchResponse object to return to the RRE evaluation loop.

(We’re also using the Apache Commons Lang library for StringUtils.join(), making it much easier to build our query parameters.)

public QueryOrSearchResponse executeQuery(String collection, String version, String query, String[] fields, int maxRows) {
    QueryOrSearchResponse searchResponse;

    // Look up the search settings
    final String baseUrl = baseUrls.get(getFullyQualifiedDomainName(collection, version));

    if (baseUrl == null) {
        System.err.println("No base URL found for index " + collection + " " + version);
        searchResponse = new QueryOrSearchResponse(0, Collections.emptyList());
    } else {
        try {
            // Build the URL query parameters
            Collection<String> urlQuery = new ArrayList<>();
            for (Map.Entry<String, String> qp : convertQueryToMap(query).entrySet()) {
                urlQuery.add(qp.getKey() + "=" + URLEncoder.encode(qp.getValue(), "UTF-8"));
            }
            // Add the fields to the query parameters
            urlQuery.add("fields=" + StringUtils.join(fields, ','));
            // Add the page size to the query parameters
            urlQuery.add("pageSize=" + maxRows);

            // Build the URL
            final URL queryUrl = new URL(baseUrl + "?" + StringUtils.join(urlQuery, "&"));

            // Make the request
            JsonSearchResponse jsonSearchResponse = new ObjectMapper().readValue(queryUrl, JsonSearchResponse.class);
            // Convert the response
            searchResponse = new QueryOrSearchResponse(jsonSearchResponse.getTotalResults(), jsonSearchResponse.getDocuments());
        } catch (IOException e) {
            System.err.println("Caught IOException making query :: " + e.getMessage());
            searchResponse = new QueryOrSearchResponse(0, Collections.emptyList());
        }
    }

    return searchResponse;
}

private Map<String, String> convertQueryToMap(String query) {
    try {
        final ObjectMapper mapper = new ObjectMapper();
        return mapper.readValue(query, new TypeReference<HashMap<String, Object>>() {
        });
    } catch (IOException e) {
        System.err.println("Cannot convert incoming query string to Map! " + e.getMessage());
        return Collections.emptyMap();
    }
}

@JsonIgnoreProperties(ignoreUnknown = true)
public static class JsonSearchResponse {
    private final long totalResults;
    private final List<Map<String, Object>> documents;

    public JsonSearchResponse(@JsonProperty("totalResults") long totalResults,
                              @JsonProperty("documents") List<Map<String, Object>> documents) {
        this.totalResults = totalResults;
        this.documents = documents;
    }

    public long getTotalResults() {
        return totalResults;
    }

    public List<Map<String, Object>> getDocuments() {
        return documents;
    }
}

Complete!

And that completes our basic search platform implementation! Obviously there are plenty of places where things can go wrong here – we’re not catching badly formatted responses, for example – but the basic implementation details are here.

See a fuller implementation in the example Github repository.