Client/Server

Search server, clients and REST APIs

Garamond is designed as a client-server architecture in which the server receives requests, performs the search, recommendation or ranking operations and returns a response containing the search results back to the client.

Note
  • The clients do not depend on the Garamond package and are very lightweight.
  • The preferred way of communicating with the server is through the REST API using HTTP clients such as curl, etc.

In the root directory of the package the search server utility and two thin clients can be found:

Server

The search server listens on an ip and/or socket for incoming requests. Once one is received, it is processed and the response sent back to same socket. Looking at the gars command line help

$ ./gars --help
Activating environment at `~/projects/Garamond.jl/Project.toml`
[ Info: ~ GARAMOND ~ v"0.2.0" commit: 55dd103 (2019-10-23)
usage: gars [-d DATA-CONFIG] [--log-level LOG-LEVEL] [-l LOG]
            [-u UNIX-SOCKET] [-w WEB-SOCKET-PORT]
            [--web-socket-ip WEB-SOCKET-IP] [-p HTTP-PORT]
            [--http-ip HTTP-IP] [-i SEARCH-SERVER-PORT] [-h]

optional arguments:
  -d, --data-config DATA-CONFIG
                        data configuration file
  --log-level LOG-LEVEL
                        logging level (default: "info")
  -l, --log LOG         logging stream (default: "stdout")
  -u, --unix-socket UNIX-SOCKET
                        UNIX socket for data communication
  -w, --web-socket-port WEB-SOCKET-PORT
                        WEB socket data communication port (type:
                        UInt16)
  --web-socket-ip WEB-SOCKET-IP
                        WEB socket data communication IP (default:
                        "127.0.0.1")
  -p, --http-port HTTP-PORT
                        HTTP port for REST services (type: Int64)
  --http-ip HTTP-IP     HTTP IP for REST services (default: "0.0.0.0")
  -i, --search-server-port SEARCH-SERVER-PORT
                        Internal TCP port for the search server (type:
                        Int64, default: 9000)
  -h, --help            show this help message and exit

starting the server becomes quite straightforward.

For example, to start the server listening to a web socket at port 9100 and to a UNIX socket at /tmp/some/socket:

$ ./gars -d ./search_data_config.json -u /tmp/some/socket -w 9100 --log-level info
Activating environment at `~/projects/Garamond.jl/Project.toml`
[ Info: • Loaders (custom): sample_loader.jl
[ Info: ~ GARAMOND ~ v"0.2.0" commit: 55dd103 (2019-10-23)
[ [2019-10-19 11:15:55][INFO][sample_loader.jl:179] Sample data loader: read 23269 records from /opt/data/sample
[ [2019-10-19 11:16:11][INFO][search.jl:28] Searchers loaded. Notifying I/O servers...
[ [2019-10-19 11:16:11][INFO][search.jl:34] SEARCH server online @127.0.0.1:9000...
[ [2019-10-19 11:16:11][INFO][unixsocket.jl:30] UNIX-Socket server online @/tmp/some/socket...
[ [2019-10-19 11:16:11][INFO][websocket.jl:21] Web-Socket server online @127.0.0.1:9100...

Through the -d switch, a data configuration is specified for use. It holds all the necessary information related to data loading, indexing, searching, recommending etc.

Clients

garc: Commandline client

The commandline client garc sends the request to an open Unix socket and waits the search response on the same socket. It is worthwhile checking the available commandline options:

$ ./garc --help
usage: garc [--log-level LOG-LEVEL] [-u UNIX-SOCKET]
            [--return-fields [RETURN-FIELDS...]] [--pretty]
            [--max-matches MAX-MATCHES]
            [--response-size RESPONSE-SIZE]
            [--search-method SEARCH-METHOD]
            [--max-suggestions MAX-SUGGESTIONS] [--id-key ID-KEY] [-k]
            [--env-operation ENV-OPERATION ENV-OPERATION]
            [--ranker RANKER] [--input-parser INPUT-PARSER] [-h]
            [query]

positional arguments:
  query                 the search query (default: "")

optional arguments:
  --log-level LOG-LEVEL
                        logging level (default: "warn")
  -u, --unix-socket UNIX-SOCKET
                        UNIX socket for data communication (default:
                        "")
  --return-fields [RETURN-FIELDS...]
                        List of fields to return (ignores wrong names)
  --pretty              output is a pretty print of the results
  --max-matches MAX-MATCHES
                        maximum number of results for internal
                        neighbor searches (type: Int64, default: 10)
  --response-size RESPONSE-SIZE
                        maximum number of results to return (type:
                        Int64, default: 10)
  --search-method SEARCH-METHOD
                        type of match done during search (type:
                        Symbol, default: :exact)
  --max-suggestions MAX-SUGGESTIONS
                        How many suggestions to return for each
                        mismatched query term (type: Int64, default:
                        0)
  --id-key ID-KEY       The linear ID key (default:
                        "garamond_linear_id")
  -k, --kill            Kill the search engine server
  --env-operation ENV-OPERATION ENV-OPERATION
                        Environment operation
  --ranker RANKER       The ranker to use; avalilable: noop_ranker
                        (default: "noop_ranker")
  --input-parser INPUT-PARSER
                        The input parser to use; available:
                        noop_input_parser, base_input_parser (default:
                        "noop_input_parser")
  -h, --help            show this help message and exit

Assuming that a search server is running and listening at /tmp/some/socket, querying the server can be done with:

$ ./garc "a test query" -u /tmp/some/socket --max-matches 5 --pretty --return-fields one two three
Elapsed search time: 0.0007750988006591797s.
1/1 search ensemble yielded 5 results.
searcher-1
[0.55555475] ~ one: 4000.0 two: test  three: X15 _linear_id: 15
[0.52799106] ~ one: 2000.0 two: query  three: X42 _linear_id: 42
...

garw: Web-socket client

The web client garw starts a HTTP server that locally serves a page: it is the page that has to connect to the search server through a user-specified web-socket. Therefore, garw is technically not fully a client but for the sake of consistency we will consider it to be one. Its commandline arguments are more simplistic:

$ ./garw --help
usage: garw [--log-level LOG-LEVEL] [-w WEB-SOCKET-PORT]
            [--web-socket-ip WEB-SOCKET-IP] [-p HTTP-PORT]
            [--web-page WEB-PAGE] [--return-fields [RETURN-FIELDS...]]
            [-h]

optional arguments:
  --log-level LOG-LEVEL
                        logging level (default: "warn")
  -w, --web-socket-port WEB-SOCKET-PORT
                        WEB socket data communication port (type:
                        UInt16, default: 0x0000)
  --web-socket-ip WEB-SOCKET-IP
                        WEB socket data communication IP (default:
                        "127.0.0.1")
  -p, --http-port HTTP-PORT
                        HTTP port for the http server (type: Int64,
                        default: 8888)
  --web-page WEB-PAGE   Search web page to serve
  --return-fields [RETURN-FIELDS...]
                        List of fields to return (ignores wrong names)
  -h, --help            show this help message and exit

Assuming a search server is running using a web socket at port 9100 (as in the first gars example above), one can start serving the default webpage at the default port by simply running:

$ ./garw -w 9100
[ Info: ~ GARAMOND ~ (web-socket client)
[ Info: Serving page on 127.0.0.1:8888

Using a browser, one can open the page at locahost:8888 and search.

HTTP client

A generic HTTP client such as curl can easily be used. Assuming that the search server listens at localhost:9000,

$curl -d '{<request JSON content>}' -H "Content-Type: application/json" http://localhost:9000/api/search

will send a request to the server. The content of the request is a JSON file conforming to the REST API specification.

REST API

The REST API exposes the search engine's functionality through HTTP requests. These can be either GET requests (for simple functionality) or POST requests, in which the body of the message contains a correctly formatted JSON string.

Endpoints

Assuming a search server listening at <ip>:<port>, the exposed API endpoints are:

operationHTTP request typeURIdescription
SearchPOSThttp://<ip>:<port>/api/searchSearches for a given input in the indexed data
RecommendPOSThttp://<ip>:<port>/api/recommendReturns a list of similar entities to one specified, using a specified recommender
RankPOSThttp://<ip>:<port>/api/rankRanks a given list of ids using a specified ranker
Environment-relatedPOSThttp://<ip>:<port>/api/envopSaves, loads or re-indexes a full search environment (data + searchers)
KillGEThttp://<ip>:<port>/api/killKills the operation
Get configurationGEThttp://<ip>:<port>/api/read-configsReturns the data configuration of the engine

HTTP status codes

Request body format

The specific functionality of the engine operations i.e. search, ranking is set through parameters passed in the HTTP request body. The underlying format of the request body is JSON, of the form:

{
    "<key>":<value>,
 }

The following tables detail the key names, types and default values for each operation supported by the engine.

Note

The default values present in the tables below are found in https://github.com/zgornel/Garamond.jl/blob/master/src/config/defaults.jl

keyrequiredtypedefaultdescription
queryString-The input query.
input_parserString-Input parser to use. Available: "noop_input_parser" (no specific parsing) and "base_input_parser" (constructs data filters and queries).
return_fieldsList of strings-A list with the names of the database columns to be returned.
sort_fields-List of stringsDEFAULT_SORT_FIELDSA list with the names for the fields to sort by when filtering. Sort precedence is given by list order.
sort_reverse-BooleanDEFAULT_SORT_REVERSEFlag specifying whether to reverse the sorting i.e. largest number/letter first.
search_method-StringDEFAULT_SEARCH_METHODDefault search method. Only used by "search_recommender".
searchable_filters-List of strings<empty list>A list of field names whose values will be inserted in the search query if the fields are used for filtering in the query.
max_matches-IntegerDEFAULT_MAX_MATCHESThe maximum number of search results to generate internally from each searcher. Note that still response_size recommendations are returned.
response_size-IntegerDEFAULT_RESPONSE_SIZEThe maximum number of results to return in the response.
response_page-IntegerDEFAULT_RESPONSE_PAGEWhich page of response_size results to return in the response.
max_suggestions-IntegerDEFAULT_MAX_SUGGESTIONSThe maximum number of suggestions to return for each mismatched keyword of the query.
custom_weights-DictionaryDEFAULT_CUSTOM_WEIGHTSA dictionary where the keys are strings with searcher ids and the values are weights of the result scores to be used in result aggregation (if the case). In this way, the importance of search results from different searchers can be tuned.
ranker-StringDEFAULT_RANKER_NAMEThe name of the ranker. Available: "noop_ranker" (no ranking).
keyrequiredtypedefaultdescription
recommenderString-The name of the recommender to use. Available: "noop_recommender" (no recommendation) and "search_recommender" (search-based recommender).
recommend_idString-The id of the record for which recommendations (similar items) are sought.
recommend_id_keyString-The database name of the column holding the recommend id.
input_parserString-Input parser to use. Available: "noop_input_parser" (no specific parsing) and "base_input_parser" (constructs data filters and queries). The "base_input_parser" has to be used with "search_recommender".
filter_fieldsList of strings-Contains the names of the fields that will be used by the recommender. Only used in "search_recommender".
return_fieldsList of strings-A list with the names of the database columns to be returned.
sort_fields-List of stringsDEFAULT_SORT_FIELDSA list with the names for the fields to sort by when filtering. Sort precedence is given by list order.
sort_reverse-BooleanDEFAULT_SORT_REVERSEFlag specifying whether to reverse the sorting i.e. largest number/letter first.
search_method-StringDEFAULT_SEARCH_METHODDefault search method. Only used by "search_recommender".
searchable_filters-List of strings<empty list>A list of field names whose values will be inserted in the search query sent to the searchers, if the field names appear also in filter_fields. This guarantees a better match between results returned by querying the database (filtering) and the indexed data (search).
max_matches-IntegerDEFAULT_MAX_MATCHESThe maximum number of recommendations to generate internally. Note that still response_size recommendations are returned.
response_size-IntegerDEFAULT_RESPONSE_SIZEThe maximum number of results to return in the response.
response_page-IntegerDEFAULT_RESPONSE_PAGEWhich page of response_size results to return in the response.
ranker-StringDEFAULT_RANKER_NAMEThe name of the ranker. Available: "noop_ranker" (no ranking).
keyrequiredtypedefaultdescription
rankerString-The name of the ranker. Available: "noop_ranker" (no ranking).
rank_idsList of strings-The ids to be ranked.
rank_id_keyString-The database name of the column holding the ids to be ranked.
return_fieldsList of strings-A list with the names of the database columns to be returned.
response_size-IntegerDEFAULT_RESPONSE_SIZEThe maximum number of results to return in the response.
response_page-IntegerDEFAULT_RESPONSE_PAGEWhich page of response_size results to return in the response.
keyrequiredtypedefaultdescription
cmdString-The operation being performed. Available: "load", "save" and "reindex".
cmd_argumentString-Argument of the operation. For "load" and "save" it should be a filepath, for "reindex", the searcher id or *.

No parameters needed.

No parameters needed.

Response format

If the search, recommendation and ranking requests are successful a HTTP response with status code 200 is received. The body of the HTTP response message is a JSON string representing the actual results of the operation. Its keys and values are detailed below:

keytypedescription
n_searchersIntegerThe total number of searchers.
n_searchers_w_resultsIntegerThe total number of searchers that returned data.
suggestionsDictionaryA dictionary containing for each searcher, a list of suggestions for each missing token.
elapsed_timeFloatThe time elapsed in executing the request, except for the building of results.
resultsDictionaryA dictionary containing for each searcher a list of dictionaries, each of the latter containing individual result data. For filtering operations, a random id is generated to which the list of results is associated.
n_total_resultsIntegerThe total number of results returned by the engine.

For more information on how the internal engine result structure is used to construct the JSON output, consult the build_response function.