Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Text Based Queries

Lets first understand why Opensearch has advantages on full text-based search compared to mySQL (SQL).

MySQL/SQL Limitations:

OpenSearch (NoSQL) Advantages:

Opensearch is a powerful search and analytics engine that excels in handling text-based queries efficiently. Understanding how to construct and utilize text-based queries in Opensearch is crucial for effective data retrieval and analysis.

This section will delve into the concepts and techniques involved in Opensearch text-based queries.

Match Query:

Match Phrase Query

Search for documents containing an exact phrase in the description field. Structure of query is:

{
    "query": {
        "match_phrase": {
            "<field>": "<phrase>"
        }
    }
}

Lets search for document(s) with exact phrase “without beam background” in description field.

search_query = {"query": {"match_phrase": {"description": "without beam background"}}}

search_results = es.search(index=index_name, body=search_query)

for hit in search_results["hits"]["hits"]:
    print(hit["_source"])

Match query

The match query is a basic query type in opensearch used to search for documents containing specific words or phrases in a specified field, such as the description field. Structure of query is:

{
    "query": {
        "match": {
            "query": "<text>"
        }
    }
}

Lets search for documents containing words “without” or “beam” in description field. Here it looks for document containing either of the words.

search_query = {"query": {"match": {"description": "without beam"}}}

search_results = es.search(index=index_name, body=search_query)

for hit in search_results["hits"]["hits"]:
    print(hit["_source"])

You should see three documents with filename expx.myfile2.root , expx.myfile3.root and expx.myfile4.root, as these document contain either of word without or beam. You can also add operator and for the query so that all the words are present in the field.

{
    "query": {
        "match": {
            "query": "<text>",
            "operator": "and"
        }
    }
}

Example , to get the documents with word “beam” and “cherenkov” you will do.

search_query = {
    "query": {"match": {"description": {"query": "beam cherenkov", "operator": "and"}}}
}

search_results = es.search(index=index_name, body=search_query)

for hit in search_results["hits"]["hits"]:
    print(hit["_source"])

query_string

Wild card

Wildcard queries are used to search for documents based on patterns or partial matches within a field. In the example below, a wildcard query is used to search for documents where the description field contains any L trigger ie. L1/L2.

search_query = {
    "query": {"query_string": {"default_field": "description", "query": "L*"}}
}

search_results = es.search(index=index_name, body=search_query)

for hit in search_results["hits"]["hits"]:
    print(hit["_source"]["description"])

You should see single document with filename expx.myfile1.root.

Prefix Query:

Prefix queries are used to search for documents where a specified field starts with a specific prefix. For example, the prefix query below searches for documents where the filename field starts with “expx.”:

search_query = {"query": {"prefix": {"filename": "expx."}}}

search_results = es.search(index=index_name, body=search_query)

for hit in search_results[“hits”][“hits”]: print(hit[“_source”][“filename”]) This query will match documents with filenames like “expx.csv,” “expx_data.txt,” etc.

Fuzzy Query:

Fuzzy queries are used to find documents with terms similar to a specified term, allowing for some degree of error or variation. In the example below, a fuzzy query is used to search for documents with terms similar to “produced” in the description field:

search_query = {"query": {"fuzzy": {"description": "physic"}}}

search_results = es.search(index=index_name, body=search_query)

for hit in search_results["hits"]["hits"]:
    print(hit["_source"]["description"])

This query will match documents with terms like “production,” “producer,” “products,” etc., based on the fuzziness parameter specified.