Text Based Queries¶
Lets first understand why Opensearch has advantages on full text-based search compared to mySQL (SQL).
MySQL/SQL Limitations:
Relational Structure: MySQL is optimized for structured, relational data, not large-scale text search. Full-Text Search: MySQL uses FULLTEXT indexes but is slower for full-text search as it lacks advanced text analysis and efficient indexing for unstructured data.
Row-Based Indexing: It indexes rows, requiring more resources to scan large text fields.
OpenSearch (NoSQL) Advantages:
Inverted Index: OpenSearch uses an inverted index, making text search faster by indexing individual terms, not rows.
Scalability: OpenSearch is built for horizontal scaling, distributing data and queries across nodes.
Text Processing: It has built-in analyzers (tokenization, stemming), making it ideal for fast, accurate full-text search.
Real-Time: OpenSearch excels at handling dynamic, real-time searches across large datasets.
Opensearch is a powerful search and analytics engine that excels in handling text-based queries efficiently. Understanding how to construct and utilize text-based queries in Opensearch is crucial for effective data retrieval and analysis.
This section will delve into the concepts and techniques involved in Opensearch text-based queries.
Match Query:¶
Match Phrase Query¶
Search for documents containing an exact phrase in the description field. Structure of query is:
{
"query": {
"match_phrase": {
"<field>": "<phrase>"
}
}
}Lets search for document(s) with exact phrase “without beam background” in description field.
search_query = {"query": {"match_phrase": {"description": "without beam background"}}}
search_results = es.search(index=index_name, body=search_query)
for hit in search_results["hits"]["hits"]:
print(hit["_source"])Match query¶
The match query is a basic query type in opensearch used to search for documents containing specific words or phrases in a specified field, such as the description field. Structure of query is:
{
"query": {
"match": {
"query": "<text>"
}
}
}Lets search for documents containing words “without” or “beam” in description field. Here it looks for document containing either of the words.
search_query = {"query": {"match": {"description": "without beam"}}}
search_results = es.search(index=index_name, body=search_query)
for hit in search_results["hits"]["hits"]:
print(hit["_source"])You should see three documents with filename expx.myfile2.root , expx.myfile3.root and expx.myfile4.root, as these document contain either of word without or beam.
You can also add operator and for the query so that all the words are present in the field.
{
"query": {
"match": {
"query": "<text>",
"operator": "and"
}
}
}
Example , to get the documents with word “beam” and “cherenkov” you will do.
search_query = {
"query": {"match": {"description": {"query": "beam cherenkov", "operator": "and"}}}
}
search_results = es.search(index=index_name, body=search_query)
for hit in search_results["hits"]["hits"]:
print(hit["_source"])query_string¶
Wild card¶
Wildcard queries are used to search for documents based on patterns or partial matches within a field. In the example below, a wildcard query is used to search for documents where the description field contains any L trigger ie. L1/L2.
search_query = {
"query": {"query_string": {"default_field": "description", "query": "L*"}}
}
search_results = es.search(index=index_name, body=search_query)
for hit in search_results["hits"]["hits"]:
print(hit["_source"]["description"])You should see single document with filename expx.myfile1.root.
Prefix Query:¶
Prefix queries are used to search for documents where a specified field starts with a specific prefix. For example, the prefix query below searches for documents where the filename field starts with “expx.”:
search_query = {"query": {"prefix": {"filename": "expx."}}}search_results = es.search(index=index_name, body=search_query)
for hit in search_results[“hits”][“hits”]: print(hit[“_source”][“filename”]) This query will match documents with filenames like “expx.csv,” “expx_data.txt,” etc.
Fuzzy Query:¶
Fuzzy queries are used to find documents with terms similar to a specified term, allowing for some degree of error or variation. In the example below, a fuzzy query is used to search for documents with terms similar to “produced” in the description field:
search_query = {"query": {"fuzzy": {"description": "physic"}}}
search_results = es.search(index=index_name, body=search_query)
for hit in search_results["hits"]["hits"]:
print(hit["_source"]["description"])This query will match documents with terms like “production,” “producer,” “products,” etc., based on the fuzziness parameter specified.