Search appears to be a very basic element of the internet that few people give much thought to. Let’s face it, it’s not the most exciting topic when compared to cutting-edge topics like machine learning or the newest DevOps ideas. However, we would suggest that you take a minute to reconsider the modest search, as we often take for granted how fantastic the search is these days.
Consider purchasing on Amazon. You type the title of a book you’ve been wanting into the search bar, press enter, and you’ll receive a page of results in two, maybe three seconds. According to one estimate, Amazon sells more than 75 million things. And in seconds, one search got what you were looking for. That’s ridiculous; how can they search such a large product database so quickly? It isn’t just Amazon. The user experience of Google, YouTube, Spotify, and any given news site is primarily built on you looking for some string of text and promptly receiving a comprehensive and accurate list of relevant results.
Consider whether any of these sites frequently took 10 seconds or more to provide search results. I’m sure you wouldn’t stay long and would go on to a speedier site. Imagine having to go through page after page of irrelevant results to locate what you were looking for. No store or streaming service would survive if you couldn’t find what you came for. Search is a critical online operation that is truly magical; it returns immediate and relevant results from millions or billions of rows of data.
Working this magic for your own site, on the other hand, does not have to be as difficult as you would assume. AWS, like many other services, provides not one, but two services for developing low-cost, high-throughput search solutions: CloudSearch and ElasticSearch. They are if they sound similar. However, there are small distinctions that may influence your choice of service.
CloudSearch’s search engine is powered on Apache Solr. More precisely, it is a Solr-based software that has been changed to be more managed via the AWS interface or API.
“Solr is highly dependable, scalable, and fault-tolerant, offering distributed indexing, replication, load-balanced querying, automatic failover and recovery, centralized setup, and more,” they say. Many of the world’s major websites use Solr to power their search and navigation functionality.”
Such power, however, does not come easily. Using Solr to power your site’s search still necessitates installation (either on a dedicated server, in Docker, or in a Kubernetes cluster) and the configuration of extra nodes if fault tolerance is desired. Make sure you prepare for regular server maintenance and Solr updates as needed.
Most developers do not want to spend time running apps like this and would rather swap this time suck for a better alternative. CloudSearch is that solution.
CloudSearch “makes it simple and cost-effective to set up, administer, and grow a search solution for your website or application,” according to AWS. It’s a managed service, similar to EC2 or S3, in which AWS runs the application for you (Solr in this case). The AWS console or APIs abstract away the application itself. The program installation, upgrades, and management are all performed behind the scenes. With managed services, you may get to work considerably faster and with much less administrative overhead; there’s no need to set up servers or install the software.
To get started with CloudSearch, you’ll need to first create a search domain, which “encapsulates a collection of data you wish to search, the search instances that process your search queries, and a configuration that determines how your data is indexed and searched.” It’s basically all of your data and the choices you’ve configured for searching it.
As previously stated, CloudSearch allows many nodes for scalability. Setting up these nodes is the initial step in establishing CloudSearch, along with selecting an instance type and number of nodes. The instance types range from small to 2xlarge, providing gradually larger data capacity and faster indexing.
Following that, you’ll need a file to specify your index fields. Consider spreadsheet columns; the index field file must have every field that your real data will contain so that CloudSearch understands what to anticipate with the data. The index field configuration can be a file in a format like JSON, XML, or CSV that you upload or store in an S3 bucket, or it can be a DynamoDB table in your AWS account. The following step, regardless of format, is to check the field types.
The actual file upload follows, either from the same file formats or from DynamoDB. One intriguing use is obtaining your file from an S3 bucket. This opens up a plethora of options for automation involving other portions of your program that are currently feeding data into S3.
Consider putting all of your microservices’ logs into a bucket; many log files, all filled with useful real-time data and metrics. Valuable only if you can find them. Point CloudSearch to that bucket, and you have a new important tool for extracting actionable data from those logs. For accurate real-time search results, your search domain is quickly updated as the data source updates.
After CloudSearch has ingested all of the data, regardless of source, you may do your first search, either through the AWS dashboard or over an HTTP API (with your search parameters in the URL as query strings). You can also search through API, so the possibilities for what you can do with your queries are nearly limitless. See the official CloudSearch getting started docs for additional information on configuring a search domain with example data.
So that’s CloudSearch, but as you’ll recall, the purpose of this essay isn’t only to expound on it, but also to show the contrasts between it and ElasticSearch, AWS’s other managed search service.
What exactly is ElasticSearch? It, like CloudSearch, is an AWS-managed app that handles all server and app administration for you, which is always a bonus for busy developers. ElasticSearch is the name of the software in this example, which is one of three open-source projects known as the ELK stack.
Starting with ElasticSearch on AWS is fairly similar to starting with CloudSearch, with one exception: you don’t need to submit data to specify index fields first; CloudSearch produces the index automatically from actual data. Otherwise, the process is the same: create a search domain, upload data, then search from the console or endpoint. Again, the AWS documentation is a good place to
ElasticSearch has excellent scalability, allowing numerous nodes in a cluster to perform data processing and querying.
However, leveraging Kibana, another component of the ELK stack, is an extra trick. Kibana, as stated on their website, can be used to generate powerful dashboards and stunning analytics for your data. It will operate as part of your managed ElasticSearch deployment; all you have to do is look in the AWS interface for the URL endpoint to access your instance.
So now that we’ve examined both managed “Search-as-a-Service” products, it’s time to answer the actual question: which one do you use? It depends, like with any good tech stack architecture talk. The two search solutions differ in terms of their strengths, targeted use cases, ecosystems, and pricing.
The greatest difference is that Cloudsearch is totally handled by AWS, so it isn’t much you can do behind the hood. Depending on your use case, this may be a positive for you; you may just want something easy and strong that can be set up in a few clicks, more of a set-it-and-forget-it solution.
ElasticSearch, on the other hand, has a massive ecosystem and community as a full-fledged open-source project. The degree of customization and extensibility is great. Again, if this is what you’re looking for instead of the open turnkey CloudSearch, you’re set to go.
Your specific use case also influences which managed solution makes the most sense. CloudSearch is a more concentrated search solution; Amazon even utilizes it to power search on amazon.com. ElasticSearch is more of an extendable framework built around the search core that can be used for analysis, dashboards, visualizations, and integration with other ecosystem products for content management or business intelligence.
Security is a major consideration in every endeavor that deals with data. Fortunately, both systems include security. Both offer built-in access control, especially with CloudSearch authentication and permission powered by IAM, providing a security paradigm that is consistent with all other AWS services. ElasticSearch’s security procedures are based on a plugin called Shield, which allows for integration with numerous IdPs as well as encryption, auditing, IP filtering, and other security measures.
Finally, for comparable-sized instance types, CloudSearch is somewhat more costly per hour. This is a difficult comparison to make because the instance types aren’t quite comparable.
The distinctions between CloudSearch and ElasticSearch are slight at first, but with closer examination, they begin to diverge. Despite their differences, they are both strong management tools for developers who need to manage their users’ searches and interactions with vast volumes of data. Hopefully, we’ve found enough to send you in the right direction so you may begin effectively resolving your search woes!