Implementers of enterprise search solutions have a fundamental decision to make between deploying a specialized piece of search hardware (MaxxCAT or Google), or a myriad of software solutions that range from toolkits, API’s, or full blown applications. This white paper outlines the considerable difference between the approaches in an effort to educate implementers on the relative merits of each approach.
If not the most important criteria, performance must rank highly in any candid evaluation. Software-only solutions are limited not only by the performance of the software itself, but also the underlying hardware, and the limitations imposed onto the software by the operating system. Increasing the performance of software-only solutions is not necessarily as easy as just running it on faster hardware. Doubling the clock speed, or number of cores in a general purpose server may have little to no impact on the performance of the software. Even techniques such as increasing physical memory, using faster disk drives, or multiple drives (which are more or less hardware techniques anyway) are hit or miss because the general purpose software running on general purpose hardware is not tuned, sized, or configured into an optimal configuration.
Furthermore, in order to run on the largest number of configurations, software only solutions must be built to the least common denominator of their projected installed base. Portability concerns, operating system dependencies and the inability to take advantage of specialized hardware including coprocessors or FPGAs hamper the design of general purpose software. There are many substantial differences between Operating Systems - the Microsoft NTFS file system has been optimized for one set of criteria, whereas with Linux, there are a wide variety of file systems available, all optimized for different criteria. In order to run optimally on Solaris, NT, or any of the multitude of flavors of Linux, a software solution developer would have a combinatorial challenge to maintain a code base that worked optimally on each platform.
One of the many reasons that the MaxxCAT™ product line offers performance, in some cases multiple orders of magnitude faster than software solutions, is because the solution is built as a single monolithic component. MaxxCAT™ engineers do not have to write device interfaces for every file system, or accept the limitations of a general purpose file system that is the LCD of the installed base. Instead, a single file system, the best file system for hierarchical performance is improved and optimized. The layers of code above the file system can take advantage of the fact that the system has been tuned to perform optimally for the single task of high performance search. Portability to other file systems, operating systems, or different processors is not a design constraint in any way. Instead, the monolithic unit is built at a level to perform its single task better than a software solution which needs configured to the lowest common denominator of platform and available hardware supplied by the customer.
Even slower performing appliances like the Google Mini have the benefit of being built for a single, known hardware environment, and can attempt to make the best use of the available hardware. The distinction between MaxxCAT™ and Google becomes the question of, at a given dollar level of investment, who gets more performance from the hardware.
Another performance indicator where there is a big distinction between hardware appliance based search solutions and software solutions is how scaling is handled. Scalability is where performance meets simplicity and either things get much more complicated, or they scale simply to handle growing data sets. In the case of a software solution, moving from 1 node to 2 nodes involves acquiring, configuring, possibly testing more hardware, or finding a suitable device from existing inventory. And then the same old process begins anew - install the software on the second machine, patch the software on the second machine, configure the operating system on the second machine etc. Then, depending upon the software, the data and user customizations may have to be migrated to the second node, applications that use the software may have to be reconfigured to become aware of the additional node, API config files may have to be adjusted etc. Scaling a software solution can be very complicated and time consuming, often requiring a monetary investment in support plans of the software developer, if available. Not very simple.
Within the MaxxCAT™ product line, adding an additional node is as simple as acquiring the second device from MaxxCAT™, plugging it in and connecting it to the network. Because the MaxxCAT™ design is monolithic, even at the cluster level, the nodes are self-aware and self activating. Each MaxxCAT™ unit can easily synchronize data, redistribute the query load, and they require no additional configuration or modification of user interfaces. This simplicity is possible because all of the components are designed to work together, and therefore, much of the complexity of converting a single instance of a search solution into a parallel instances is done inside the box, not by an administrator or technical expert.
Finally, once applications move from single threaded, single instance, to multi-homed, things become even more complicated for software based enterprise search solutions. Issues of concurrency, synchronization, latency are now foremost, and although these features are certainly programmed into software solutions, they suffer from the same limitations as general purpose hardware running general purpose software. The synchronization mechanisms, the latency time outs, etc have to drawn from the pool of LCDs, and have to either work across the universe of possible installed base, or allow for the user, again, to configure parameters to tune the distributive aspects of the solution. But this is not the case for well designed, hardware solutions like MaxxCAT™, where the configuration and optimization are built into the product, not left for the customer to struggle with.
Even the simplest of software solutions require a certain amount of handholding to install and administer. Software presupposes a working hardware platform, but there is also administrative overhead in installing the software, configuring the software, backing up the software, upgrading the software etc. This is all in addition to the tasks required to keep the underlying hardware running.
Specialized hardware appliances, be they firewalls, mailservers, or in the case of Enterprise Search, dedicated search hardware, have the intrinsic advantage of operating as a “black box” in the case of Google, or as a “grey box” in the case of MaxxCAT™. This box paradigm abstracts away much of the underlying complexity of the hardware, software and operating system, and exports a simple, but robust, API that can be manipulated to achieve desired results with a minimum of time spent tuning, configuring, patching, installing etc. MaxxCAT™ search appliances can be up and performing their job in as little as 5 minutes. Plug them in, hook up the network and search.
Well designed software can have the perception of being simpler to customize -- it’s software after all. Just change some bits around and go. Hardware, involves installing a physical device, wiring things etc. So often, software is perceived as having a potential edge here. However, most of the things that affect enterprise search results, do not need to be configured. Register word sizes are fine at 64 bits, the amount of disk storage can be safely fixed at some upper bound, and so on. Customization comes in only at the top level of the solution -- the user interface. What does the user see, what data sources does the appliance connect to and so on.
A well designed search appliance, such as the MaxxCAT™ EX5000 provides maximum configurability and customization at the top level of the application stack, by exporting an extremely simple mechanism for retrieval. The hardware can rapidly locate relevant results, interacting optimally with internal subsystems with zero configuration by the user. The part that matters, the part that needs to be customized is left exposed at the top level. So in this sense, hardware gives all of the simplicity of customization as software at the top level, while obscuring all of the nitty-gritty details of buffer allocation, disk scheduling, interrupt handling, etc. from the need to be customized or tuned. Not so with software -- the software has to be installed in a certain location, it has to be registered with the operating system for start up at boot time, it has to have security access to the required ports, etc. And then, the software still needs managed.
Other costs creep in when simplicity is lost. With a hardware solution, either the appliance is working, or it is not. However, with software, tracking down problems is much more time consuming. The software vendor blames the hardware vendor (or the customer!). The hardware vendor blames the operating system. The OS vendor blames the external storage manufacturer, and so on. The customer is stuck in the middle, playing traffic cop trying to determine the root of the problem. With a search appliance, it is obvious where the problem is, and much simpler to resolve. At least in the case of MaxxCAT™, hardware problems can be solved with a single phone call to one vendor, and possibly the replacement of faulty hardware.
An argument that has been constructed in favor of software is that the development of specialized hardware can not keep pace with the development of software. Anyone familiar with Moore’s Law, knows that new hardware cycles quickly . At any given time, software can cycle ahead. It is but a small matter, the argument goes, for software to take advantage of any new developments in general purpose hardware simply by upgrading the underlying hardware. The software-only solution can benefit from the upgrades to hardware, if the installed software resides on an upgradable machine assuming the hardware upgrades are not cost-prohibitive. Also contributing to this reasoning is that the speed of general purpose hardware increases much faster than the speed of specialized hardware because the cost and pace of development of general purpose hardware is spread across a much wider installed base. As such, it is only a matter of time for tomorrow’s general purpose laptop to be faster than today’s special purpose super computer. This is something that Maxxcat uses to it’s advantage.
By simply building its hardware platform from generally available commodity hardware, as opposed to specialized board or circuit designs. MaxxCAT™ has taken advantage of performance gains. By designing units from commodity hardware, Moore’s Law works in favor of hardware solutions, not against it. Just as software can become faster by virtue of the performance gains in the underlying general purpose hardware, so can hardware appliances. In either case, the hardware must be upgraded to take advantage of newer technology, however, the upgrade path on a hardware-only solution is much simpler. Just upgrade the hardware (plug it in, hook it to the network.) With software though, the underlying general purpose hardware, operating system, network must be installed, and then the software must be loaded, and perhaps reconfigured to take advantage of new capabilities in the underlying platform.
In conclusion, we hope that this paper has presented some of the distinctions between hardware-based solutions and software-based solutions, and will help to persuade those who are not yet convinced to further explore the performance, simplicity and scalability aspects of specialized hardware solutions.