As you might know, last August, we launched "10 million docs in a box ". Building on the premise of scale with simplicity, we (the engineering team) challenged ourselves to see how much further we could go, while still keeping the architecture extremely simple. After many late-night sessions spent diagramming on the whiteboards and chugging cappuccinos, we had a breakthrough.
The end result was a new architecture: (GSA)n. When we tested it out, the product manager was pretty excited about all the new features and search power. He was used to hearing about the millions of docs we could handle – but this time we were going to push it to a new realm: billions.
The idea was simple; build technology to connect as many appliances as you'd like, whether in one location or separated across departments or even across continents - and still provide a unified set of results to the end-user – the employee searching for an elusive document or piece of information. This would not only give our customers unparalleled ability to scale, but enable them to integrate all the data in their organizations. Information doesn’t do you much good if you can’t find it! That was our guiding principle. One of our beta customers, MTCSC Inc., really needs the geographic integration. When we caught up with them, MTCSC was in the midst of deploying over 50 GSAs all over the world for a federal customer, connecting to over 2,500 data sources and consisting of data on websites, file shares, databases, and SharePoint servers. The new GSA 6.0 architecture is now helping them integrate information from the varied data centers to provide users with a single, unified set of results.
So imagine there is a database that might “live” in Egypt, some documents in a data center in Sydney, and a fileshare whose homebase is Los Angeles. The new GSA 6.0 integration can handle searching through all those data stores and give the employee who is looking a simple page of search results – one that looks as easy to use as Google.com – even though the backend search is really complicated. And, since we were feeling ambitious, we added a Ranking Framework (where administrators can easily feed in server logs and other enterprise-specific information to improve relevance of search results), multiple new biasing options, and an administrative API to provide more control for automation of common tasks. We also added support for both early binding and late binding, providing organizations with flexible security policies to meet their needs.
The bottom line: safer, higher-quality, more customizable enterprise search with (GSA)n. This morning, both Google and our customers, including MTCSC, spoke at an event on all the new developments leading to (GSA)n and the 6.0 version. We talked so much about searching a billion documents, we decided to try something we’ve never seen done before: set up and showcase the actual infrastructure required to search a billion docs, and you can see it here. It is surprisingly small and simple, and pretty cool to know that we can now take the amount of content in the entire Google.com index in the year 2000 (when Google.com was searching though just a billion docs), and pack it into a server rack built that could fit in the corner of my living room. And I have a normal-sized living room.
Posted by Shamim Alpha, Enterprise Search EngineerLearn more about the Google Search Appliance 6.0 at google.com/gsa