Enterprise Content Management (ECM) systems are useful to manage and version control the information assets of the organization. But lets be honest, they don't necessarily have the most best search mechanism to retrieve the valuable information contained within. The default search capabilities provided by ECM systems create new silos where users must log-in to multiple applications to retrieve the information. Our mission at Google is to provide a unified search experience across all enterprise content sources. Due to technical differences in the interfaces and the different security mechanisms supported by ECM systems there is need to build specific connectors to these systems.

We just open-sourced an interesting project that will make it easy to build connectors to ECM systems. This new connector framework provides rich service provider interfaces (SPI) to write connectors to different content sources. It also provides a security infrastructure to securely index and serve documents stored in ECM systems. Finally it provides rich administrative capabilities to configure the connectors to different ECM systems in a centralized way. The connector framework is designed for building connectors to ECM systems as well as other content sources that may or may not have web-enabled content.

The open source project contains source for Connector Manager, Connector SPI interfaces, associated javadocs, sample code and test suites. This is an early technical preview of the connector manager project and is not (yet) an officially supported feature in the Google Search Appliance. We wanted to get the word out sooner and invite the broader developer and partner community to give us feedback. Check out the connector manager project and let us know your thoughts on it.


Our OneBox technology has been part of our Search Appliance product since last April and it's really taking off with our customers. We're constantly hearing about cool and interesting uses of the technology for integrating realtime data into enterprise search results, just like weather forecasts can be integrated into results. One sly Google engineer connected our internal search appliance to a database of Googler's license plates, thus greatly easing the process of finding that sad soul who left their headlights on, blocked somebody else in, or was sideswiped by a runaway Prius. Just type in "plate" followed by any series of numbers or letters and you can immediately drop a message to the car's owner.


Lakehead University became our first large-scale deployment of Google Apps for Education in Canada and shared with us some truly impressive statistics. Lakehead transitioned 38,000 students, faculty and alumni to Google Apps for Education in just one week. We think that getting all three of these groups on the same collaboration system should have a huge impact on learning (and student social calendars) as well as keep alumni involved in the campus community. Users should be excited about going from 60 MB of storage on their prior email system to 2 GB with Google Apps - eliminating the need to delete those large project files that happen to become useful come finals time.

What's most impressive is that Shahzad Jafri, Lakehead's Chief Information Officer, estimates that Lakehead will save $2-3 million in maintenance costs annually as well as $6 million in infrastructure costs - which is a big win for us and them! Read their press release for more information.


We came across another interesting article published in New Idea Engineering in the series - "Enterprise Search: Mapping Security Requirements to Enterprise Search". In this article Mark talks about the importance of document level security and the two methods of implementing it. We completely agree with Mark on the importance of supporting document level security with enterprise search systems. Anything short of fine-grained access control is no security at all. The Google Search Appliance supports document level security across heterogeneous enterprise content stores.

While we agree with Mark on some of the benefits with using early-binding security filtering, there are certain limitations that make it impractical (if not impossible) to use for most deployments today. One of the main issues with early-binding is synchronization with the access control list (ACL) policies stored in content systems. ACL policies change frequently, and caching the ACL policies results in policies being out-of-sync with the source system. This can cause severe breaches in company security and allow sensitive IP to be leaked within the organization.

The second issue is the lack of implemented standards for introspecting the ACL policies. Without a standard way of reading policies from source systems, companies are faced with difficult implementations or are only able to provide secure results inside a homogeneous system. The new MOSS 2007 search system is a prime example of this, where security is only enforced on content that is stored in the Sharepoint system and not across other content systems, web servers, or databases.

At Google, we're working to establish a scalable, standards-driven way of early-binding security filtering. For that to work we need implemented standards within content systems (web servers, file servers, document management systems, portals, etc.) for introspecting and notifying changes in ACL policies. Until then we continue to support late-binding, document-level security filtering and delivering the highest quality, highly secure search results to tens of millions of users in companies worldwide every day.