The EILF Project
Please note: at the Creation Review, the project name will be changed to SMILA (The SeMantic Information Logistics Architecture).
Introduction
The EILF project is a proposed open source project under the Eclipse Technology Project.
According to the Eclipse Development Process, this proposal is intended to declare the intent and scope of the EILF project as well as to gather input from the Eclipse community. You are invited to comment on and/or join the project. Please send all feedback to the http://www.eclipse.org/newsportal/thread.php?group=eclipse.technology.eilf newsgroup.
Background
Enterprise information systems have mainly focused on structured information residing in databases, ERP systems, etc. Now, however, the management of unstructured information is receiving more and more attention in the market. The term Information Access Management (IAM) has been coined as a name for a more general approach that bridges the two worlds.
Some vendors address the need for IAM tools with proprietary approaches mainly originating in their domestic market, such as search engine technology. In addition, some attempts have been undertaken to address specific parts of IAM even in the open source community (cf. UIMA).
However, a standardized architecture covering all parts of the IAM process on an enterprise level is still missing. This includes the standardization of the overall architecture and frequently used building blocks as well as a methodology for implementing IAM systems.
Having many years of project and product development experience in information logistics, search and related IAM applications, empolis GmbH and brox IT-Solutions GmbH decided to join forces and are now seeking more interested parties, e.g. Eclipse members, to launch the Enterprise Information Logistics Framework (EILF) project addressing the above mentioned aspects.
Description
The EILF project will deliver the foundation for building IAM systems ranging from medium-sized to enterprise-level installations. For doing so, EILF will define and implement a standard architecture making intensive use of the OSGi model to support componentization and extensibility.
Mainly focusing on the server-side architecture, EILF will provide a platform enabling all relevant steps in IAM, including management of data sources, data access, pre-processing of (structured and mainly unstructured) data, linguistic analysis, indexing, search and so on.
Figure 1: Architecture Overview of EILF
Components from existing Eclipse projects will be adapted and integrated in the EILF platform in order to implement specific steps in usual IAM processes. Furthermore, EILF will deliver new components as they are needed for the targeted IAM processes.
As an example, the EILF platform will allow for the implementation of an enterprise information system by
- collecting documents from a variety of (external and internal) data sources
- pre-processing this data
- enriching it with semantic metadata
- indexing data and metadata
- making all information accessible either via an integrated search platform or via direct access through web services
Figure 2: Architecture Overview - Service Model of EILF
To summarize, EILF will deliver a platform as well as dedicated components for the implementation of a next generation IAM and information logistics purposes.
Thus, EILF will allow enterprise organizations to use latest information access/management technology in their business processes.
Scope
The EILF project will address a set of features and components that are required for the implementation of typical information logistic platforms. Some of the most important aspects will be described briefly in the following.
Standardized and Scalable Architecture
EILF will deliver a highly scalable architecture for IAM processes based on well-established standards. The objective is to serve a wide range of application scenarios from medium-sized simple search environments to enterprise-level business information systems. Scalability will be achieved by an appropriate design which ensures that additional resources (e.g. hardware) can easily be added to a running system in order to achieve a higher throughput of data etc. The underlying infrastructure will probably rely on messaging systems and queuing for use cases aiming at high throughput of data while other approaches need to be provided for synchronous processes.
Componentization
A major focus of EILF will be on componentization of the overall system architecture, thus ensuring that other open source tools, products by different vendors, or even project-specific extensions can easily be plugged into the system. Componentization will make use of OSGi while the various OSGi services need to be orchestrated by an appropriate and user-friendly mechanism that still needs to be identified.
In addition, we believe that the configuration and orchestration of the appropriate components should not only be possible by highly technical people actually programming new “workflows” but rather this should be configurable by an experienced and trained user. Standards, such as BPEL appear to be promising candidates for this – in particular when appropriate tools become available.
Data Source Management
For any IAM platform, access to a variety of data sources is essential. Some of these data sources can be considered fairly standardized (e.g. when crawling Web resources) while others are highly proprietary. While EILF cannot deliver ready-to-use solutions for all such data sources, the objective is to make available a set of standardized connectors (crawlers) for the most relevant data sources – but also to provide a framework by means of which connectors to other data sources can easily be integrated.
Operating and Management
Operation, management and monitoring of the IAM solution are particularly relevant for enterprise-level systems. EILF will address that requirement by providing tools and approaches by means of which the current state of any EILF-based system can be monitored in terms of a general health check of the entire application as well as the single components. In addition, this shall allow verifying utilized hardware resources in order to foresee upcoming problems and bottlenecks. Standards, such as SNMP, will be addressed here.
Additionally, enhancing the management capabilities of IAM technology in the area of backup management is on focus. One of the major aspects for a next generation IAM technology will be the ability to integrate in backup and maintenance procedures of enterprise companies. This fact will be more and more important, because the amount of required computing will increase depending on the increased usage of semantic technologies in the future.
Authentication and Authorization
Just as any other enterprise application, IAM systems as well require Authentication and Authorization be handled in a careful manner. This includes access rights with regards to information available in the system, permissions to (not) execute certain operations, but also user profiles which may potentially drive processes in the IAM system. Also, functional aspects such as single-sign-on (SSO) are very important.
Reporting
Analytics and Business Intelligence Reporting are essential parts to any IAM system as the information provided in such reports not only allows to optimize the usage of the system but also to identify missing information via knowledge gap analysis and similar approaches.
The powerful reporting engine provided by Eclipse BIRT will very likely be utilized with EILF even though this aspect may be shifted to later releases of the project.
Out of Scope
As of today, the EILF project does not focus on the client (user interface) part of IAM systems. This limitation includes the integration into existing enterprise portals or other user interface approaches. Sample clients might potentially be provided but probably mainly for demonstration purposes.
Also, the implementation of specific aspects of IAM (such as e.g. leading edge linguistic technology or ontology management components) is out of scope although the integration of such components in the EILF platform should be straightforward.
In terms of completeness, EILF will provide basic working components while commercial vendors may offer more advanced implementations of those components, for example relating to scalability or feature sets.
All of the above are subject to change should the need arise to more tightly integrate one or more of these aspects with the current scope of EILF.
Organization
Mentors
- Wayne Beaton, Eclipse Foundation
- Markus Knauer, Innoopract
Initial Committers
The initial committers will ensure the setup of the EILF platform as well as the initial implementation of both the platform and selected components. The agile development process will follow the Eclipse Development Process standards for openness and transparency. Hence, we will actively encourage contributions to EILF.
The initial committers and contributors are:
- Georg Schmidt (brox IT-Solutions GmbH): architect, project lead and committer
- Igor Novakovic (empolis GmbH): team lead, committer
- Jürgen Schumacher (empolis GmbH): architect, committer
- Daniel Stucky (empolis GmbH): architect, contributor
- Sebastian Voigt (brox IT-Solutions GmbH): senior developer, committer
- Ralf Rausch (brox IT-Solutions GmbH): developer, contributor
- Ralf Schumann (brox IT-Solutions GmbH): developer, contributor
- Thomas Menzel (brox IT-Solutions GmbH): architect, committer
Code Contribution
empolis GmbH as well as brox IT-Solutions GmbH offer initial code contributions for some of the components based on their experience and long-standing development expertise in the information logistics area.
Interested Parties
Both empolis GmbH as well as brox IT-Solutions GmbH express their explicit interest in the project as also indicated by the list of committers and contributors above.
Furthermore, Arexera Information Technologies GmbH (represented by their CTO, Bidjan Tschaitschian) as well as project partners on behalf of Siemens and SAP expressed their interest in the EILF project.
More generally, universities and research centers are interested in using a standardized framework for search technologies to use it as a foundation for further research. The project fits also the interests of the Fraunhofer institutes as well as the DFKI (German Institute for Artificial Intelligence) because of its integration in processing chains of research projects.
Moreover, EILF will be highly interesting for commercial software when it comes to integrating specific components and building blocks into a fully operational environment without having to care about the entire infrastructure from scratch. The cooperation between brox IT-Solutions GmbH and empolis GmbH is already a very good example of that: while the primary focus of brox’ technology is on accessibility and management of (unstructured) data, the key benefit of empolis’ technology is on indexing and semantic refinement of that data – in a productive environment, however, both aspects need to be addressed simultaneously.
Last but not least one of the main focuses of standardization departments of enterprise companies will be on standardization of building blocks in their companies. The EILF project is the first approach to deliver such a standardized approach for IAM technology. Therefore, also enterprise companies have a major interest on the outcome of the EILF project.
Relationships to other Eclipse projects
The following projects currently hosted by Eclipse might potentially become relevant in the course of EILF (in alphabetical order):
- BIRT (http://www.eclipse.org/birt/) provides reporting features potentially relevant for the collection of information about usage of EILF components.
- BPEL (http://www.eclipse.org/bpel/) very likely will be used to orchestrate services as part of EILF.
- Corona (http://www.eclipse.org/corona/) could potentially serve as a server-side runtime environment for EILF services.
- Connectivity (http://www.eclipse.org/datatools/project_connectivity/) could deliver functionality for gathering content from diverse data sources.
- EclipseLink (http://www.eclipse.org/eclipselink/) can potentially provide components for the persistence of data inside EILF.
- Equinox (http://www.eclipse.org/equinox/) definitely will be utilized in order to implement componentization based on OSGi.
- Higgins (http://www.eclipse.org/higgins/) addresses identity management which will become relevant for EILF sooner or later.
- Target Management (http://www.eclipse.org/dsdp/tm/) could potentially be utilized for the administration and monitoring of an EILF installation that is deployed to distributed servers.
- Rich Ajax Platform (http://www.eclipse.org/rap/) addresses more the client part but nevertheless can become relevant for the implementation of EILF clients.
- SOA Tools (http://www.eclipse.org/stp/) as well as Swordfish (http://www.eclipse.org/swordfish/) both appear to be relevant when it comes to the service architecture of EILF.
- TPTP (http://www.eclipse.org/tptp/) very likely will be used for performance measurements and monitoring purposes.
User Community
EILF aims at delivering a de facto standard in the IAM area. Consequently, an active community providing solid feedback is essential. This includes a community of developers active in different areas (architecture, IAM components, UIs) as well as potential users of EILF who implement specific IAM systems.
Tentative Plan
2008-03 Version 0.5
- Basic architecture settled and implemented
- Simple search application available at least on demonstrator level
2008-06 Version 0.6
- More components available e.g. supporting the integration of linguistic preprocessing
- More data sources accessible
2008-10 Version 1.0 – Release 1.0
- Orchestration of components / services
- Sample application
- Documentation and related material
- Full support of the OSGI architecture model
2009-03 Version 1.5
- Improved support of Semantic Web features, ontologies etc.