|Knowledge Collection in Corona Workbench [message #1048]
||Mon, 27 February 2006 16:48
| Glenn Everitt
Registered: July 2009
My apologies this is a long post. I'm looking for thoughts on Knowledge |
Collection in OSGi component based environment. Let me know if this post
makes sense or not and problems you can foresee. If you have worked on a
Knowledge Base in any way similar please take some time and have a read
- Thanks Glenn Everitt
Corona Project Knowledge
When we started thinking about the Corona we originally thought we would use
project as the main object for collecting data. We later realized that we
want to be able to gather information about objects both larger and smaller
than a "project". We also argued that we would want to gather information
about ongoing operations that really shouldn't be thought of as a project
that ultimately ends. So rather than using the term project we decided
something more abstract would be appropriate. The term IContainer from the
RCP context seemed close and we ended up with ICollaborationContainer. As
you continue reading think of the CollaborationContainer as a space where
users and components collaborate.
Corona CollaborationContainer Knowledge
A Corona CollaborationContainer gives us an identifiable context from which
we can create a knowledge base. We want to be able to ask interesting
questions about Corona CollaborationContainers. For example, what are the
most heavily used resources. Does my CollaborationContainer use a component
with a security flaw. We would also like to be able to ask qualitative
questions such as which projects are the best run and who are the best
software developers working on them.
Gathering CollaborationContainer Knowledge
Corona CollaborationContainer knowledge is gathered by monitoring the events
that occur within a CollaborationContainer. These events include resource
events indicating which resources were added or removed from a project.
Collaboration Event - ECF
The events include collaboration events. We can gather information about
which groups of people are talking with one another. We can use information
from the ECF Sessions to identify the people involved and then look up their
role and work group. We can then infer whether groups such as software
development are talking with software test.
Status Events - ALF ?
What about status-update type of events - when is a project done, when is a
task complete. We could monitor ALF lifecycle events. This support would
require a common mechanism for identification of resources. If ALF lifecycle
events included a URI indicating which lifecycle event affects which
resource we could determine what stage the project is at and add in this
information. I need to check with ALF Project to see whether they are
thinking about software lifecycle events the same way we are.
Process Events - BPEL (again ALF)
If we wanted to know if a process within a project was complete we could
also capture information about status of BPEL processes. I think BPEL
processes started by ALF Event Manager should also be able to send ALF
status type events from the BPEL Processes - again verify with ALF.
Corona CollaborationContainer Owners should determine what types of
information they want to collect about their collaboration space. The
Collaboration Knowledge Monitor is a Corona Component that allows
CollaborationContainer Owner to define the types of events they are
interested in. We think that event processing uses existing OSGi Event
processing. Should we define specific event topics for publish and
Structuring CollaborationContainer Knowledge
Since we have gathered information from many different sources about the
context of a given CollaborationContainer we need an organized way to save
this information. We have chosen to formally define the information
structure of CollaborationContainer. The Resource Definition Framework (RDF)
and Web Ontology Language (OWL) allow us to define the structure we will use
to hold the information gathered for our Corona CollaborationContainers.
CollaborationContainer Knowledge Structure
We are currently using a project based Ontology to define the knowledge
structure for our exemplary Collaboration Container. See the RDF/OWL
definition of a Corona Project. This definition is from SemanticWeb.org and
we plan to change this ontology to add extensions for the Corona
Environment. This file project.rdf has some minor changes from the posted
definition so that it could be parsed by Jena2.3 Ontology Repostitory
Adding CollaborationContainer Knowledge
The monitored events will have Knowledge Monitor event adapter. The Event
Adapter will pull information from an event and convert it into a RDF
Triple. The triple consists of a Subject Predicate and Object. This form
tells us what the item is we are talking about. The predicate tells us the
relationship to \the subject has to the object. Here is an example of a
triple, \ "Jim", "Works on", "ProjectX" "Jim" is the subject, "Works on" is
the predicate and "ProjectX" is the object. So if we received an event that
indicated that Jim was added to Project X the Knowledge Monitor would create
the RDF Triple and add that into the Knowledge Base. We need to investigate
performance of this type of knowledge base.
Uniquely Identifying Project Resources
The RDF/OWL standard utilizes a URI naming scheme. Each subject, predicate,
or object of a Triple can be defined by URI or a literal. There already
exist many URI based vocabularies that build upon existing vocabularies.
This approach can only work because all of the vocabulary definitions are
namespaced via URI's. So, we can choose to add more detail and expand the
content held in the Corona Knowledge Base by utilizing existing vocabulary
defitions for resource types and resource properties.
Asking Questions about Projects
We can ask questions about Projects through the use of SPARQL queries
against the CollaborationContainer Knowledge Base. I believe in general
these will be predefined queries. However, a web service SPARQL query could
be used to execute ad hoc queries
Project Related Ontologies
Multiple ontologies will be used in the Corona environment. There will be
one ontology that describes a CollaborationContainer. This ontology will
describe very basic vocabulary/relationships regarding Tasks, Activities,
Components, Person etc. An ontology closely related to the
CollaborationContainer Ontology is the Phase Ontology this ontology
describes the time base vocabulary used for working with the
CollaborationContainer. This ontology will describe the Lifecycle Events
such as milestones, the design, development, and test phases of the project
or Operational Events such Maintenance Windows, Shutdowns.
Corona Component Related Ontologies
The other area of ontology usage is in description of Corona/OSGi bundle
capabilities. Each Corona Component should define its own ontology
describing what it does and the information it produces. This ontology
approach augments the Web Service Description Language (WSDL)definition. The
WSDL describes the interface to the components services. The WSDL does not
indicate what the service does, or what artifact files it creates or uses.
The ontology also enhances the information available from the WSDM
management interface. The WSDM interface provides an API for retrieving
metadata about a web service. The API allows the specification of metadata
description dialect. Our dialect will be explicitly defined via an RDF/OWL
Our current thinking is that Corona will provide a very basic
CollaborationContainer Ontology. This CollaborationContainer Ontology will
be extended with more specific ontology information for a particular
company's usage. We will provide an RDF definition indicating where we think
the basic CollaborationContainer Ontology should be extended. This property
will be named something like <ontology-extension-point> this is just an
indication of where the ontology could be extended. In this way you can
think of the ontology extensions as Knowledge Base Extension Points. The
extension ontologies allow for customization for specific types of Projects.
[#1]more here ????
Relationships between Ontologies
When a project is created the CollaborationContainer Ontology should be
available. The ontology extensions will be based upon an enterprise's
standard project defintion of activities and tasks. The next ontology added
would be the Phase Ontology this ontology will vary based upon the use of
CollaborationContainer. When used for larger projects which have more phases
and more time aggregation definitions the Phase Ontology could be fairly
A Project Ontology consists of the Collaboration Ontology with ontology
extension points specific for your enterprise project process. The classes
defined in a Project Ontology will be referenced by the Phase Ontology. The
Phase Ontology will also be defined as ontology extension to the base
Collaboration Ontology. The key to merging ontologies is through careful
scoping to prevent overlap of class definitions. We anticipate the basic
CollaborationContainer Ontology being able to general enough to allow
obvious extension while still providing enough structure to be a useful
guide of where information should be placed.
Corona Component Ontologies
As a project gets underway Corona Components to accomplish tasks will be
added to the Corona Workbench (need def of Corona Workbench). On the client
side Corona Component functionality is added to project via the Nature
extension points. Natures in the Corona environment would also provide
extension ontologies that indicate where a Corona Component's capability
fits within the Project environment. It also should indicate where the
artifacts produced by the Nature fit within the Project Ontology.
When Corona Components are added to a Project the Corona Component Ontology
information for the Component is merged with the Project Ontology. This
process is called Corona Component Project Registration. The ontology
information allows us to infer information about resources that are added to
Project. For example if a Debug Nature is added to a project and dump files
are produced by the debugger we anticipate dump files being produced. We can
also know which Corona Component produced the dump files and which product
the dump files are associated with. So, information about artifacts being
produced and consumed by the Corona Components can be added to the Knowledge
When a Corona Component is used to accomplish a task in a project the Corona
Component Ontology could define the phase (as defined by the Project Phase
Ontology) when the Corona Components will be used.
Web Service Metadata
Corona Component can expose web services and provide WSDM management
interface. The WSDM management interface includes an API for retrieving
metadata about a web service. This API allows specification of a metadata
dialect. Corona Components will support three interfaces:
The WSDL metadata dialect will provide basic definition about parameters
used in the web service call. It will also return the WSDM defined component
The ontology dialect will return the Corona Component Ontology definition.
The ontology as previously mentioned should be defined as an extension to
the Project Ontology.
The semantic dialect will return information about how the Corona Component
is used i.e. the context in which it can be used. The type of data it
produces and the type of data it consumes. Artifacts that are produced.
Current thinking is that the Project Knowledge can be queried to retrieve
information about the context the component is being used.  [#2] need
Event Information Harvesting
As a Project progresses within the Corona Workbench, Corona Components
generate artifacts and events indicating state changes to the project such
as tasks completed, resources added removed, test success or failure. These
event called Corona Collaboration Events are published to a topic. The
Corona Knowledge Monitor subcribes to these topics and processes events.
Information from the event may be ignored or enhance from other sources.
Non-Corona events can be monitored by writing new Knowledge Monitors that
listen for other event types. The events will still be converted into
RDF/OWL triples and written into the Knowledge Base.
Decouple Event System from the Project Ontologies and Phase Ontologies
Events need a minimal amount of information to do their job. That job is
deliver information from one component to another. So, they need a way to
c.. event type (what happened)
d.. event properties (information about what happened)
e.. object identification (which objects were involved in the event.
By partitioning the information handled by the event into two classes we can
reduce the impact of changing Ontologies on the event system.
We can look to JMS as a model for this approach it classifies information as
envelope information and content information. The information in the content
is not visible to the event system whereas the envelope information is
visible. The event system uses the event envelope information to route
messages to destinations. Only the event creator (publisher) and the event
(consumer) care about the contents of the event ( the data payload).
All objects in the Corona Workbench are identified by a URI. This approach
allows components to be uniquely identified, this works well with the Web
Services, WSDM, RDF/OWL however, there are identification schemes which
would require mapping from non-URI naming domain to the URI naming domain.
Items requiring mapping from the Ontology to the Event System
a.. Object identity
b.. Object type