Skip to main content


Eclipse Community Forums
Forum Search:

Search      Help    Register    Login    Home
Home » Modeling » EMF "Technology" (Ecore Tools, EMFatic, etc)  » Tokenizing of strings in OCL ?
Tokenizing of strings in OCL ? [message #60105] Sat, 04 November 2006 02:35 Go to next message
Eclipse UserFriend
Originally posted by: maiera.de.ibm.com

Hi,
I have the need to tokenize strings in OCL.

To begin with a simple task: In the string "CIM_ManagedElement", extract
the string before the first underscore, and the string after the first
underscore. The length of the first string is arbitrary, and there may
be more than one underscore. The character set is Unicode.

How can this be done in OCL ?

Andy
Re: Tokenizing of strings in OCL ? [message #60131 is a reply to message #60105] Mon, 06 November 2006 16:14 Go to previous messageGo to next message
Eclipse UserFriend
Originally posted by: cdamus.ca.ibm.com

Hi, Andy,

It's a little awkward, but feasible. Assuming that "CIM_ManagedElement" is
the 'name' property of the context element:

let underscore : Integer =
Sequence{1..name.size()}->iterate(i : Integer; c : Integer = 0 |
if c = 0 and name.substring(i, i) = '_' then
i
else
c
endif) in
if underscore > 0 then
Sequence{
name.substring(1, underscore),
name.substring(underscore + 1, name.size())}
else
Sequence{}
endif

Basically, you to iterate all of the character indices in the name and
select the first one that is a subsequence matching '_'. Then, if we found
it, we can create a sequence containing the two parts of the name.

If you want to find the last underscore, then remove the "c=0 and" condition
in the iteration.

Given that you want the first underscore only, the following would actually
be a little simpler and more efficient (not having to iterate through the
rest of the string after finding the underscore):

let underscore : Integer =
Sequence{1..name.size()}->any(i |
name.substring(i, i) = '_') in
if not underscore.oclIsUndefined() then
Sequence{
name.substring(1, underscore),
name.substring(underscore + 1, name.size())}
else
Sequence{}
endif

Of course, this depends on the implementation detail of the 'any' iterator
always selecting the first match. More correct, perhaps, would be:

let underscore : Integer =
Sequence{1..name.size()}->select(i |
name.substring(i, i) = '_')->first() in
if not underscore.oclIsUndefined() then
Sequence{
name.substring(1, underscore),
name.substring(underscore + 1, name.size())}
else
Sequence{}
endif


HTH,

Christian


Andreas Maier wrote:

> Hi,
> I have the need to tokenize strings in OCL.
>
> To begin with a simple task: In the string "CIM_ManagedElement", extract
> the string before the first underscore, and the string after the first
> underscore. The length of the first string is arbitrary, and there may
> be more than one underscore. The character set is Unicode.
>
> How can this be done in OCL ?
>
> Andy
Re: Tokenizing of strings in OCL ? [message #60155 is a reply to message #60131] Mon, 06 November 2006 19:31 Go to previous messageGo to next message
Eclipse UserFriend
Originally posted by: cdamus.ca.ibm.com

Hi, Andy,

I should point out, of course, that the third solution I posted (using the
select iterator) has the same problem as the first (using iterate) of
iterating through the entire string.

I think the second solution (using the 'any' iterator) is, in fact,
preferable. I had another look at the spec, and it defines the 'any'
iterator semantics by mapping it onto the 'select' iterator (which, in
turn, maps onto the primitive 'iterate'). These mappings actually specify
that, for a sequence, the 'any' iterator will always return the first
matching element.

In applying the mapping (Section 11.9.1):

source->any(iterator | body) =
source->select(iterator | body)->asSequence()->first()

to a Sequence, 'select' returns a sequence in which the selected elements
are in the same order as they appear in the original sequence.
asSequence(), when applied to a Sequence, results in itself, so first()
effectively selects the first element in the original sequence for which
the body is true.

So, using the 'any' formulation should provide you with a portable (between
OCL implementations) solution that performs optimally on implementations
that short-circuit on finding the first result, as does ours.

Cheers,

Christian
Tokenizing of strings in OCL ? - solution [message #60179 is a reply to message #60131] Tue, 07 November 2006 15:06 Go to previous messageGo to next message
Eclipse UserFriend
Originally posted by: maiera.de.ibm.com

Hi Christian,
I read both your replies, thanks for the quick and thorough answer!

Using that, I have defined the following two constraints and tested that
they work fine:


Constraint on UML Class:
"The name of a CIM class must follow the format
schemaname '_' classname"

let upos : Integer = /* position of first underscore in class name */
Sequence { 1 .. self.base_Class.name.size() }->any( i |
self.base_Class.name.substring( i, i) = '_')
in
if upos.oclIsUndefined() then false
else
upos > 1 and
upos < self.base_Class.name.size()
endif


Constraint on UML Class:
"The schema name of a CIM class must appear in its package path"

let upos : Integer = /* position of first underscore in class name */
Sequence { 1 .. self.base_Class.name.size() }->any( i |
self.base_Class.name.substring( i, i) = '_')
in
if upos.oclIsUndefined() then false
else
let schema : String = /* schema name surrounded by delimiters */
'::'.concat(self.base_Class.name.substring( 1, upos-1).concat('::'))
in
let pkgpos : Integer = /* position of schema name in package path */
Sequence { 1 .. self.base_Class.qualifiedName.size() -
schema.size() + 1 }->any( i |
self.base_Class.qualifiedName.substring(
i, i + schema.size() - 1) = schema)
in
not pkgpos.oclIsUndefined()
endif


However, I think the search for the strings using any() and substring()
is a bit heavy for something like OCL, especially if I imagine to run
this against the 1600+ classes in the CIM Schema.

How would I go about proposing additional OCL functions that improve
that, e.g. a function to return the first substring that matches a
pattern, or a function that tests for match against a pattern ? I.e.
would the starting point for a discussion be an implementation to test
the concept, or the WG defining the OCL standard ?

Is anything planned in this direction ?

Andy

Christian W. Damus wrote:
> Hi, Andy,
>
> It's a little awkward, but feasible. Assuming that "CIM_ManagedElement" is
> the 'name' property of the context element:
>
> let underscore : Integer =
> Sequence{1..name.size()}->iterate(i : Integer; c : Integer = 0 |
> if c = 0 and name.substring(i, i) = '_' then
> i
> else
> c
> endif) in
> if underscore > 0 then
> Sequence{
> name.substring(1, underscore),
> name.substring(underscore + 1, name.size())}
> else
> Sequence{}
> endif
>
> Basically, you to iterate all of the character indices in the name and
> select the first one that is a subsequence matching '_'. Then, if we found
> it, we can create a sequence containing the two parts of the name.
>
> If you want to find the last underscore, then remove the "c=0 and" condition
> in the iteration.
>
> Given that you want the first underscore only, the following would actually
> be a little simpler and more efficient (not having to iterate through the
> rest of the string after finding the underscore):
>
> let underscore : Integer =
> Sequence{1..name.size()}->any(i |
> name.substring(i, i) = '_') in
> if not underscore.oclIsUndefined() then
> Sequence{
> name.substring(1, underscore),
> name.substring(underscore + 1, name.size())}
> else
> Sequence{}
> endif
>
> Of course, this depends on the implementation detail of the 'any' iterator
> always selecting the first match. More correct, perhaps, would be:
>
> let underscore : Integer =
> Sequence{1..name.size()}->select(i |
> name.substring(i, i) = '_')->first() in
> if not underscore.oclIsUndefined() then
> Sequence{
> name.substring(1, underscore),
> name.substring(underscore + 1, name.size())}
> else
> Sequence{}
> endif
>
>
> HTH,
>
> Christian
>
>
> Andreas Maier wrote:
>
>> Hi,
>> I have the need to tokenize strings in OCL.
>>
>> To begin with a simple task: In the string "CIM_ManagedElement", extract
>> the string before the first underscore, and the string after the first
>> underscore. The length of the first string is arbitrary, and there may
>> be more than one underscore. The character set is Unicode.
>>
>> How can this be done in OCL ?
>>
>> Andy
>
Re: Tokenizing of strings in OCL ? - solution [message #60256 is a reply to message #60179] Tue, 07 November 2006 19:59 Go to previous message
Eclipse UserFriend
Originally posted by: cdamus.ca.ibm.com

Hi, Andy,

I'm glad that this is working for you! However, I share your concern about
the performance ...

I answered your other posting about extending the OCL environment to add
this capability. Regarding having the OCL Standard Library extended, that
requires participation in the OCL working group to influence the
development of the specification. I don't think that having a reference
implementation is a requirement. You can submit issues by following the
link from the OMG's website:

http://www.omg.org/technology/documents/formal/ocl.htm

for consideration by the revision task force (the 2.0 version of the
specification was recently officially published). I don't know of any
current plan to enhance the standard library.

Cheers,

Christian


Andreas Maier wrote:

> Hi Christian,
> I read both your replies, thanks for the quick and thorough answer!
>
> Using that, I have defined the following two constraints and tested that
> they work fine:
>
>
> Constraint on UML Class:
> "The name of a CIM class must follow the format
> schemaname '_' classname"
>
> let upos : Integer = /* position of first underscore in class name */
> Sequence { 1 .. self.base_Class.name.size() }->any( i |
> self.base_Class.name.substring( i, i) = '_')
> in
> if upos.oclIsUndefined() then false
> else
> upos > 1 and
> upos < self.base_Class.name.size()
> endif
>
>
> Constraint on UML Class:
> "The schema name of a CIM class must appear in its package path"
>
> let upos : Integer = /* position of first underscore in class name */
> Sequence { 1 .. self.base_Class.name.size() }->any( i |
> self.base_Class.name.substring( i, i) = '_')
> in
> if upos.oclIsUndefined() then false
> else
> let schema : String = /* schema name surrounded by delimiters */
> '::'.concat(self.base_Class.name.substring( 1, upos-1).concat('::'))
> in
> let pkgpos : Integer = /* position of schema name in package path */
> Sequence { 1 .. self.base_Class.qualifiedName.size() -
> schema.size() + 1 }->any( i |
> self.base_Class.qualifiedName.substring(
> i, i + schema.size() - 1) = schema)
> in
> not pkgpos.oclIsUndefined()
> endif
>
>
> However, I think the search for the strings using any() and substring()
> is a bit heavy for something like OCL, especially if I imagine to run
> this against the 1600+ classes in the CIM Schema.
>
> How would I go about proposing additional OCL functions that improve
> that, e.g. a function to return the first substring that matches a
> pattern, or a function that tests for match against a pattern ? I.e.
> would the starting point for a discussion be an implementation to test
> the concept, or the WG defining the OCL standard ?
>
> Is anything planned in this direction ?
>
> Andy

<snip>
Re: Tokenizing of strings in OCL ? [message #595172 is a reply to message #60105] Mon, 06 November 2006 16:14 Go to previous message
Eclipse UserFriend
Originally posted by: cdamus.ca.ibm.com

Hi, Andy,

It's a little awkward, but feasible. Assuming that "CIM_ManagedElement" is
the 'name' property of the context element:

let underscore : Integer =
Sequence{1..name.size()}->iterate(i : Integer; c : Integer = 0 |
if c = 0 and name.substring(i, i) = '_' then
i
else
c
endif) in
if underscore > 0 then
Sequence{
name.substring(1, underscore),
name.substring(underscore + 1, name.size())}
else
Sequence{}
endif

Basically, you to iterate all of the character indices in the name and
select the first one that is a subsequence matching '_'. Then, if we found
it, we can create a sequence containing the two parts of the name.

If you want to find the last underscore, then remove the "c=0 and" condition
in the iteration.

Given that you want the first underscore only, the following would actually
be a little simpler and more efficient (not having to iterate through the
rest of the string after finding the underscore):

let underscore : Integer =
Sequence{1..name.size()}->any(i |
name.substring(i, i) = '_') in
if not underscore.oclIsUndefined() then
Sequence{
name.substring(1, underscore),
name.substring(underscore + 1, name.size())}
else
Sequence{}
endif

Of course, this depends on the implementation detail of the 'any' iterator
always selecting the first match. More correct, perhaps, would be:

let underscore : Integer =
Sequence{1..name.size()}->select(i |
name.substring(i, i) = '_')->first() in
if not underscore.oclIsUndefined() then
Sequence{
name.substring(1, underscore),
name.substring(underscore + 1, name.size())}
else
Sequence{}
endif


HTH,

Christian


Andreas Maier wrote:

> Hi,
> I have the need to tokenize strings in OCL.
>
> To begin with a simple task: In the string "CIM_ManagedElement", extract
> the string before the first underscore, and the string after the first
> underscore. The length of the first string is arbitrary, and there may
> be more than one underscore. The character set is Unicode.
>
> How can this be done in OCL ?
>
> Andy
Re: Tokenizing of strings in OCL ? [message #595180 is a reply to message #60131] Mon, 06 November 2006 19:31 Go to previous message
Eclipse UserFriend
Originally posted by: cdamus.ca.ibm.com

Hi, Andy,

I should point out, of course, that the third solution I posted (using the
select iterator) has the same problem as the first (using iterate) of
iterating through the entire string.

I think the second solution (using the 'any' iterator) is, in fact,
preferable. I had another look at the spec, and it defines the 'any'
iterator semantics by mapping it onto the 'select' iterator (which, in
turn, maps onto the primitive 'iterate'). These mappings actually specify
that, for a sequence, the 'any' iterator will always return the first
matching element.

In applying the mapping (Section 11.9.1):

source->any(iterator | body) =
source->select(iterator | body)->asSequence()->first()

to a Sequence, 'select' returns a sequence in which the selected elements
are in the same order as they appear in the original sequence.
asSequence(), when applied to a Sequence, results in itself, so first()
effectively selects the first element in the original sequence for which
the body is true.

So, using the 'any' formulation should provide you with a portable (between
OCL implementations) solution that performs optimally on implementations
that short-circuit on finding the first result, as does ours.

Cheers,

Christian
Tokenizing of strings in OCL ? - solution [message #595195 is a reply to message #60131] Tue, 07 November 2006 15:06 Go to previous message
Andreas Maier is currently offline Andreas MaierFriend
Messages: 32
Registered: July 2009
Member
Hi Christian,
I read both your replies, thanks for the quick and thorough answer!

Using that, I have defined the following two constraints and tested that
they work fine:


Constraint on UML Class:
"The name of a CIM class must follow the format
schemaname '_' classname"

let upos : Integer = /* position of first underscore in class name */
Sequence { 1 .. self.base_Class.name.size() }->any( i |
self.base_Class.name.substring( i, i) = '_')
in
if upos.oclIsUndefined() then false
else
upos > 1 and
upos < self.base_Class.name.size()
endif


Constraint on UML Class:
"The schema name of a CIM class must appear in its package path"

let upos : Integer = /* position of first underscore in class name */
Sequence { 1 .. self.base_Class.name.size() }->any( i |
self.base_Class.name.substring( i, i) = '_')
in
if upos.oclIsUndefined() then false
else
let schema : String = /* schema name surrounded by delimiters */
'::'.concat(self.base_Class.name.substring( 1, upos-1).concat('::'))
in
let pkgpos : Integer = /* position of schema name in package path */
Sequence { 1 .. self.base_Class.qualifiedName.size() -
schema.size() + 1 }->any( i |
self.base_Class.qualifiedName.substring(
i, i + schema.size() - 1) = schema)
in
not pkgpos.oclIsUndefined()
endif


However, I think the search for the strings using any() and substring()
is a bit heavy for something like OCL, especially if I imagine to run
this against the 1600+ classes in the CIM Schema.

How would I go about proposing additional OCL functions that improve
that, e.g. a function to return the first substring that matches a
pattern, or a function that tests for match against a pattern ? I.e.
would the starting point for a discussion be an implementation to test
the concept, or the WG defining the OCL standard ?

Is anything planned in this direction ?

Andy

Christian W. Damus wrote:
> Hi, Andy,
>
> It's a little awkward, but feasible. Assuming that "CIM_ManagedElement" is
> the 'name' property of the context element:
>
> let underscore : Integer =
> Sequence{1..name.size()}->iterate(i : Integer; c : Integer = 0 |
> if c = 0 and name.substring(i, i) = '_' then
> i
> else
> c
> endif) in
> if underscore > 0 then
> Sequence{
> name.substring(1, underscore),
> name.substring(underscore + 1, name.size())}
> else
> Sequence{}
> endif
>
> Basically, you to iterate all of the character indices in the name and
> select the first one that is a subsequence matching '_'. Then, if we found
> it, we can create a sequence containing the two parts of the name.
>
> If you want to find the last underscore, then remove the "c=0 and" condition
> in the iteration.
>
> Given that you want the first underscore only, the following would actually
> be a little simpler and more efficient (not having to iterate through the
> rest of the string after finding the underscore):
>
> let underscore : Integer =
> Sequence{1..name.size()}->any(i |
> name.substring(i, i) = '_') in
> if not underscore.oclIsUndefined() then
> Sequence{
> name.substring(1, underscore),
> name.substring(underscore + 1, name.size())}
> else
> Sequence{}
> endif
>
> Of course, this depends on the implementation detail of the 'any' iterator
> always selecting the first match. More correct, perhaps, would be:
>
> let underscore : Integer =
> Sequence{1..name.size()}->select(i |
> name.substring(i, i) = '_')->first() in
> if not underscore.oclIsUndefined() then
> Sequence{
> name.substring(1, underscore),
> name.substring(underscore + 1, name.size())}
> else
> Sequence{}
> endif
>
>
> HTH,
>
> Christian
>
>
> Andreas Maier wrote:
>
>> Hi,
>> I have the need to tokenize strings in OCL.
>>
>> To begin with a simple task: In the string "CIM_ManagedElement", extract
>> the string before the first underscore, and the string after the first
>> underscore. The length of the first string is arbitrary, and there may
>> be more than one underscore. The character set is Unicode.
>>
>> How can this be done in OCL ?
>>
>> Andy
>
Re: Tokenizing of strings in OCL ? - solution [message #595245 is a reply to message #60179] Tue, 07 November 2006 19:59 Go to previous message
Eclipse UserFriend
Originally posted by: cdamus.ca.ibm.com

Hi, Andy,

I'm glad that this is working for you! However, I share your concern about
the performance ...

I answered your other posting about extending the OCL environment to add
this capability. Regarding having the OCL Standard Library extended, that
requires participation in the OCL working group to influence the
development of the specification. I don't think that having a reference
implementation is a requirement. You can submit issues by following the
link from the OMG's website:

http://www.omg.org/technology/documents/formal/ocl.htm

for consideration by the revision task force (the 2.0 version of the
specification was recently officially published). I don't know of any
current plan to enhance the standard library.

Cheers,

Christian


Andreas Maier wrote:

> Hi Christian,
> I read both your replies, thanks for the quick and thorough answer!
>
> Using that, I have defined the following two constraints and tested that
> they work fine:
>
>
> Constraint on UML Class:
> "The name of a CIM class must follow the format
> schemaname '_' classname"
>
> let upos : Integer = /* position of first underscore in class name */
> Sequence { 1 .. self.base_Class.name.size() }->any( i |
> self.base_Class.name.substring( i, i) = '_')
> in
> if upos.oclIsUndefined() then false
> else
> upos > 1 and
> upos < self.base_Class.name.size()
> endif
>
>
> Constraint on UML Class:
> "The schema name of a CIM class must appear in its package path"
>
> let upos : Integer = /* position of first underscore in class name */
> Sequence { 1 .. self.base_Class.name.size() }->any( i |
> self.base_Class.name.substring( i, i) = '_')
> in
> if upos.oclIsUndefined() then false
> else
> let schema : String = /* schema name surrounded by delimiters */
> '::'.concat(self.base_Class.name.substring( 1, upos-1).concat('::'))
> in
> let pkgpos : Integer = /* position of schema name in package path */
> Sequence { 1 .. self.base_Class.qualifiedName.size() -
> schema.size() + 1 }->any( i |
> self.base_Class.qualifiedName.substring(
> i, i + schema.size() - 1) = schema)
> in
> not pkgpos.oclIsUndefined()
> endif
>
>
> However, I think the search for the strings using any() and substring()
> is a bit heavy for something like OCL, especially if I imagine to run
> this against the 1600+ classes in the CIM Schema.
>
> How would I go about proposing additional OCL functions that improve
> that, e.g. a function to return the first substring that matches a
> pattern, or a function that tests for match against a pattern ? I.e.
> would the starting point for a discussion be an implementation to test
> the concept, or the WG defining the OCL standard ?
>
> Is anything planned in this direction ?
>
> Andy

<snip>
Previous Topic:OCL Newsgroup and Bugzilla Have Moved
Next Topic:Performance Problem with Teneo 200610261350 / Jpox 1.1.3
Goto Forum:
  


Current Time: Wed Apr 24 13:39:41 GMT 2024

Powered by FUDForum. Page generated in 0.03927 seconds
.:: Contact :: Home ::.

Powered by: FUDforum 3.0.2.
Copyright ©2001-2010 FUDforum Bulletin Board Software

Back to the top