Home » Modeling » EMF "Technology" (Ecore Tools, EMFatic, etc) » Tokenizing of strings in OCL ?
Tokenizing of strings in OCL ? [message #60105] |
Sat, 04 November 2006 02:35 |
Eclipse User |
|
|
|
Originally posted by: maiera.de.ibm.com
Hi,
I have the need to tokenize strings in OCL.
To begin with a simple task: In the string "CIM_ManagedElement", extract
the string before the first underscore, and the string after the first
underscore. The length of the first string is arbitrary, and there may
be more than one underscore. The character set is Unicode.
How can this be done in OCL ?
Andy
|
|
|
Re: Tokenizing of strings in OCL ? [message #60131 is a reply to message #60105] |
Mon, 06 November 2006 16:14 |
Eclipse User |
|
|
|
Originally posted by: cdamus.ca.ibm.com
Hi, Andy,
It's a little awkward, but feasible. Assuming that "CIM_ManagedElement" is
the 'name' property of the context element:
let underscore : Integer =
Sequence{1..name.size()}->iterate(i : Integer; c : Integer = 0 |
if c = 0 and name.substring(i, i) = '_' then
i
else
c
endif) in
if underscore > 0 then
Sequence{
name.substring(1, underscore),
name.substring(underscore + 1, name.size())}
else
Sequence{}
endif
Basically, you to iterate all of the character indices in the name and
select the first one that is a subsequence matching '_'. Then, if we found
it, we can create a sequence containing the two parts of the name.
If you want to find the last underscore, then remove the "c=0 and" condition
in the iteration.
Given that you want the first underscore only, the following would actually
be a little simpler and more efficient (not having to iterate through the
rest of the string after finding the underscore):
let underscore : Integer =
Sequence{1..name.size()}->any(i |
name.substring(i, i) = '_') in
if not underscore.oclIsUndefined() then
Sequence{
name.substring(1, underscore),
name.substring(underscore + 1, name.size())}
else
Sequence{}
endif
Of course, this depends on the implementation detail of the 'any' iterator
always selecting the first match. More correct, perhaps, would be:
let underscore : Integer =
Sequence{1..name.size()}->select(i |
name.substring(i, i) = '_')->first() in
if not underscore.oclIsUndefined() then
Sequence{
name.substring(1, underscore),
name.substring(underscore + 1, name.size())}
else
Sequence{}
endif
HTH,
Christian
Andreas Maier wrote:
> Hi,
> I have the need to tokenize strings in OCL.
>
> To begin with a simple task: In the string "CIM_ManagedElement", extract
> the string before the first underscore, and the string after the first
> underscore. The length of the first string is arbitrary, and there may
> be more than one underscore. The character set is Unicode.
>
> How can this be done in OCL ?
>
> Andy
|
|
|
Re: Tokenizing of strings in OCL ? [message #60155 is a reply to message #60131] |
Mon, 06 November 2006 19:31 |
Eclipse User |
|
|
|
Originally posted by: cdamus.ca.ibm.com
Hi, Andy,
I should point out, of course, that the third solution I posted (using the
select iterator) has the same problem as the first (using iterate) of
iterating through the entire string.
I think the second solution (using the 'any' iterator) is, in fact,
preferable. I had another look at the spec, and it defines the 'any'
iterator semantics by mapping it onto the 'select' iterator (which, in
turn, maps onto the primitive 'iterate'). These mappings actually specify
that, for a sequence, the 'any' iterator will always return the first
matching element.
In applying the mapping (Section 11.9.1):
source->any(iterator | body) =
source->select(iterator | body)->asSequence()->first()
to a Sequence, 'select' returns a sequence in which the selected elements
are in the same order as they appear in the original sequence.
asSequence(), when applied to a Sequence, results in itself, so first()
effectively selects the first element in the original sequence for which
the body is true.
So, using the 'any' formulation should provide you with a portable (between
OCL implementations) solution that performs optimally on implementations
that short-circuit on finding the first result, as does ours.
Cheers,
Christian
|
|
|
Tokenizing of strings in OCL ? - solution [message #60179 is a reply to message #60131] |
Tue, 07 November 2006 15:06 |
Eclipse User |
|
|
|
Originally posted by: maiera.de.ibm.com
Hi Christian,
I read both your replies, thanks for the quick and thorough answer!
Using that, I have defined the following two constraints and tested that
they work fine:
Constraint on UML Class:
"The name of a CIM class must follow the format
schemaname '_' classname"
let upos : Integer = /* position of first underscore in class name */
Sequence { 1 .. self.base_Class.name.size() }->any( i |
self.base_Class.name.substring( i, i) = '_')
in
if upos.oclIsUndefined() then false
else
upos > 1 and
upos < self.base_Class.name.size()
endif
Constraint on UML Class:
"The schema name of a CIM class must appear in its package path"
let upos : Integer = /* position of first underscore in class name */
Sequence { 1 .. self.base_Class.name.size() }->any( i |
self.base_Class.name.substring( i, i) = '_')
in
if upos.oclIsUndefined() then false
else
let schema : String = /* schema name surrounded by delimiters */
'::'.concat(self.base_Class.name.substring( 1, upos-1).concat('::'))
in
let pkgpos : Integer = /* position of schema name in package path */
Sequence { 1 .. self.base_Class.qualifiedName.size() -
schema.size() + 1 }->any( i |
self.base_Class.qualifiedName.substring(
i, i + schema.size() - 1) = schema)
in
not pkgpos.oclIsUndefined()
endif
However, I think the search for the strings using any() and substring()
is a bit heavy for something like OCL, especially if I imagine to run
this against the 1600+ classes in the CIM Schema.
How would I go about proposing additional OCL functions that improve
that, e.g. a function to return the first substring that matches a
pattern, or a function that tests for match against a pattern ? I.e.
would the starting point for a discussion be an implementation to test
the concept, or the WG defining the OCL standard ?
Is anything planned in this direction ?
Andy
Christian W. Damus wrote:
> Hi, Andy,
>
> It's a little awkward, but feasible. Assuming that "CIM_ManagedElement" is
> the 'name' property of the context element:
>
> let underscore : Integer =
> Sequence{1..name.size()}->iterate(i : Integer; c : Integer = 0 |
> if c = 0 and name.substring(i, i) = '_' then
> i
> else
> c
> endif) in
> if underscore > 0 then
> Sequence{
> name.substring(1, underscore),
> name.substring(underscore + 1, name.size())}
> else
> Sequence{}
> endif
>
> Basically, you to iterate all of the character indices in the name and
> select the first one that is a subsequence matching '_'. Then, if we found
> it, we can create a sequence containing the two parts of the name.
>
> If you want to find the last underscore, then remove the "c=0 and" condition
> in the iteration.
>
> Given that you want the first underscore only, the following would actually
> be a little simpler and more efficient (not having to iterate through the
> rest of the string after finding the underscore):
>
> let underscore : Integer =
> Sequence{1..name.size()}->any(i |
> name.substring(i, i) = '_') in
> if not underscore.oclIsUndefined() then
> Sequence{
> name.substring(1, underscore),
> name.substring(underscore + 1, name.size())}
> else
> Sequence{}
> endif
>
> Of course, this depends on the implementation detail of the 'any' iterator
> always selecting the first match. More correct, perhaps, would be:
>
> let underscore : Integer =
> Sequence{1..name.size()}->select(i |
> name.substring(i, i) = '_')->first() in
> if not underscore.oclIsUndefined() then
> Sequence{
> name.substring(1, underscore),
> name.substring(underscore + 1, name.size())}
> else
> Sequence{}
> endif
>
>
> HTH,
>
> Christian
>
>
> Andreas Maier wrote:
>
>> Hi,
>> I have the need to tokenize strings in OCL.
>>
>> To begin with a simple task: In the string "CIM_ManagedElement", extract
>> the string before the first underscore, and the string after the first
>> underscore. The length of the first string is arbitrary, and there may
>> be more than one underscore. The character set is Unicode.
>>
>> How can this be done in OCL ?
>>
>> Andy
>
|
|
|
Re: Tokenizing of strings in OCL ? - solution [message #60256 is a reply to message #60179] |
Tue, 07 November 2006 19:59 |
Eclipse User |
|
|
|
Originally posted by: cdamus.ca.ibm.com
Hi, Andy,
I'm glad that this is working for you! However, I share your concern about
the performance ...
I answered your other posting about extending the OCL environment to add
this capability. Regarding having the OCL Standard Library extended, that
requires participation in the OCL working group to influence the
development of the specification. I don't think that having a reference
implementation is a requirement. You can submit issues by following the
link from the OMG's website:
http://www.omg.org/technology/documents/formal/ocl.htm
for consideration by the revision task force (the 2.0 version of the
specification was recently officially published). I don't know of any
current plan to enhance the standard library.
Cheers,
Christian
Andreas Maier wrote:
> Hi Christian,
> I read both your replies, thanks for the quick and thorough answer!
>
> Using that, I have defined the following two constraints and tested that
> they work fine:
>
>
> Constraint on UML Class:
> "The name of a CIM class must follow the format
> schemaname '_' classname"
>
> let upos : Integer = /* position of first underscore in class name */
> Sequence { 1 .. self.base_Class.name.size() }->any( i |
> self.base_Class.name.substring( i, i) = '_')
> in
> if upos.oclIsUndefined() then false
> else
> upos > 1 and
> upos < self.base_Class.name.size()
> endif
>
>
> Constraint on UML Class:
> "The schema name of a CIM class must appear in its package path"
>
> let upos : Integer = /* position of first underscore in class name */
> Sequence { 1 .. self.base_Class.name.size() }->any( i |
> self.base_Class.name.substring( i, i) = '_')
> in
> if upos.oclIsUndefined() then false
> else
> let schema : String = /* schema name surrounded by delimiters */
> '::'.concat(self.base_Class.name.substring( 1, upos-1).concat('::'))
> in
> let pkgpos : Integer = /* position of schema name in package path */
> Sequence { 1 .. self.base_Class.qualifiedName.size() -
> schema.size() + 1 }->any( i |
> self.base_Class.qualifiedName.substring(
> i, i + schema.size() - 1) = schema)
> in
> not pkgpos.oclIsUndefined()
> endif
>
>
> However, I think the search for the strings using any() and substring()
> is a bit heavy for something like OCL, especially if I imagine to run
> this against the 1600+ classes in the CIM Schema.
>
> How would I go about proposing additional OCL functions that improve
> that, e.g. a function to return the first substring that matches a
> pattern, or a function that tests for match against a pattern ? I.e.
> would the starting point for a discussion be an implementation to test
> the concept, or the WG defining the OCL standard ?
>
> Is anything planned in this direction ?
>
> Andy
<snip>
|
|
|
Re: Tokenizing of strings in OCL ? [message #595172 is a reply to message #60105] |
Mon, 06 November 2006 16:14 |
Eclipse User |
|
|
|
Originally posted by: cdamus.ca.ibm.com
Hi, Andy,
It's a little awkward, but feasible. Assuming that "CIM_ManagedElement" is
the 'name' property of the context element:
let underscore : Integer =
Sequence{1..name.size()}->iterate(i : Integer; c : Integer = 0 |
if c = 0 and name.substring(i, i) = '_' then
i
else
c
endif) in
if underscore > 0 then
Sequence{
name.substring(1, underscore),
name.substring(underscore + 1, name.size())}
else
Sequence{}
endif
Basically, you to iterate all of the character indices in the name and
select the first one that is a subsequence matching '_'. Then, if we found
it, we can create a sequence containing the two parts of the name.
If you want to find the last underscore, then remove the "c=0 and" condition
in the iteration.
Given that you want the first underscore only, the following would actually
be a little simpler and more efficient (not having to iterate through the
rest of the string after finding the underscore):
let underscore : Integer =
Sequence{1..name.size()}->any(i |
name.substring(i, i) = '_') in
if not underscore.oclIsUndefined() then
Sequence{
name.substring(1, underscore),
name.substring(underscore + 1, name.size())}
else
Sequence{}
endif
Of course, this depends on the implementation detail of the 'any' iterator
always selecting the first match. More correct, perhaps, would be:
let underscore : Integer =
Sequence{1..name.size()}->select(i |
name.substring(i, i) = '_')->first() in
if not underscore.oclIsUndefined() then
Sequence{
name.substring(1, underscore),
name.substring(underscore + 1, name.size())}
else
Sequence{}
endif
HTH,
Christian
Andreas Maier wrote:
> Hi,
> I have the need to tokenize strings in OCL.
>
> To begin with a simple task: In the string "CIM_ManagedElement", extract
> the string before the first underscore, and the string after the first
> underscore. The length of the first string is arbitrary, and there may
> be more than one underscore. The character set is Unicode.
>
> How can this be done in OCL ?
>
> Andy
|
|
|
Re: Tokenizing of strings in OCL ? [message #595180 is a reply to message #60131] |
Mon, 06 November 2006 19:31 |
Eclipse User |
|
|
|
Originally posted by: cdamus.ca.ibm.com
Hi, Andy,
I should point out, of course, that the third solution I posted (using the
select iterator) has the same problem as the first (using iterate) of
iterating through the entire string.
I think the second solution (using the 'any' iterator) is, in fact,
preferable. I had another look at the spec, and it defines the 'any'
iterator semantics by mapping it onto the 'select' iterator (which, in
turn, maps onto the primitive 'iterate'). These mappings actually specify
that, for a sequence, the 'any' iterator will always return the first
matching element.
In applying the mapping (Section 11.9.1):
source->any(iterator | body) =
source->select(iterator | body)->asSequence()->first()
to a Sequence, 'select' returns a sequence in which the selected elements
are in the same order as they appear in the original sequence.
asSequence(), when applied to a Sequence, results in itself, so first()
effectively selects the first element in the original sequence for which
the body is true.
So, using the 'any' formulation should provide you with a portable (between
OCL implementations) solution that performs optimally on implementations
that short-circuit on finding the first result, as does ours.
Cheers,
Christian
|
|
|
Tokenizing of strings in OCL ? - solution [message #595195 is a reply to message #60131] |
Tue, 07 November 2006 15:06 |
Andreas Maier Messages: 32 Registered: July 2009 |
Member |
|
|
Hi Christian,
I read both your replies, thanks for the quick and thorough answer!
Using that, I have defined the following two constraints and tested that
they work fine:
Constraint on UML Class:
"The name of a CIM class must follow the format
schemaname '_' classname"
let upos : Integer = /* position of first underscore in class name */
Sequence { 1 .. self.base_Class.name.size() }->any( i |
self.base_Class.name.substring( i, i) = '_')
in
if upos.oclIsUndefined() then false
else
upos > 1 and
upos < self.base_Class.name.size()
endif
Constraint on UML Class:
"The schema name of a CIM class must appear in its package path"
let upos : Integer = /* position of first underscore in class name */
Sequence { 1 .. self.base_Class.name.size() }->any( i |
self.base_Class.name.substring( i, i) = '_')
in
if upos.oclIsUndefined() then false
else
let schema : String = /* schema name surrounded by delimiters */
'::'.concat(self.base_Class.name.substring( 1, upos-1).concat('::'))
in
let pkgpos : Integer = /* position of schema name in package path */
Sequence { 1 .. self.base_Class.qualifiedName.size() -
schema.size() + 1 }->any( i |
self.base_Class.qualifiedName.substring(
i, i + schema.size() - 1) = schema)
in
not pkgpos.oclIsUndefined()
endif
However, I think the search for the strings using any() and substring()
is a bit heavy for something like OCL, especially if I imagine to run
this against the 1600+ classes in the CIM Schema.
How would I go about proposing additional OCL functions that improve
that, e.g. a function to return the first substring that matches a
pattern, or a function that tests for match against a pattern ? I.e.
would the starting point for a discussion be an implementation to test
the concept, or the WG defining the OCL standard ?
Is anything planned in this direction ?
Andy
Christian W. Damus wrote:
> Hi, Andy,
>
> It's a little awkward, but feasible. Assuming that "CIM_ManagedElement" is
> the 'name' property of the context element:
>
> let underscore : Integer =
> Sequence{1..name.size()}->iterate(i : Integer; c : Integer = 0 |
> if c = 0 and name.substring(i, i) = '_' then
> i
> else
> c
> endif) in
> if underscore > 0 then
> Sequence{
> name.substring(1, underscore),
> name.substring(underscore + 1, name.size())}
> else
> Sequence{}
> endif
>
> Basically, you to iterate all of the character indices in the name and
> select the first one that is a subsequence matching '_'. Then, if we found
> it, we can create a sequence containing the two parts of the name.
>
> If you want to find the last underscore, then remove the "c=0 and" condition
> in the iteration.
>
> Given that you want the first underscore only, the following would actually
> be a little simpler and more efficient (not having to iterate through the
> rest of the string after finding the underscore):
>
> let underscore : Integer =
> Sequence{1..name.size()}->any(i |
> name.substring(i, i) = '_') in
> if not underscore.oclIsUndefined() then
> Sequence{
> name.substring(1, underscore),
> name.substring(underscore + 1, name.size())}
> else
> Sequence{}
> endif
>
> Of course, this depends on the implementation detail of the 'any' iterator
> always selecting the first match. More correct, perhaps, would be:
>
> let underscore : Integer =
> Sequence{1..name.size()}->select(i |
> name.substring(i, i) = '_')->first() in
> if not underscore.oclIsUndefined() then
> Sequence{
> name.substring(1, underscore),
> name.substring(underscore + 1, name.size())}
> else
> Sequence{}
> endif
>
>
> HTH,
>
> Christian
>
>
> Andreas Maier wrote:
>
>> Hi,
>> I have the need to tokenize strings in OCL.
>>
>> To begin with a simple task: In the string "CIM_ManagedElement", extract
>> the string before the first underscore, and the string after the first
>> underscore. The length of the first string is arbitrary, and there may
>> be more than one underscore. The character set is Unicode.
>>
>> How can this be done in OCL ?
>>
>> Andy
>
|
|
|
Re: Tokenizing of strings in OCL ? - solution [message #595245 is a reply to message #60179] |
Tue, 07 November 2006 19:59 |
Eclipse User |
|
|
|
Originally posted by: cdamus.ca.ibm.com
Hi, Andy,
I'm glad that this is working for you! However, I share your concern about
the performance ...
I answered your other posting about extending the OCL environment to add
this capability. Regarding having the OCL Standard Library extended, that
requires participation in the OCL working group to influence the
development of the specification. I don't think that having a reference
implementation is a requirement. You can submit issues by following the
link from the OMG's website:
http://www.omg.org/technology/documents/formal/ocl.htm
for consideration by the revision task force (the 2.0 version of the
specification was recently officially published). I don't know of any
current plan to enhance the standard library.
Cheers,
Christian
Andreas Maier wrote:
> Hi Christian,
> I read both your replies, thanks for the quick and thorough answer!
>
> Using that, I have defined the following two constraints and tested that
> they work fine:
>
>
> Constraint on UML Class:
> "The name of a CIM class must follow the format
> schemaname '_' classname"
>
> let upos : Integer = /* position of first underscore in class name */
> Sequence { 1 .. self.base_Class.name.size() }->any( i |
> self.base_Class.name.substring( i, i) = '_')
> in
> if upos.oclIsUndefined() then false
> else
> upos > 1 and
> upos < self.base_Class.name.size()
> endif
>
>
> Constraint on UML Class:
> "The schema name of a CIM class must appear in its package path"
>
> let upos : Integer = /* position of first underscore in class name */
> Sequence { 1 .. self.base_Class.name.size() }->any( i |
> self.base_Class.name.substring( i, i) = '_')
> in
> if upos.oclIsUndefined() then false
> else
> let schema : String = /* schema name surrounded by delimiters */
> '::'.concat(self.base_Class.name.substring( 1, upos-1).concat('::'))
> in
> let pkgpos : Integer = /* position of schema name in package path */
> Sequence { 1 .. self.base_Class.qualifiedName.size() -
> schema.size() + 1 }->any( i |
> self.base_Class.qualifiedName.substring(
> i, i + schema.size() - 1) = schema)
> in
> not pkgpos.oclIsUndefined()
> endif
>
>
> However, I think the search for the strings using any() and substring()
> is a bit heavy for something like OCL, especially if I imagine to run
> this against the 1600+ classes in the CIM Schema.
>
> How would I go about proposing additional OCL functions that improve
> that, e.g. a function to return the first substring that matches a
> pattern, or a function that tests for match against a pattern ? I.e.
> would the starting point for a discussion be an implementation to test
> the concept, or the WG defining the OCL standard ?
>
> Is anything planned in this direction ?
>
> Andy
<snip>
|
|
|
Goto Forum:
Current Time: Wed Apr 24 13:39:41 GMT 2024
Powered by FUDForum. Page generated in 0.03927 seconds
|