[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
| Re: [geomesa-users] Question about min and max times in indexing | 
Hi David,
I think I see the problem - we added a secondary z-index to our 
attribute indices a while back. However, that code doesn't check to see 
if a table is disabled or not. I've created a ticket here for the issue:
https://geomesa.atlassian.net/browse/GEOMESA-1784
I believe as a work-around you could just edit the attributes row and 
completely remove the 'geomesa.index.dtg' entry - that should 
effectively make it ignore your date field. (You would need to bounce 
your application as we cache simple feature types fairly aggressively). 
I will test that out to verify.
Thanks,
Emilio
On 04/21/2017 05:24 PM, David Boyd wrote:
Emilio:
  So I found part of the problem.
I missspelled geomesa.indexes.enable (had geomeas).
So now at least my metadata attributes do not have the Z3 entry:
ActorRecordset~attributes : [] 
objectKey:String,entityName:String,entitySource:String,entityTitle:String,recordKey:String:cardinality=high:index=full,Name:String:cardinality=high:index=full,Type:String:cardinality=high:index=full,NameMetaphone:String:cardinality=high:index=full,Country:String:cardinality=high:index=full,AffiliationTo:String:cardinality=high:index=full,AffiliationStart:Date:cardinality=high:index=full,AffiliationEnd:Date:cardinality=high:index=full,Aliases:String:cardinality=high:index=full,GeoCountryCode:String:cardinality=high:index=full,*GeoLocation:Point;geomesa.index.dtg='AffiliationStart',geomesa.table.sharing='true',geomesa.indices='z2:3:3,records:2:3,attr:4:3',geomesa.table.sharing.prefix='\\u0002'
These are the tables I now have in accumulo:
CoalesceSearch
CoalesceSearch_attr_v4
CoalesceSearch_queries
CoalesceSearch_records_v2
CoalesceSearch_stats
CoalesceSearch_z2_v3
But I am still getting the validation error.   It is picking the 
AffiliationStart field as the first one for the default date index.  
But that field is one that has the dates before the Epoch.
This is causing me a some real issues.  I don't want to have to 
clutter up my data with a dummy date field. I already have to create 
dummy geometry fields for records with no location information.
If the Z3 indices are turned off why am I still getting validation 
errors for the date?  It should never be used.
On 4/21/17 4:48 PM, Emilio Lahr-Vivaz wrote:
Ok, since by default feature types will share a table, you can expect 
to still see the _z3 table. I think somehow the user data is not 
getting set right
before the call to createSchema. If you look at the 'atttributes' 
row, you should see something like:
...geomesa.indices='z2:3:3,records:2:3,attr:4:3'...
(it shouldn't include the z3 entry).
You can try remote debugging to figure out what's wrong, this is the 
line that should be handling it:
https://github.com/locationtech/geomesa/blob/master/geomesa-accumulo/geomesa-accumulo-datastore/src/main/scala/org/locationtech/geomesa/accumulo/data/AccumuloDataStore.scala#L185 
As a work-around, you can edit the 'attributes' row through the 
accumulo shell to remove the z3 reference - that will cause it to 
stop writing and reading from z3.
Thanks,
Emilio
On 04/21/2017 04:34 PM, David Boyd wrote:
Emilio:
   There are three feature types defined.
ActorRecordset~attributes : [] 
objectKey:String,entityName:String,entitySource:String,entityTitle:String,recordKey:String:cardinality=high:index=full,Name:String:cardinality=high:index=full,Type:String:cardinality=high:index=full,NameMetaphone:String:cardinality=high:index=full,Country:String:cardinality=high:index=full,AffiliationTo:String:cardinality=high:index=full,AffiliationStart:Date:cardinality=high:index=full,AffiliationEnd:Date:cardinality=high:index=full,Aliases:String:cardinality=high:index=full,GeoCountryCode:String:cardinality=high:index=full,*GeoLocation:Point;geomesa.index.dtg='AffiliationStart',geomesa.table.sharing='true',geomesa.indices='z3:4:3,z2:3:3,records:2:3,attr:4:3',geomesa.table.sharing.prefix='\\u0002'
ActorRecordset~id : []    \x02
ActorRecordset~stats-date : []    2017-04-21T20:17:01.572Z
ActorRecordset~table.attr.v4 : []    CoalesceSearch_attr_v4
ActorRecordset~table.records.v2 : [] CoalesceSearch_records_v2
ActorRecordset~table.z2.v3 : []    CoalesceSearch_z2_v3
ActorRecordset~table.z3.v4 : []    CoalesceSearch_z3_v4
ICEWSArtifactRecordset~attributes : [] 
objectKey:String,entityName:String,entitySource:String,entityTitle:String,recordKey:String:cardinality=high:index=full,SourceFileName:String:cardinality=high:index=full,RawText:String:cardinality=high:index=full,Md5Sum:String:cardinality=high:index=full,DateIngested:Date:cardinality=high:index=full,ArtifactDate:Date:cardinality=high:index=full,*theWorld:Polygon;geomesa.index.dtg='DateIngested',geomesa.table.sharing='true',geomesa.indices='xz3:1:3,xz2:1:3,records:2:3,attr:4:3',geomesa.table.sharing.prefix='\\u0003'
ICEWSArtifactRecordset~id : []    \x03
ICEWSArtifactRecordset~stats-date : [] 2017-04-21T20:20:58.054Z
ICEWSArtifactRecordset~table.attr.v4 : [] CoalesceSearch_attr_v4
ICEWSArtifactRecordset~table.records.v2 : [] CoalesceSearch_records_v2
ICEWSArtifactRecordset~table.xz2.v1 : [] CoalesceSearch_xz2
ICEWSArtifactRecordset~table.xz3.v1 : [] CoalesceSearch_xz3
Linkages~attributes : [] 
objectKey:String:cardinality=high:index=full,entity1Key:String,entity1Name:String,entity1Source:String,entity1Version:String,entity1Key:String:cardinality=high:index=full,entity1Name:String,entity1Source:String,entity1Version:String,lastModified:Date:cardinality=high:index=full,label:String:cardinality=low:index=full,linkType:String:cardinality=low:index=full,*theWorld:Polygon;geomesa.index.dtg='lastModified',geomesa.table.sharing='true',geomesa.indices='xz3:1:3,xz2:1:3,records:2:3,attr:4:3',geomesa.table.sharing.prefix='\\u0001'
Linkages~id : []    \x01
Linkages~stats-date : []    2017-04-21T20:16:02.269Z
Linkages~table.attr.v4 : []    CoalesceSearch_attr_v4
Linkages~table.records.v2 : []    CoalesceSearch_records_v2
Linkages~table.xz2.v1 : []    CoalesceSearch_xz2
Linkages~table.xz3.v1 : []    CoalesceSearch_xz3
On 4/21/17 4:28 PM, Emilio Lahr-Vivaz wrote:
We will always set a default date field for indexing, so that is 
why you see the date validation message. However, it seems like
you are setting the hints correctly. It is odd though, because 
there shouldn't ever be a situation where we create both the XZ3 
and Z3 index for a single feature type. Do you have other feature 
types in the same catalog table? Can you scan the catalog table and 
reply with the result of the 'attributes' row?
Thanks,
Emilio
On 04/21/2017 04:20 PM, David Boyd wrote:
Emilio:
   Some more information.  I am getting this message:
2017-04-21 16:17:01,484 |  WARN | [main] | 
(GeoMesaSchemaValidator.scala:90) - geomesa.index.dtg is not 
valid or defined for simple feature type SimpleFeatureTypeImpl 
http://www.opengis.net/gml:ActorRecordset identified extends 
Feature(objectKey:objectKey,entityName:entityName,entitySource:entitySource,entityTitle:entityTitle,recordKey:recordKey,Name:Name,Type:Type,NameMetaphone:NameMetaphone,Country:Country,AffiliationTo:AffiliationTo,AffiliationStart:AffiliationStart,AffiliationEnd:AffiliationEnd,Aliases:Aliases,GeoCountryCode:GeoCountryCode,GeoLocation:GeoLocation). 
However, the following attribute(s) can be used in GeoMesa's 
temporal index: AffiliationStart, AffiliationEnd. GeoMesa will 
now point geomesa.index.dtg to the first temporal attribute 
found: AffiliationStart
Now when I create my schema's.   Despite specifically disabling 
those indexes and not specifying a time field for geomesa.index.dtg.
I have also tried adding:
feature.getUserData().put("geomesa.index.dtg",null);
To my code.  Same result.
On 4/21/17 4:04 PM, David Boyd wrote:
Emilio:
   Thanks for the detailed explanation.
I am trying to disable the Z3 index.   I have added the following 
to my code:
final  String indexes = "z2,records,id,attr";
        SimpleFeatureType feature = tb.buildFeatureType();
        // index recordkey, cardinality is high because there is 
only one record per key.
feature.getDescriptor(ENTITY_RECORD_KEY_COLUMN_NAME).getUserData().put("index", 
"full");
feature.getDescriptor(ENTITY_RECORD_KEY_COLUMN_NAME).getUserData().put("cardinality", 
"high");
feature.getUserData().put("geomeas.indexes.enabled",indexes);
I then create other attribute indexes the call createSchema with 
the feature.
I am still getting the exception:
java.lang.IllegalArgumentException: requirement failed: Value 
out of bounds ([0.0 604800.0]): -432000.0
    at scala.Predef$.require(Predef.scala:224)
    at 
org.locationtech.geomesa.curve.NormalizedDimension$class.normalize(NormalizedDimension.scala:17)
    at 
org.locationtech.geomesa.curve.NormalizedTime.normalize(NormalizedDimension.scala:33)
When I look at my accumulo tables I still have:
CoalesceSearch_xz3
CoalesceSearch_z3_v4
I dropped all my tables before this was run.
What am I missing?
On 4/21/17 10:02 AM, Emilio Lahr-Vivaz wrote:
Yeah, that error is a bit obtuse but it's coming from converting 
the date into an index value. I believe that currently if a 
feature fails to validate for any index, it will not be stored 
at all. This is to prevent partial indexing, where your query 
results might differ based on which index it uses. Previously we 
allowed partial indexing, and I think at this point we'd like to 
support both based on a configuration property, but haven't 
implemented it yet.
We haven't really had any use-cases so far for storing data that 
old, so we don't currently support it. However, there are a 
couple things you could do (off the top of my head):
* Add another date field for indexing, or disable the z3 index. 
If the date isn't part of the primary z index, then it won't 
cause any problems. You can still filter on it as normal, it 
just won't use the date in the primary range planning so queries 
will be slower. To alleviate that, you could add an attribute 
index on the date field - that does not have the same 
restrictions on date range, but it is not a composite index so 
query planning will use either date *or* geometry but not both.
* Offset dates by some fixed amount to bring them into an 
indexable range, and add some logic in your client to transform 
queries and results. This may be fairly complicated...
From a technical perspective I don't think there is any reason 
we couldn't store dates before the epoch, it just hasn't been 
implemented.
Thanks,
Emilio
On 04/20/2017 10:13 PM, David Boyd wrote:
Emilio:
   Thanks.  I puzzled it out in the end.
How would one date index historical data?  The data I have has 
numerous dates before the Epoch.   The exception I am
getting below.  Does this mean my feature did not get stored, 
or just the date was not indexed?    If the latter, how would
this data behave on a query including the date?
2017-04-20 17:11:12,306 | WARN | [Thread-7] | 
(ICEWS_EntityExtractor.java:240) - StartDateString: 1968-01-01 
StartDate: 1968-01-01T00:00:00.000-05:00 EndDateString: 
1996-08-31 EndDate: 1996-08-31T00:00:00.000-04:00
2017-04-20 17:11:12,306 |  INFO | [Thread-7] | 
(ICEWS_EntityExtractor.java:300) - Persisting 2 ICEWS records.
2017-04-20 17:11:12,556 | ERROR | [Thread-7] | 
(AccumuloPersistor.java:1073) - requirement failed: Value out 
of bounds ([0.0 604800.0]): -241200.0
java.lang.IllegalArgumentException: requirement failed: Value 
out of bounds ([0.0 604800.0]): -241200.0
    at scala.Predef$.require(Predef.scala:224)
    at 
org.locationtech.geomesa.curve.NormalizedDimension$class.normalize(NormalizedDimension.scala:17)
On 4/20/17 6:07 PM, Emilio Lahr-Vivaz wrote:
Hi David,
I don't believe that this is in our documentation, but it's 
commented in our source code. The min date will always be the 
unix epoch, and the max date depends on the indexing interval 
of your z-curve (the default interval is week):
https://github.com/locationtech/geomesa/blob/master/geomesa-z3/src/main/scala/org/locationtech/geomesa/curve/BinnedTime.scala#L15-L39 
Thanks,
Emilio
On 04/20/2017 04:45 PM, David Boyd wrote:
All:
   Haven't found this in the documents yet so I thought I 
would ask.
I have a two fields in my data representing a startTime and 
an endTime.
Values for those string fields are normally dates but can 
also be "beginning of time" and
"end of time" respectively.
I originally I tried setting beginning of time to be 
01/01/1111 but I would get an
index out of range error (I assume it is because this was 
before the standard Unix epoc).
That error was down in the XZ3 index creation.
I then tried using new DateTime(Long.MIN) and new 
DateTime(Long.MAX) but the max
now throws errors in Joda.Time.
So what are the min and max Times supported by Geomesa in the 
indexes?
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or 
unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or 
unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or 
unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or 
unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users