[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
| Re: [geomesa-users] Bug with geomesa.index.dtg and not allowing dates before the Epoch | 
Emilio:
  Thanks for the fast turn.  I will try to find time to test.
On 4/24/17 6:38 PM, Emilio Lahr-Vivaz wrote:
Hi David,
I've put up a PR here to allow for disabling indexing of any dates:
https://github.com/locationtech/geomesa/pull/1489
When you create your schema, set 'geomesa.ignore.dtg=true'.
Feel free to check if out if you're so inclined.
Thanks,
Emilio
On 04/24/2017 10:06 AM, Emilio Lahr-Vivaz wrote:
It does look like we re-add the default date even if it is deleted 
from the metadata :/ I'll open a ticket for that as well.
For now, I think your best bet is to just go with the dummy date field.
Thanks,
Emilio
On 04/24/2017 09:30 AM, Emilio Lahr-Vivaz wrote:
Hi David,
I opened a ticket to support dates before the epoch here:
https://geomesa.atlassian.net/browse/GEOMESA-1785
In general we assume that indicating a date field to index shouldn't 
be mandatory, as users are likely to miss that step and suffer 
degraded performance. It does seem useful to have some way to 
indicate explicitly to not index any date though.
I believe that modifying the 'attributes' entry in the metadata 
table should work though - try removing the entire 
'geomesa.index.dtg' entry, and make sure to bounce your application 
as we cache feature types. I'll test it out to verify.
Thanks,
Emilio
On 04/21/2017 08:07 PM, David Boyd wrote:
I would like to report this as a bug. I am using 1.3.1 of Geomesa.
Here is the situation:
1. I have Date fields in my data with times before the start of the 
Epoch.
2. I do not necessarily need them indexed within the Z3 indexes.
I set the hint to turn off the Z3 indexes - That works.
If I don't set  geomesa.index.dtg or set it to null, GEOMESA picks 
a field automatically to index.
The problem is it picks my fields with Dates before the Epoch.
So when I go to store the record it errors out because of a date 
out of range (basically any date before the Epoch
will have a negative number ) and the range check fails.
I think at least one of the following fixes is needed:
1. Accept dates for indexing from before the Epoch (preferred, that 
would let historical data to be time indexed).
2. Accept null for geomesa.index.dtg and don't use a date (in the 
code if the that attribute is null it will use 0 for the
time. Except that the geomesa.index.dtg gets overwritten before 
that when the schema is created).
3. If the z3 indexes are disabled stop doing the range check on the 
date.
I have tried changing the <FeatureName>~Attributes record in the 
metadata table to set the geomesa.index.dtg to null but
that seemed to be ignored.
I have hacked around it by adding a dummy date field to my date 
(not ideal by any means).
I am not a scala developer, and if I had time on my project I would 
attempt #2 or #3 above.  But #1 would seem to be the right
fix.
Or am I missing something.   This issue has cost me at least two 
full days.
On 4/21/17 5:24 PM, David Boyd wrote:
Emilio:
  So I found part of the problem.
I missspelled geomesa.indexes.enable (had geomeas).
So now at least my metadata attributes do not have the Z3 entry:
ActorRecordset~attributes : [] 
objectKey:String,entityName:String,entitySource:String,entityTitle:String,recordKey:String:cardinality=high:index=full,Name:String:cardinality=high:index=full,Type:String:cardinality=high:index=full,NameMetaphone:String:cardinality=high:index=full,Country:String:cardinality=high:index=full,AffiliationTo:String:cardinality=high:index=full,AffiliationStart:Date:cardinality=high:index=full,AffiliationEnd:Date:cardinality=high:index=full,Aliases:String:cardinality=high:index=full,GeoCountryCode:String:cardinality=high:index=full,*GeoLocation:Point;geomesa.index.dtg='AffiliationStart',geomesa.table.sharing='true',geomesa.indices='z2:3:3,records:2:3,attr:4:3',geomesa.table.sharing.prefix='\\u0002'
These are the tables I now have in accumulo:
CoalesceSearch
CoalesceSearch_attr_v4
CoalesceSearch_queries
CoalesceSearch_records_v2
CoalesceSearch_stats
CoalesceSearch_z2_v3
But I am still getting the validation error.   It is picking the 
AffiliationStart field as the first one for the default date 
index.  But that field is one that has the dates before the Epoch.
This is causing me a some real issues.  I don't want to have to 
clutter up my data with a dummy date field. I already have to 
create dummy geometry fields for records with no location 
information.
If the Z3 indices are turned off why am I still getting validation 
errors for the date?  It should never be used.
On 4/21/17 4:48 PM, Emilio Lahr-Vivaz wrote:
Ok, since by default feature types will share a table, you can 
expect to still see the _z3 table. I think somehow the user data 
is not getting set right
before the call to createSchema. If you look at the 'atttributes' 
row, you should see something like:
...geomesa.indices='z2:3:3,records:2:3,attr:4:3'...
(it shouldn't include the z3 entry).
You can try remote debugging to figure out what's wrong, this is 
the line that should be handling it:
https://github.com/locationtech/geomesa/blob/master/geomesa-accumulo/geomesa-accumulo-datastore/src/main/scala/org/locationtech/geomesa/accumulo/data/AccumuloDataStore.scala#L185 
As a work-around, you can edit the 'attributes' row through the 
accumulo shell to remove the z3 reference - that will cause it to 
stop writing and reading from z3.
Thanks,
Emilio
On 04/21/2017 04:34 PM, David Boyd wrote:
Emilio:
   There are three feature types defined.
ActorRecordset~attributes : [] 
objectKey:String,entityName:String,entitySource:String,entityTitle:String,recordKey:String:cardinality=high:index=full,Name:String:cardinality=high:index=full,Type:String:cardinality=high:index=full,NameMetaphone:String:cardinality=high:index=full,Country:String:cardinality=high:index=full,AffiliationTo:String:cardinality=high:index=full,AffiliationStart:Date:cardinality=high:index=full,AffiliationEnd:Date:cardinality=high:index=full,Aliases:String:cardinality=high:index=full,GeoCountryCode:String:cardinality=high:index=full,*GeoLocation:Point;geomesa.index.dtg='AffiliationStart',geomesa.table.sharing='true',geomesa.indices='z3:4:3,z2:3:3,records:2:3,attr:4:3',geomesa.table.sharing.prefix='\\u0002' 
ActorRecordset~id : []    \x02
ActorRecordset~stats-date : [] 2017-04-21T20:17:01.572Z
ActorRecordset~table.attr.v4 : [] CoalesceSearch_attr_v4
ActorRecordset~table.records.v2 : [] CoalesceSearch_records_v2
ActorRecordset~table.z2.v3 : [] CoalesceSearch_z2_v3
ActorRecordset~table.z3.v4 : [] CoalesceSearch_z3_v4
ICEWSArtifactRecordset~attributes : [] 
objectKey:String,entityName:String,entitySource:String,entityTitle:String,recordKey:String:cardinality=high:index=full,SourceFileName:String:cardinality=high:index=full,RawText:String:cardinality=high:index=full,Md5Sum:String:cardinality=high:index=full,DateIngested:Date:cardinality=high:index=full,ArtifactDate:Date:cardinality=high:index=full,*theWorld:Polygon;geomesa.index.dtg='DateIngested',geomesa.table.sharing='true',geomesa.indices='xz3:1:3,xz2:1:3,records:2:3,attr:4:3',geomesa.table.sharing.prefix='\\u0003'
ICEWSArtifactRecordset~id : []    \x03
ICEWSArtifactRecordset~stats-date : [] 2017-04-21T20:20:58.054Z
ICEWSArtifactRecordset~table.attr.v4 : [] CoalesceSearch_attr_v4
ICEWSArtifactRecordset~table.records.v2 : [] 
CoalesceSearch_records_v2
ICEWSArtifactRecordset~table.xz2.v1 : [] CoalesceSearch_xz2
ICEWSArtifactRecordset~table.xz3.v1 : [] CoalesceSearch_xz3
Linkages~attributes : [] 
objectKey:String:cardinality=high:index=full,entity1Key:String,entity1Name:String,entity1Source:String,entity1Version:String,entity1Key:String:cardinality=high:index=full,entity1Name:String,entity1Source:String,entity1Version:String,lastModified:Date:cardinality=high:index=full,label:String:cardinality=low:index=full,linkType:String:cardinality=low:index=full,*theWorld:Polygon;geomesa.index.dtg='lastModified',geomesa.table.sharing='true',geomesa.indices='xz3:1:3,xz2:1:3,records:2:3,attr:4:3',geomesa.table.sharing.prefix='\\u0001'
Linkages~id : []    \x01
Linkages~stats-date : []    2017-04-21T20:16:02.269Z
Linkages~table.attr.v4 : [] CoalesceSearch_attr_v4
Linkages~table.records.v2 : [] CoalesceSearch_records_v2
Linkages~table.xz2.v1 : []    CoalesceSearch_xz2
Linkages~table.xz3.v1 : []    CoalesceSearch_xz3
On 4/21/17 4:28 PM, Emilio Lahr-Vivaz wrote:
We will always set a default date field for indexing, so that 
is why you see the date validation message. However, it seems like
you are setting the hints correctly. It is odd though, because 
there shouldn't ever be a situation where we create both the 
XZ3 and Z3 index for a single feature type. Do you have other 
feature types in the same catalog table? Can you scan the 
catalog table and reply with the result of the 'attributes' row?
Thanks,
Emilio
On 04/21/2017 04:20 PM, David Boyd wrote:
Emilio:
   Some more information.  I am getting this message:
2017-04-21 16:17:01,484 | WARN | [main] | 
(GeoMesaSchemaValidator.scala:90) - geomesa.index.dtg is not 
valid or defined for simple feature type 
SimpleFeatureTypeImpl 
http://www.opengis.net/gml:ActorRecordset identified extends 
Feature(objectKey:objectKey,entityName:entityName,entitySource:entitySource,entityTitle:entityTitle,recordKey:recordKey,Name:Name,Type:Type,NameMetaphone:NameMetaphone,Country:Country,AffiliationTo:AffiliationTo,AffiliationStart:AffiliationStart,AffiliationEnd:AffiliationEnd,Aliases:Aliases,GeoCountryCode:GeoCountryCode,GeoLocation:GeoLocation). 
However, the following attribute(s) can be used in GeoMesa's 
temporal index: AffiliationStart, AffiliationEnd. GeoMesa 
will now point geomesa.index.dtg to the first temporal 
attribute found: AffiliationStart
Now when I create my schema's.   Despite specifically 
disabling those indexes and not specifying a time field for 
geomesa.index.dtg.
I have also tried adding:
feature.getUserData().put("geomesa.index.dtg",null);
To my code.  Same result.
On 4/21/17 4:04 PM, David Boyd wrote:
Emilio:
   Thanks for the detailed explanation.
I am trying to disable the Z3 index.   I have added the 
following to my code:
final  String indexes = "z2,records,id,attr";
        SimpleFeatureType feature = tb.buildFeatureType();
        // index recordkey, cardinality is high because 
there is only one record per key.
feature.getDescriptor(ENTITY_RECORD_KEY_COLUMN_NAME).getUserData().put("index", 
"full");
feature.getDescriptor(ENTITY_RECORD_KEY_COLUMN_NAME).getUserData().put("cardinality", 
"high");
feature.getUserData().put("geomeas.indexes.enabled",indexes);
I then create other attribute indexes the call createSchema 
with the feature.
I am still getting the exception:
java.lang.IllegalArgumentException: requirement failed: 
Value out of bounds ([0.0 604800.0]): -432000.0
    at scala.Predef$.require(Predef.scala:224)
    at 
org.locationtech.geomesa.curve.NormalizedDimension$class.normalize(NormalizedDimension.scala:17)
    at 
org.locationtech.geomesa.curve.NormalizedTime.normalize(NormalizedDimension.scala:33)
When I look at my accumulo tables I still have:
CoalesceSearch_xz3
CoalesceSearch_z3_v4
I dropped all my tables before this was run.
What am I missing?
On 4/21/17 10:02 AM, Emilio Lahr-Vivaz wrote:
Yeah, that error is a bit obtuse but it's coming from 
converting the date into an index value. I believe that 
currently if a feature fails to validate for any index, it 
will not be stored at all. This is to prevent partial 
indexing, where your query results might differ based on 
which index it uses. Previously we allowed partial indexing, 
and I think at this point we'd like to support both based on 
a configuration property, but haven't implemented it yet.
We haven't really had any use-cases so far for storing data 
that old, so we don't currently support it. However, there 
are a couple things you could do (off the top of my head):
* Add another date field for indexing, or disable the z3 
index. If the date isn't part of the primary z index, then 
it won't cause any problems. You can still filter on it as 
normal, it just won't use the date in the primary range 
planning so queries will be slower. To alleviate that, you 
could add an attribute index on the date field - that does 
not have the same restrictions on date range, but it is not 
a composite index so query planning will use either date 
*or* geometry but not both.
* Offset dates by some fixed amount to bring them into an 
indexable range, and add some logic in your client to 
transform queries and results. This may be fairly 
complicated...
From a technical perspective I don't think there is any 
reason we couldn't store dates before the epoch, it just 
hasn't been implemented.
Thanks,
Emilio
On 04/20/2017 10:13 PM, David Boyd wrote:
Emilio:
   Thanks.  I puzzled it out in the end.
How would one date index historical data? The data I have 
has numerous dates before the Epoch. The exception I am
getting below.  Does this mean my feature did not get 
stored, or just the date was not indexed? If the latter, 
how would
this data behave on a query including the date?
2017-04-20 17:11:12,306 | WARN | [Thread-7] | 
(ICEWS_EntityExtractor.java:240) - StartDateString: 
1968-01-01 StartDate: 1968-01-01T00:00:00.000-05:00 
EndDateString: 1996-08-31 EndDate: 
1996-08-31T00:00:00.000-04:00
2017-04-20 17:11:12,306 |  INFO | [Thread-7] | 
(ICEWS_EntityExtractor.java:300) - Persisting 2 ICEWS 
records.
2017-04-20 17:11:12,556 | ERROR | [Thread-7] | 
(AccumuloPersistor.java:1073) - requirement failed: Value 
out of bounds ([0.0 604800.0]): -241200.0
java.lang.IllegalArgumentException: requirement failed: 
Value out of bounds ([0.0 604800.0]): -241200.0
    at scala.Predef$.require(Predef.scala:224)
    at 
org.locationtech.geomesa.curve.NormalizedDimension$class.normalize(NormalizedDimension.scala:17)
On 4/20/17 6:07 PM, Emilio Lahr-Vivaz wrote:
Hi David,
I don't believe that this is in our documentation, but 
it's commented in our source code. The min date will 
always be the unix epoch, and the max date depends on the 
indexing interval of your z-curve (the default interval is 
week):
https://github.com/locationtech/geomesa/blob/master/geomesa-z3/src/main/scala/org/locationtech/geomesa/curve/BinnedTime.scala#L15-L39 
Thanks,
Emilio
On 04/20/2017 04:45 PM, David Boyd wrote:
All:
   Haven't found this in the documents yet so I thought I 
would ask.
I have a two fields in my data representing a startTime 
and an endTime.
Values for those string fields are normally dates but can 
also be "beginning of time" and
"end of time" respectively.
I originally I tried setting beginning of time to be 
01/01/1111 but I would get an
index out of range error (I assume it is because this was 
before the standard Unix epoc).
That error was down in the XZ3 index creation.
I then tried using new DateTime(Long.MIN) and new 
DateTime(Long.MAX) but the max
now throws errors in Joda.Time.
So what are the min and max Times supported by Geomesa in 
the indexes?
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, 
or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or 
unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or 
unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or 
unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or 
unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or 
unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or 
unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users
--
========= mailto:dboyd@xxxxxxxxxxxxxxxxx ============
David W. Boyd
VP,  Data Solutions
10432 Balls Ford, Suite 240
Manassas, VA 20109
office:   +1-703-552-2862
cell:     +1-703-402-7908
============== http://www.incadencecorp.com/ ============
ISO/IEC JTC1 WG9, editor ISO/IEC 20547 Big Data Reference Architecture
Chair ANSI/INCITS TC Big Data
Co-chair NIST Big Data Public Working Group Reference Architecture
First Robotic Mentor - FRC, FTC - www.iliterobotics.org
Board Member- USSTEM Foundation - www.usstem.org
The information contained in this message may be privileged
and/or confidential and protected from disclosure.
If the reader of this message is not the intended recipient
or an employee or agent responsible for delivering this message
to the intended recipient, you are hereby notified that any
dissemination, distribution or copying of this communication
is strictly prohibited.  If you have received this communication
in error, please notify the sender immediately by replying to
this message and deleting the material from any computer.