Emilio, Anthony, Jim,
Thanks for your suggestions. With regards to overwriting a GeoMesa with the same ID:
With my use case I would be conceptually using the same feature ID, location, and time; however, the location portion has double-precision
data in it and thus may not be exactly equivalent to a previously indexed entry. Because of the double-precision data within my default geometry attribute, I think it is best not to assume equality between my previously indexed {id, location, time} entry,
and my new {id, location, time} entry.
Jim suggested that doing a pure lookup by feature ID is slow, so to get around this slowness (since I also know the geometry and time), if I add a geospatial-temporal
component to the Filter, will that ease the slowness concerns?
Below is some Java code for modification. The code seems to functionally work, but:
A) Does combining the geospatialtemporalfilter with the idfilter offer speed improvements over a simple idfilter for larger data sets?
B) Is combining the geospatialtemporalfilter with the idfilter safe? According to
http://docs.geotools.org/stable/userguide/library/opengis/filter.html, “Formally this style of Id matching is not supposed to be mixed with the traditional attribute based
evaluation (such as a bounding box filter).” This statement gives me cause for concern because I am mixing the id matching with a bounding box and time filter.
public boolean modifyFeatureByID(String featureName, String featureID, List<String> attributes, List<Object> values,
String timeAttribute, Date startTime, Date endTime, String geometryAttribute, double startLat, double startLon,
double endLat, double endLon)
{
boolean wasSuccessful = true;
// queryString is of the form: "( ( NOT (myTime AFTER 2014-06-20T15:23:03.952Z)) AND ( NOT (myTime BEFORE 2012-06-20T15:23:03.952Z))
) AND BBOX(myGeometry,123.2,40.1,123.6,49.9)"
String queryString = constructBoundedBoxAndTimeFrameQueryString(timeAttribute, startTime, endTime,
geometryAttribute, startLat, startLon, endLat, endLon);
Filter geospatialtemporalfilter = queryStringToFilter(queryString);
logger.debug("Modifying feature with id: '" + featureID + "' from the '" + featureName + "' feature store");
if (attributes == null && values != null)
{
throw new IllegalArgumentException("if attributes list is null, values list must also be null");
}
if (attributes != null && values == null)
{
throw new IllegalArgumentException("if values list is null, attributes list must also be null");
}
if (attributes != null && values != null)
{
int numAttributes = attributes.size();
if (numAttributes != values.size())
{
throw new IllegalArgumentException("attributes and values lists must be the same size");
}
if (numAttributes > 0)
{
Name[] attributeNames = new NameImpl[numAttributes];
Object[] attributeValues = new Object[numAttributes];
for (int k=0; k<numAttributes; k++)
{
attributeNames[k] = new NameImpl(attributes.get(k));
attributeValues[k] = values.get(k);
}
DataStore dataStore = createDataStore();
FeatureStore<SimpleFeatureType, SimpleFeature> featureStore = createFeatureStore(dataStore,
featureName);
FilterFactory2 ff = CommonFactoryFinder.getFilterFactory2();
Filter idfilter = ff.id(Collections.singleton(ff.featureId(featureID)));
Filter filter = ff.and(Arrays.asList(geospatialtemporalfilter, idfilter));
try
{
featureStore.modifyFeatures(attributeNames, attributeValues, filter);
logger.debug("Feature with id: '" + featureID + "' has been successfully
modified within the '" + featureName + "' feature store");
}
catch (Exception e)
{
wasSuccessful = false;
logger.error("Problem modifying feature with id: '" + featureID + "' within
the '" + featureName + "' feature store", e);
}
dataStore.dispose();
}
else
{
logger.debug("modifyFeatureByID() invoked with empty attributes/values inputs, no features
were modified for feature with id: " + featureID);
}
}
else
{
logger.debug("modifyFeatureByID() invoked with null attributes/values inputs, no features were modified for
feature with id: " + featureID);
}
return wasSuccessful;
}
From: geomesa-users-bounces@xxxxxxxxxxxxxxxx [mailto:geomesa-users-bounces@xxxxxxxxxxxxxxxx]
On Behalf Of Emilio Lahr-Vivaz
Sent: Friday, June 20, 2014 9:38 AM
To: geomesa-users@xxxxxxxxxxxxxxxx
Subject: Re: [geomesa-users] Feature writing issue...
Hi all,
A couple caveats:
In GeoMesa, the row key contains the temporal and spatial data for the feature. So if you try to overwrite a feature by keeping the same feature ID, but you change the location and/or time, it will create a new row and not overwrite the existing entry.
If you do overwrite the exact same row, Accumulo by default will apply a VersioningIterator so that you only see the latest entry. You can change this behavior if you want:
http://accumulo.apache.org/1.5/accumulo_user_manual.html#_versioning_iterators_and_timestamps
I would encourage people to check out the GeoMesa quick start tutorial:
http://geomesa.github.io/2014/05/28/geomesa-quickstart/
It lets you write and read just a few features at a time through the API. You can test out your exact scenario and determine what works for you.
Thanks,
- Emilio
On 06/20/2014 08:13 AM, Anthony Fox wrote:
If you overwrite records with the same id, then you'll observe multiple records until Accumulo performs a table compaction or you set a versioning iterator. Since this sounds like a valid use case, we can by
default set a versioning iterator so you only see the most recent version of your record. Table compactions will periodically remove stale data.
-Anthony
On Thu, Jun 19, 2014 at 6:27 PM, Jim Hughes <jnh5y@xxxxxxxx> wrote:
Hi Beau,
Initially, I wanted to say that #1 is the intended behavior. I wanted to check things out before responding, and unfortunately, it has taken me a bit longer than I expected.
The first thing to point out is how GeoTools handles feature IDs. Since several folks could be writing to a FeatureStore, you have to state your feature ids should be used for a given feature being written to the database. For example...
feature.getUserData.put(Hints.USE_PROVIDED_FID, true)
(I might have the exact syntax off since I'm bouncing between Scala and Java.) Without that hint, GeoMesa will pick a random UUID for the feature id.
Just to make sure that #1 isn't (currently) possible, I tried writing two distinct with the same id, and I ended up with two records with the same FID.
As for the other two approaches, in the current implementation, looking up features by ID involves a table scan and hence generally is a bad idea. We do have some work in progress which will make such queries faster/sane. The last note on along these lines
is to point out that to support this fully, we'll likely need to implement/override the DataStore's function called getFeatureWriter.
I mention that because this is the GeoTools way of doing #2. At the minute, we are using an abstract implementation of this, and it should work correctly. The filtering is done entirely on the client side, so it'll be slow. If your data is small (say, a
few thousand records), this sort of thing might be tenable.
I hope that helps clarify the matter; let me know if you have other questions.
Jim
On 06/18/2014 04:37 PM, Beau Lalonde wrote:
Jim, Others,
If we do use the same ID, can we count on the previous value getting “overwritten”/replaced?
In other words, if I actually intend to overwrite/replace a feature with a specific ID (if it
exists, otherwise create a new feature), which of the following is the best option:
1.
Act as if I am adding the feature, counting on any existing feature with the same ID to be overwritten/replaced
2.
Query GeoMesa for the existence of a feature with the specific ID, modify feature if it exists, add feature if it doesn’t exist
3.
Blindly attempt to remove the feature with the specific ID, add a new feature with the same ID
Any suggestions for a recommended approach would be helpful.
Thanks,
Beau
Hi Adnan,
Great question! Geomesa uses the feature id as a unique identifier. It sounds like you might using the id field to identify/name a thing which is moving/changing shape/varying attributes through time. If that's the use case, I'd suggest putting that information
which identifies the object into a different field like 'name' or 'identifier'.
As for documentation, I'd suggest checking out
http://geomesa.github.io/ and looking through the tutorials. We've integrated with GeoTools, so I'd also point to their documentation about DataStores/FeatureStores (http://docs.geotools.org/latest/userguide/library/api/datastore.html
and
http://docs.geotools.org/stable/userguide/library/data/featuresource.html).*
Let us know what others questions we can help with,
Jim
* In particular, I believe that you would see the same behavior with (most) other GeoTools FeatureStores.
On 06/18/2014 06:14 AM, Adnan Yaqoob wrote:
Hello Everybody,
I am new to Geomesa and trying its API. I have a question, how can I store features with same id and geometry with different time stamp and attributes values. I tried to write feature
with same id with different attributes and it was overwriting previous feature. I am stuck on this point, please help me understand.
Is there any documentation for Geomesa API?
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
http://www.locationtech.org/mailman/listinfo/geomesa-users
_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
http://www.locationtech.org/mailman/listinfo/geomesa-users