Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [geomesa-users] problem with stats

the problem is remain

I put the geomesa-accumulo-distributed-runtime jar  inside in main accumulo lib directory

[g.rinchin@netris-cassandra-stage60-04 lib]$ pwd
/opt/accumulo/lib
[g.rinchin@netris-cassandra-stage60-04 lib]$ ls | grep geomesa
geomesa-accumulo-distributed-runtime_2.12-3.2.2.jar
after this i can correctly load class org.locationtech.geomesa.accumulo.data.stats.StatsCombiner

root@accumulo> setiter -t examples.runners -p 10 -scan -minc -majc -n decStats -class org.locationtech.geomesa.accumulo.data.stats.StatsCombiner
Combiners apply reduce functions to multiple versions of values with otherwise equal keys
----------> set StatsCombiner parameter all, set to true to apply Combiner to every column, otherwise leave blank. if true, columns option will be ignored.: 

I recreate namespace like
root@accumulo> deletenamespace -f myNamespace
root@accumulo> createnamespace myNamespace
root@accumulo> grant NameSpace.CREATE_TABLE -ns myNamespace -u root
root@accumulo> config -ns myNamespace -s table.classpath.context=myNamespace

then run an application and it create geomesa tables put

my  myNamespace.geomesa_stats table config
root@accumulo> config -t myNamespace.geomesa_stats
-----------+-------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SCOPE      | NAME                                                        | VALUE
-----------+-------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
default    | table.balancer ............................................ | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
default    | table.bloom.enabled ....................................... | false
default    | table.bloom.error.rate .................................... | 0.5%
default    | table.bloom.hash.type ..................................... | murmur
default    | table.bloom.key.functor ................................... | org.apache.accumulo.core.file.keyfunctor.RowFunctor
default    | table.bloom.load.threshold ................................ | 1
default    | table.bloom.size .......................................... | 1048576
default    | table.cache.block.enable .................................. | false
default    | table.cache.index.enable .................................. | true
default    | table.classpath.context ................................... |
namespace  |    @override .............................................. | myNamespace
default    | table.compaction.major.everything.idle .................... | 1h
default    | table.compaction.major.ratio .............................. | 3
default    | table.compaction.minor.idle ............................... | 5m
default    | table.compaction.minor.logs.threshold ..................... | 3
default    | table.compaction.minor.merge.file.size.max ................ | 0
table      | table.constraint.1 ........................................ | org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
default    | table.durability .......................................... | sync
default    | table.failures.ignore ..................................... | false
default    | table.file.blocksize ...................................... | 0B
default    | table.file.compress.blocksize ............................. | 100K
default    | table.file.compress.blocksize.index ....................... | 128K
default    | table.file.compress.type .................................. | gz
default    | table.file.max ............................................ | 15
default    | table.file.replication .................................... | 0
default    | table.file.summary.maxSize ................................ | 256K
default    | table.file.type ........................................... | rf
default    | table.formatter ........................................... | org.apache.accumulo.core.util.format.DefaultFormatter
default    | table.groups.enabled ...................................... |
default    | table.interepreter ........................................ | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter
table      | table.iterator.majc.stats-combiner ........................ | 10,org.locationtech.geomesa.accumulo.data.stats.StatsCombiner
table      | table.iterator.majc.stats-combiner.opt.all ................ | true
table      | table.iterator.majc.stats-combiner.opt.sep ................ | ~
table      | table.iterator.majc.stats-combiner.opt.sft-SignalBuilder .. | *geo:Point,time:Date,cam:String:keep-stats=true,imei:String,dir:Double,alt:Double,vlc:Double,sl:Integer,ds:Integer,dir_y:Double,poi_azimuth_x:Double,poi_azimuth_y:Double
table      | table.iterator.majc.vers .................................. | 20,org.apache.accumulo.core.iterators.user.VersioningIterator
table      | table.iterator.majc.vers.opt.maxVersions .................. | 1
table      | table.iterator.minc.stats-combiner ........................ | 10,org.locationtech.geomesa.accumulo.data.stats.StatsCombiner
table      | table.iterator.minc.stats-combiner.opt.all ................ | true
table      | table.iterator.minc.stats-combiner.opt.sep ................ | ~
table      | table.iterator.minc.stats-combiner.opt.sft-SignalBuilder .. | *geo:Point,time:Date,cam:String:keep-stats=true,imei:String,dir:Double,alt:Double,vlc:Double,sl:Integer,ds:Integer,dir_y:Double,poi_azimuth_x:Double,poi_azimuth_y:Double
table      | table.iterator.minc.vers .................................. | 20,org.apache.accumulo.core.iterators.user.VersioningIterator
table      | table.iterator.minc.vers.opt.maxVersions .................. | 1
table      | table.iterator.scan.stats-combiner ........................ | 10,org.locationtech.geomesa.accumulo.data.stats.StatsCombiner
table      | table.iterator.scan.stats-combiner.opt.all ................ | true
table      | table.iterator.scan.stats-combiner.opt.sep ................ | ~
table      | table.iterator.scan.stats-combiner.opt.sft-SignalBuilder .. | *geo:Point,time:Date,cam:String:keep-stats=true,imei:String,dir:Double,alt:Double,vlc:Double,sl:Integer,ds:Integer,dir_y:Double,poi_azimuth_x:Double,poi_azimuth_y:Double
table      | table.iterator.scan.vers .................................. | 20,org.apache.accumulo.core.iterators.user.VersioningIterator
table      | table.iterator.scan.vers.opt.maxVersions .................. | 1
default    | table.majc.compaction.strategy ............................ | org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy
default    | table.replication ......................................... | false
default    | table.sampler ............................................. |
default    | table.scan.dispatcher ..................................... | org.apache.accumulo.core.spi.scan.SimpleScanDispatcher
default    | table.scan.max.memory ..................................... | 512K
default    | table.security.scan.visibility.default .................... |
default    | table.split.endrow.size.max ............................... | 10K
default    | table.split.threshold ..................................... | 1G
default    | table.suspend.duration .................................... | 0s
default    | table.walog.enabled ....................................... | true
-----------+-------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
root@accumulo> 

but the statistics is still not correctly gathered for the first iteration

I put 1000 geocoordinates and stats count by cam it returns 
 ✘  ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder    
Estimated count: 1000
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin 
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin 
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='0000c1fe-a727-4a86-9eee-5b99d21038ea'"
Estimated count: 866

866 - is last batch of events saved from code 
 code log
16.02.2022 12:16:21.199 INFO  [pool-3-thread-4] r.netris.gps.sampler.GeoEventSampler - Saved 866 of 866 events 

the next added events is correctly added to count stats
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='0000c1fe-a727-4a86-9eee-5b99d21038ea'"
Estimated count: 1866
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='0000c1fe-a727-4a86-9eee-5b99d21038ea'"
Estimated count: 2866
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin 
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin 
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='0000c1fe-a727-4a86-9eee-5b99d21038ea'"
Estimated count: 2866
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='0000c1fe-a727-4a86-9eee-5b99d21038ea'"
Estimated count: 3866
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='0000c1fe-a727-4a86-9eee-5b99d21038ea'"
Estimated count: 4866
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='0000c1fe-a727-4a86-9eee-5b99d21038ea'"
Estimated count: 5866
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='0000c1fe-a727-4a86-9eee-5b99d21038ea'"
Estimated count: 6866
 
To solve this in code I do next first I write the first event then all others events something like
private Integer writeDataInternalTest(List<GeoEvent> events) throws IOException {

if (events == null || events.isEmpty()) {
return 0;
}

int count = 0;
GeoEvent firstEvent = events.remove(0);

try (FeatureWriter<SimpleFeatureType, SimpleFeature> writer = dataStore.getFeatureWriterAppend(
SimpleFeatureUtils.TYPE.getTypeName(), Transaction.AUTO_COMMIT)) {

SimpleFeature feature = SimpleFeatureUtils.toSimpleFeature(firstEvent);
String event_id = feature.getID();
if (!event_id.contains(firstEvent.getCam())) {
log.info("event not contain camId");
}
SimpleFeature toWrite = writer.next();
toWrite.setAttributes(feature.getAttributes());
toWrite.getUserData().put(Hints.PROVIDED_FID, event_id);
toWrite.getUserData().putAll(feature.getUserData());

writer.write();
count++;
log.info("Event id = {}, for event = {}", event_id, firstEvent);

} catch (Exception e) {
log.error("Geomesa write error", e);
}

try (FeatureWriter<SimpleFeatureType, SimpleFeature> writer = dataStore.getFeatureWriterAppend(
SimpleFeatureUtils.TYPE.getTypeName(), Transaction.AUTO_COMMIT)) {

for (GeoEvent event : events) {
SimpleFeature feature = SimpleFeatureUtils.toSimpleFeature(event);
String event_id = feature.getID();
if (!event_id.contains(event.getCam())) {
log.info("event not contain camId");
}
SimpleFeature toWrite = writer.next();
toWrite.setAttributes(feature.getAttributes());
toWrite.getUserData().put(Hints.PROVIDED_FID, event_id);
toWrite.getUserData().putAll(feature.getUserData());

writer.write();
count++;
log.info("Event id = {}, for event = {}", event_id, event);
}

} catch (Exception e) {
log.error("Geomesa write error", e);
}
return count;
}
Then the statistics for putting the first 1000 geoevents is 999
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='0000c1fe-a727-4a86-9eee-5b99d21038ea'"            
Estimated count: 999


But still if I run stats-analyze it reset the count by cam to 0

 ✘  ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-analyze -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder                                              
INFO  Running stat analysis for feature type SignalBuilder...
INFO  Stats analyzed:
  Total features: 1000
  Bounds for geo: [ 37.598174, 55.736823, 37.681424, 55.820073 ] cardinality: 981
  Bounds for time: [ 2022-02-27T08:26:42.000Z to 2022-02-27T09:00:00.000Z ] cardinality: 973
  Bounds for cam: [ 0000c1fe-a727-4a86-9eee-5b99d21038ea to 0000c1fe-a727-4a86-9eee-5b99d21038ea ] cardinality: 1
INFO  Use 'stats-histogram', 'stats-top-k' or 'stats-count' commands for more details
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='0000c1fe-a727-4a86-9eee-5b99d21038ea'"
Estimated count: 0


Thanks.


вт, 15 февр. 2022 г. в 23:35, Rinchin Gomboev <gomboev.rinchin@xxxxxxxxx>:
another one thing that confused me is when I try to get scan for table it returns that is can' determine the type of file

root@accumulo> scan -t myNamespace.geomesa
2022-02-15 23:30:20,171 [commands.ShellPluginConfigurationCommand] ERROR: Error: Could not determine the type of file "hdfs://10.200.217.27:9000/accumulo/classpath/myNamespace/[^.].*.jar".
2022-02-15 23:30:20,172 [shell.Shell] ERROR: Could not load the specified formatter. Using the DefaultFormatter
2022-02-15 23:30:20,191 [commands.ShellPluginConfigurationCommand] ERROR: Error: Could not determine the type of file "hdfs://10.200.217.27:9000/accumulo/classpath/myNamespace/[^.].*.jar".
SignalBuilder~attributes : []    *geo:Point,time:Date,cam:String:keep-stats=true,imei:String,dir:Double,alt:Double,vlc:Double,sl:Integer,ds:Integer,dir_y:Double,poi_azimuth_x:Double,poi_azimuth_y:Double;geomesa.stats.enable='true',geomesa.feature.expiry='time(30 days)',geomesa.z.splits='4',geomesa.table.partition='time',geomesa.index.dtg='time',geomesa.attr.splits='4',geomesa.indices='z3:7:3:geo:time,z2:5:3:geo,attr:8:3:time,attr:8:3:cam:time',geomesa.z3.interval='week'
SignalBuilder~stats-date : []    2022-02-15T14:12:53.243Z
SignalBuilder~table.attr.cam.time.v8.02720 : []    myNamespace.geomesa_SignalBuilder_attr_cam_time_v8_02720
SignalBuilder~table.attr.time.v8.02720 : []    myNamespace.geomesa_SignalBuilder_attr_time_v8_02720
SignalBuilder~table.z2.geo.v5.02720 : []    myNamespace.geomesa_SignalBuilder_z2_geo_v5_02720
SignalBuilder~table.z3.geo.time.v7.02720 : []    myNamespace.geomesa_SignalBuilder_z3_geo_time_v7_02720
root@accumulo> 

when I try to see in hadoop claster I found that file

[g.rinchin@netris-cassandra-stage60-04 bin]$ sudo ./hadoop fs -ls /accumulo/classpath/myNamespace
Found 1 items
-rw-r--r--   1 root supergroup   46078040 2022-02-01 18:37 /accumulo/classpath/myNamespace/geomesa-accumulo-distributed-runtime_2.12-3.2.2.jar


I create namespace like this
root@accumulo> createnamespace myNamespace
root@accumulo> grant NameSpace.CREATE_TABLE -ns myNamespace -u root
root@accumulo> config -s general.vfs.context.classpath.myNamespace=hdfs://10.200.217.27:9000/accumulo/classpath/myNamespace/[^.].*.jar
root@accumulo> config -ns myNamespace -s table.classpath.context=myNamespace

thanks.

вт, 15 февр. 2022 г. в 23:10, Rinchin Gomboev <gomboev.rinchin@xxxxxxxxx>:
I try to see configuration on _stats table

root@accumulo> config -t myNamespace.geomesa_stats
-----------+-------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SCOPE      | NAME                                                        | VALUE
-----------+-------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
default    | table.balancer ............................................ | org.apache.accumulo.server.master.balancer.DefaultLoadBalancer
default    | table.bloom.enabled ....................................... | false
default    | table.bloom.error.rate .................................... | 0.5%
default    | table.bloom.hash.type ..................................... | murmur
default    | table.bloom.key.functor ................................... | org.apache.accumulo.core.file.keyfunctor.RowFunctor
default    | table.bloom.load.threshold ................................ | 1
default    | table.bloom.size .......................................... | 1048576
default    | table.cache.block.enable .................................. | false
default    | table.cache.index.enable .................................. | true
default    | table.classpath.context ................................... |
namespace  |    @override .............................................. | myNamespace
default    | table.compaction.major.everything.idle .................... | 1h
default    | table.compaction.major.ratio .............................. | 3
default    | table.compaction.minor.idle ............................... | 5m
default    | table.compaction.minor.logs.threshold ..................... | 3
default    | table.compaction.minor.merge.file.size.max ................ | 0
table      | table.constraint.1 ........................................ | org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
default    | table.durability .......................................... | sync
default    | table.failures.ignore ..................................... | false
default    | table.file.blocksize ...................................... | 0B
default    | table.file.compress.blocksize ............................. | 100K
default    | table.file.compress.blocksize.index ....................... | 128K
default    | table.file.compress.type .................................. | gz
default    | table.file.max ............................................ | 15
default    | table.file.replication .................................... | 0
default    | table.file.summary.maxSize ................................ | 256K
default    | table.file.type ........................................... | rf
default    | table.formatter ........................................... | org.apache.accumulo.core.util.format.DefaultFormatter
default    | table.groups.enabled ...................................... |
default    | table.interepreter ........................................ | org.apache.accumulo.core.util.interpret.DefaultScanInterpreter
table      | table.iterator.majc.stats-combiner ........................ | 10,org.locationtech.geomesa.accumulo.data.stats.StatsCombiner
table      | table.iterator.majc.stats-combiner.opt.all ................ | true
table      | table.iterator.majc.stats-combiner.opt.sep ................ | ~
table      | table.iterator.majc.stats-combiner.opt.sft-SignalBuilder .. | *geo:Point,time:Date,cam:String:keep-stats=true,imei:String,dir:Double,alt:Double,vlc:Double,sl:Integer,ds:Integer,dir_y:Double,poi_azimuth_x:Double,poi_azimuth_y:Double
table      | table.iterator.majc.vers .................................. | 20,org.apache.accumulo.core.iterators.user.VersioningIterator
table      | table.iterator.majc.vers.opt.maxVersions .................. | 1
table      | table.iterator.minc.stats-combiner ........................ | 10,org.locationtech.geomesa.accumulo.data.stats.StatsCombiner
table      | table.iterator.minc.stats-combiner.opt.all ................ | true
table      | table.iterator.minc.stats-combiner.opt.sep ................ | ~
table      | table.iterator.minc.stats-combiner.opt.sft-SignalBuilder .. | *geo:Point,time:Date,cam:String:keep-stats=true,imei:String,dir:Double,alt:Double,vlc:Double,sl:Integer,ds:Integer,dir_y:Double,poi_azimuth_x:Double,poi_azimuth_y:Double
table      | table.iterator.minc.vers .................................. | 20,org.apache.accumulo.core.iterators.user.VersioningIterator
table      | table.iterator.minc.vers.opt.maxVersions .................. | 1
table      | table.iterator.scan.stats-combiner ........................ | 10,org.locationtech.geomesa.accumulo.data.stats.StatsCombiner
table      | table.iterator.scan.stats-combiner.opt.all ................ | true
table      | table.iterator.scan.stats-combiner.opt.sep ................ | ~
table      | table.iterator.scan.stats-combiner.opt.sft-SignalBuilder .. | *geo:Point,time:Date,cam:String:keep-stats=true,imei:String,dir:Double,alt:Double,vlc:Double,sl:Integer,ds:Integer,dir_y:Double,poi_azimuth_x:Double,poi_azimuth_y:Double
table      | table.iterator.scan.vers .................................. | 20,org.apache.accumulo.core.iterators.user.VersioningIterator
table      | table.iterator.scan.vers.opt.maxVersions .................. | 1
default    | table.majc.compaction.strategy ............................ | org.apache.accumulo.tserver.compaction.DefaultCompactionStrategy
default    | table.replication ......................................... | false
default    | table.sampler ............................................. |
default    | table.scan.dispatcher ..................................... | org.apache.accumulo.core.spi.scan.SimpleScanDispatcher
default    | table.scan.max.memory ..................................... | 512K
default    | table.security.scan.visibility.default .................... |
default    | table.split.endrow.size.max ............................... | 10K
default    | table.split.threshold ..................................... | 1G
default    | table.suspend.duration .................................... | 0s
default    | table.walog.enabled ....................................... | true
-----------+-------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

but when I try to load it on table it is really can't load class. Class not found. How to fix this issue?

root@accumulo> createnamespace examples
root@accumulo> createtable examples.runners
root@accumulo examples.runners> setiter -t examples.runners -p 10 -scan -minc -majc -n decStats -class org.apache.accumulo.examples.combiner.StatsCombiner
2022-02-15 23:08:18,098 [shell.Shell] ERROR: org.apache.accumulo.shell.ShellCommandException: Command could not be initialized (Unable to load org.apache.accumulo.examples.combiner.StatsCombiner; class not found.)
root@accumulo examples.runners>
root@accumulo examples.runners>
root@accumulo examples.runners> setiter -t examples.runners -p 10 -scan -minc -majc -n decStats -class org.locationtech.geomesa.accumulo.data.stats.StatsCombiner
2022-02-15 23:09:07,538 [shell.Shell] ERROR: org.apache.accumulo.shell.ShellCommandException: Command could not be initialized (Unable to load org.locationtech.geomesa.accumulo.data.stats.StatsCombiner; class not found.)

Thanks.

вт, 15 февр. 2022 г. в 22:01, Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx>:
I think you probably have a misconfiguration in your accumulo tables. The _stats table created by GeoMesa needs an Accumulo combiner configured on it - if the combiner is not configured or can't be loaded, then reading stats won't work properly. Can you try to verify through the Accumulo shell that there is a StatsCombiner configured on the table, and that you can load that class through the shell?

Thanks,

Emilio

On 2/15/22 1:38 PM, Rinchin Gomboev wrote:
When I put the events to geomesa  accumulo client saved them using batch. And the estimated count in stats-count query by cam equals the count of events in last operation between opening writer and closing it.

Thanks.
--
Gomboev Rinchin 

try (FeatureWriter<SimpleFeatureType, SimpleFeature> writer = dataStore.getFeatureWriterAppend(
        SimpleFeatureUtils.TYPE.getTypeName(), Transaction.AUTO_COMMIT))

вт, 15 февр. 2022 г., 17:47 Rinchin Gomboev <gomboev.rinchin@xxxxxxxxx>:
Yes, maybe I incorrectly understand statistics. Why does stats-analyze remove statistics by cam?

I thought that the statistics persist for all indexes. And when data is persisted then it added new statistic data, not replace it.

I need the count of coordinates by the combination of area, by time, and by cam to make sampling coordinates. And return correct count. But estimated is so poor, and analyze is erase statistic by query cam.

вт, 15 февр. 2022 г., 17:40 Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx>:
Oh, I misunderstood your problem. Is the main issue that after running stats-analyze, you get an estimate of zero when you expect approximately 1000?

Thanks,

Emilio

On 2/15/22 9:17 AM, Rinchin Gomboev wrote:
this is my function for conversion from java Object to SimpleFeature

 package ru.netris.gps.geo;

import lombok.extern.slf4j.Slf4j;
import org.geotools.feature.simple.SimpleFeatureBuilder;
import org.geotools.feature.simple.SimpleFeatureTypeBuilder;
import org.geotools.geometry.jts.JTSFactoryFinder;
import org.geotools.referencing.crs.DefaultGeographicCRS;
import org.geotools.util.SimpleInternationalString;
import org.locationtech.geomesa.utils.geotools.SimpleFeatureTypes;
import org.locationtech.jts.geom.Coordinate;
import org.locationtech.jts.geom.GeometryFactory;
import org.locationtech.jts.geom.Point;
import org.opengis.feature.simple.SimpleFeature;
import org.opengis.feature.simple.SimpleFeatureType;
import ru.netris.gps.model.Azimuth;
import ru.netris.gps.model.Latitude;
import ru.netris.gps.model.Longitude;
import ru.netris.gps.model.events.BasicEvent;
import ru.netris.gps.model.events.GeoEvent;
import ru.netris.gps.model.events.VsvnEvent;

@Slf4j
public final class SimpleFeatureUtils {

  public static SimpleFeatureType TYPE = createFeatureType();
  public static GeometryFactory GEOMETRY_FACTORY = JTSFactoryFinder.getGeometryFactory();

  public static SimpleFeature toSimpleFeature(GeoEvent event) {
    log.info("save event {}", event);

    long time =
        event.getTime() > (long) Integer.MAX_VALUE ? event.getTime() : event.getTime() * 1000L;
    String EventID = event.getCam() + "-" + time /*event.time*/;

    // if altitude > 0, add altitude to coordinate ?
    Point point = GEOMETRY_FACTORY.createPoint(
        new Coordinate(event.getLon().getValue(), event.getLat().getValue())); // Point(X, Y)
    SimpleFeature f = new SimpleFeatureBuilder(TYPE).buildFeature(EventID); // SimpleFeature

    f.setAttribute("geo", point);
    f.setAttribute("time", new java.util.Date(time)); // (event.time));

    f.setAttribute("cam", event.getCam());
    f.setAttribute("imei", event.getImei());
    f.setAttribute("dir", event.getDir());
    f.setAttribute("alt", event.getAlt());
    f.setAttribute("vlc", event.getVlc());

    f.setAttribute("sl", event.getSl());
    f.setAttribute("ds", event.getDs().toInt());
    f.setAttribute("dir_y", event instanceof VsvnEvent ? ((VsvnEvent) event).getDirV() : null);

    f.setAttribute("poi_azimuth_x",
                   event instanceof VsvnEvent ? ((VsvnEvent) event).getPoi() : null);
    f.setAttribute("poi_azimuth_y",
                   event instanceof VsvnEvent ? ((VsvnEvent) event).getPoiV() : null);
//         log.info("SimpleFeature: " + f);
    return f;
  }

  public static BasicEvent toBasicEvent(SimpleFeature sf) {
    Point pt = (Point) sf.getAttribute("geo");
    return new BasicEvent(
        (String) sf.getAttribute("cam"),                        // camId
        ((java.util.Date) sf.getAttribute("time")).getTime(),   // dateTime
        new Latitude(pt.getY()),                                      // lat (Y)
        new Longitude(pt.getX())                                     // lon (X)
    );
  }

  public static SimpleFeatureType createFeatureType() {
    SimpleFeatureTypeBuilder builder = new SimpleFeatureTypeBuilder(); // SimpleFeatureTypeBuilder
    builder.setName("SignalBuilder");
    builder.setCRS(DefaultGeographicCRS.WGS84);    // <- Coordinate reference system
    builder.setDescription(new SimpleInternationalString("GPS event type"));
    // index attributes
    builder.setDefaultGeometry("geo");
    builder.add("geo", Point.class);
    builder.add("time", java.util.Date.class);  // date/time
    // user attributes
    builder.add("cam", java.lang.String.class); // String (!) UUID camId
    builder.add("imei", java.lang.String.class); // String IMEI
    builder.add("dir", java.lang.Double.class); // direction
    builder.add("alt", java.lang.Double.class); // altitude
    builder.add("vlc", java.lang.Double.class); // velocity
    // extra
    builder.add("sl", java.lang.Integer.class); // signal level
    builder.add("ds", java.lang.Integer.class); // data status
    builder.add("dir_y", java.lang.Double.class); // directionY

    builder.add("poi_azimuth_x", java.lang.Double.class); // camera poi x (направление камеры)
    builder.add("poi_azimuth_y", java.lang.Double.class); // camera poi y (направление камеры)
    return SimpleFeatureTypes.immutable(makeSFT(builder), null);
  }

  // build the type
  private static SimpleFeatureType makeSFT(SimpleFeatureTypeBuilder builder) {
    SimpleFeatureType sft = builder.buildFeatureType();
    sft.getUserData().put("geomesa.feature.expiry",
                          "time(30 days)"); // Age-off filter by "time" field
    sft.getUserData().put("geomesa.indices.enabled", "z3,z2,attr:time,attr:cam:time");

    sft.getUserData().put("geomesa.z3.interval", "week");
    sft.getUserData().put("geomesa.table.partition", "time");
    sft.getUserData().put("geomesa.index.dtg", "time");
    sft.getUserData().put("geomesa.z.splits", "4");
    sft.getUserData().put("geomesa.attr.splits", "4");

    sft.getDescriptor("cam").getUserData().put("index", "true");
    sft.getDescriptor("time").getUserData().put("index", "true");
    sft.getDescriptor("geo").getUserData().put("index", "true");

    sft.getDescriptor("cam").getUserData().put("keep-stats", "true");

    return sft;
  }
}
where the domain is next
@Getter
public class VsvnEvent extends GeoEvent {

  private final Double dirV;
  private final Double poi;
  private final Double poiV;

@Getter
public class GeoEvent extends BasicEvent {

  private final Double alt;
  private String imei;
  private final Double vlc;
  private Double dir;
  private final Integer sl;
  private final DataStatus ds;

@Getter
public class BasicEvent {

  protected final String cam;
  protected final Long time; // date/time in mills
  protected final Latitude lat;
  protected final Longitude lon;

I can't find any error. Thanks

вт, 15 февр. 2022 г. в 16:50, Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx>:
So it seems like you've written all your features with the same cam value. I don't see anything wrong with the way you're writing features, but you should check your input data and your conversion to simple features to see if you're incorrectly copying the same cam value.

Thanks,

Emilio

On 2/15/22 8:43 AM, Rinchin Gomboev wrote:
Thank you very much for fast reply

the result is returned
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo export -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder --attributes cam --no-header | sort -u
INFO  Running export - please wait...
INFO  Feature export complete to standard out for 1000 features in 3199ms
0000c1fe-a727-4a86-9eee-5b99d21038ea

вт, 15 февр. 2022 г. в 16:10, Emilio Lahr-Vivaz <elahrvivaz@xxxxxxxx>:
Hello,

Are you sure that you're writing distinct cam values for each feature? You could try running:

./geomesa-accumulo export -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder --attributes cam --no-header | sort -u

and see how many unique cam values come back that way.

Thanks,

Emilio

On 2/15/22 7:32 AM, Rinchin Gom@Getter
public class BasicEvent {

  protected final String cam;
  protected final Long time; // date/time in mills
  protected final Latitude lat;
  protected final Longitude lon;boev wrote:
Hello, everyone.

I try to write an application using geomesa with accumulo.
I have a problem stats not gathered.
I have empty namespace in accumulo. Create a schema like from java code:
like
// build the type
private static SimpleFeatureType makeSFT(SimpleFeatureTypeBuilder builder) {
  SimpleFeatureType sft = builder.buildFeatureType();
  sft.getUserData().put("geomesa.feature.expiry",
                        "time(30 days)"); // Age-off filter by "time" field
  sft.getUserData().put("geomesa.indices.enabled", "z3,z2,attr:time,attr:cam:time");

  sft.getUserData().put("geomesa.z3.interval", "week");
  sft.getUserData().put("geomesa.table.partition", "time");
  sft.getUserData().put("geomesa.index.dtg", "time");
  sft.getUserData().put("geomesa.z.splits", "4");
  sft.getUserData().put("geomesa.attr.splits", "4");

  sft.getDescriptor("cam").getUserData().put("index", "true");
  sft.getDescriptor("time").getUserData().put("index", "true");
  sft.getDescriptor("geo").getUserData().put("index", "true");

  sft.getDescriptor("cam").getUserData().put("keep-stats", "true");

  return sft;
}

 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo describe-schema -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder                                            
INFO  Describing attributes of feature 'SignalBuilder'
geo           | Point   (Spatio-temporally indexed) (Spatially indexed)
time          | Date    (Spatio-temporally indexed) (Attribute indexed)
cam           | String  (Attribute indexed)
imei          | String  
dir           | Double  
alt           | Double  
vlc           | Double  
sl            | Integer
ds            | Integer
dir_y         | Double  
poi_azimuth_x | Double  
poi_azimuth_y | Double  

User data:
  geomesa.attr.splits     | 4
  geomesa.feature.expiry  | time(30 days)
  geomesa.index.dtg       | time
  geomesa.indices         | z3:7:3:geo:time,z2:5:3:geo,attr:8:3:time,attr:8:3:cam:time
  geomesa.stats.enable    | true
  geomesa.table.partition | time
  geomesa.z.splits        | 4
  geomesa.z3.interval     | week

And put 1000 geocoordinates like this
  private Integer writeDataInternal(List<GeoEvent> events) throws IOException {

    if (events == null || events.isEmpty()) {
      return 0;
    }

    int count = 0;

    //запись в geomesa
    try (FeatureWriter<SimpleFeatureType, SimpleFeature> writer = dataStore.getFeatureWriterAppend(
        SimpleFeatureUtils.TYPE.getTypeName(), Transaction.AUTO_COMMIT)) {

      for (GeoEvent event : events) {
        SimpleFeature feature = SimpleFeatureUtils.toSimpleFeature(event);
        String event_id = feature.getID();
        if (!event_id.contains(event.getCam())) {
          log.info("event not contain camId");
        }
        SimpleFeature toWrite = writer.next();
        toWrite.setAttributes(feature.getAttributes());
        toWrite.getUserData().put(Hints.PROVIDED_FID, event_id);
        toWrite.getUserData().putAll(feature.getUserData());

        writer.write();
        count++;
        log.info("Event id = {}, for event = {}", event_id, event);
      }

    } catch (Exception e) {
      log.error("Geomesa write error", e);
    }
    return count;
  }



the result

 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder                                                
Estimated count: 1000
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='0000c1fe-a727-4a86-9eee-5b99d21038ea'"
Estimated count: 950
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin                                                                                                                                                                           
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin 
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin 

after analyze it removes all statistics for that cam  
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-analyze -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder                                              
INFO  Running stat analysis for feature type SignalBuilder...
INFO  Stats analyzed:
  Total features: 1000
  Bounds for geo: [ 37.598174, 55.736823, 37.681424, 55.820073 ] cardinality: 981
  Bounds for time: [ 2022-02-22T11:46:42.000Z to 2022-02-22T12:20:00.000Z ] cardinality: 957
  Bounds for cam: [ 0000c1fe-a727-4a86-9eee-5b99d21038ea to 0000c1fe-a727-4a86-9eee-5b99d21038ea ] cardinality: 1
INFO  Use 'stats-histogram', 'stats-top-k' or 'stats-count' commands for more details
 ~/bin/geomesa-accumulo_2.12-3.2.2/bin  ./geomesa-accumulo stats-count -c myNamespace.geomesa -z 10.200.217.27 -i accumulo -u root -p qweasd123 -f SignalBuilder -q "cam='0000c1fe-a727-4a86-9eee-5b99d21038ea'"
Estimated count: 0

Maybe the reason is on accumulo server there is no spark?

How to get statistics? Thank you

--
Rinchin Gomboev


_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxx
To unsubscribe from this list, visit https://dev.eclipse.org/mailman/listinfo/geomesa-users

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxx
To unsubscribe from this list, visit https://dev.eclipse.org/mailman/listinfo/geomesa-users


--
Rinchin Gomboev


_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxx
To unsubscribe from this list, visit https://dev.eclipse.org/mailman/listinfo/geomesa-users

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxx
To unsubscribe from this list, visit https://dev.eclipse.org/mailman/listinfo/geomesa-users


--
Rinchin Gomboev


_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxx
To unsubscribe from this list, visit https://dev.eclipse.org/mailman/listinfo/geomesa-users

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxx
To unsubscribe from this list, visit https://dev.eclipse.org/mailman/listinfo/geomesa-users

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxx
To unsubscribe from this list, visit https://dev.eclipse.org/mailman/listinfo/geomesa-users

_______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxx
To unsubscribe from this list, visit https://dev.eclipse.org/mailman/listinfo/geomesa-users


--
Rinchin Gomboev



--
Rinchin Gomboev



--
Rinchin Gomboev


Back to the top