| Hi Ben, 
 If there is a geometry and date field in your simple feature type,
    then geomesa will use that for indexing. If you have more than one
    date or geometry field, you can indicate which ones you want to be
    used for the index - more below. If nothing is indicated, I believe
    geomesa will default to the first one declared. You can check which
    fields are indexed by scanning the geomesa metadata table in the
    accumulo shell. The exact entries will vary by version, but you can
    probably figure it out - if not please reply back with the scan
    output and we can parse it for you.
 
 Exactly how you indicate the defaults depends on how you are
    creating your simple feature type. For dates, the end result should
    be that there is a user-data entry for "geomesa.index.dtg" set to
    the name of your date field. For geometries, the default should be
    returned by simpleFeatureType.getGeometryDescriptor().
 
 If you just have lat/lon, you will need to turn them into an actual
    geometry type in your simple feature type.
 
 During ingestion, you can then set the indexed fields to whatever
    values you want (sys time, provided time, etc).
 
 Thanks,
 
 Emilio
 
 
 On 02/14/2017 10:35 AM, Benjamin Weaver
      wrote:
 
      
      
      
        This email (and any attachments) may contain confidential
      information and is intended solely for the recipient(s) to whom
      the email is addressed. If you received this email in error,
      please inform us immediately and delete the email and all
      attachments without further using, copying or disclosing the
      information. This email and any attachments are believed to be,
      but cannot be guaranteed to be, secure or virus-free. Satellite
      Applications Catapult Limited is registered in England &
      Wales. Company Number: 7964746. Registered office: Electron
      Building, Fermi Avenue, Harwell Oxford, Didcot, Oxfordshire OX11
      0QR.Thanks, Emilio, 
 
 There is a lot of very valuable information here. 
 Two questions just to clarify (you were clear in your
          answers--the lack of clarity is in my understanding of
          things): 
 1. How would we index in Geomesa on latitude, longitude, and
          a time we provide from our own data, i.e. not a system
          generated timestamp?
 
          
 Hi Ben, 
            1. The key used to write in geomesa depends on the
            particular index, but it will always include the feature ID,
            so if the feature ID changes you will get a duplicate
            record.
             
            2. If you're using our converter framework, we do have some
            methods to use an MD5 of the values as the feature ID, which
            will prevent duplicates. If not, you can do the same thing
            by generating the feature ID yourself and setting the
            PROVIDED_FID or USE_PROVIDED_FID hint. We also have a
            pluggable SPI interface for generating feature IDs when they
            aren't set. See
            http://www.geomesa.org/documentation/user/datastores/runtime_config.html#geomesa-feature-id-generator .
            By default we generate a UUID that includes parts of the Z3
            index, so that features grouped in space-time will also be
            grouped in accumulo. Note that the feature ID is a string
            and has no inherent restrictions on form.
             
            3. The Z3 index uses the default date attribute to index
            records, not the insertion time.
             
            Let me know if anything isn't clear!
             
            Thanks,
             
            Emilio
            
             On 02/12/2017 03:28 PM,
              Benjamin Weaver wrote:
 
              
                This email (and any attachments) may contain confidential
              information and is intended solely for the recipient(s) to
              whom the email is addressed. If you received this email in
              error, please inform us immediately and delete the email
              and all attachments without further using, copying or
              disclosing the information. This email and any attachments
              are believed to be, but cannot be guaranteed to be, secure
              or virus-free. Satellite Applications Catapult Limited is
              registered in England & Wales. Company Number:
              7964746. Registered office: Electron Building, Fermi
              Avenue, Harwell Oxford, Didcot, Oxfordshire OX11 0QR.Hi all, 
 If we ingest, say, the same line of text data twice
                  (by mistake) in Geomesa 1.2.1 we end up with duplicate
                  data in our Accumulo (1.7.2) database. We are
                  ingesting using Gemesa-generated featureIDs (setting
                  our featureBuilder.setFeatureID to NULL without the
                  use of Hints). 
 A colleague asked me, why are duplicates generated in
                  this case? I realized I did not know. 
 1. How, exactly, in our
                    configuration of geomesa + Accumulo, is a geomesa row,
                    or record made unique? I know the importance of
                    Accumulo logical rows, but in this case of
                    identical data we would want to insure insertation
                    of only one geomesa record, namely, one
                    instance of our geomesa SimpleFeature.
 
 1a. Are duplicate geomesa rows added because the time
                  at insertion differs? or because different featureIDs
                  are randomly generated on each insertion? 
 
 Potentially related questions: 
 2. How are featureIDs generated by geomesa? I thought
                  randomly, but I read a comment somewhere
                  suggesting that FeatureIDs were created out of an md5
                  hash of all the values in the feature. But a colleague
                  points out that even if this is so, a featureID does
                  not resemble an md5 hash, so must be composed at least
                  partially by other means 
 3. A potentially related question: can we create a z3
                  index by using a data-derived timestamp--not the
                  insertion timestamp-- as the time dimension? 
 All comments and perspectives are appreciated and
                  welcome! 
 Ben Weaver 
 
 
 
 
 
 _______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users 
 
 _______________________________________________
geomesa-users mailing list
geomesa-users@xxxxxxxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.locationtech.org/mailman/listinfo/geomesa-users 
 |