[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
[cdt-patch] Indexer Requirements for 3.0
|
Here is a list of indexer requirements
for the 3.0 release. The overall focus for the indexer for 3.0 is on improving
the scalability and overall usability of the indexer service for all clients.
Feedback is always appreciated - especially if you think there are missing
requirements. A more detailed design doc is to follow.
Thanks,
Bogdan
Title: Indexer 3.0
Indexer
Requirements for 3.0
This
document describes the proposed work items for the indexer for the CDT
3.0 release. |
![]() |
Author |
: Bogdan Gheorghe
|
Revision Date |
: 11/29/2004 - Version: 0.1.0 |
|
: 01/10/2005 - Version: 0.1.1
|
Change History |
: 0.1.0 - Document Creation |
|
: 0.1.1 - Revision
|
Table of Contents
The Indexer has been around since CDT 1.2 and currently provides
support for Search, Navigation, and Refactoring. Its main purpose is to
provide rapid access to a complete database of code elements and to
manage this database in an efficient and non-intrusive manner. As the
CDT has evolved; so has the indexer - adding more elements to the
index, refining job scheduling, providing feedback mechanisms for
indexes. Although the indexer is sufficiently developed to provide most
requested information to clients; it has become clear that the next
step in the indexer's evolution will have to address its ability to
handle very large projects efficiently.
Having the indexer work well on large scale projects requires some
new architecture to reduce the amount of time spent indexing as much as
possible, reuse existing indexes as much as possible and provide users
with mechanisms to extend the index framework.
This document will address main requirements on the indexer for CDT
3.0.
1.0 Definitions
Resource
|
A project, folder or file within
the Eclipse workspace
|
Index profiles
|
Separate indexes that are
created for different configurations of include paths/symbols
|
1.1 Current Architecture
Quick overview of the indexing architecture:
- The indexer responds to resource events from the workbench. These
events occur whenever a resource gets created, modified or deleted.
- The indexer will create index jobs based on the resource events.
These jobs might schedule other jobs (such as in the case of indexing
an entire project) but most index jobs eventually boil down to an
AddCompilationUnitToIndex job.
- The indexer creates a new parser, passes in the current include
paths and symbol definitions and parses the file and any other files
included by the file in full parse mode (which generates cross
reference information).
- The index gets created as the parser returns information about
the elements in the file - the index is stored in memory and at certain
intervals gets merged with the persisted index on disk.
Currently the indexes have the following structure:
- Index Version Number
- Summary Block Location
- File Blocks [1 ... N]:
- Each full file block is 8k
- File block entries associate the path of a file with a unique ID
- Word Blocks [1 ... N]:
- Each full word block is 8k
- Word block entries encode the element in a special format and
add the referring file number
- Include Blocks [1 ... N]:
- Each full include block is 8k
- Include block entries associate a file with the initial file
that was parsed to get to the current file
- Summary Block:
- Keeps track of the total number of words, files and include
entries in the index
- Keeps track of the first file block number, the first word
block number and the first include block number
- Keeps track of the first file for every File Block, the first
word entry for every Word Block, and the first include for every
Include Block
1.1 Constraints
Number
|
Description
|
C1
|
Multiple
indexer profiles not possible until similar build configuration notion
appears
In order to get an accurate index the indexer depends on being able to
pass on the relevant includes/symbols to the parser. As long as
there is no standard representation for build configurations in the
core model for all build
types (both standard and managed), it isn't possible to enable index
profiles.
Bug
25682: Indexer Profiles
|
List of requirements for the indexer is classified into following
categories:
2.1 Index Management Requirements
Number
|
Priority
|
Description
|
R1
|
P1
|
Indexer
must provide different types of indexing services
It has become apparent that the one-size-fits all indexing approach
does
not meet all of our users needs. Most clients with existing legacy
projects want some form of search/navigation but some don't want the
hassle of having to wait for an entire project to index fully before
being able to use search/navigation. Thus, in order to accommodate both
sets of user groups (those who are willing to wait for a full index to
complete and those who just want the absolute bare minimum index) we
need to offer the following indexer options:
- Full Index: this is the "regular" index mode which uses the
CDT parser (include paths and symbol definitions need to be setup
properly); everything
gets indexed
- Quick Index with no setup: this will result in a "best
effort" bare bones index that should enable some navigation/search
These indexer options are per project - so it is possible to have
different indexers for different projects.
Bug
69078: C/C++ indexer too slow
|
R2
|
P1
|
Indexes
should be shareable between team members
As part of streamlining the indexing process for large projects,
indexes should be able to be shared between users working on the same
project. All index entries should make use of path variables in order
to allow indexes to be translated into different workspace locations.
(See Scanner Config Correctness
Enhancement FDS for more details about path variables)
Bug
79661: All Index Entries should make use of Path Variables
Bug
79518: Path/Variable Manager support service in the core(string
substitution)
|
R3
|
P1
|
Indexer
should be able to index a project offline
For features that require a complete index it would be
ideal to have the index be created somewhere separately and imported in
at a later date. This is especially true for medium to large projects
that need a long time to index.
Bug
74433: Offline Indexing/Index Hierarchy
|
R4
|
P1
|
Indexer
should be able to merge indexes
With all of the new options for indexing, it is
conceivable that any project might have several sources to look at for
a single index. The indexer should be able to merge new index
information into an existing index (through a user action), provided
that both index formats are alike.
Bug 52126:
Indexer should maintain per project indices
Bug
74433: Offline Indexing/Index Hierarchy
|
R5
|
P2
|
Indexer
should allow user to specify indexer settings
Currently the indexer uses the same settings for all projects. This
might not suit all users. Indexer settings that should be customizable
include:
- Indexing policy: (default normal)
- normal (always up to date)
- manual (index only when manually requested)
- static (don't update current index)
- after build (don't index until build)
- Index Progress Bar displayed: (Checkbox, default displayed)
- Indexer default setting for new projects:
- Currently the indexer is always on when creating a new
project. This should be changed to allow users to set the default index
behaviour when creating a new project
Bug
75884: Allow C/C++ Indexing to be set on or off as a default setting
|
R6
|
P1
|
Index
Manager must improve job scheduling smarts
The indexer currently has limited smarts when it comes to job
scheduling; it will prevent the same job from being queued up. But
there are other circumstances when being more aware of what's in the
job queue would solve numerous problems: including the ever recurring
double index, and source folder changes. Essentially, more information
needs to be added to individual job requests about the event that
created the job in order to enable the index manager to make smart
choices about how to best schedule the jobs.
The index manager should also run at the minimum priority.
Bug
60084: Indexer should reduce its priority when running in background
|
R7
|
P1
|
Indexer
must try to keep as much as possible from all failed index attempts
The indexer needs to persist a list of all files that are
to be indexed as part of an index job. As each merge happens, the list
is updated. In the event of a crash, all work that has been merged to
disk shall be considered
as sane and the indexer will restart on the remaining files on the next
startup.
Bug 62366: [Index] Need ability to read a partial index and resume
|
R8
|
P1
|
Indexer
needs to deal with new path/symbol addition/deletion gracefully
The indexer currently reindexes the entire project on the
addition/deletion of each new path/symbol. With the new per-file
scanner settings, we should only index the range of files that are
affected. (See Scanner Config
Correctness Enhancement FDS for more details about per file
scanner settings).
|
R9
|
P1
|
Indexer
needs to handle Source Folder changes gracefully
The indexer currently does a brute force reindex of whenever anything
changes in the source folder view. This needs to change to a more
scalable solution.
|
R10
|
P4
|
Resources
can be added manually to the index
Users can request a new index on any resource in the workspace
(including files that are included from external directories) by
selecting one or more files and choosing the appropriate option
from the context menu.
Bug
71821: Action on the indexer to parse file/folder
|
R11
|
P2
|
FileType
changes should trigger indexes
Changing the file type settings on a project might introduce new file
types as extensions that have not been indexed as of yet. The indexer
should react in accordance to the users settings (as defined in R5).
Bug 72396: The indexer is not aware of the File Type (ResolverModel)
|
R12
|
P1
|
Indexer
Extension point
If all the current versions of the indexer don't meet the users
requirements, the CDT should provide an extension point that will allow
users to write their own indexer that will populate the index. Provided
the information is complete, all index-based CDT features should work
as normal.
|
R13
|
P4
|
Index
Manager should allow for ongoing search while indexing
The indexer should be able to compare incoming index entries with
any pending search queries and return the matches.
Bug
72803: [Performance][Usability][Indexer/Search] Waiting policy for
index results
Bug
53792: [Scalability] Prioritizing the indexing when searching a working
set
|
R14
|
P1
|
Index
should provide an interface to allow clients to determine what features
are available with the current indexes
With the prospect of having various levels of detail available in a
project's indexes, the Index Manager needs to be able to provide an
interface that will answer any client whether there is sufficient
information in the index to run the client's service. If the necessary
index detail is missing, it would be up to the client to ask the user
if they wish to schedule a new index.
|
2.2 Index Content Requirements
Number
|
Priority
|
Description
|
R15
|
P1
|
Indexer
will provide enough information to run searches without a second parse
Currently the index stores just the location information for the
entries - this currently causes search to require two separate parses:
one for the initial index and another to determine the offset
information. The indexer should store this offset information in the
initial index and thus avoid the second parse. Enough information must
be available in the index to answer all of the possible MatchLocator
queries. Additional info that needs to be added to the indexer includes:
- function/method parameters
- if a variable has an initializer clause, extern specifier
or linkage specification (needed to determine if it is a definition)
- if a field is static (needed to see if we need to
check for definition)
Bug
74427: Indexer needs to store more info
|
R16
|
P3
|
References
in the index should be tied into their declarations
We need to be able to match up references to their declarations
(possible requirement for refactoring support).
Bug
69606: [Search] Match locator has to make sure that the reference
belongs to the specified declaration
|
R17
|
P1
|
New Indexer
needs to be written for new AST
As the CDT switches over to
the new AST, the indexer will have to be rewritten to extract
information from the AST.
|
2.3 Problem Markers
3.1 UI enhancements
Following UI enhancements are planned to support the feature:
Number
|
Description
|
UE1 |
Indexer
Options Preference Page
This page will allow users to make changes to the Indexer
that affect the entire workspace:
- Index Progress Bar displayed
- Indexer New Project Default Setting
|
UE2
|
Indexer
Project Properties Page
This page will allow users to
set indexer settings per project:
- Indexer to use: if a number of indexers are available (ie.
SourceIndexer, SourceIndexer2, CTagsIndexer) , users can specify which
indexer should be used for the entire workspace.
- Indexing policy to use
|
- Scanner Config Correctness Enhancement FDS
Last Modified on Monday, January 10, 2005