Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[stellation-res] Revised version of Jonathan Gossage's workspace audit

This note contains a revised version of the workspace code audit
from Jonathan Gossage (jgossage@xxxxxxxx). I greatly appreciate the
care and attention to detail he demonstrated in preparing his audit.

The original version contains about 213 items, over 130 of which note
that the Javadoc is not consistent with the code. Many others contain
a request to name return values from methods or to provide explicit
names for string literals.

I knew Javadoc was out-of-date when I posted the code, but wanted to
give others a chance to see the form the code had taken. I have
already started a line-by-line review of all the workspace code, with
a principal goal of updating the Javadoc comments.

I will address the issue of 'using return codes' in a separate thread.

I will also address the issue of 'naming literals' in a separate
thread.

The following revision contains the remaining points, renumbered to
assist in further discussion. I will post a reply to some of these
points in a followup note. I do plan to address all of them
eventually.

dave


org.eclipse.stellation.workspace.Change

1. Should use the default platform directory separator instead of a hard
coded '/'.


2. The method "fileNotFound" should log through log4J. If the user
needs this output on sydout than use the appropriate appender. Logging
directly to stdout is problamatic since stdout will not be usefully
available in all environments (such as within Eclipse).

3. The method "inform(String") needs a Javadoc entry.

Is stdout an appropriate destination if this method invoked from
within Eclipse? Would an optional callback mechanism make sense?

4.After discovering and reporting that the file does not exist in the
workspace should either an exception be thrown or a silent exit
taken. It does not make sense to proceed.

5. The method "synchronize" should use symbolic return codes. It
should also use the return codes from the lower level "rm" and "add"
calls. At the least it should accurately return the overall status of
the operation.

5A.Should the synchronize operation terminate if errors are encountered? 
If so it should rollback any changes and I don't see that the current
environment has the bookeeping to do this.


5B.This method does not honour return codes from lower level
methods.Should it do so and should a rollback mechanism be
implemented?

org.eclipse.stellation.workspace.Checkin

6. The method "markExecutables" is now a nop and has only one
invocation from in this class. Should this both be removed?

org.eclipse.stellation.workspace.Checkout

7. The call to the Unix utility "sleep" seems a bit heavy-weight when
you can easily use the Java Object.Wait(time) calls. Incidentally
Windows has the same problem with file modified time, so this should
not be Unix specific anyway.

org.eclipse.stellation.workspace.Configuration

8. In method "addTypes" the Javadoc parameters have a spelling error
in "patterm".


9. In method "defaultSuffixes" there are a number of common Windows
suffixes that should be added, including the following:

  The following should be data types
  "exe", "sys", "lib"

  The following should be ignore type
  "obj"

  The following should be text type
  "cxx", "hpp", "bat", "bas", "xml" (maybe, not sure of the implications of                                      embedded CDATA)
  "txt", "def"

10. The method "defaultTypes" should use symbolic names for default
pattern names.  These names are used elsewhere.


org.eclipse.stellation.workspace.Configure


11. Also is the character '/' in the constant "/.*" intended to be
interpreted as a path component separator? If so it should be
represented as the default platorm separator.


org.eclipse.stellation.workspace.Context

12. The "getComment" method needs updated Javadoc with @param, @return
and @throws specs.

If you prompt for a comment in command line mode, you should tell the
user how to end the comment. Many users have forgotten or do not know
how to type an EOF from the keyboard. As a quick check I asked our
Unix developers where I work and not one could remember how to do
this. I couldn't either!


org.eclipse.stellation.workspace.Create

13. "create" method Should this not be made a static method of
workspace.Project? As it stands it simply looks like a kuldgy way of
getting a non-class method.

Given that all the necessary functionality is in Project is there
really any need for a "do nothing" class just to support the create
command. If you were going to add the ability to create more types of
objects than simply projects than this class would make sense.


org.eclipse.stellation.workspace.Differ

14. The "command" method should use symbolic names for options.

15. The Javadoc comment for the "compare" method is a little
misleading. It should make clear that what is being compared is the
text artifacts in the current workspace with a branch of the same
project in the repository. Part of the confusion rises from the fact
that a project object represents a branch and it's materialization in
a workspace, not an entire project. Also it is missing a @param spec
for the second project.


16. The method "getRevisionProject" looks as if it belongs in the
Project class. It offers no functionality that is specific to the
differ command but simply implements some functionality that uses the
Project class.

17. Method "getStateProject" nedds a Javadoc update - needs @throws
spec.  Again this method really looks like it belongs in the Project
class.


org.eclipse.stellation.workspace.Export

18. Should use platform default dpath component separator rather than hard
coded constant.


org.eclipse.stellation.workspace.Files

19. Javadoc comment for "closeContentDirectories" is misleading as
perhaps is the method name. The content directores are being deleted,
not closed.

20. The two file copy methods will benefit significantly from an
increase in the buffer size used from 4096 to 65536. The reason for
this has to do with the caching strategies used with both Windows and
Linux will get a strong hint to read ahead aggressively which is
exactly what you want when doing file copies. The only cost is a
larger tan necessary buffer with small files, but that cost is less
than the overhaed involved in discovering the file size and tuning the
buffer size.

21. In the method "emptyArchive" should it be an error to encounter a
sub-directory in the archive directory? If not is it intentional to
leave any discovered sub-directories alone.

22. The method "find" is not going to work properly on Windows systems
unless Cygwin and the GNU utilities are installed. For the moment,
this is acceptable, but over the long term it seems a bit much to have
to install Cygwin simply to get access to "find" and "diff", and at
some point alternates for these utilities should be found or
developed.

23. The method "findFileInfo" can be given an alternative
implementation using Java IO only in a Windows environment. The reason
for this is that in Windows you are only interested in regular files
and directories and do not have to worry about executable files or
symbolic links. This would remove the need for the "find" utility in
Windows since the only use Stellation seems to make of it is to
classify the workspace files. Of course, if you are willing to allow
JNI, then there is no need for the "find" utility in a Unix
environment either.


24. The method "getCanonicalPath" will not work correctly as it stands
since it will not properly normalize all Windows paths. The problem is
that in Windows both '\' and '/' are allowed as path component
separators in all places except the command line and can be intermixed
in the same path. The easiest solution would be to add a first pass
that converts all instances of path separators into the default path
separator for the platform. This will also fix numerous downstream
problems where other methods try to parse path strings to identify
directories.

This routine also needs to deal with UNC prefixes and/or drive letters
on Windows

25. The method "getContentDirectory" can easily be modified to not
require looping to determine the next directory name to
allocate. Simply get a list of the files and directories in the
context root directory and use the list size as the file name for the
next directory. This will be more efficient than testing files one at
a time.


26. Should the method "getCurrentDirectory" throw an exception if it
is not defined. This represents a serious software error and I doubt
that any of this methods consumers are prepared to handle a null
pointer.

27. The method "getExecutables" should return an empty StringSet
immediately on Windows since the Unix concept of executable file
attribute is not applicable. I don't think that it is necessary to
simulate this because even if a Windows executable gets stored on a
Unix file system it is not likely to be executed!

28. The method "getLinkTarget" should return a null immediately on a
Windows platform since soft links are not available at application
level.

29. The method "getMembers" should check against the project path
symbolically rather than using hard coded path.

30. The method "getName" is not guaranteed to work on Windows unless
the path is in the form of an absolute canonical path. This is because
if the path has not been normalized you cannot guarantee what path
separator is present. For safety's sake all paths should be normalized
to use the default path component separator for the active platform.

31. The method "getParentName" is not guaranteed to work on Windows
unless the path is in the form of an absolute canonical path. This is
because if the path has not been normalized you cannot guarantee what
path separator is present. For safety's sake all paths should be
normalized to use the default path component separator for the active
platform.

32. The method getTime can be refactored a bit to eliminate redundant
code when the Java File class can be used.

33. The ASCII tests in "getTypeFromContents" are only valid for basic
ASCII chaacter sets. Ultimately, Stellation should be able to deal
with files that use any encoding and any character set, but this is a
substantial area that is better dealt with later.


34. In the method "getWorkspaceFiles" the direcctory name for the
project directory should not be hard coded.


The Perl compiler should be configured to generate case insensitive
patterns on a Windows platform.


35. "isWorkspacePath" will only work reliably under Windows if path
component separators have been normalized to the platform
default. Javadoc needs upgrading to add @param spec.


35A.If the call "renameTo" fails should revert to doing a file copy/delete
sequence. There are too many legitimate cases on both Windows and Unix
to give up without a final try.

36. The method "normalizeText" neds the Javadoc entry revised to
reflect new implementation.

I have serious doubts about the usefulness of this operation. It is
costly and given that all internal operations are done on text
artifacts where line termination characters have been removed during
the construction process I don't see the benefit.

37. The method "removeFile" returns one of a set of small integers as
a return code. This seems to be a fairly common idiom in
Stellation. Should these values not be given symbolic names and used
consistently throughout the code base?


org.eclipse.stellation.workspace.GetProperty


38. The method "getProperty" should use symbolic values for exit codes.
Also the comments indicate an uncertainty over the return code set
when a named property is not present. This should be resolved but
given that this method is called from the command line it is probably
better to leave things as they are so external scripts can know what
is happening.


org.eclipse.stellation.workspace.List


org.eclipse.stellation.workspace.Location

39. The constructor Javadoc should be updated with @param spec. Also,
database names should be referenced symbolically and other location
option names should also be symbolic. For the longer term this method
probably needs to be initialized with static tables defining protocol
specific options that may be present in the location string. This can
be done during database protocol loading.

40. The Javadoc for method "getHost" needs updating since this call is
valid for some local protocols such as MySQL. However, it is probably
better to replace the parameter specific calls "getHost" or "getPort"
with generic calls that take the name of the option as a parameter and
always return a string. This mechanism will fit with the table driven
parameter identification mechanism.

41. A specific variable to identify a local protocol is now required
since some local protocols may use socket connections and need host
and port parameters.


org.eclipse.stellation.workspace.Merge


42. The method "cleanCompound" should use the platform default path
component separator in the logging statements for tidiness and
consistency when normalized path names are used.

org.eclipse.stellation.workspace.Project


43. The method "buildPaths(String)" needs a Javadoc entry.  The
variable _pathMap is problamatic as it stands. Since it is going to
contain file paths, it needs a custom comparator to ensure that
compares on the paths are case sensitive on Unix systems and case
insensitive on Windows systems.


44. The method "getProjectSignature" need the Javadoc updated with a
@throws spec. It should also use symbolic names for attribute
keys. There is a problem with the computation of the project
signature. It uses the Java algorithm to hash a string. The problem
with this is that hash production algorithms are only required to
provide a reasonable expectation that a collision will not occur. In
other places you use the MD5 hash which has much better behaviour in
this respect. For strong one-way hash algorithms, the probability that
two input streams will generate the same hash is roughly equivalent to
1 in 2^number of hash bits. So an MD5 hash will give you odds of 1 in
2^64. Probably the best hash algorithm currently available is the
SHA-1 hash. It uses a 20 byte hash and so yields a 1 in 2^160
probability of generating a collision. The donside of course is that
it requires 20 bytes of storage as opposed to 8 bytes for your current
hashes.

If you are interested in using this algorithm, there is an excellent
open source implementation in the Bouncy Castle cryptographic library.

This method should also use symbolic names for attribute keys.


45. The method "setProjectBranch" has an editing errorcheckError" in
the Javadoc.

46. The method "setProperty" needs the Javadoc updated for a changed
calling sequence.  Also would it not be prudent to add a second to the
"propertyModificationTime" to ensure that it is always different to
the oroginal time.

47. In the method "update" the discussion on hashes in point 26 for
this class is relevant. Note that the code for signature computation
can be significantly simplified if you use SHA_1 which is guaranteed
to produce a non-zero hash.

Also the discussion in the Files class point 37 on the need for text
file normalization is relevant.

48. in method "write(OutputStream)" there is a comment about
PrintWriter error handling. Although PrintWriter does not throw
exceptions it does provide a "checkError" method that allows errors to
be detected and if so then an exception can be thrown.


org.eclipse.stellation.workspace.Svc


49. Also there is a "TODO" comment on one statement in this method. Is
this still relevant and if so what needs to be done?

50. The method "main" breaks conventional usage in the Javadoc
entry. The method does not return a value, instead it calls a system
function to set the exit code. This should be documented but in the
description, not in a @returns tag.


51. The Javadoc description is misleading for method
"printException". The method operates on an exception period. It knows
nothing about throw or catch operations in other methods and in fact
may be called before an exception is first thrown in some
circumstances.

52. Also there seems to be a bit of fragility in the use of the
"simple" option type. Currently the only option that has this type is
"help". The semantics of the type are not specified but the code uses
the type as a signal to issue the help message rather than directly
recognizing the option. If the "simple" type has real meaning than it
is possible that other options might be added in the future with this
type and the existing code would then break;


org.eclipse.stellation.workspace.Workspace

53. In method "getProjectDocumentFile" the XML file suffix should be
symbolic.


-- 
Dave Shields, IBM Research, shields@xxxxxxxxxxxxxx. 


Back to the top