Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
RE: [platform-vcm-dev] Question to provider writers: Text/Binary default

Please note, I am not a provider but an end user.

The text/binary choice is indeed a hard dilemma.  One of the driving forces
behind eclipse was the ability to produce an environment where every aspect
of software development could be combined into a single platform.  This
includes things like writing documentation.  Many of our documents are
written in Word.  Word documents are obviously binary data and generally
they have a .doc extension.  However, there is no absolute requirement for
such an extension.  We save our documentation in our vcm package
(clearcase.)  Other development tools may also have binary formatted data
that needs to be stored in the vcm.

If a choice needs to be made between text vs. binary for unknown types, I
would vote for binary simply because incorrectly identifying a binary file
as text can lead to destruction of data.   I would rather insure that my
data is correct than worry about whether stripped eol characters cause my
compare to show all lines as changed.  Recovering from the first may be
impossible.  Recovering from the second wouldn't.

I would think that an argument could be made to add a third return type of
unknown to the getType() method.  This would allow the providers to use
their own built-in defaults.  This could still lead to misidentifying files.
However, if the choice is made by the provider, it would be consistent both
inside and outside eclipse.

Dave Wegener

-----Original Message-----
From: platform-vcm-dev-admin@xxxxxxxxxxx
[mailto:platform-vcm-dev-admin@xxxxxxxxxxx]On Behalf Of
Kevin_McGuire@xxxxxxx
Sent: Friday, May 17, 2002 5:41 PM
To: platform-vcm-dev@xxxxxxxxxxx
Subject: [platform-vcm-dev] Question to provider writers: Text/Binary
default


Dear repository providers,

As you know, Team supports API available to all providers which will tell
you if a file is believed to be text or binary.  This determination is
based on a table of file types, some of which we contribute, and the rest
which other plugins would contribute.

At present, Team is agnostic wrt. whether files of unknown type should be
considered text or binary.  We believed it was incorrect of us to assume
for a provider how this should be handled.  We are question this assumption
though.

For CVS we've assumed binary because we're concered about errant EOL
conversion on gif's, etc. This is a very bad failure because it results in
corrupting of data and potentially lost work/data.

The counter argument is that for the most part people only version control
text files.  Furthermore, our support for marking files as derived and not
version controlling them means we catch a lot of the binary cases.
Generally speaking the remaining set of known binary file types to be
version control is relatively small and we could probably reliably list
most as defaults in Team.  By contrast, its much harder to come up with a
list of known text file types.


Problem 1:

For CVS users this is tedious and they must ensure they've updated the list
of text files, otherwise they don't get EOL conversion.  Presumably will be
true for other providers too.

Problem 2:

The problem becomes more interesting with code that reads/writes files.
Because we (CVS) don't convert EOL on unknown file types (assumed binary),
files generated using the platform encoding will show up in compare as
having every line in conflict, unless the person thinks to turn on ignoring
whitespce.

Problem 3:

When someone intoduces a new file type and writes code that generates
content, they *must* always add that new file type to the Team global list.
They can't assume that it will either be interpreted as text or binary.
Worse, they may make assumptions about the default based on how their
provider interprets unkown file types, which could be different when used
against a different provider.

Thus the list of text/binary files must be complete.  It is unreasonable to
expect plugin writers to be such good Team citizens.  If the default was
known, and was text, then its more believable that someone generating
binary files that aren't derived would think to add them to the Team type
list, although the failure case is still there.

Problem 4:

Our (Team's) current default list only has the text files, and this is
wrong since Team is agnostic for unknown types.  That is, we made the exact
error described in problem #3.

My question to you:

Q1: Should Team.getType(IFile) return a hardcoded "text" or "binary" for
unknown files?

Q2:  If yes, should it be "text"?


I believe #1 should be "yes".  I think #2 should be yes (text).

This discussion is occuring much later in the cycle than we would like, but
we've only recently fully understood the problem.  If we make any changes
we need to do them next week.

Thanks for your time,
The Team team

_______________________________________________
platform-vcm-dev mailing list
platform-vcm-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-vcm-dev


Back to the top