RE: [platform-vcm-dev] Question to provider writers: Text/Binary

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

RE: [platform-vcm-dev] Question to provider writers: Text/Binary

From: "Mark C. Chu-Carroll" <mcc@xxxxxxxxxxxxxx>
Date: Sat, 18 May 2002 15:53:38 -0400 (EDT)
Delivered-to: platform-vcm-dev@xxxxxxxxxxx
Importance: Normal
List-archive: <http://dev.eclipse.org/pipermail/platform-vcm-dev/>
List-help: <mailto:platform-vcm-dev-request@eclipse.org?subject=help>
List-subscribe: <http://dev.eclipse.org/mailman/listinfo/platform-vcm-dev>, <mailto:platform-vcm-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <http://dev.eclipse.org/mailman/listinfo/platform-vcm-dev>, <mailto:platform-vcm-dev-request@eclipse.org?subject=unsubscribe>

> Please note, I am not a provider but an end user.
>
> The text/binary choice is indeed a hard dilemma.  One of the driving
> forces behind eclipse was the ability to produce an environment where
> every aspect of software development could be combined into a single
> platform.  This includes things like writing documentation.  Many of
> our documents are written in Word.  Word documents are obviously binary
> data and generally they have a .doc extension.  However, there is no
> absolute requirement for such an extension.  We save our documentation
> in our vcm package
> (clearcase.)  Other development tools may also have binary formatted
> data that needs to be stored in the vcm.
>
> If a choice needs to be made between text vs. binary for unknown types,
> I would vote for binary simply because incorrectly identifying a binary
> file as text can lead to destruction of data.   I would rather insure
> that my data is correct than worry about whether stripped eol
> characters cause my compare to show all lines as changed.  Recovering
> from the first may be impossible.  Recovering from the second wouldn't.
>
> I would think that an argument could be made to add a third return type
> of unknown to the getType() method.  This would allow the providers to
> use their own built-in defaults.  This could still lead to
> misidentifying files. However, if the choice is made by the provider,
> it would be consistent both inside and outside eclipse.

This last would be my preference.

For our system, Stellation, we have a mechanism for identifying file types
by naming heuristics, and when those fail, we look at the file itself to
try to determine if we can tell if it's text or not. It ends up being
a pretty effective system, which usually gets the type right.

When Eclipse knows the type of something, I'd like us to be able to
take advantage of that. So I'd like to know when Eclipse really believes
that it *knows* that this file is text or binary. And when Eclipse
isn't sure, I'd like to fall back to our heuristics. To be able to
do that, we need a way for Eclipse to say that it doesn't really
know.

       -Mark





>
> Dave Wegener
>
> -----Original Message-----
> From: platform-vcm-dev-admin@xxxxxxxxxxx
> [mailto:platform-vcm-dev-admin@xxxxxxxxxxx]On Behalf Of
> Kevin_McGuire@xxxxxxx
> Sent: Friday, May 17, 2002 5:41 PM
> To: platform-vcm-dev@xxxxxxxxxxx
> Subject: [platform-vcm-dev] Question to provider writers: Text/Binary
> default
>
>
> Dear repository providers,
>
> As you know, Team supports API available to all providers which will
> tell you if a file is believed to be text or binary.  This
> determination is based on a table of file types, some of which we
> contribute, and the rest which other plugins would contribute.
>
> At present, Team is agnostic wrt. whether files of unknown type should
> be considered text or binary.  We believed it was incorrect of us to
> assume for a provider how this should be handled.  We are question this
> assumption though.
>
> For CVS we've assumed binary because we're concered about errant EOL
> conversion on gif's, etc. This is a very bad failure because it results
> in corrupting of data and potentially lost work/data.
>
> The counter argument is that for the most part people only version
> control text files.  Furthermore, our support for marking files as
> derived and not version controlling them means we catch a lot of the
> binary cases. Generally speaking the remaining set of known binary file
> types to be version control is relatively small and we could probably
> reliably list most as defaults in Team.  By contrast, its much harder
> to come up with a list of known text file types.
>
>
> Problem 1:
>
> For CVS users this is tedious and they must ensure they've updated the
> list of text files, otherwise they don't get EOL conversion.
> Presumably will be true for other providers too.
>
> Problem 2:
>
> The problem becomes more interesting with code that reads/writes files.
> Because we (CVS) don't convert EOL on unknown file types (assumed
> binary), files generated using the platform encoding will show up in
> compare as having every line in conflict, unless the person thinks to
> turn on ignoring whitespce.
>
> Problem 3:
>
> When someone intoduces a new file type and writes code that generates
> content, they *must* always add that new file type to the Team global
> list. They can't assume that it will either be interpreted as text or
> binary. Worse, they may make assumptions about the default based on how
> their provider interprets unkown file types, which could be different
> when used against a different provider.
>
> Thus the list of text/binary files must be complete.  It is
> unreasonable to expect plugin writers to be such good Team citizens.
> If the default was known, and was text, then its more believable that
> someone generating binary files that aren't derived would think to add
> them to the Team type list, although the failure case is still there.
>
> Problem 4:
>
> Our (Team's) current default list only has the text files, and this is
> wrong since Team is agnostic for unknown types.  That is, we made the
> exact error described in problem #3.
>
> My question to you:
>
> Q1: Should Team.getType(IFile) return a hardcoded "text" or "binary"
> for unknown files?
>
> Q2:  If yes, should it be "text"?
>
>
> I believe #1 should be "yes".  I think #2 should be yes (text).
>
> This discussion is occuring much later in the cycle than we would like,
> but we've only recently fully understood the problem.  If we make any
> changes we need to do them next week.
>
> Thanks for your time,
> The Team team
>
> _______________________________________________
> platform-vcm-dev mailing list
> platform-vcm-dev@xxxxxxxxxxx
> http://dev.eclipse.org/mailman/listinfo/platform-vcm-dev
> _______________________________________________
> platform-vcm-dev mailing list
> platform-vcm-dev@xxxxxxxxxxx
> http://dev.eclipse.org/mailman/listinfo/platform-vcm-dev


-- 
*** Mark Craig Chu-Carroll,  <mcc@xxxxxxxxxxxxxx>
*** IBM T.J. Watson Research Center
*** The Stellation project:
http://domino.research.ibm.com/synedra/synedra.nsf

References:
- RE: [platform-vcm-dev] Question to provider writers: Text/Binary default
  - From: Wegener, Dave

Prev by Date: RE: [platform-vcm-dev] Question to provider writers: Text/Binary default
Next by Date: RE: [platform-vcm-dev] Question to provider writers: Text/Binary
Previous by thread: RE: [platform-vcm-dev] Question to provider writers: Text/Binary default
Next by thread: RE: [platform-vcm-dev] Question to provider writers: Text/Binary default
Index(es):
- Date
- Thread

Breadcrumbs