Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re[2]: [platform-vcm-dev] Question to provider writers: Text/Binary default

Hi,

Why not use mime-types instead of text/binary ? Wouldn't that help ?
Then a provider can behave intelligently, and choose what to do based
on the type - it can also use wildcards, such as text/* to figure out
what type a file is.
That would probably be best if a mime-type could be associated with
every file, not just the ones under version control. Would that fix
it?

At the very least, please add a lot of extra default text file
extensions to the current list - files such as .xslt, .jsp, .js and
whatever else most people come across could safely be in the default
list.

Saturday, May 18, 2002, 6:23:55 AM, you wrote:

CG> I believe that discarding useful information (i.e. the difference
CG> between "unknown" and "text"/"bin") would be a mistake.

CG> I would suggest continuing to return tri-state (text, bin, unknown),
CG> and documenting that an "unknown" type MUST be treated conservatively
CG> (i.e. not be assumed to be of a specific format such as "text")
CG> unless the provider has other information about that file (e.g. by
CG> reading the content of the file and using that to infer its type,
CG> such as by finding a "magic" string in the beginning of the file).

CG> So that means that the behavior of the current CVS plugin (i.e.
CG> interpreting "unknown" as "text") is a bug, and should be fixed.
CG> Even if you had a convention that "all binary files types should be
CG> declared", inevitably, some will not be, and the information in those
CG> files would be damaged by the "unknown is text" assumption.

CG> Cheers,
CG> Geoff



CG> -----Original Message-----
CG> From: Kevin_McGuire@xxxxxxx [mailto:Kevin_McGuire@xxxxxxx]
CG> Sent: Friday, May 17, 2002 7:54 PM
CG> To: platform-vcm-dev@xxxxxxxxxxx
CG> Subject: RE: [platform-vcm-dev] Question to provider writers:
CG> Text/Binary default



>>>Please note, I am not a provider but an end user.

CG> We like end users too :)

>>>I would rather insure that my
>>>data is correct than worry about whether stripped eol characters cause my
>>>compare to show all lines as changed.  Recovering from the first may be
>>>impossible.  Recovering from the second wouldn't.

CG> Yup, this was our reasoning for going with binary, even though CVS command
CG> line clients typically assume text.

>>>I would think that an argument could be made to add a third return type
CG> of
>>>unknown to the getType() method.

CG> Actually, this is exactly what Team.getType() does!  The return value is
CG> tri-state - constants for text, bin, unknown.
CG> However, as outlined, there are issues with different providers
CG> interpreting this differently because a plugin writer doesn't know what to
CG> rely on, or may rely on the wrong thing.

CG> Thanks for the feedback,
CG> Kevin




 

CG>                       "Wegener, Dave"

CG>                       <Wegener@xxxxxxxx>              To:
CG> platform-vcm-dev@xxxxxxxxxxx                  
CG>                       Sent by:                        cc:

CG>                       platform-vcm-dev-admin@         Subject: RE:
CG> [platform-vcm-dev] Question to provider   
CG>                       eclipse.org                     writers:  Text/Binary
CG> default                    
 

 

CG>                       05/17/2002 07:44 PM

CG>                       Please respond to

CG>                       platform-vcm-dev

 

 




CG> Please note, I am not a provider but an end user.

CG> The text/binary choice is indeed a hard dilemma.  One of the driving forces
CG> behind eclipse was the ability to produce an environment where every aspect
CG> of software development could be combined into a single platform.  This
CG> includes things like writing documentation.  Many of our documents are
CG> written in Word.  Word documents are obviously binary data and generally
CG> they have a .doc extension.  However, there is no absolute requirement for
CG> such an extension.  We save our documentation in our vcm package
CG> (clearcase.)  Other development tools may also have binary formatted data
CG> that needs to be stored in the vcm.

CG> If a choice needs to be made between text vs. binary for unknown types, I
CG> would vote for binary simply because incorrectly identifying a binary file
CG> as text can lead to destruction of data.   I would rather insure that my
CG> data is correct than worry about whether stripped eol characters cause my
CG> compare to show all lines as changed.  Recovering from the first may be
CG> impossible.  Recovering from the second wouldn't.

CG> I would think that an argument could be made to add a third return type of
CG> unknown to the getType() method.  This would allow the providers to use
CG> their own built-in defaults.  This could still lead to misidentifying
CG> files.
CG> However, if the choice is made by the provider, it would be consistent both
CG> inside and outside eclipse.

CG> Dave Wegener

CG> -----Original Message-----
CG> From: platform-vcm-dev-admin@xxxxxxxxxxx
CG> [mailto:platform-vcm-dev-admin@xxxxxxxxxxx]On Behalf Of
CG> Kevin_McGuire@xxxxxxx
CG> Sent: Friday, May 17, 2002 5:41 PM
CG> To: platform-vcm-dev@xxxxxxxxxxx
CG> Subject: [platform-vcm-dev] Question to provider writers: Text/Binary
CG> default


CG> Dear repository providers,

CG> As you know, Team supports API available to all providers which will tell
CG> you if a file is believed to be text or binary.  This determination is
CG> based on a table of file types, some of which we contribute, and the rest
CG> which other plugins would contribute.

CG> At present, Team is agnostic wrt. whether files of unknown type should be
CG> considered text or binary.  We believed it was incorrect of us to assume
CG> for a provider how this should be handled.  We are question this assumption
CG> though.

CG> For CVS we've assumed binary because we're concered about errant EOL
CG> conversion on gif's, etc. This is a very bad failure because it results in
CG> corrupting of data and potentially lost work/data.

CG> The counter argument is that for the most part people only version control
CG> text files.  Furthermore, our support for marking files as derived and not
CG> version controlling them means we catch a lot of the binary cases.
CG> Generally speaking the remaining set of known binary file types to be
CG> version control is relatively small and we could probably reliably list
CG> most as defaults in Team.  By contrast, its much harder to come up with a
CG> list of known text file types.


CG> Problem 1:

CG> For CVS users this is tedious and they must ensure they've updated the list
CG> of text files, otherwise they don't get EOL conversion.  Presumably will be
CG> true for other providers too.

CG> Problem 2:

CG> The problem becomes more interesting with code that reads/writes files.
CG> Because we (CVS) don't convert EOL on unknown file types (assumed binary),
CG> files generated using the platform encoding will show up in compare as
CG> having every line in conflict, unless the person thinks to turn on ignoring
CG> whitespce.

CG> Problem 3:

CG> When someone intoduces a new file type and writes code that generates
CG> content, they *must* always add that new file type to the Team global list.
CG> They can't assume that it will either be interpreted as text or binary.
CG> Worse, they may make assumptions about the default based on how their
CG> provider interprets unkown file types, which could be different when used
CG> against a different provider.

CG> Thus the list of text/binary files must be complete.  It is unreasonable to
CG> expect plugin writers to be such good Team citizens.  If the default was
CG> known, and was text, then its more believable that someone generating
CG> binary files that aren't derived would think to add them to the Team type
CG> list, although the failure case is still there.

CG> Problem 4:

CG> Our (Team's) current default list only has the text files, and this is
CG> wrong since Team is agnostic for unknown types.  That is, we made the exact
CG> error described in problem #3.

CG> My question to you:

CG> Q1: Should Team.getType(IFile) return a hardcoded "text" or "binary" for
CG> unknown files?

CG> Q2:  If yes, should it be "text"?


CG> I believe #1 should be "yes".  I think #2 should be yes (text).

CG> This discussion is occuring much later in the cycle than we would like, but
CG> we've only recently fully understood the problem.  If we make any changes
CG> we need to do them next week.

CG> Thanks for your time,
CG> The Team team

CG> _______________________________________________
CG> platform-vcm-dev mailing list
CG> platform-vcm-dev@xxxxxxxxxxx
CG> http://dev.eclipse.org/mailman/listinfo/platform-vcm-dev
CG> _______________________________________________
CG> platform-vcm-dev mailing list
CG> platform-vcm-dev@xxxxxxxxxxx
CG> http://dev.eclipse.org/mailman/listinfo/platform-vcm-dev




CG> _______________________________________________
CG> platform-vcm-dev mailing list
CG> platform-vcm-dev@xxxxxxxxxxx
CG> http://dev.eclipse.org/mailman/listinfo/platform-vcm-dev
CG> _______________________________________________
CG> platform-vcm-dev mailing list
CG> platform-vcm-dev@xxxxxxxxxxx
CG> http://dev.eclipse.org/mailman/listinfo/platform-vcm-dev



-- 
Best regards,
 Kim Rasmussen                           mailto:kim@xxxxxxxxxx




Back to the top