[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re[2]: [platform-vcm-dev] Question to provider writers: Text/Binary default
|
Hi,
Why not use mime-types instead of text/binary ? Wouldn't that help ?
Then a provider can behave intelligently, and choose what to do based
on the type - it can also use wildcards, such as text/* to figure out
what type a file is.
That would probably be best if a mime-type could be associated with
every file, not just the ones under version control. Would that fix
it?
At the very least, please add a lot of extra default text file
extensions to the current list - files such as .xslt, .jsp, .js and
whatever else most people come across could safely be in the default
list.
Saturday, May 18, 2002, 6:23:55 AM, you wrote:
CG> I believe that discarding useful information (i.e. the difference
CG> between "unknown" and "text"/"bin") would be a mistake.
CG> I would suggest continuing to return tri-state (text, bin, unknown),
CG> and documenting that an "unknown" type MUST be treated conservatively
CG> (i.e. not be assumed to be of a specific format such as "text")
CG> unless the provider has other information about that file (e.g. by
CG> reading the content of the file and using that to infer its type,
CG> such as by finding a "magic" string in the beginning of the file).
CG> So that means that the behavior of the current CVS plugin (i.e.
CG> interpreting "unknown" as "text") is a bug, and should be fixed.
CG> Even if you had a convention that "all binary files types should be
CG> declared", inevitably, some will not be, and the information in those
CG> files would be damaged by the "unknown is text" assumption.
CG> Cheers,
CG> Geoff
CG> -----Original Message-----
CG> From: Kevin_McGuire@xxxxxxx [mailto:Kevin_McGuire@xxxxxxx]
CG> Sent: Friday, May 17, 2002 7:54 PM
CG> To: platform-vcm-dev@xxxxxxxxxxx
CG> Subject: RE: [platform-vcm-dev] Question to provider writers:
CG> Text/Binary default
>>>Please note, I am not a provider but an end user.
CG> We like end users too :)
>>>I would rather insure that my
>>>data is correct than worry about whether stripped eol characters cause my
>>>compare to show all lines as changed. Recovering from the first may be
>>>impossible. Recovering from the second wouldn't.
CG> Yup, this was our reasoning for going with binary, even though CVS command
CG> line clients typically assume text.
>>>I would think that an argument could be made to add a third return type
CG> of
>>>unknown to the getType() method.
CG> Actually, this is exactly what Team.getType() does! The return value is
CG> tri-state - constants for text, bin, unknown.
CG> However, as outlined, there are issues with different providers
CG> interpreting this differently because a plugin writer doesn't know what to
CG> rely on, or may rely on the wrong thing.
CG> Thanks for the feedback,
CG> Kevin
CG> "Wegener, Dave"
CG> <Wegener@xxxxxxxx> To:
CG> platform-vcm-dev@xxxxxxxxxxx
CG> Sent by: cc:
CG> platform-vcm-dev-admin@ Subject: RE:
CG> [platform-vcm-dev] Question to provider
CG> eclipse.org writers: Text/Binary
CG> default
CG> 05/17/2002 07:44 PM
CG> Please respond to
CG> platform-vcm-dev
CG> Please note, I am not a provider but an end user.
CG> The text/binary choice is indeed a hard dilemma. One of the driving forces
CG> behind eclipse was the ability to produce an environment where every aspect
CG> of software development could be combined into a single platform. This
CG> includes things like writing documentation. Many of our documents are
CG> written in Word. Word documents are obviously binary data and generally
CG> they have a .doc extension. However, there is no absolute requirement for
CG> such an extension. We save our documentation in our vcm package
CG> (clearcase.) Other development tools may also have binary formatted data
CG> that needs to be stored in the vcm.
CG> If a choice needs to be made between text vs. binary for unknown types, I
CG> would vote for binary simply because incorrectly identifying a binary file
CG> as text can lead to destruction of data. I would rather insure that my
CG> data is correct than worry about whether stripped eol characters cause my
CG> compare to show all lines as changed. Recovering from the first may be
CG> impossible. Recovering from the second wouldn't.
CG> I would think that an argument could be made to add a third return type of
CG> unknown to the getType() method. This would allow the providers to use
CG> their own built-in defaults. This could still lead to misidentifying
CG> files.
CG> However, if the choice is made by the provider, it would be consistent both
CG> inside and outside eclipse.
CG> Dave Wegener
CG> -----Original Message-----
CG> From: platform-vcm-dev-admin@xxxxxxxxxxx
CG> [mailto:platform-vcm-dev-admin@xxxxxxxxxxx]On Behalf Of
CG> Kevin_McGuire@xxxxxxx
CG> Sent: Friday, May 17, 2002 5:41 PM
CG> To: platform-vcm-dev@xxxxxxxxxxx
CG> Subject: [platform-vcm-dev] Question to provider writers: Text/Binary
CG> default
CG> Dear repository providers,
CG> As you know, Team supports API available to all providers which will tell
CG> you if a file is believed to be text or binary. This determination is
CG> based on a table of file types, some of which we contribute, and the rest
CG> which other plugins would contribute.
CG> At present, Team is agnostic wrt. whether files of unknown type should be
CG> considered text or binary. We believed it was incorrect of us to assume
CG> for a provider how this should be handled. We are question this assumption
CG> though.
CG> For CVS we've assumed binary because we're concered about errant EOL
CG> conversion on gif's, etc. This is a very bad failure because it results in
CG> corrupting of data and potentially lost work/data.
CG> The counter argument is that for the most part people only version control
CG> text files. Furthermore, our support for marking files as derived and not
CG> version controlling them means we catch a lot of the binary cases.
CG> Generally speaking the remaining set of known binary file types to be
CG> version control is relatively small and we could probably reliably list
CG> most as defaults in Team. By contrast, its much harder to come up with a
CG> list of known text file types.
CG> Problem 1:
CG> For CVS users this is tedious and they must ensure they've updated the list
CG> of text files, otherwise they don't get EOL conversion. Presumably will be
CG> true for other providers too.
CG> Problem 2:
CG> The problem becomes more interesting with code that reads/writes files.
CG> Because we (CVS) don't convert EOL on unknown file types (assumed binary),
CG> files generated using the platform encoding will show up in compare as
CG> having every line in conflict, unless the person thinks to turn on ignoring
CG> whitespce.
CG> Problem 3:
CG> When someone intoduces a new file type and writes code that generates
CG> content, they *must* always add that new file type to the Team global list.
CG> They can't assume that it will either be interpreted as text or binary.
CG> Worse, they may make assumptions about the default based on how their
CG> provider interprets unkown file types, which could be different when used
CG> against a different provider.
CG> Thus the list of text/binary files must be complete. It is unreasonable to
CG> expect plugin writers to be such good Team citizens. If the default was
CG> known, and was text, then its more believable that someone generating
CG> binary files that aren't derived would think to add them to the Team type
CG> list, although the failure case is still there.
CG> Problem 4:
CG> Our (Team's) current default list only has the text files, and this is
CG> wrong since Team is agnostic for unknown types. That is, we made the exact
CG> error described in problem #3.
CG> My question to you:
CG> Q1: Should Team.getType(IFile) return a hardcoded "text" or "binary" for
CG> unknown files?
CG> Q2: If yes, should it be "text"?
CG> I believe #1 should be "yes". I think #2 should be yes (text).
CG> This discussion is occuring much later in the cycle than we would like, but
CG> we've only recently fully understood the problem. If we make any changes
CG> we need to do them next week.
CG> Thanks for your time,
CG> The Team team
CG> _______________________________________________
CG> platform-vcm-dev mailing list
CG> platform-vcm-dev@xxxxxxxxxxx
CG> http://dev.eclipse.org/mailman/listinfo/platform-vcm-dev
CG> _______________________________________________
CG> platform-vcm-dev mailing list
CG> platform-vcm-dev@xxxxxxxxxxx
CG> http://dev.eclipse.org/mailman/listinfo/platform-vcm-dev
CG> _______________________________________________
CG> platform-vcm-dev mailing list
CG> platform-vcm-dev@xxxxxxxxxxx
CG> http://dev.eclipse.org/mailman/listinfo/platform-vcm-dev
CG> _______________________________________________
CG> platform-vcm-dev mailing list
CG> platform-vcm-dev@xxxxxxxxxxx
CG> http://dev.eclipse.org/mailman/listinfo/platform-vcm-dev
--
Best regards,
Kim Rasmussen mailto:kim@xxxxxxxxxx