MediaWiki stores the entire contents of the file on each save. This
seems a bit excessive to me, but they must have a good reason for doing
this. This means getting the latest version is always easy. When you
want to see the changes between two revisions, MediaWiki simply runs
their custom diff engine on the two complete revisions. If the file is
not a Wiki markup file, it seems to use the shell for diff.
Since there is no turnkey solution for doing this, here are a couple of
solutions I can think of for Babel.
1. We use an external VCS as Antoine suggested. I think this is an
entirely valid approach, although perhaps a bit more complex.
2. We do like MediaWiki -- save the entire file on each save. If it
works for them, it will work for us. Except:
- when a user wants to diff two versions, we could use one of the diff
libraries Gabe pointed out. No need to go to shell or write our own.
- older revisions could pass through the gzip lib and be stored in
compressed format. If a user wants to diff to an older version, we
unzip it first, then pass it through the diff engine.
- one design consideration here would be to use many tables -- perhaps
one translation table per language. Or, one table to contain the
'latest' of everything and one table per language for the gzipped older
On 03/19/2010 10:55 AM, Antoine Toulme wrote:
A wiki like approach sounds good. Thanks for the
I'll agree that when dealing with entire files, a
would be ideal. However, interfacing with one from PHP may add much
(unneeded) complexity. Also, our simple LAMP application would then
require the usage of an external VCS.
I fail to see how the usage of a PHP diff library is "reinventing" any
more than using an external VCS. We don't necessarily want to control
versions here -- we just want to cut on the storage size by saving
I may be missing something, though, or I may be misunderstanding how
you would implement/interface with a VCS from PHP. As for the editors,
let's keep that as a separate subject. The problem we want to solve
here is storage.
The two solutions Gabe has enumerated look OK to me, but I'd have to
try them out to be sure. One thing that concerns me, however, is the
need to re-play the entire transaction set if we want to see the latest
file. This is perhaps where Antoine's VCS idea comes in to play, since
this is what it is designed to do.
But I'll say this again: we should be looking at how MediaWiki is
doing this, since we are doing the exact same thing. I will enlist
the help of my co-webmaster Matt, since he knows MediaWiki code quite
On 03/16/2010 08:23 PM, Antoine Toulme wrote:
wouldn't you be better off with a VCS for that ?
I would recommend you also look at Bespin for an online
that could help.
An other alternative could be to use an editor like the one
github uses so that people can make small changes online.
I'm afraid otherwise you reinvent quite a few things to
the same result.
Now that's my 0.02 c, feel free to proceed with your plan,
On our last status call Kit asked me to look into possible solutions
to generating diffs for translations of documentation. The idea is we
could save the differences between translations as people work on a
file and only save one working copy of the most recent translated
document. The diffs would serve as an audit trail and allow for
recreating the translated document if needed.
I found two possible solution for creating diffs in PHP. It would be
great to get some feedback from Denis on which one would be better from
a server standpoint.