Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [rdf4j-dev] thoughts on RDFa ?

Hi Jerven,

Thanks, looks like a good test case indeed.
I'll give it a try and let you know when the code becomes somewhat stable and passes the basic tests.

Best regards

Bart

-----Original Message-----
From: Jerven Tjalling Bolleman <Jerven.Bolleman@sib.swiss> 
Sent: dinsdag 7 mei 2019 21:42
To: rdf4j developer discussions <rdf4j-dev@xxxxxxxxxxx>
Cc: Bart Hanssens (BOSA) <bart.hanssens@xxxxxxxxxxxx>
Subject: Re: [rdf4j-dev] thoughts on RDFa ?

Hi Bart,

I can test it against www.uniprot.org. It has quite a bit of RDFa (written by me) on the entry pages. Pages are not quite valid but not to bad in terms of deviance of spec.

I like RDFa these days and probably prefer it over JSON-LD for the schema.org markup that I need to do in the day job.

Regards,
Jerven

On 2019-05-07 20:06, Bart Hanssens (BOSA) wrote:
> FWIW, I have some initial code so I can start testing it against the 
> RDFa testsuite.
> 
> I’ve used JSoup since it is a well-maintained library with a nice API
> 
> The only really annoying part seems to be the lack of line/column 
> indication when an error occurs.
> 
> (I guess I could first use Jsoup to create well-formed X(HT)ML, and 
> then use SAX to iterate over the result,
> 
> but it seems to be a bit of an overkill to include a new dependency 
> just to only do tag balancing…)
> 
> Technically, attoparser fits the bill (smaller, line/column 
> indication), but there seems to be only 1 maintainer and one other 
> contributor.
> 
> Which does not say anything about the quality of the project of course
>  😊
> 
> Best regards
> 
> Bart
> 
> FROM: Bart Hanssens (BOSA)
> SENT: dinsdag 30 april 2019 9:57
> TO: rdf4j developer discussions <rdf4j-dev@xxxxxxxxxxx>
> SUBJECT: RE: [rdf4j-dev] thoughts on RDFa ?
> 
> Hi Håvard,
> 
> Well, I’m mainly looking into RDFa because of the (somewhat basic) 
> support for RDFa in Drupal CMS.
> 
> We’re running quite a few Drupal-websites, so this could come in 
> handy…
> 
> But “perfect syntax” and “website” is a rare combo, so I’ll use Jsoup 
> or attoparser 😊
> 
> Best regards
> 
> Bart
> 
> FROM: rdf4j-dev-bounces@xxxxxxxxxxx <rdf4j-dev-bounces@xxxxxxxxxxx> ON 
> BEHALF OF Håvard Ottestad
> SENT: zaterdag 20 april 2019 12:26
> TO: rdf4j developer discussions <rdf4j-dev@xxxxxxxxxxx>
> SUBJECT: Re: [rdf4j-dev] thoughts on RDFa ?
> 
> Hi Bart,
> 
> I have not used RDFa for anything. I do know that the metadata in 
> images is rdf, and also that google is pushing for more jsonld on 
> webpages.
> 
> My experience with both SAX and jsoup are good. I usually use jsoup 
> when I need to crawl webpages, and for this it is the best library I 
> have used. Very robust and simple to use.
> 
> SAX I use in my XmlToRdf converter for performance. I can convert 100 
> mb of XML to turtle with only 20 mb of ram in less than 2 seconds on 
> my laptop. It even works all the way down to 3 mb of ram, but then the 
> parsing time jumps to around 10 seconds because of GC.
> 
> I would recommend SAX for pure XML, perfect syntax, usecases. JAXB for 
> when you want java objects, and jsoup for everything else.
> 
> Håvard
> 
> On 18 Apr 2019, at 18:46, Bart Hanssens (BOSA) 
> <bart.hanssens@xxxxxxxxxxxx> wrote:
> 
>> Hi,
>> 
>> for scraping purposes, I'm looking into RDFa/RDFa-Lite and I'm 
>> thinking about writing a RIO parser (see also issue #512).
>> 
>> IIRC James did some experimental work on RDFa as well, but I think it 
>> was based on SAX,
>> 
>> so probably assuming that the source would be perfectly formatted 
>> XHTML... which is rarely the case
>> 
>> So currently I'm looking at using either attoparser (smaller,
>> event-driven) or jsoup (more frequently updated, DOM-interface),
>> 
>> and there is a wonderful test suite available at 
>> http://rdfa.info/test-suite/
>> 
>> So I was wondering
>> 
>> - are there other HTML parser I'd should look into (Jodd Lagarto ?
>> NekoHTML ?)
>> 
>> - where should the testsuite go (if it gets CQ approval): I remember 
>> some emails about moving the rdf4j-testsuite back into the main repo, 
>> but I'm not sure what the conclusion was
>> 
>> Thanks
>> 
>> Bart
> 
>> _______________________________________________
>> rdf4j-dev mailing list
>> rdf4j-dev@xxxxxxxxxxx
>> To change your delivery options, retrieve your password, or 
>> unsubscribe from this list, visit 
>> https://www.eclipse.org/mailman/listinfo/rdf4j-dev
> _______________________________________________
> rdf4j-dev mailing list
> rdf4j-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or 
> unsubscribe from this list, visit 
> https://www.eclipse.org/mailman/listinfo/rdf4j-dev

--
Jerven Tjalling Bolleman
SIB | Swiss Institute of Bioinformatics
CMU - 1, rue Michel Servet - 1211 Geneva 4
t: +41 22 379 58 85 - f: +41 22 379 58 58 Jerven.Bolleman@sib.swiss - http://www.sib.swiss


Back to the top