Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [rdf4j-dev] thoughts on RDFa ?

Hi Bart,

I can test it against www.uniprot.org. It has quite a bit of RDFa (written by me) on the entry pages. Pages are not quite valid but not to bad in terms of deviance of spec.

I like RDFa these days and probably prefer it over JSON-LD for the schema.org markup that I need to do in the day job.

Regards,
Jerven

On 2019-05-07 20:06, Bart Hanssens (BOSA) wrote:
FWIW, I have some initial code so I can start testing it against the
RDFa testsuite.

I’ve used JSoup since it is a well-maintained library with a nice
API

The only really annoying part seems to be the lack of line/column
indication when an error occurs.

(I guess I could first use Jsoup to create well-formed X(HT)ML, and
then use SAX to iterate over the result,

but it seems to be a bit of an overkill to include a new dependency
just to only do tag balancing…)

Technically, attoparser fits the bill (smaller, line/column
indication), but there seems to be only 1 maintainer and one other
contributor.

Which does not say anything about the quality of the project of course
 😊

Best regards

Bart

FROM: Bart Hanssens (BOSA)
SENT: dinsdag 30 april 2019 9:57
TO: rdf4j developer discussions <rdf4j-dev@xxxxxxxxxxx>
SUBJECT: RE: [rdf4j-dev] thoughts on RDFa ?

Hi Håvard,

Well, I’m mainly looking into RDFa because of the (somewhat basic)
support for RDFa in Drupal CMS.

We’re running quite a few Drupal-websites, so this could come in
handy…

But “perfect syntax” and “website” is a rare combo, so I’ll
use Jsoup or attoparser 😊

Best regards

Bart

FROM: rdf4j-dev-bounces@xxxxxxxxxxx <rdf4j-dev-bounces@xxxxxxxxxxx> ON
BEHALF OF Håvard Ottestad
SENT: zaterdag 20 april 2019 12:26
TO: rdf4j developer discussions <rdf4j-dev@xxxxxxxxxxx>
SUBJECT: Re: [rdf4j-dev] thoughts on RDFa ?

Hi Bart,

I have not used RDFa for anything. I do know that the metadata in
images is rdf, and also that google is pushing for more jsonld on
webpages.

My experience with both SAX and jsoup are good. I usually use jsoup
when I need to crawl webpages, and for this it is the best library I
have used. Very robust and simple to use.

SAX I use in my XmlToRdf converter for performance. I can convert 100
mb of XML to turtle with only 20 mb of ram in less than 2 seconds on
my laptop. It even works all the way down to 3 mb of ram, but then the
parsing time jumps to around 10 seconds because of GC.

I would recommend SAX for pure XML, perfect syntax, usecases. JAXB for
when you want java objects, and jsoup for everything else.

Håvard

On 18 Apr 2019, at 18:46, Bart Hanssens (BOSA)
<bart.hanssens@xxxxxxxxxxxx> wrote:

Hi,

for scraping purposes, I'm looking into RDFa/RDFa-Lite and I'm
thinking about writing a RIO parser (see also issue #512).

IIRC James did some experimental work on RDFa as well, but I think
it was based on SAX,

so probably assuming that the source would be perfectly formatted
XHTML... which is rarely the case

So currently I'm looking at using either attoparser (smaller,
event-driven) or jsoup (more frequently updated, DOM-interface),

and there is a wonderful test suite available at
http://rdfa.info/test-suite/

So I was wondering

- are there other HTML parser I'd should look into (Jodd Lagarto ?
NekoHTML ?)

- where should the testsuite go (if it gets CQ approval): I remember
some emails about moving the rdf4j-testsuite back into the main
repo, but I'm not sure what the conclusion was

Thanks

Bart

_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/rdf4j-dev
_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or
unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/rdf4j-dev

--
Jerven Tjalling Bolleman
SIB | Swiss Institute of Bioinformatics
CMU - 1, rue Michel Servet - 1211 Geneva 4
t: +41 22 379 58 85 - f: +41 22 379 58 58
Jerven.Bolleman@sib.swiss - http://www.sib.swiss



Back to the top