Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[rdf4j-dev] thoughts on RDFa ?

Hi,


for scraping purposes, I'm looking into RDFa/RDFa-Lite and I'm thinking about writing a RIO parser (see also issue #512).


IIRC James did some experimental work on RDFa as well, but I think it was based on SAX,

so probably assuming that the source would be perfectly formatted XHTML... which is rarely the case


So currently I'm looking at using either attoparser (smaller, event-driven) or jsoup (more frequently updated, DOM-interface),

and there is a wonderful test suite available at http://rdfa.info/test-suite/


So I was wondering

- are there other HTML parser I'd should look into (Jodd Lagarto ? NekoHTML ?)

- where should the testsuite go (if it gets CQ approval): I remember some emails about moving the rdf4j-testsuite back into the main repo, but I'm not sure what the conclusion was



Thanks


Bart



Back to the top