Skip to content

awwright/node-rdfa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ECMAScript RDFa Parser

A fairly modular, cross-platform, extensible package to extract RDF information from RDFa-enabled documents.

Features:

  • Sometimes extracts RDF statements
  • Passes all compliant XML tests
  • Walks DOM tree to generate an RDF graph
  • Extensible design for parsing supersets of RDFa and supporting new media types

Goals:

  • Consume DOM nodes or SAX events
  • Output a complete graph or "Statement" events
  • Support custom types of nodes (e.g. variables)
  • Query for DOM nodes by data they contain
  • Linting of RDFa documents for:
    • check property order
    • reverse/forward incompatability
    • shadowed definitions
    • defining over registered URI schemes
    • non-normalized IRI references
    • Invalid IRIs
    • Invalid CURIEs
    • Term is used without an active vocabulary

Features

Parse a DOM tree into an RDF graph

const rdfa = require('rdfa');
const fs = require('fs');
const DOMParser = require('xmldom').DOMParser;

const filepath = './test/rdfa.github.io/test-suite/test-cases/rdfa1.1-lite/xhtml1/0021.xhtml';
const document = new DOMParser().parseFromString(fs.readFileSync(filepath, 'UTF-8'), 'text/xml');
const result = rdfa.parseDOM(rdfa.RDFaXMLParser, 'http://example.com/', document);
result.outputGraph.forEach(function(n){ console.log(n.toTurtle()); });
<http://example.com/> <http://purl.org/dc/elements/1.1/creator> "Mark Birbeck" .

Support for standards-based XML parsers

There is no included document parser, instead supports reading from a DOM (W3C or compatible), or a SAX-like event stream.

Tests are written for the following packages:

Support for different media types

Supports plain XML, HTML, and SVG out-of-the-box, and is extendible for new host languages.

Supported host languages:

  • RDFaXMLParser - for plain XML documents
  • RDFaXHTMLParser - for HTML documents delivered in application/xhtml+xml
  • RDFaHTMLParser - for HTML documents delivered in text/html

File index

  • bin/httpd.js - an HTTP server for use with the RDFa Test Suite
  • bin/rdfa-turtle.js - read a file and print extracted triples
  • defaults.json - list of RDFa Initial Context vocabulary & terms
  • index.js - the bulk of the logic
  • test/suite.js - runs the library against a local copy of the RDFa test suite

API

parseDOM(Processor, base, document, options)

Extracts RDF statements out of a DOM document document, assuming a URI base base (always ignoring the URI in the DOM node, if any).

Returns an RDFaParser instance (see below).

  • Processor: a reference to a subclass of RDFaParser
  • base: the URI base for the document (where the document was downloaded from)
  • document: DOM document
  • options: object with additional configuration
    • forceVersion: RDFa version to use. This is normally detected.
    • defaultLanguage: default language for when no language is specified by the document (use the language specified in the Content-Language header, if any)

RDFaParser

Maintains state during processing of a document. Pass a subclass of this to parseDOM as the first argument Processor.

RDFaXMLParser

RDFaParser with the default RDF context loaded.

RDFaXHTMLParser

RDFaParser extended with the parsing rules for application/xhtml+xml documents.

RDFaHTMLParser

RDFaParser extended with the parsing rules for text/html documents.

RDFaContext

Represents an RDFa processing context that's generated during processing of an element and passed to child elements.

About

Query and process RDFa documents

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published