WHATWG HTML5 specification-compliant, fast and ready for production HTML parsing/serialization toolset for Node.
parse5 contains nearly everything what you will need to deal with the HTML. It's the fastest spec-compliant HTML parser for Node to the date and will parse HTML the way the latest version of your browser does. It's stable and used by such projects as [jsdom](https://github.com/tmpvar/jsdom), [Angular2](https://github.com/angular/angular), [Polymer](https://www.polymer-project.org/1.0/) and many more. # Table of contents * [Install](#install) * [Usage](#usage) * [API Reference](#api-reference) * [FAQ](#faq) * [Version history](#version-history) * [License](#license-and-author-information) # Install ``` $ npm install parse5 ``` # Usage ```js var parse5 = require('parse5'); var document = parse5.parse('Hi there!'); var documentHtml = parse5.serialize(document); var fragment = parse5.parseFragment('object
Object
Object
Object
Object
Object
Object
object
**Kind**: global namespace
* [parse5](#parse5) : object
* [.ParserStream](#parse5+ParserStream) ⇐ stream.Writable
* [new ParserStream(options)](#new_parse5+ParserStream_new)
* [.document](#parse5+ParserStream+document) : ASTNode.<document>
* ["script" (scriptElement, documentWrite(html), resume)](#parse5+ParserStream+event_script)
* [.SAXParser](#parse5+SAXParser) ⇐ stream.Transform
* [new SAXParser(options)](#new_parse5+SAXParser_new)
* [.stop()](#parse5+SAXParser+stop)
* ["startTag" (name, attributes, selfClosing, [location])](#parse5+SAXParser+event_startTag)
* ["endTag" (name, [location])](#parse5+SAXParser+event_endTag)
* ["comment" (text, [location])](#parse5+SAXParser+event_comment)
* ["doctype" (name, publicId, systemId, [location])](#parse5+SAXParser+event_doctype)
* ["text" (text, [location])](#parse5+SAXParser+event_text)
* [.SerializerStream](#parse5+SerializerStream) ⇐ stream.Readable
* [new SerializerStream(node, [options])](#new_parse5+SerializerStream_new)
* [.treeAdapters](#parse5+treeAdapters)
* [.parse(html, [options])](#parse5+parse) ⇒ ASTNode.<Document>
* [.parseFragment([fragmentContext], html, [options])](#parse5+parseFragment) ⇒ ASTNode.<DocumentFragment>
* [.serialize(node, [options])](#parse5+serialize) ⇒ String
### parse5.ParserStream ⇐ stream.Writable
**Kind**: instance class of [parse5](#parse5)
**Extends:** stream.Writable
* [.ParserStream](#parse5+ParserStream) ⇐ stream.Writable
* [new ParserStream(options)](#new_parse5+ParserStream_new)
* [.document](#parse5+ParserStream+document) : ASTNode.<document>
* ["script" (scriptElement, documentWrite(html), resume)](#parse5+ParserStream+event_script)
#### new ParserStream(options)
Streaming HTML parser with the scripting support.
[Writable stream](https://nodejs.org/api/stream.html#stream_class_stream_writable).
| Param | Type | Description |
| --- | --- | --- |
| options | [ParserOptions](#ParserOptions)
| Parsing options. |
**Example**
```js
var parse5 = require('parse5');
var http = require('http');
// Fetch google.com content and obtain it's node
http.get('http://google.com', function(res) {
var parser = new parse5.ParserStream();
parser.on('finish', function() {
var body = parser.document.childNodes[0].childNodes[1];
});
res.pipe(parser);
});
```
#### parserStream.document : ASTNode.<document>
Resulting document node.
**Kind**: instance property of [ParserStream](#parse5+ParserStream)
#### "script" (scriptElement, documentWrite(html), resume)
Raised then parser encounters `');
```
### parse5.SAXParser ⇐ stream.Transform
**Kind**: instance class of [parse5](#parse5)
**Extends:** stream.Transform
* [.SAXParser](#parse5+SAXParser) ⇐ stream.Transform
* [new SAXParser(options)](#new_parse5+SAXParser_new)
* [.stop()](#parse5+SAXParser+stop)
* ["startTag" (name, attributes, selfClosing, [location])](#parse5+SAXParser+event_startTag)
* ["endTag" (name, [location])](#parse5+SAXParser+event_endTag)
* ["comment" (text, [location])](#parse5+SAXParser+event_comment)
* ["doctype" (name, publicId, systemId, [location])](#parse5+SAXParser+event_doctype)
* ["text" (text, [location])](#parse5+SAXParser+event_text)
#### new SAXParser(options)
Streaming [SAX](https://en.wikipedia.org/wiki/Simple_API_for_XML)-style HTML parser.
[Transform stream](https://nodejs.org/api/stream.html#stream_class_stream_transform)
(which means you can pipe *through* it, see example).
| Param | Type | Description |
| --- | --- | --- |
| options | [SAXParserOptions](#SAXParserOptions)
| Parsing options. |
**Example**
```js
var parse5 = require('parse5');
var http = require('http');
var fs = require('fs');
var file = fs.createWriteStream('/home/google.com.html');
var parser = new SAXParser();
parser.on('text', function(text) {
// Handle page text content
...
});
http.get('http://google.com', function(res) {
// SAXParser is the Transform stream, which means you can pipe
// through it. So you can analyze page content and e.g. save it
// to the file at the same time:
res.pipe(parser).pipe(file);
});
```
#### saxParser.stop()
Stops parsing. Useful if you want parser to stop consume
CPU time once you've obtained desired info from input stream.
Doesn't prevents piping, so data will flow through parser as usual.
**Kind**: instance method of [SAXParser](#parse5+SAXParser)
**Example**
```js
var parse5 = require('parse5');
var http = require('http');
var fs = require('fs');
var file = fs.createWriteStream('/home/google.com.html');
var parser = new parse5.SAXParser();
parser.on('doctype', function(name, publicId, systemId) {
// Process doctype info ans stop parsing
...
parser.stop();
});
http.get('http://google.com', function(res) {
// Despite the fact that parser.stop() was called whole
// content of the page will be written to the file
res.pipe(parser).pipe(file);
});
```
#### "startTag" (name, attributes, selfClosing, [location])
Raised then parser encounters start tag.
**Kind**: event emitted by [SAXParser](#parse5+SAXParser)
| Param | Type | Description |
| --- | --- | --- |
| name | String
| Tag name. |
| attributes | String
| List of attributes in `{ key: String, value: String }` form. |
| selfClosing | Boolean
| Indicates if tag is self-closing. |
| [location] | [LocationInfo](#LocationInfo)
| Start tag source code location info. Available if location info is enabled in [SAXParserOptions](#SAXParserOptions). |
#### "endTag" (name, [location])
Raised then parser encounters end tag.
**Kind**: event emitted by [SAXParser](#parse5+SAXParser)
| Param | Type | Description |
| --- | --- | --- |
| name | String
| Tag name. |
| [location] | [LocationInfo](#LocationInfo)
| End tag source code location info. Available if location info is enabled in [SAXParserOptions](#SAXParserOptions). |
#### "comment" (text, [location])
Raised then parser encounters comment.
**Kind**: event emitted by [SAXParser](#parse5+SAXParser)
| Param | Type | Description |
| --- | --- | --- |
| text | String
| Comment text. |
| [location] | [LocationInfo](#LocationInfo)
| Comment source code location info. Available if location info is enabled in [SAXParserOptions](#SAXParserOptions). |
#### "doctype" (name, publicId, systemId, [location])
Raised then parser encounters [document type declaration](https://en.wikipedia.org/wiki/Document_type_declaration).
**Kind**: event emitted by [SAXParser](#parse5+SAXParser)
| Param | Type | Description |
| --- | --- | --- |
| name | String
| Document type name. |
| publicId | String
| Document type public identifier. |
| systemId | String
| Document type system identifier. |
| [location] | [LocationInfo](#LocationInfo)
| Document type declaration source code location info. Available if location info is enabled in [SAXParserOptions](#SAXParserOptions). |
#### "text" (text, [location])
Raised then parser encounters text content.
**Kind**: event emitted by [SAXParser](#parse5+SAXParser)
| Param | Type | Description |
| --- | --- | --- |
| text | String
| Text content. |
| [location] | [LocationInfo](#LocationInfo)
| Text content code location info. Available if location info is enabled in [SAXParserOptions](#SAXParserOptions). |
### parse5.SerializerStream ⇐ stream.Readable
**Kind**: instance class of [parse5](#parse5)
**Extends:** stream.Readable
#### new SerializerStream(node, [options])
Streaming AST node to HTML serializer.
[Readable stream](https://nodejs.org/api/stream.html#stream_class_stream_readable).
| Param | Type | Description |
| --- | --- | --- |
| node | ASTNode
| Node to serialize. |
| [options] | [SerializerOptions](#SerializerOptions)
| Serialization options. |
**Example**
```js
var parse5 = require('parse5');
var fs = require('fs');
var file = fs.createWriteStream('/home/index.html');
// Serialize parsed document to the HTML and write it to file
var document = parse5.parse('Who is John Galt?');
var serializer = new parse5.SerializerStream(document);
serializer.pipe(file);
```
### parse5.treeAdapters
Provides built-in tree adapters which can be used for parsing and serialization.
**Kind**: instance property of [parse5](#parse5)
**Properties**
| Name | Type | Description |
| --- | --- | --- |
| default | [TreeAdapter](#TreeAdapter)
| Default tree format for parse5. |
| htmlparser2 | [TreeAdapter](#TreeAdapter)
| Quite popular [htmlparser2](https://github.com/fb55/htmlparser2) tree format (e.g. used by [cheerio](https://github.com/MatthewMueller/cheerio) and [jsdom](https://github.com/tmpvar/jsdom)). |
**Example**
```js
var parse5 = require('parse5');
// Use default tree adapter for parsing
var document = parse5.parse('', { treeAdapter: parse5.treeAdapters.default });
// Use htmlparser2 tree adapter with SerializerStream
var serializer = new parse5.SerializerStream(node, { treeAdapter: parse5.treeAdapters.htmlparser2 });
```
### parse5.parse(html, [options]) ⇒ ASTNode.<Document>
Parses HTML string.
**Kind**: instance method of [parse5](#parse5)
**Returns**: ASTNode.<Document>
- document
| Param | Type | Description |
| --- | --- | --- |
| html | string
| Input HTML string. |
| [options] | [ParserOptions](#ParserOptions)
| Parsing options. |
**Example**
```js
var parse5 = require('parse5');
var document = parse5.parse('Hi there!');
```
### parse5.parseFragment([fragmentContext], html, [options]) ⇒ ASTNode.<DocumentFragment>
Parses HTML fragment.
**Kind**: instance method of [parse5](#parse5)
**Returns**: ASTNode.<DocumentFragment>
- documentFragment
| Param | Type | Description |
| --- | --- | --- |
| [fragmentContext] | ASTNode
| Parsing context element. If specified, given fragment will be parsed as if it was set to the context element's `innerHTML` property. |
| html | string
| Input HTML fragment string. |
| [options] | [ParserOptions](#ParserOptions)
| Parsing options. |
**Example**
```js
var parse5 = require('parse5');
var documentFragment = parse5.parseFragment('