Once you provide a DOM query selector, the module creates new output documents for each query result and allows you to set the new document content and/or set new metadata based on the query result.
Note that because this module parses the document content as standards-compliant HTML and outputs the formatted post-parsed DOM, you should only place this module after all other template processing has been performed.
Package
#n Wyam.Html
Usage
-
HtmlQuery(string querySelector)
Creates the module with the specified query selector.
querySelector
The query selector to use.
Fluent Methods
Chain these methods together after the constructor to modify behavior.
-
First(bool first = true)
Specifies that only the first query result should be processed (the default is
false
).first
If set to
true
, only the first result is processed.
-
GetAll()
Gets all information for each query result and sets the metadata of the corresponding result document(s). This is equivalent to calling
GetOuterHtml()
,GetInnerHtml()
,GetTextContent()
, andGetAttributeValues()
with default arguments. -
GetAttributeValue(string attributeName, string metadataKey = null)
Gets the specified attribute value of each query result and sets it in the metadata of the corresponding result document(s). If the attribute is not found for a given query result, no metadata is set. If
metadataKey
isnull
, the attribute name will be used as the metadata key, otherwise the specified metadata key will be used.attributeName
Name of the attribute to get.
metadataKey
The metadata key in which to place the attribute value.
-
GetAttributeValues()
Gets the values for all attributes of each query result and sets them in the metadata of the corresponding result document(s) with keys names equal to the attribute local name.
-
GetInnerHtml(string metadataKey = "InnerHtml")
Gets the inner HTML of each query result and sets it in the metadata of the corresponding result document(s) with the specified key.
metadataKey
The metadata key in which to place the inner HTML.
-
GetOuterHtml(string metadataKey = "OuterHtml")
Gets the outer HTML of each query result and sets it in the metadata of the corresponding result document(s) with the specified key.
metadataKey
The metadata key in which to place the outer HTML.
-
GetTextContent(string metadataKey = "TextContent")
Gets the text content of each query result and sets it in the metadata of the corresponding result document(s) with the specified key.
metadataKey
The metadata key in which to place the text content.
-
SetContent(Nullable<bool> outerHtml = true)
Sets the content of the result document(s) to the content of the corresponding query result, optionally specifying whether inner or outer HTML content should be used. The default is
null
, which does not add any content to the result documents (only metadata).outerHtml
If set to
true
, outer HTML content is used for the document content. If set tofalse
, inner HTML content is used for the document content. Ifnull
, no document content is set.
Output Metadata
The metadata values listed below apply to individual documents and are created and set by the module as indicated in their descriptions.
-
HtmlKeys.InnerHtml
:System.String
Contains the inner HTML of the query result (unless an alternate metadata key is specified).
-
HtmlKeys.OuterHtml
:System.String
Contains the outer HTML of the query result (unless an alternate metadata key is specified).
-
HtmlKeys.TextContent
:System.String
Contains the text content of the query result (unless an alternate metadata key is specified).