Version: 5.0
HTML: Parsing Library
The
html library provides
functions to read html documents and structures to represent them.
Reads (X)HTML from a port, producing an
html instance.
If v is not #f, then comments are read and returned. Defaults to #f.
If v is not #f, then the HTML must respect the HTML specification
with regards to what elements are allowed to be the children of
other elements. For example, the top-level "<html>"
element may only contain a "<body>" and "<head>"
element. Defaults to #f.
1 Example
| (module html-example racket | | | | ; Some of the symbols in html and xml conflict with | | ; each other and with racket/base language, so we prefix | | ; to avoid namespace conflict. | | (require (prefix-in h: html) | | (prefix-in x: xml)) | | | | (define an-html | | (h:read-xhtml | | (open-input-string | | (string-append | | "<html><head><title>My title</title></head><body>" | | "<p>Hello world</p><p><b>Testing</b>!</p>" | | "</body></html>")))) | | | | ; extract-pcdata: html-content -> (listof string) | | ; Pulls out the pcdata strings from some-content. | | (define (extract-pcdata some-content) | | (cond [(x:pcdata? some-content) | | (list (x:pcdata-string some-content))] | | [(x:entity? some-content) | | (list)] | | [else | | (extract-pcdata-from-element some-content)])) | | | | ; extract-pcdata-from-element: html-element -> (listof string) | | ; Pulls out the pcdata strings from an-html-element. | | (define (extract-pcdata-from-element an-html-element) | | (match an-html-element | | [(struct h:html-full (attributes content)) | | (apply append (map extract-pcdata content))] | | | | [(struct h:html-element (attributes)) | | '()])) | | | | (printf "~s~n" (extract-pcdata an-html))) |
|
|
| > (require 'html-example) | ("My title" "Hello world" "Testing" "!") |
|
2 HTML Structures
pcdata, entity, and attribute are defined
in the xml documentation.
A html-content is either
|
| content : (listof html-content) |
Any html tag that may include content also inherits from
html-full without adding any additional fields.
A Contents-of-html is either
A Contents-of-head is either
A Contents-of-tr is either
A Contents-of-table is either
A Contents-of-fieldset is either
A Contents-of-select is either
A Contents-of-dl is either
A Contents-of-pre is either
A Contents-of-object-applet is either
A Map is
(make-map (listof attribute) (listof Contents-of-map))A Contents-of-map is either
A Contents-of-a is either
A Contents-of-address is either
A Contents-of-body is either
A G12 is either
A G11 is either
A G10 is either
A G9 is either
A G8 is either
A G7 is either
A G6 is either
A G5 is either
A G4 is either
A G3 is either
A G2 is either