# The Document Object Model {{quote {author: "Friedrich Nietzsche", title: "Beyond Good and Evil", chapter: true} Too bad! Same old story! Once you've finished building your house you notice you've accidentally learned something that you really should have known—before you started. quote}} {{figure {url: "img/chapter_picture_14.jpg", alt: "Illustration showing a tree with letters, pictures, and gears hanging on its branches", chapter: "framed"}}} {{index drawing, parsing}} When you open a web page, your browser retrieves the page's ((HTML)) text and parses it, much like our parser from [Chapter ?](language#parsing) parsed programs. The browser builds up a model of the document's ((structure)) and uses this model to draw the page on the screen. {{index "live data structure"}} This representation of the ((document)) is one of the toys that a JavaScript program has available in its ((sandbox)). It is a ((data structure)) that you can read or modify. It acts as a _live_ data structure: when it's modified, the page on the screen is updated to reflect the changes. ## Document structure {{index [HTML, structure]}} You can imagine an HTML document as a nested set of ((box))es. Tags such as `
` and `` enclose other ((tag))s, which in turn contain other tags or ((text)). Here's the example document from the [previous chapter](browser): ```{lang: html, sandbox: "homepage"}Hello, I am Marijn and this is my home page.
I also wrote a book! Read it here.
``` This page has the following structure: {{figure {url: "img/html-boxes.svg", alt: "Diagram showing an HTML document as a set of nested boxes. The outer box is labeled 'html' and contains two boxes labeled 'head' and 'body'. Inside those are further boxes, with some of the innermost boxes containing the document's text.", width: "7cm"}}} {{indexsee "Document Object Model", DOM}} The data structure the browser uses to represent the document follows this shape. For each box, there is an object, which we can interact with to find out things such as what HTML tag it represents and which boxes and text it contains. This representation is called the _Document Object Model_, or _((DOM))_ for short. {{index "documentElement property", "head property", "body property", "html (HTML tag)", "body (HTML tag)", "head (HTML tag)"}} The global binding `document` gives us access to these objects. Its `documentElement` property refers to the object representing the `` tag. Since every HTML document has a head and a body, it also has `head` and `body` properties pointing at those elements. ## Trees {{index [nesting, "of objects"]}} Think back to the ((syntax tree))s from [Chapter ?](language#parsing) for a moment. Their structures are strikingly similar to the structure of a browser's document. Each _((node))_ may refer to other nodes, _children_, which in turn may have their own children. This shape is typical of nested structures, where elements can contain subelements that are similar to themselves. {{index "documentElement property", [DOM, tree]}} We call a data structure a _((tree))_ when it has a branching structure, no ((cycle))s (a node may not contain itself, directly or indirectly), and a single, well-defined _((root))_. In the case of the DOM, `document.documentElement` serves as the root. {{index sorting, ["data structure", "tree"], "syntax tree"}} Trees come up a lot in computer science. In addition to representing recursive structures such as HTML documents or programs, they are often used to maintain sorted ((set))s of data because elements can usually be found or inserted more efficiently in a tree than in a flat array. {{index "leaf node", "Egg language"}} A typical tree has different kinds of ((node))s. The syntax tree for [the Egg language](language) had identifiers, values, and application nodes. Application nodes may have children, whereas identifiers and values are _leaves_, or nodes without children. {{index "body property", [HTML, structure]}} The same goes for the DOM. Nodes for _((element))s_, which represent HTML tags, determine the structure of the document. These can have ((child node))s. An example of such a node is `document.body`. Some of these children can be ((leaf node))s, such as pieces of ((text)) or ((comment)) nodes. {{index "text node", element, "ELEMENT_NODE code", "COMMENT_NODE code", "TEXT_NODE code", "nodeType property"}} Each DOM node object has a `nodeType` property, which contains a code (number) that identifies the type of node. Elements have code 1, which is also defined as the constant property `Node.ELEMENT_NODE`. Text nodes, representing a section of text in the document, get code 3 (`Node.TEXT_NODE`). Comments have code 8 (`Node.COMMENT_NODE`). Another way to visualize our document ((tree)) is as follows: {{figure {url: "img/html-tree.svg", alt: "Diagram showing the HTML document as a tree, with arrows from parent nodes to child nodes", width: "8cm"}}} The leaves are text nodes, and the arrows indicate parent-child relationships between nodes. {{id standard}} ## The standard {{index "programming language", [interface, design], [DOM, interface]}} Using cryptic numeric codes to represent node types is not a very JavaScript-like thing to do. Later in this chapter, we'll see that other parts of the DOM interface also feel cumbersome and alien. This is because the DOM interface wasn't designed for JavaScript alone. Rather, it tries to be a language-neutral interface that can be used in other systems as well—not just for HTML but also for ((XML)), which is a generic ((data format)) with an HTML-like syntax. {{index consistency, integration}} This is unfortunate. Standards are often useful. But in this case, the advantage (cross-language consistency) isn't all that compelling. Having an interface that is properly integrated with the language you're using will save you more time than having a familiar interface across languages. {{index "array-like object", "NodeList type"}} As an example of this poor integration, consider the `childNodes` property that element nodes in the DOM have. This property holds an array-like object with a `length` property and properties labeled by numbers to access the child nodes. But it is an instance of the `NodeList` type, not a real array, so it does not have methods such as `slice` and `map`. {{index [interface, design], [DOM, construction], "side effect"}} Then there are issues that are simply caused by poor design. For example, there is no way to create a new node and immediately add children or ((attribute))s to it. Instead, you have to first create it and then add the children and attributes one by one, using side effects. Code that interacts heavily with the DOM tends to get long, repetitive, and ugly. {{index library}} But these flaws aren't fatal. Since JavaScript allows us to create our own ((abstraction))s, it is possible to design improved ways to express the operations we are performing. Many libraries intended for browser programming come with such tools. ## Moving through the tree {{index pointer}} DOM nodes contain a wealth of ((link))s to other nearby nodes. The following diagram illustrates these: {{figure {url: "img/html-links.svg", alt: "Diagram that shows the links between DOM nodes. The 'body' node is shown as a box, with a 'firstChild' arrow pointing at the 'h1' node at its start, a 'lastChild' arrow pointing at the last paragraph node, and 'childNodes' arrow pointing at an array of links to all its children. The middle paragraph has a 'previousSibling' arrow pointing at the node before it, a 'nextSibling' arrow to the node after it, and a 'parentNode' arrow pointing at the 'body' node.", width: "6cm"}}} {{index "child node", "parentNode property", "childNodes property"}} Although the diagram shows only one link of each type, every node has a `parentNode` property that points to the node it is part of, if any. Likewise, every element node (node type 1) has a `childNodes` property that points to an ((array-like object)) holding its children. {{index "firstChild property", "lastChild property", "previousSibling property", "nextSibling property"}} In theory, you could move anywhere in the tree using just these parent and child links. But JavaScript also gives you access to a number of additional convenience links. The `firstChild` and `lastChild` properties point to the first and last child elements or have the value `null` for nodes without children. Similarly, `previousSibling` and `nextSibling` point to adjacent nodes, which are nodes with the same parent that appear immediately before or after the node itself. For a first child, `previousSibling` will be null, and for a last child, `nextSibling` will be null. {{index "children property", "text node", element}} There's also the `children` property, which is like `childNodes` but contains only element (type 1) children, not other types of child nodes. This can be useful when you aren't interested in text nodes. {{index "talksAbout function", recursion, [nesting, "of objects"]}} When dealing with a nested data structure like this one, recursive functions are often useful. The following function scans a document for ((text node))s containing a given string and returns `true` when it has found one: {{id talksAbout}} ```{sandbox: "homepage"} function talksAbout(node, string) { if (node.nodeType == Node.ELEMENT_NODE) { for (let child of node.childNodes) { if (talksAbout(child, string)) { return true; } } return false; } else if (node.nodeType == Node.TEXT_NODE) { return node.nodeValue.indexOf(string) > -1; } } console.log(talksAbout(document.body, "book")); // → true ``` {{index "nodeValue property"}} The `nodeValue` property of a text node holds the string of text that it represents. ## Finding elements {{index [DOM, querying], "body property", "hard-coding", [whitespace, "in HTML"]}} Navigating these ((link))s among parents, children, and siblings is often useful. But if we want to find a specific node in the document, reaching it by starting at `document.body` and following a fixed path of properties is a bad idea. Doing so bakes assumptions into our program about the precise structure of the document—a structure you might want to change later. Another complicating factor is that text nodes are created even for the whitespace between nodes. The example document's `` tag has not just three children (`` elements), but seven: those three, plus the spaces before, after, and between them. {{index "search problem", "href attribute", "getElementsByTagName method"}} If we want to get the `href` attribute of the link in that document, we don't want to say something like "Get the second child of the sixth child of the document body". It'd be better if we could say "Get the first link in the document". And we can. ```{sandbox: "homepage"} let link = document.body.getElementsByTagName("a")[0]; console.log(link.href); ``` {{index "child node"}} All element nodes have a `getElementsByTagName` method, which collects all elements with the given tag name that are descendants (direct or indirect children) of that node and returns them as an ((array-like object)). {{index "id attribute", "getElementById method"}} To find a specific _single_ node, you can give it an `id` attribute and use `document.getElementById` instead. ```{lang: html}
My ostrich Gertrude:

One
Two
Three
``` A node can exist in the document in only one place. Thus, inserting paragraph _Three_ in front of paragraph _One_ will first remove it from the end of the document and then insert it at the front, resulting in _Three_/_One_/_Two_. All operations that insert a node somewhere will, as a ((side effect)), cause it to be removed from its current position (if it has one). {{index "insertBefore method", "replaceChild method"}} The `replaceChild` method is used to replace a child node with another one. It takes as arguments two nodes: a new node and the node to be replaced. The replaced node must be a child of the element the method is called on. Note that both `replaceChild` and `insertBefore` expect the _new_ node as their first argument. ## Creating nodes {{index "alt attribute", "img (HTML tag)", "createTextNode method"}} Say we want to write a script that replaces all ((image))s (`The
in the
.
No book can ever be finished. While working on it we learn just enough to find it immature the moment we turn away from it.``` {{if book This is what the resulting document looks like: {{figure {url: "img/blockquote.png", alt: "Rendered picture of the blockquote with attribution", width: "8cm"}}} if}} ## Attributes {{index "href attribute", [DOM, attributes]}} Some element ((attribute))s, such as `href` for links, can be accessed through a property of the same name on the element's ((DOM)) object. This is the case for most commonly used standard attributes. {{index "data attribute", "getAttribute method", "setAttribute method", attribute}} HTML allows you to set any attribute you want on nodes. This can be useful because it allows you to store extra information in a document. To read or change custom attributes, which aren't available as regular object properties, you have to use the `getAttribute` and `setAttribute` methods. ```{lang: html}
The launch code is 00000000.
I have two feet.
``` It is recommended to prefix the names of such made-up attributes with `data-` to ensure they do not conflict with any other attributes. {{index "getAttribute method", "setAttribute method", "className property", "class attribute"}} There is a commonly used attribute, `class`, which is a ((keyword)) in the JavaScript language. For historical reasons—some old JavaScript implementations could not handle property names that matched keywords—the property used to access this attribute is called `className`. You can also access it under its real name, `"class"`, with the `getAttribute` and `setAttribute` methods. ## Layout {{index layout, "block element", "inline element", "p (HTML tag)", "h1 (HTML tag)", "a (HTML tag)", "strong (HTML tag)"}} You may have noticed that different types of elements are laid out differently. Some, such as paragraphs (``) or headings (`
I'm boxed in
``` {{if book Giving a paragraph a border causes a rectangle to be drawn around it. {{figure {url: "img/boxed-in.png", alt: "Rendered picture of a paragraph with a border", width: "8cm"}}} if}} {{index "getBoundingClientRect method", position, "pageXOffset property", "pageYOffset property"}} {{id boundingRect}} The most effective way to find the precise position of an element on the screen is the `getBoundingClientRect` method. It returns an object with `top`, `bottom`, `left`, and `right` properties, indicating the pixel positions of the sides of the element relative to the upper left of the screen. If you want pixel positions relative to the whole document, you must add the current scroll position, which you can find in the `pageXOffset` and `pageYOffset` bindings. {{index "offsetHeight property", "getBoundingClientRect method", drawing, laziness, performance, efficiency}} Laying out a document can be quite a lot of work. In the interest of speed, browser engines do not immediately re-layout a document every time you change it but wait as long as they can before doing so. When a JavaScript program that changed the document finishes running, the browser will have to compute a new layout to draw the changed document to the screen. When a program _asks_ for the position or size of something by reading properties such as `offsetHeight` or calling `getBoundingClientRect`, providing that information also requires computing a ((layout)). {{index "side effect", optimization, benchmark}} A program that repeatedly alternates between reading DOM layout information and changing the DOM forces a lot of layout computations to happen and will consequently run very slowly. The following code is an example of this. It contains two different programs that build up a line of _X_ characters 2,000 pixels wide and measures the time each one takes. ```{lang: html, test: nonumbers}
``` ## Styling {{index "block element", "inline element", style, "strong (HTML tag)", "a (HTML tag)", underline}} We have seen that different HTML elements are drawn differently. Some are displayed as blocks, others inline. Some add styling—`` makes its content ((bold)), and `` makes it blue and underlines it. {{index "img (HTML tag)", "default behavior", "style attribute"}} The way an `
Nice text
``` {{index "camel case", capitalization, "hyphen character", "font-family (CSS)"}} Some style property names contain hyphens, such as `font-family`. Because such property names are awkward to work with in JavaScript (you'd have to say `style["font-family"]`), the property names in the `style` object for such properties have their hyphens removed and the letters after them capitalized (`style.fontFamily`). ## Cascading styles {{index "rule (CSS)", "style (HTML tag)"}} {{indexsee "Cascading Style Sheets", CSS}} {{indexsee "style sheet", CSS}} The styling system for HTML is called _((CSS))_, for _Cascading Style Sheets_. A _style sheet_ is a set of rules for how to style elements in a document. It can be given inside a `Now strong text is italic and gray.
``` {{index "rule (CSS)", "font-weight (CSS)", overlay}} The _((cascading))_ in the name refers to the fact that multiple such rules are combined to produce the final style for an element. In the example, the default styling for `` tags, which gives them `font-weight: bold`, is overlaid by the rule in the `
```
if}}
{{hint
`Math.cos` and `Math.sin` measure angles in radians, where a full circle is 2π. For a given angle, you can get the opposite angle by adding half of this, which is `Math.PI`. This can be useful for putting the hat on the opposite side of the orbit.
hint}}