{{meta {load_files: ["code/jacques_journal.js", "code/chapter/04_data.js"], zip: "node/html"}}} # Data Structures: Objects and Arrays {{quote {author: "Charles Babbage", title: "Passages from the Life of a Philosopher (1864)", chapter: true} On two occasions I have been asked, ‘Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?’ [...] I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. quote}} {{index "Babbage, Charles", object, "data structure"}} Numbers, Booleans, and strings are the atoms that ((data)) structures are built from. Many types of information require more than one atom, though. _Objects_ allow us to group values—including other objects—together to build more complex structures. The programs we have built so far have been limited by the fact that they were operating only on simple data types. This chapter will introduce basic data structures. By the end of it, you'll know enough to start writing useful programs. The chapter will work through a more or less realistic programming example, introducing concepts as they apply to the problem at hand. The example code will often build on functions and bindings that were introduced earlier in the text. {{if book The online coding ((sandbox)) for the book (http://eloquentjavascript.net/code[_eloquentjavascript.net/code_]) provides a way to run code in the context of a specific chapter. If you decide to work through the examples in another environment, be sure to first download the full code for this chapter from the sandbox page. if}} ## The weresquirrel {{index "weresquirrel example", lycanthropy}} Every now and then, usually between eight and ten in the evening, ((Jacques)) finds himself transforming into a small furry rodent with a bushy tail. On one hand, Jacques is quite glad that he doesn't have classic lycanthropy. Turning into a squirrel does cause fewer problems than turning into a wolf. Instead of having to worry about accidentally eating the neighbor (_that_ would be awkward), he worries about being eaten by the neighbor's cat. After two occasions where he woke up on a precariously thin branch in the crown of an oak, naked and disoriented, he has taken to locking the doors and windows of his room at night and putting a few walnuts on the floor to keep himself busy. {{figure {url: "img/weresquirrel.png", alt: "The weresquirrel"}}} That takes care of the cat and tree problems. But Jacques would prefer to get rid of his condition entirely. The irregular occurrences of the transformation make him suspect that they might be triggered by something. For a while, he believed that it happened only on days when he had been near oak trees. But avoiding oak trees did not cause the problem to stop. {{index journal}} Switching to a more scientific approach, Jacques has started keeping a daily log of everything he does on a given day and whether he changed form. With this data he hopes to narrow down the conditions that trigger the transformations. The first thing he needs is a data structure to store this information. ## Data sets {{index "data structure"}} To work with a chunk of digital data, we'll first have to find a way to represent it in our machine's ((memory)). Say, as an example, that we want to represent a ((collection)) of numbers: 2, 3, 5, 7, and 11. {{index string}} We could get creative with strings—after all, strings can have any length, so we can put a lot of data into them—and use `"2 3 5 7 11"` as our representation. But this is awkward. You'd have to somehow extract the digits and convert them back to numbers to access them. {{index [array, creation], "[] (array)"}} Fortunately, JavaScript provides a data type specifically for storing sequences of values. It is called an _array_ and is written as a list of values between ((square brackets)), separated by commas. ``` let listOfNumbers = [2, 3, 5, 7, 11]; console.log(listOfNumbers[2]); // → 5 console.log(listOfNumbers[0]); // → 2 console.log(listOfNumbers[2 - 1]); // → 3 ``` {{index "[] (subscript)", [array, indexing]}} The notation for getting at the elements inside an array also uses ((square brackets)). A pair of square brackets immediately after an expression, with another expression inside of them, will look up the element in the left-hand expression that corresponds to the _((index))_ given by the expression in the brackets. {{id array_indexing}} {{index "zero-based counting"}} The first index of an array is zero, not one. So the first element is read with `listOfNumbers[0]`. This convention takes some getting used to. Zero-based counting has a long tradition in technology, and in certain ways makes a lot of sense. Think of the index as the amount of items to skip, counting from the start of the array. {{id properties}} ## Properties {{index "Math object", "Math.max function", ["length property", "for string"], [object, property], "period character"}} We've seen a few suspicious-looking expressions like `myString.length` (to get the length of a string) and `Math.max` (the maximum function) in past examples. These are expressions that access a _((property))_ of some value. In the first case, we access the `length` property of the value in `myString`. In the second, we access the property named `max` in the `Math` object (which is a collection of mathematics-related values and functions). {{index property, null, undefined}} Almost all JavaScript values have properties. The exceptions are `null` and `undefined`. If you try to access a property on one of these nonvalues, you get an error. ```{test: no} null.length; // → TypeError: Cannot read property 'length' of null ``` {{indexsee "dot character", "period character"}} {{index "[] (subscript)", "period character", "square brackets", "computed property"}} The two main ways to access properties in JavaScript are with a dot and with square brackets. Both `value.x` and `value[x]` access a ((property)) on `value`—but not necessarily the same property. The difference is in how `x` is interpreted. When using a dot, the word after the dot is the literal name of the property. When using square brackets, the expression between the brackets is _evaluated_ to get the property name. Whereas `value.x` fetches the property of `value` named “x”, `value[x]` tries to evaluate the expression `x` and uses the result as the property name. So if you know that the property you are interested in is called “length”, you say `value.length`. If you want to extract the property named by the value held in the binding `i`, you say `value[i]`. Property names can be any string, and the dot notation only allows names that look like valid binding names, so if you want to access a property named “2” or “John Doe”, you must use square brackets: `value[2]` or `value["John Doe"]`. The elements in an ((array)) are stored as the array's properties, using numbers as property names. Because you can't use the dot notation with numbers, and usually want to use a binding that holds the index anyway, you have to use the bracket notation to get at them. {{index ["length property", "for array"], [array, "length of"]}} The `length` property of an array tells us how many elements it has. This property name is a valid binding name, and we know its name in advance, so to find the length of an array, you typically write `array.length` because it is easier to write than `array["length"]`. {{id methods}} ## Methods {{index [function, "as property"], method, string}} Both string and array objects contain, in addition to the `length` property, a number of properties that hold function values. ``` let doh = "Doh"; console.log(typeof doh.toUpperCase); // → function console.log(doh.toUpperCase()); // → DOH ``` {{index "case conversion", "toUpperCase method", "toLowerCase method"}} Every string has a `toUpperCase` property. When called, it will return a copy of the string in which all letters have been converted to uppercase. There is also `toLowerCase`, going the other way. {{index this}} Interestingly, even though the call to `toUpperCase` does not pass any arguments, the function somehow has access to the string `"Doh"`, the value whose property we called. How this works is described in [Chapter ?](object#obj_methods). Properties that contain functions are generally called _methods_ of the value they belong to. As in, “_toUpperCase_ is a method of a string”. {{id array_methods}} This example demonstrates two methods you can use to manipulate arrays: ``` let sequence = [1, 2, 3]; sequence.push(4); sequence.push(5); console.log(sequence); // → [1, 2, 3, 4, 5] console.log(sequence.pop()); // → 5 console.log(sequence); // → [1, 2, 3, 4] ``` {{index collection, array, "push method", "pop method"}} The `push` method adds values to the end of an array, and the the `pop` method does the opposite, removing the last value in the array and returning it. These somewhat silly names are the traditional terms for operations on a _((stack))_. A stack, in programming, is a ((data structure)) that allows you to push values into it and pop them out again in the opposite order—the thing that was added last is removed first. These are common in programming—you might remember the function ((call stack)) from [the previous chapter](functions#stack), which is an instance of the same idea. ## Objects {{index journal, "weresquirrel example", array, record}} Back to the weresquirrel. A set of daily log entries can be represented as an array. But the entries do not consist of just a number or a string—each entry needs to store a list of activities and a Boolean value that indicates whether Jacques turned into a squirrel or not. Ideally, we would like to group these together into a single value and then put those grouped values into an array of log entries. {{index syntax, property, "curly braces", "{} (object)"}} Values of the type _((object))_ are arbitrary collections of properties. One way to create an object is by using curly brace notation. ``` let day1 = { squirrel: false, events: ["work", "touched tree", "pizza", "running"] }; console.log(day1.squirrel); // → false console.log(day1.wolf); // → undefined day1.wolf = false; console.log(day1.wolf); // → false ``` {{index [quoting, "of object properties"], "colon character"}} Inside the curly braces, we give a list of properties separated by commas. Each property has a name, after the colon, a value. When an object is written over multiple lines, indenting it like in the example helps readability. Properties whose names are not valid binding names or numbers have to be quoted. ``` let descriptions = { work: "Went to work", "touched tree": "Touched a tree" }; ``` This means that ((curly braces)) have _two_ meanings in JavaScript. At the start of a ((statement)), they start a ((block)) of statements. In any other position, they describe an object. Fortunately, it is almost never useful to start a statement with a curly-brace object, so ambiguity between these two uses is rare. {{index undefined}} Reading a property that doesn't exist will produce the value `undefined`, which happens the first time we try to read the `wolf` property. {{index [property, assignment], mutability, "= operator"}} It is possible to assign a value to a property expression with the `=` operator. This will replace the property's value if it already existed or create a new property on the object if it didn't. {{index "tentacle (analogy)", [property, "model of"]}} To briefly return to our tentacle model of ((binding))s—property bindings are similar. They _grasp_ values, but other bindings and properties might be holding onto those same values. You may think of objects as octopuses with any number of tentacles, each of which has a name tattooed on it. {{figure {url: "img/octopus-object.jpg", alt: "Artist's representation of an object"}}} {{index "delete operator", [property, deletion]}} The `delete` operator cuts off a tentacle from such an octopus. It is a unary operator that, when applied to a property access expression, will remove the named property from the object. This is not a common thing to do, but it is possible. ``` let anObject = {left: 1, right: 2}; console.log(anObject.left); // → 1 delete anObject.left; console.log(anObject.left); // → undefined console.log("left" in anObject); // → false console.log("right" in anObject); // → true ``` {{index "in operator", [property, "testing for"], object}} The binary `in` operator, when applied to a string and an object, tells you whether that object has that property. The difference between setting a property to `undefined` and actually deleting it is that, in the first case, the object still _has_ the property (it just doesn't have a very interesting value), whereas in the second case the property is no longer present and `in` will return `false`. {{index "Object.keys function"}} To find out what properties an object has, you can use the `Object.keys` function. You give it an object, and it returns an array of strings—the object's property names. ``` console.log(Object.keys({x: 0, y: 0, z: 2})); // → ["x", "y", "z"] ``` You can use `Object.assign` to copy the properties from one object into another. ``` let objectA = {a: 1, b: 2}; Object.assign(objectA, {b: 3, c: 4}); console.log(objectA); // → {a: 1, b: 3, c: 4} ``` {{index array, collection}} Arrays, then, are just a kind of object specialized for storing sequences of things. If you evaluate `typeof []`, it produces `"object"`. You can see them as long, flat octopuses with all their arms in a neat row, labeled with numbers. {{figure {url: "img/octopus-array.jpg", alt: "Artist's representation of an array"}}} {{index journal, "weresquirrel example"}} So we can represent Jacques’ journal as an array of objects. ```{test: wrap} let journal = [ {events: ["work", "touched tree", "pizza", "running", "television"], squirrel: false}, {events: ["work", "ice cream", "cauliflower", "lasagna", "touched tree", "brushed teeth"], squirrel: false}, {events: ["weekend", "cycling", "break", "peanuts", "beer"], squirrel: true}, /* and so on... */ ]; ``` ## Mutability We will get to actual programming _real_ soon now. But first, there's one more piece of theory to understand. {{index mutability, "side effect", number, string, Boolean, object}} We saw that object values can be modified. The types of values discussed in earlier chapters, such as numbers, strings, and Booleans, are all _immutable_—it is impossible to change an existing value of those types. You can combine them and derive new values from them, but when you take a specific string value, that value will always remain the same. The text inside it cannot be changed. If you have reference to a string that contains `"cat"`, it is not possible for other code to change a character in your string to make it spell `"rat"`. With objects, on the other hand, the content of a value _can_ be modified by changing its properties. {{index [object, identity], identity, memory, mutability}} When we have two numbers, 120 and 120, we can consider them precisely the same number, whether or not they refer to the same physical bits. But with objects, there is a difference between having two references to the same object and having two different objects that contain the same properties. Consider the following code: ``` let object1 = {value: 10}; let object2 = object1; let object3 = {value: 10}; console.log(object1 == object2); // → true console.log(object1 == object3); // → false object1.value = 15; console.log(object2.value); // → 15 console.log(object3.value); // → 10 ``` {{index "tentacle (analogy)", [binding, "model of"]}} The `object1` and `object2` bindings grasp the _same_ object, which is why changing `object1` also changes the value of `object2`. The binding `object3` points to a different object, which initially contains the same properties as `object1` but lives a separate life. {{index "== operator", [comparison, "of objects"], "deep comparison"}} JavaScript's `==` operator, when comparing objects, will return `true` only if both objects are precisely the same value. Comparing different objects will return `false`, even if they have identical contents. There is no “deep” comparison operation built into JavaScript, which looks at object's contents, but it is possible to write it yourself (which will be one of the [exercises](data#exercise_deep_compare) at the end of this chapter). ## The lycanthrope's log {{index "weresquirrel example", lycanthropy, "addEntry function"}} So Jacques starts up his JavaScript interpreter and sets up the environment he needs to keep his ((journal)). ```{includeCode: true} let journal = []; function addEntry(events, squirrel) { journal.push({events, squirrel}); } ``` {{index "curly braces", "{} (object)"}} Note that the object added to the journal looks a little odd. Instead of declaring properies like `events: events`, it just gives a ((property)) name. This is a short-hand that means the same thing—if a property name in object notation isn't followed by a colon, its value is the value of the binding with the same name in the current scope. So then, every evening at ten—or sometimes the next morning, after climbing down from the top shelf of his bookcase—he records the day. ``` addEntry(["work", "touched tree", "pizza", "running", "television"], false); addEntry(["work", "ice cream", "cauliflower", "lasagna", "touched tree", "brushed teeth"], false); addEntry(["weekend", "cycling", "break", "peanuts", "beer"], true); ``` Once he has enough data points, Jacques intends to use statistics to find out which of these events may be related to the squirrelifications. {{index correlation}} _Correlation_ is a measure of ((dependence)) between statistical variables. A statistical variable is not quite the same as a programming variable. In statistics you typically have a set of _measurements_, and each variable is measured for every measurement. Correlation between variables is usually expressed as a value that ranges from -1 to 1. Zero correlation means the variables are not related, whereas a correlation of one indicates that the two are perfectly related—if you know one, you also know the other. Negative one also means that the variables are perfectly related but that they are opposites—when one is true, the other is false. {{index "phi coefficient"}} To compute the measure of correlation between two Boolean variables, we can use the "phi coefficient" (_ϕ_). This is a formula whose input is a ((frequency table)) containing the amount of times the different combinations of the variables were observed. The output of the formula is a number between -1 and 1 that describes the correlation. We could take the event of eating ((pizza)) and put that in a frequency table like this, where each number indicates the amount of times that combination occurred in our measurements: {{figure {url: "img/pizza-squirrel.svg", alt: "Eating pizza versus turning into a squirrel", width: "7cm"}}} If we call that table _n_, we can compute _ϕ_ using the following formula: {{if html
| ϕ = |
n11n00 -
n10n01
√
n1•n0•n•1n•0
|