**Feed-Forward Networks** All of the neural networks we'll look at in this book share some common features. The one we'll look at here is how data moves through the network from the input to the output. Neural Network Graphs === Most drawings of neural nets look something like Figure [fig-net-drawing-types]. We draw a [graph](graph.md.html) composed of [nodes](node.md.html) (also called *vertices* or *elements*) and [edges](edge.md.html) (also called *arcs* or simply *lines*). Each edge has an arrow that indicates a [direction](directed-edge.md.html). Data always flows on each edge in the direction of its arrow. ![Figure [fig-net-drawing-types]: Neural network graphs. Data usuall flows left to right, or bottom to top](../Images/600x200.png) The general idea is that we start things off by putting data at the intput node or nodes, and then it flows generally to the right (or up) until it reaches the output node or nodes. Sometimes you'll see people refer to one node as being "above" another node, even if the graph is drawn left-to-right. A node that is "above", "to the right", or a "descendent" of some node are those that are closer to the output than the node in question. In the same way, nodes that are "below", "to the left", or an "ancestor" of another node are closer to the input than the node in question. It can be confusing to read about how one node is "above" another when in the drawing it's directly to the right, but remember that a graph with data flowing rightward can be rotated so that the data is flowing upward. Usually people leave off the arrows on the edges. The implication is that data flows from the inputs to the outputs, and never in the other direction. One consequence of this is that there are no *loops*. It is never the case that data coming out of a node can ever make its way back in, no matter how circuitious path it follows. The formal name for this kind of graph, where the edges have arrows and there are no loops, is a [Directed](curse_of_dimensionality.md.html). Just to keep things confusing, sometimes people will draw a graph with the data flowing down, but that's unusual. It's possible that there are graphs of neural networks where the data flows right-to-left, but spotting one of those would be like finding a very rare bird. The Flow of Data === If you're used to using the graphs of the last section, you may take for granted just how many conventions you're implicitly applying when you interpret what the graph means. For those new to these graphs, or who don't immediately see how they actually correspond to an algorithm, we'll now see explicitly how to understand what those diagrams are telling us. First off, though we often use the word "flow" in various forms when referring to how data moves through the graph, don't think of water going through pipes. That is a *continuous* process: there are new molecules of water flowing through the pipes at every moment. The graphs we work with (and the neural networks they represent) are *discrete*: information arrives one chunk at a time, like text messages. Also like text messages, these packets of information have *time stamps*: we know when they were sent. Let's suppose that we have a group of people who are responsible for maintaining the health of a national forest. The people are arranged in a hierarchy, with rangers at the bottom, and a series of supervisors over them, ending with a forest administrator at the top. Each ranger is responsible for a circular area of the forest. So to make sure that everything is inspected, their areas of responsibility overlap, as shown in Figure [fig-forest-geometry]. ![Figure [fig-forest-geometry]: The overlapping regions assigned to the forest rangers](../Images/600x200.png) Each of the supervisors in the system also has a circular region to manage, so their domains also overlap as in Figure [fig-forest-geometry]. At the end, the forest supervisor's circle encompasses all those of her subordinates. Figure [fig-tree-net] shows the graph for this pretend workforce. I've drawn this in the left-to-right form. ![Figure [fig-tree-net]: A network graph for our forest inspection team](../Images/600x200.png) This graph is like a typical organizational chart, where the nodes represent people, and the lines show their relationships. Only here the lines don't show who manages whom. Instead, they represent the flow of reports. So at the far left, the rangers create reports on what they see. Those reports flow to the right, reaching one or more supervisors. Each supervisor waits until all the incoming reports have arrived, and then produces a summary report. That report flows to the next person up the hierarchy, and so on. So there are two essential things that make this work that aren't explicitly present in the graph. The first is that every report is time-stamped. No supervisor can prepare his or her summary report until current versions of all of the incoming reports have arrived. There are many ways that could be implemented. For specificity, here's one. Each supervisor maintains a list of all the reports that he or she is expecting. When the first full set of reports has arrived, the supervisor writes down the current time; let's call this the *moment of fulfillment*. At that moment, the supervisor writes their summary report. Now the supervisor waits. When she has received a report from each ranger that has a time stamp later than the moment of fulfillment, she updates the moment of fulfillment to that moment, and prepares a new report. That report from the supervisor is itself time-stamped, and the next person up the line applies the same procedure. The other thing that's not explicit in the diagram is that the reports are *discrete*. That is, they're not changing all the time, like the temperature. Instead, time-stamped reports appear at the output of each node, and they basically sit there, unchanging, until they're replaced by a newer report. The Graph In Practice --- Attaching a time-stamp to every report was useful for the discussion above, but we often don't need it in a computer implementation. Traditionally, we enforce this rule simply by how we structure the code. Strictly as a thought device, you can imagine a giant clock ticking in the background. On each tick, new information is fed into the input nodes, and new outputs are computed. We choose the clock speed up so that we're sure that every node will finish its work and produce a new value within a single clock tick. Then on the next tick, the next nodes up the graph take the outputs from all the nodes that feed into them, and produce new values. In essence, on every tick, information moves one step through the graph, until it gets the output. This scheme would completely fall apart if the graph allowed any node to contribute to an earlier one, so that's one of the reasons many neural network architectures specifically disallow that kind of connection. You might object that this is all backwards: surely the programs should be written to do whatever is best for training our neural networks, rather than placing limits on the networks because they'd cause inefficient programs. In practice, this kind of give-and-take between theory and practice happens all the time, in all fields. Theorists dream up ideas, practitioners make those ideas real, and both groups adjust and modify what they're doing to make the process and results as useful as possible. Although I've presented the above picture as a "typcial" approach, of course there are neural network architectures that do allow these such loops, such as the [recurrent neural networks](RNN.md.html) we'll examine later. They still follow the general idea from above, but they use a more complicated scheme for handling information flow and working out which values should be used when. The basic scheme described above changes a bit when we run our algorithms on [GPUs](GPU.md.html) that let us process a large number of nodes simultaneously. Getting that timing right can be tricky, so that part of a machine-learning library is written by someone who is a specialist in managing the proper sequencing of many simultaneous (or parallel [parallel](parallel.md.html). ) operations.