Basics
Here we will cover the basics of Tensor Builder, for this we will solve one of the simplest classical examples in the history of neural network: the XOR.
We will assume that you have already installed TensorBuilder, if not click here. Remember that you must have a working installation of TensorFlow.
Setup
First we will setup our imports, you'll need to have numpy installed.
import numpy as np
import tensorflow as tf
from tensorbuilder import tb
As you see tb is not an alias for the tensorbuilder module, its actually an object that we import from this library. There are several reason behind exposing the API as an object, one is that implementing it this way reduced a lot of code internally, but it also plays better with the DSL as you might see later.
Note:
tbis of typeApplicativeand all of its methods are immutable, so down worry about "breaking" it.
Next we are going to create our data and placeholders
#TRUTH TABLE (DATA)
X = [[0.0,0.0]]; Y = [[0.0]]
X.append([1.0,0.0]); Y.append([1.0])
X.append([0.0,1.0]); Y.append([1.0])
X.append([1.0,1.0]); Y.append([0.0])
X = np.array(X)
Y = np.array(Y)
x = tf.placeholder(tf.float32, shape=[None, 2])
y = tf.placeholder(tf.float32, shape=[None, 1])
Building Networks
Now we need to construct smallest the neural network that can solve the XOR, its architecture is going to be [2 input, 2 sigmoid, 1 sigmoid]. To do that we will first calculate the logit of the last layer, and then using it we will calculate 2 things:
- The
activationfunction (sometimes denotedh) by using thesigmoidfunction - The networks
trainerby creating a loss function and feeding it to a training algorithm.
Here is the code
logit = (
tb
.build(x)
.sigmoid_layer(2)
.linear_layer(1)
)
activation = (
logit
.sigmoid()
.tensor()
)
trainer = (
logit
.sigmoid_cross_entropy_with_logits(y) # loss
.map(tf.train.AdamOptimizer(0.01).minimize)
.tensor()
)
As you see TensorBuilders API is fluent, meaning that you can keep chaining methods to build the computation.
The Builder class
The first thing we should talk about when reviewing this code is the Builder class. When we executed
tb
.build(x)
we created a Builder that holds our input Tensor x. Having our Builder we proceeded to use the methods
.sigmoid_layer(2)
.linear_layer(1)
If the acronym "What You Read is Mostly What You Get (WYRMWYG)" were a thing, this code would be it. Its telling you that the input is connected to a layer of 2 sigmoid units, and then this is connected to a layer of 1 linear unit. You might be wondering where do these methods come from? Or what kinds of methods are there?
Method Families
TensorBuilder decided to become a library that doesn't implement the core methods that actually deal with Tensors. Instead it has some class methods to register instance methods and during import we actually include a bunch of functions from other libraries (yeah we are basically just stealing other libraries for the greater good). Currently most of these methods come from the tensorflow library, but there are also some from tflearn. The the current practice is the following
- The function
tf.contrib.layers.fully_connectedis a very special function that is registered as a method of this class. Its importance is due to the fact that the most fundamental operations in the creation of neural networks involve creating/connecting layers. - If
fis a funciton intfortf.nn, it will most likely be registered as method of theBuilderclass. The process that registers these functions lifts them from being functions that accept aTensor(plus some extra arguments) to functions that accept aBuilder(plus some extra arguments). Due to this, not all methods will work as expected, an obvious example is tf.placeholder, this function is automatically included but it doesn't take a Tensor as its first parameter so it doesn't make any sense a method of this class. Right now the current policy of which of these functions are include/exclude is a blacklist approach so that only functions that are known to cause serious problems (like having the same name as basic methods) are excluded and all the functions you are likely going to use are included. - Based on point 1 and 2, the next set of function are defined as: if
fis a function intfortf.nnwith namefname, then the methodfname_layerexists in this class. These methods usefully_connectedandfto create a layer withfas its activation function. While you don't REALLY need them,.softmax_layer(5)reads much better than.fully_connected(5, activation_fn=tf.nn.softmax).
Using the methods
So we used the methods .sigmoid_layer(2) and .linear_layer(1) to create our logit. Now to create the activation function (rather Tensor) of our network we did the following
activation = (
logit
.sigmoid()
.tensor()
)
This was basically just applying tf.sigmoid over the logit. The method .tensor allows us to actually get back the Tensor inside the Builder.
The map method
Finally we created our network trainer doing the following
trainer = (
logit
.sigmoid_cross_entropy_with_logits(y) # loss
.map(tf.train.AdamOptimizer(0.01).minimize)
.tensor()
)
Initially we just indirectly applyed the function tf.nn.sigmoid_cross_entropy_with_logits over the login and the target's placeholder y, to get out our loss Tensor. But then we used a custom method from the Builder class: map.
map takes any function that accepts a Tensor as its first parameter (and some extra arguments), applies that function to the Tensor inside our Builder (plus the extra arguments), and returns a Builder with the new Tensor. In this case our function was the unbounded method minimize of the AdamOptimizer instace (created in-line) that expect a loss Tensor and returns a Tensor that performs the computation that trains our network.
The thing is, given that we have map we actually don't REALLY need most of the other methods! We could e.g. have written the initial structure of our network like this
logit = (
tb
.build(x)
.map(tf.contrib.layers.fully_connected, 2, activation_fn=tf.nn.sigmoid)
.map(tf.contrib.layers.fully_connected, 1, activation_fn=None)
)
instead of
logit = (
tb
.build(x)
.sigmoid_layer(2)
.linear_layer(1)
)
but as you see the latter is more compact and readable. The important thing is that you understand that you can use map to incorporate functions not registered in the Builder class naturally into the computation.
Training
Finally, given that we have constructed the trainer and activation Tensors, lets use regular TensorFlow operations to trainer the network. We will train for 2000 epochs using full batch training (given that we only have 4 training examples) and then print out the prediction for each case of the XOR using the activation Tensor.
# create session
sess = tf.Session()
sess.run(tf.initialize_all_variables())
# train
for i in range(2000):
sess.run(trainer, feed_dict={x: X, y: Y})
# test
for i in range(len(X)):
print "{0} ==> {1}".format(X[i], sess.run(activation, feed_dict={x: X[i:i+1,:]}))
Congratulations! You have just solved the XOR problem using TensorBuilder. Not much of a feat for a serious Machine Learning Engineer, but you have the basic knowledge of the TensorBuilder API.
What's Next?
In the next chapters you will learn how to create branched neural networks (important in many architectures), use scoping mechanisms to specify some attributes about the Tensor we build, and explore the Domain Specific Language (DSL) using all the previous knowledge to enable you to code even faster.