Basics
Here we will cover the basics of Tensor Builder, for this we will solve one of the simplest classical examples in the history of neural network: the XOR.
We will assume that you have already installed TensorBuilder, if not click here. Remember that you must have a working installation of TensorFlow.
Setup
First we will setup our imports, you'll need to have numpy
installed.
import numpy as np
import tensorflow as tf
from tensorbuilder import tb
As you see tb
is not an alias for the tensorbuilder
module, its actually an object that we import from this library. There are several reason behind exposing the API as an object, one is that implementing it this way reduced a lot of code internally, but it also plays better with the DSL as you might see later.
Note:
tb
is of typeApplicative
and all of its methods are immutable, so down worry about "breaking" it.
Next we are going to create our data and placeholders
#TRUTH TABLE (DATA)
X = [[0.0,0.0]]; Y = [[0.0]]
X.append([1.0,0.0]); Y.append([1.0])
X.append([0.0,1.0]); Y.append([1.0])
X.append([1.0,1.0]); Y.append([0.0])
X = np.array(X)
Y = np.array(Y)
x = tf.placeholder(tf.float32, shape=[None, 2])
y = tf.placeholder(tf.float32, shape=[None, 1])
Building Networks
Now we need to construct smallest the neural network that can solve the XOR, its architecture is going to be [2 input, 2 sigmoid, 1 sigmoid]
. To do that we will first calculate the logit
of the last layer, and then using it we will calculate 2 things:
- The
activation
function (sometimes denotedh
) by using thesigmoid
function - The networks
trainer
by creating a loss function and feeding it to a training algorithm.
Here is the code
logit = (
tb
.build(x)
.sigmoid_layer(2)
.linear_layer(1)
)
activation = (
logit
.sigmoid()
.tensor()
)
trainer = (
logit
.sigmoid_cross_entropy_with_logits(y) # loss
.map(tf.train.AdamOptimizer(0.01).minimize)
.tensor()
)
As you see TensorBuilder
s API is fluent, meaning that you can keep chaining methods to build the computation.
The Builder class
The first thing we should talk about when reviewing this code is the Builder
class. When we executed
tb
.build(x)
we created a Builder
that holds our input Tensor x
. Having our Builder
we proceeded to use the methods
.sigmoid_layer(2)
.linear_layer(1)
If the acronym "What You Read is Mostly What You Get (WYRMWYG)" were a thing, this code would be it. Its telling you that the input is connected to a layer of 2 sigmoid units, and then this is connected to a layer of 1 linear unit. You might be wondering where do these methods come from? Or what kinds of methods are there?
Method Families
TensorBuilder
decided to become a library that doesn't implement the core methods that actually deal with Tensors. Instead it has some class methods to register instance methods and during import we actually include a bunch of functions from other libraries (yeah we are basically just stealing other libraries for the greater good). Currently most of these methods come from the tensorflow
library, but there are also some from tflearn
. The the current practice is the following
- The function
tf.contrib.layers.fully_connected
is a very special function that is registered as a method of this class. Its importance is due to the fact that the most fundamental operations in the creation of neural networks involve creating/connecting layers. - If
f
is a funciton intf
ortf.nn
, it will most likely be registered as method of theBuilder
class. The process that registers these functions lifts them from being functions that accept aTensor
(plus some extra arguments) to functions that accept aBuilder
(plus some extra arguments). Due to this, not all methods will work as expected, an obvious example is tf.placeholder, this function is automatically included but it doesn't take a Tensor as its first parameter so it doesn't make any sense a method of this class. Right now the current policy of which of these functions are include/exclude is a blacklist approach so that only functions that are known to cause serious problems (like having the same name as basic methods) are excluded and all the functions you are likely going to use are included. - Based on point 1 and 2, the next set of function are defined as: if
f
is a function intf
ortf.nn
with namefname
, then the methodfname_layer
exists in this class. These methods usefully_connected
andf
to create a layer withf
as its activation function. While you don't REALLY need them,.softmax_layer(5)
reads much better than.fully_connected(5, activation_fn=tf.nn.softmax)
.
Using the methods
So we used the methods .sigmoid_layer(2)
and .linear_layer(1)
to create our logit
. Now to create the activation
function (rather Tensor) of our network we did the following
activation = (
logit
.sigmoid()
.tensor()
)
This was basically just applying tf.sigmoid
over the logit
. The method .tensor
allows us to actually get back the Tensor
inside the Builder.
The map method
Finally we created our network trainer
doing the following
trainer = (
logit
.sigmoid_cross_entropy_with_logits(y) # loss
.map(tf.train.AdamOptimizer(0.01).minimize)
.tensor()
)
Initially we just indirectly applyed the function tf.nn.sigmoid_cross_entropy_with_logits
over the login
and the target's placeholder y
, to get out our loss Tensor. But then we used a custom method from the Builder
class: map.
map
takes any function that accepts a Tensor as its first parameter (and some extra arguments), applies that function to the Tensor inside our Builder (plus the extra arguments), and returns a Builder with the new Tensor. In this case our function was the unbounded method minimize
of the AdamOptimizer
instace (created in-line) that expect a loss Tensor and returns a Tensor that performs the computation that trains our network.
The thing is, given that we have map
we actually don't REALLY need most of the other methods! We could e.g. have written the initial structure of our network like this
logit = (
tb
.build(x)
.map(tf.contrib.layers.fully_connected, 2, activation_fn=tf.nn.sigmoid)
.map(tf.contrib.layers.fully_connected, 1, activation_fn=None)
)
instead of
logit = (
tb
.build(x)
.sigmoid_layer(2)
.linear_layer(1)
)
but as you see the latter is more compact and readable. The important thing is that you understand that you can use map
to incorporate functions not registered in the Builder class naturally into the computation.
Training
Finally, given that we have constructed the trainer
and activation
Tensors, lets use regular TensorFlow operations to trainer the network. We will train for 2000 epochs using full batch training (given that we only have 4 training examples) and then print out the prediction for each case of the XOR using the activation
Tensor.
# create session
sess = tf.Session()
sess.run(tf.initialize_all_variables())
# train
for i in range(2000):
sess.run(trainer, feed_dict={x: X, y: Y})
# test
for i in range(len(X)):
print "{0} ==> {1}".format(X[i], sess.run(activation, feed_dict={x: X[i:i+1,:]}))
Congratulations! You have just solved the XOR problem using TensorBuilder. Not much of a feat for a serious Machine Learning Engineer, but you have the basic knowledge of the TensorBuilder API.
What's Next?
In the next chapters you will learn how to create branched neural networks (important in many architectures), use scoping mechanisms to specify some attributes about the Tensor we build, and explore the Domain Specific Language (DSL) using all the previous knowledge to enable you to code even faster.