Model Detail
This code snippet sets up a neural network model using Keras (a NN API for Tensorflow and some other ML platforms)
model = Sequential()
model.add(Dense(128, input_dim=len(sensor_columns), activation='relu', kernel_regularizer='l2'))
model.add(Dropout(0.3))
model.add(Dense(64, activation='relu', kernel_regularizer='l2'))
model.add(Dense(len(np.unique(y)), activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
(This is an excerpt, for a breakdown of the full python environment, see: Machine Learning Pipeline )
Line by line discussion
Building the model
model = Sequential()
This line initializes a new sequential model. In Keras, a Sequential
model is a linear stack of layers. It's called sequential because it allows you to build the models layer after layer.
model.add(Dense(128, input_dim=len(sensor_columns), activation='relu', kernel_regularizer='l2'))
model.add()
: This method adds a layer to the model.Dense(128, ...)
: This is adding a densely-connected (also known as fully connected) neural network layer. The128
indicates that this layer will have 128 neurons.input_dim=len(sensor_columns)
: This is the size of the input layer (the number of input neurons), which must be the same as the number of features in the dataset. In our case, it is 12, because there are 12 capacitive touch channels on the touch sensing board.activation='relu'
: This is the activation function for the layer.relu
stands for rectified linear unit- commonly used in neural networks. It is defined as $f(x)=max(0,x)$.kernel_regularizer='l2'
: This parameter is applying L2 regularization; this helps prevent the model from overfitting by penalizing large weights. L2 regularization will add a penalty equal to the square of the magnitude of coefficients.
model.add(Dropout(0.3))
Dropout(0.3)
: This layer randomly sets a fraction0.3
of input units to 0 at each update during training to help prevent overfitting. (The specific parameter here (0.3
) means 30% of the neurons will be dropped out per training update.)
model.add(Dense(len(np.unique(y)), activation='softmax'))
Dense(len(np.unique(y)), ...)
: This is the output layer of the network. The number of neurons is equal to the number of unique classes iny
(wherey
is the array of target labels.) This layer will output probabilities of the input being in each class.activation='softmax'
: The softmax activation function, used here on the output layer, will convert the output into probability distribution over the target classes. The class with the highest probability will be the model's prediction.
Compiling
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.compile()
: configures the model for training.loss='sparse_categorical_crossentropy'
: This is the loss function used when the target classes are integers. It is a variant of categorical crossentropy that is more efficient when the targets are categorical.optimizer='adam'
: Adam is an optimization algorithm that can be used instead of the classical stochastic gradient descent procedure to update network weights iteratively based on training data.metrics=['accuracy']
: This tells the model to evaluate its performance in terms of accuracy during training and testing. Accuracy is the fraction of predictions our model got right.
Summary
By the time all of this
model = Sequential()
model.add(Dense(128, input_dim=len(sensor_columns), activation='relu', kernel_regularizer='l2'))
model.add(Dropout(0.3))
model.add(Dense(64, activation='relu', kernel_regularizer='l2'))
model.add(Dense(len(np.unique(y)), activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
is put together, we have (in more easily readable terms): a neural network with three dense layers, a dropout for regularization, and L2
regularization applied to the weights. It is compiled for a classification task, with y
values (output nodes), where y
was encoded to represent the number of unique labels in our set. It uses the relu
activation function for hidden layers and softmax
for the output layer. It is trained using the adam
optimizer and sparse categorical crossentropy loss, and accuracy is the metric used to evaluate it.