Model Detail

This code snippet sets up a neural network model using Keras (a NN API for Tensorflow and some other ML platforms)

model = Sequential()
model.add(Dense(128, input_dim=len(sensor_columns), activation='relu', kernel_regularizer='l2'))
model.add(Dropout(0.3))
model.add(Dense(64, activation='relu', kernel_regularizer='l2'))
model.add(Dense(len(np.unique(y)), activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

(This is an excerpt, for a breakdown of the full python environment, see: Machine Learning Pipeline )

Line by line discussion

Building the model

model = Sequential()

This line initializes a new sequential model. In Keras, a Sequential model is a linear stack of layers. It's called sequential because it allows you to build the models layer after layer.

model.add(Dense(128, input_dim=len(sensor_columns), activation='relu', kernel_regularizer='l2'))
  • model.add(): This method adds a layer to the model.
  • Dense(128, ...): This is adding a densely-connected (also known as fully connected) neural network layer. The 128 indicates that this layer will have 128 neurons.
  • input_dim=len(sensor_columns): This is the size of the input layer (the number of input neurons), which must be the same as the number of features in the dataset. In our case, it is 12, because there are 12 capacitive touch channels on the touch sensing board.
  • activation='relu': This is the activation function for the layer. relu stands for rectified linear unit- commonly used in neural networks. It is defined as $f(x)=max(0,x)$.
  • kernel_regularizer='l2': This parameter is applying L2 regularization; this helps prevent the model from overfitting by penalizing large weights. L2 regularization will add a penalty equal to the square of the magnitude of coefficients.
model.add(Dropout(0.3))
  • Dropout(0.3): This layer randomly sets a fraction 0.3 of input units to 0 at each update during training to help prevent overfitting. (The specific parameter here (0.3) means 30% of the neurons will be dropped out per training update.)
model.add(Dense(len(np.unique(y)), activation='softmax'))
  • Dense(len(np.unique(y)), ...): This is the output layer of the network. The number of neurons is equal to the number of unique classes in y (where y is the array of target labels.) This layer will output probabilities of the input being in each class.
  • activation='softmax': The softmax activation function, used here on the output layer, will convert the output into probability distribution over the target classes. The class with the highest probability will be the model's prediction.

Compiling

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
  • model.compile(): configures the model for training.
  • loss='sparse_categorical_crossentropy': This is the loss function used when the target classes are integers. It is a variant of categorical crossentropy that is more efficient when the targets are categorical.
  • optimizer='adam': Adam is an optimization algorithm that can be used instead of the classical stochastic gradient descent procedure to update network weights iteratively based on training data.
  • metrics=['accuracy']: This tells the model to evaluate its performance in terms of accuracy during training and testing. Accuracy is the fraction of predictions our model got right.

Summary

By the time all of this

model = Sequential()
model.add(Dense(128, input_dim=len(sensor_columns), activation='relu', kernel_regularizer='l2'))
model.add(Dropout(0.3))
model.add(Dense(64, activation='relu', kernel_regularizer='l2'))
model.add(Dense(len(np.unique(y)), activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

is put together, we have (in more easily readable terms): a neural network with three dense layers, a dropout for regularization, and L2 regularization applied to the weights. It is compiled for a classification task, with y values (output nodes), where y was encoded to represent the number of unique labels in our set. It uses the relu activation function for hidden layers and softmax for the output layer. It is trained using the adam optimizer and sparse categorical crossentropy loss, and accuracy is the metric used to evaluate it.