Neural Networks

This is non-comprehensive but a useful reference for me.
Adaptive & never-ending (i.e. I add to it when I come across new things I need to remember)


Layer TypeDescriptionUse Case
DenseFully connectedGeneral purpose
ConvolutionalFilters spatial dataImage or video processing
PoolingReduces dimensionsDownsamples features
RecurrentProcesses sequencesTime series or text
LSTMLong-term dependenciesComplex sequence learning
GRUSimplified LSTMFaster sequence processing
EmbeddingMaps discrete inputWord representations for text
DropoutPrevents overfittingRegularization

Activation Functions

ReLUAdds non-linearity, efficient, solves vanishing gradient
SigmoidMaps to probability, for binary classification output
TanhMaps values between -1 and 1, used in hidden layers


Dropout RateNeuron drop probability.Balances overfitting.
Spatial DropoutDrops feature maps.For CNN independence.
Alpha DropoutMaintains stats with SELU.For self-normalizing networks.
Gaussian DropoutApplies Gaussian noise.Alternative regularization.


DropoutPrevents overfitting by omitting units randomly during training
L1/L2 RegularizationPenalizes larger weights, encourages simpler models

Loss function

Cross-EntropyFor classification, measures probability differences
Mean Squared ErrorFor regression, measures average squared errors


SGDSimple, good for large datasets, needs learning rate tuning
AdamAdapts learning rate per parameter, fast convergence, less tuning
RMSpropAdjusts learning rate based on recent gradients, good for RNNs
AdaGradAdapts learning rate to parameters, good for sparse data


AccuracyMeasures the number of correct predictions divided by the total number of predictions.
PrecisionMeasures the number of true positives divided by the number of true positives and false positives.
RecallMeasures the number of true positives divided by the number of true positives and false negatives.
F1 ScoreHarmonic mean of precision and recall, balances the trade-off between them.
AUCArea Under the Curve for the ROC, measures the ability of a classifier to distinguish classes.
Mean Squared ErrorAverage squared difference between the estimated values and actual value, used for regression.

Model Configuration

(Keras specific)

SequentialFor linear stacks of layers where each layer has exactly one input tensor and one output tensor.
Functional APIFor more complex architectures, allows creation of models with non-linear topology, shared layers, and even multiple inputs or outputs.