Neural Networks
This is non-comprehensive but a useful reference for me.
Adaptive & never-ending (i.e. I add to it when I come across new things I need to remember)
Layers
Layer Type | Description | Use Case |
---|
Dense | Fully connected | General purpose |
Convolutional | Filters spatial data | Image or video processing |
Pooling | Reduces dimensions | Downsamples features |
Recurrent | Processes sequences | Time series or text |
LSTM | Long-term dependencies | Complex sequence learning |
GRU | Simplified LSTM | Faster sequence processing |
Embedding | Maps discrete input | Word representations for text |
Dropout | Prevents overfitting | Regularization |
Activation Functions
Name | Usage |
---|
ReLU | Adds non-linearity, efficient, solves vanishing gradient |
Sigmoid | Maps to probability, for binary classification output |
Tanh | Maps values between -1 and 1, used in hidden layers |
Dropout
Parameter | Description | Impact |
---|
Dropout Rate | Neuron drop probability. | Balances overfitting. |
Spatial Dropout | Drops feature maps. | For CNN independence. |
Alpha Dropout | Maintains stats with SELU. | For self-normalizing networks. |
Gaussian Dropout | Applies Gaussian noise. | Alternative regularization. |
Regularization
Name | Usage |
---|
Dropout | Prevents overfitting by omitting units randomly during training |
L1/L2 Regularization | Penalizes larger weights, encourages simpler models |
Loss function
Name | Usage |
---|
Cross-Entropy | For classification, measures probability differences |
Mean Squared Error | For regression, measures average squared errors |
Optimizer
Name | Usage |
---|
SGD | Simple, good for large datasets, needs learning rate tuning |
Adam | Adapts learning rate per parameter, fast convergence, less tuning |
RMSprop | Adjusts learning rate based on recent gradients, good for RNNs |
AdaGrad | Adapts learning rate to parameters, good for sparse data |
Metrics
Metric | Description |
---|
Accuracy | Measures the number of correct predictions divided by the total number of predictions. |
Precision | Measures the number of true positives divided by the number of true positives and false positives. |
Recall | Measures the number of true positives divided by the number of true positives and false negatives. |
F1 Score | Harmonic mean of precision and recall, balances the trade-off between them. |
AUC | Area Under the Curve for the ROC, measures the ability of a classifier to distinguish classes. |
Mean Squared Error | Average squared difference between the estimated values and actual value, used for regression. |
Model Configuration
(Keras specific)
Type | Usage |
---|
Sequential | For linear stacks of layers where each layer has exactly one input tensor and one output tensor. |
Functional API | For more complex architectures, allows creation of models with non-linear topology, shared layers, and even multiple inputs or outputs. |