Neural Networks

This is non-comprehensive but a useful reference for me.
Adaptive & never-ending (i.e. I add to it when I come across new things I need to remember)

Layers

Layer Type	Description	Use Case
Dense	Fully connected	General purpose
Convolutional	Filters spatial data	Image or video processing
Pooling	Reduces dimensions	Downsamples features
Recurrent	Processes sequences	Time series or text
LSTM	Long-term dependencies	Complex sequence learning
GRU	Simplified LSTM	Faster sequence processing
Embedding	Maps discrete input	Word representations for text
Dropout	Prevents overfitting	Regularization

Activation Functions

Name	Usage
ReLU	Adds non-linearity, efficient, solves vanishing gradient
Sigmoid	Maps to probability, for binary classification output
Tanh	Maps values between -1 and 1, used in hidden layers

Dropout

Parameter	Description	Impact
Dropout Rate	Neuron drop probability.	Balances overfitting.
Spatial Dropout	Drops feature maps.	For CNN independence.
Alpha Dropout	Maintains stats with SELU.	For self-normalizing networks.
Gaussian Dropout	Applies Gaussian noise.	Alternative regularization.

Regularization

Name	Usage
Dropout	Prevents overfitting by omitting units randomly during training
L1/L2 Regularization	Penalizes larger weights, encourages simpler models

Loss function

Name	Usage
Cross-Entropy	For classification, measures probability differences
Mean Squared Error	For regression, measures average squared errors

Optimizer

Name	Usage
SGD	Simple, good for large datasets, needs learning rate tuning
Adam	Adapts learning rate per parameter, fast convergence, less tuning
RMSprop	Adjusts learning rate based on recent gradients, good for RNNs
AdaGrad	Adapts learning rate to parameters, good for sparse data

Metrics

Metric	Description
Accuracy	Measures the number of correct predictions divided by the total number of predictions.
Precision	Measures the number of true positives divided by the number of true positives and false positives.
Recall	Measures the number of true positives divided by the number of true positives and false negatives.
F1 Score	Harmonic mean of precision and recall, balances the trade-off between them.
AUC	Area Under the Curve for the ROC, measures the ability of a classifier to distinguish classes.
Mean Squared Error	Average squared difference between the estimated values and actual value, used for regression.

Model Configuration

(Keras specific)

Type	Usage
Sequential	For linear stacks of layers where each layer has exactly one input tensor and one output tensor.
Functional API	For more complex architectures, allows creation of models with non-linear topology, shared layers, and even multiple inputs or outputs.