Enter property
: Transfer learning: take a great model & retrain end layers to fit it to own needs
Torbogen
: Overfitting: if model only recognizes only one set of data, force-fitting
Stairs up
: Learning rate: how quickly params are updated
Door
: Tensor: matrix n * m, use torch.set_default_dtype(torch.flat64)
2nd Door
: Backprop: calcs grad of a loss func wrt to all weights in NN (backward propagation of errors)
Main arch
: Gradient descent: iterative optimization algo for finding minima of loss func
Toilette
: Minibatch: random bunch of input points to update weights
Bathroom
: Epoch: one complete run through all data points
Basement door
: SGD: grad descent using mini batches
Hallway
: Architecture: math function to fit paarams to (y = mx + t)
Stove
: Loss Functions: how far/cloase predictions are to labels (mse, mae, cel)
Oven Adjusters
: Regularization: techniques to reduce overfitting (l1 lasso, l2 ridge)
Under Oven Adjusters
: Underfitting: too simple to explain variance, validation set is really bad
Nicer Dicer
: Tokenization: represent each word / lingu concept with one token after a text was converted
Scale
: Numericalization: take complete list of words, uniquify them, and order due to their occurence
Cook book text
: WikiText-103: 1B tokens with knowledge about the world
Cook book ingredients
: Tabular data use cases: sales forecasting, fraud detection, failure prediction, pricing
Kitchenaid
: Super Convergence: alternating cycle use low LR & high momentum & reverse
Filter
: Softmax: function where all activations add to 1 & >0
Mash
: Weight decay: subtract constant wd * weights every time a batch is processed
Walking to living Room
: Momentum: update of step is based on opposite of previous direction
Mom
: Bias: values representing features (like personal preference)
Couch
: Matrix Factorization: factors from latent variables
TV
: PCA: dimensionality reduction technique
Photo Books
: Dropout: tanking out a number of activations
Light Switch
: Activation function: ReLU or sigmoid
Dad Computer
: Code structure: data, experiments, model
Window
: Core training step:
Balcony
: output_batch = model(train_batch)
Stefan Silo
: loss = loss_fn(output_batch, labels_batch)
Street
: optimizer.zero_grad() # clear prev grads
Silo
: loss.backward() # calc grads wrt to inputs + params
Halle
: optimizer.step() # perform updates using calc’d grads
Bed
: Model.train(): before training, impacts: dropout and batch_norm
TV
: model.eval(): after training