Tldr

Hyperparameter tuning is the process of finding the best combination of hyperparameters (e.g., learning rate, number of layers, batch size) that results in the optimal performance of a model on a validation set. Unlike model parameters (e.g., weights), hyperparameters are not learned during training and must be manually specified or optimized.

Why is it important?

The choice of hyperparameters can dramatically affect model performance. Poor choices can lead to underfitting, overfitting, or slow convergence.

Tuning with Optuna

import optuna
 
def objective(trial):
    # Define search space
    lr = trial.suggest_float("lr", 1e-5, 1e-2, log=True)
    dropout = trial.suggest_float("dropout", 0.1, 0.5)
    hidden_size = trial.suggest_int("hidden_size", 64, 512)
 
    model = MyModel(hidden_size=hidden_size, dropout=dropout)
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    
    val_loss = train_and_evaluate(model, optimizer)
    return val_loss
 
study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=50)
 
print("Best params:", study.best_params)