Tldr
Hyperparameter tuning is the process of finding the best combination of hyperparameters (e.g., learning rate, number of layers, batch size) that results in the optimal performance of a model on a validation set. Unlike model parameters (e.g., weights), hyperparameters are not learned during training and must be manually specified or optimized.
Why is it important?
The choice of hyperparameters can dramatically affect model performance. Poor choices can lead to underfitting, overfitting, or slow convergence.
Tuning with Optuna
import optuna
def objective(trial):
# Define search space
lr = trial.suggest_float("lr", 1e-5, 1e-2, log=True)
dropout = trial.suggest_float("dropout", 0.1, 0.5)
hidden_size = trial.suggest_int("hidden_size", 64, 512)
model = MyModel(hidden_size=hidden_size, dropout=dropout)
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
val_loss = train_and_evaluate(model, optimizer)
return val_loss
study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=50)
print("Best params:", study.best_params)