• 0 Posts
  • 51 Comments
Joined 2 years ago
cake
Cake day: July 1st, 2023

help-circle








  • I’ve never built or trained an LLM, but I did get to play around with some smaller neural networks. My neural networks professor described this phenomenon as “overfitting”: When a model is trained too long on a dataset that’s too small for the model, it will sort of cheat by “memorizing” arbitrary details in the training dataset (flaws in the image like compression artifacts, minor correlations only coincidentally found in the training dataset, etc) to improve evaluation performance on the training dataset.

    The problem is, because the model is now getting hung-up analyzing random details of the training dataset instead of the broad strokes that actually define the subject matter, the evaluation results on the validation and testing datasets will suffer. My understanding is that the effect is more pronounced when a model is oversized for its data because a smaller model wouldn’t have the “resolution” to overanalyze like that in the first place

    Here’s an example I found from my old homework of a model that started to become overfit after the 5th epoch or so:

    By the end of the training session, accuracy on the training dataset is pushing the high 90%'s, but never broke 80% on validation. Training vs validation loss diverged even more extremely. It’s apparent that whatever the model learned to get that 90-something percent in training isn’t broadly applicable to the subject matter, so this model is considered “overfit” to its training data.










  • it doesn’t matter which one you choose

    That’s not really true though, every instance has it’s own rules, and it’s own federation policies, not to mention the other instances that don’t federate with it.

    I’m already on lemmy, so it’s not like I haven’t gone through this before, yet I still haven’t made a pixelfed account despite being interested because I don’t want to just go for the biggest instance and I have no idea how to vet the other ones.