Written By: Dylan Sam '21 Edited By: Shailen Sampath
In the past few years, Deep Learning has become increasingly popular as a solution to many real world problems. As a consequence, many different research teams in both industry and academia have adopted deep learning approaches to solving problems whenever they can find sufficiently large datasets. However, many people treat Deep Learning as a “blackbox”, not fully understanding the concepts underneath. Many researchers do not consider why deep learning and neural networks predict so accurately; in fact, even machine learning and deep learning theoreticians cannot show that deep learning models work. This is an huge, unanswered question in the field of deep learning research that many people are looking to answer.
One major issue with deep learning is overfitting. Deep Learning consists of creating neural networks, which are large models that are trained on some large amount of training data. The model can use the knowledge that it has “learned” from the training dataset to make predictions about new data. However, when some new testing data is introduced, the model may fail to predict well on these new data points, as they might be different than the training dataset. This phenomenon is overfitting. One of the main causes of overfitting, in general machine learning, is creating a model that is too specific and too large, which is known as overparameterization. However, this is one of the main attractions of Deep Learning; neural networks are so large and so deep that it can learn so many different things about a training dataset. Therefore, from a theoretical perspective, deep learning models should always be overfitting because they are so large and specific; they shouldn’t be able to generalize well to new data. Yet in practice, it is the opposite; neural networks are able to make new predictions with remarkable accuracy.
Many theoretical researchers have tried to explain this phenomena; a couple researchers at CMU tried a traditional approach to analyze deep learning generalization, which is that model will make better predictions on new data as it is given more diverse training data. Uniform convergence is a mathematical property that many theoretical machine learning scientists use to show that models can generalize well to new data if the sample of the data is randomly selected and sufficiently large. However, in even the simplest model of deep learning, “uniform convergence on this set of classifiers yields a generalization guarantee that is larger than 1 − ε and is therefore nearly vacuous” . In other words, uniform convergence fails to show that even the most basic neural network architecture can predict new data well. The publication further discusses many different ways of invoking uniform convergence and possibly changing or loosening restrictions and definitions, but to no avail. The researchers have tried many other different traditional ways to attempt to explain why deep learning can predict well on new data, but it did not have any success; the paper concludes by stating other novel directions may lead to much needed proof.
Another team of researchers at UC Berkeley tried a completely different approach; they tried to define a different concept of generalization. Their publication constructed a new theoretical perspective on generalization and how models are still effectively capable of learning and optimizing on their training data, although tradition concepts of generalization prove otherwise. In their conclusion, they stated that “reasons for why optimization is empirically easy must be different from the true cause of generalization” , or that there must be some other explanation for high accuracy of experimental results. They propose that there is some other factor, other than traditional generalization, that will allows neural nets to generalize. Both publications show a similar theme; traditional approaches to explaining generalization fail to explain the remarkable abilities of deep learning and neural networks. In order to get a better understanding, the second publication proposes another perspective. By looking at generalization in a different sense, we can possibly show a better relationship between accurate predictions and neural nets. Although steps are being made to better prove the convergence of deep learning models, researchers are still simply feeding problems into neural network architecture that produce seemingly accurate results. This is an issue, especially in the case of unsupervised deep learning, or when there is no concept of a “true” prediction. In those cases, models are producing results, which researchers can interpret with no actual understanding. This may likely be valid and neural networks, in practice, predict very well; however, I would caution the usage of them without fully understanding generalization, as many publications utilize technologies that only “possibly” work. Therefore, this giant question still remains; until we are able to produce another perspective on generalization, we will have this weak, unsure founding on research publications.
 Nagarajan, V.; Kolter, Z. Uniform convergence may be unable to explain generalization in deep learning. [Internet]. 2019 [Cited 2019 Mar 9]. 1-24. DOI: https://arxiv.org/abs/1902.04742  Zhang, C.; Bengio, S.; Hardt, M.; Recht, B.; Vinyals, O. Understanding Deep Learning Requires Rethinking Generalization. [Internet]. 2016 [Cited 2019 Mar 9]; , 1-15. DOI: https://arxiv.org/abs/1611.03530