Model
Digital Document
Publisher
Florida Atlantic University
Description
Neural network models with many tunable parameters can be trained to approximate functions that transform a source distribution, or dataset, into a target distribution of interest. In contrast to low-parameter models with simple governing equations, the dynamics of transformations learned in deep neural network models are abstract and the correspondence of dynamical structure to predictive function is opaque. Despite their “black box” nature, neural networks converge to functions that implement complex tasks in computer vision, Natural Language Processing (NLP), and the sciences when trained on large quantities of data. Where traditional machine learning approaches rely on clean datasets with appropriate features, sample densities, and label distributions to mitigate unwanted bias, modern Transformer neural networks with self-attention mechanisms use Self-Supervised Learning (SSL) to pretrain on large, unlabeled datasets scraped from the internet without concern for data quality. SSL tasks have been shown to learn functions that match or outperform their supervised learning counterparts in many fields, even without task-specific finetuning. The recent paradigm shift to pretraining large models with massive amounts of unlabeled data has given credibility to the hypothesis that SSL pretraining can produce functions that implement generally intelligent computations.
Member of