Executive Summary : | Large-scale datasets are one of the driving factors behind the success of deep learning. Large, labelled datasets have enabled models to achieve close to human-level performance for applications such as visual recognition. These datasets are often carefully curated to have equal number of training samples for each class. However, in practice, many real-world computer vision tasks (e.g., autonomous vehicle navigation) have to rely on datasets that are often heavily imbalanced. That means, different classes will have different number of samples. In other words, a few classes populate most of the training data, whereas a large number of classes contribute very small training data, leading to long-tailed label distribution. Classes of the first kind are referred to as head or majority and the second kind as the tail or minority classes. Deep learning classifiers and generative models (e.g., Generative Adversarial Networks) trained on long-tailed datasets are often observed to generalize poorly, particularly on minority classes. Dominance of majority classes and overfitting to the scarce minority class samples lead to inferior performance of the model. In summary, deep learning models trained on long-tailed datasets incur significant gap between the performance on majority and minority classes. Broad goal of this project is to reduce this performance gap. As part of this project, we aim to (i) study training of deep learning classifiers and generative models on long-tailed datasets for computer vision, (ii) understand the shortcomings of existing training paradigms, and (iii) based on these findings, formulate hypotheses and develop novel techniques to improve the model performance. In this regard, we explore two major directions: (i) Learning effective representations, and (ii) Pattern sharing across classes. Note that these directions complement each other. Former investigates from the model training perspective for reducing majority class dominance. Latter explores the dataset itself for sharable patterns among classes thereby reducing the data imbalance. Experimental evaluation requires us to work with multiple benchmark computer vision datasets tailored to be long-tailed. Effectiveness of the proposed techniques will be examined by training models on computer vision tasks such as object recognition and comparing their performance against that of the existing works. Given the long-tailed nature of the real-world datasets, deep learning techniques that are robust to data imbalance are of huge practical importance. Deep learning techniques are applied in diverse scientific domains and all of which are naturally prone to data imbalance. That way, the findings of this work are relevant to diverse deep learning practitioners. Moreover, fundamental understanding of these relatively young but sophisticated systems in the face of long-tailed training datasets has been limited. Hence, research and solutions in this regard are of paramount importance. |