Executive Summary : | Explainability is crucial in deep learning models, as posthoc explanation methods often focus on local interpretations and cannot explain the global behavior of the models. This project aims to introduce explainability/interpretability as a part of model development and use it as a constraint or regularizer to control the model learning process. Previous studies have not explored the relationships among words present in text, which are helpful in understanding causality and the model's decision-making process. The project plans to develop interpretable graphical models based on an annotated dataset and optimize the task and explanation jointly. A multi-task learning approach could play a significant role in training such models, as it allows exploring multiple related tasks that benefit from each other. The main focus is to explore the nonlinear structure of input and learn explanation tokens and relations among them in a joint fashion. Existing datasets contain tokens but not the relations. To overcome the scarcity of good quality explanation data, the project will explore different learning strategies, such as active, fidelity-weighted learning, to overcome the limitation of training data. The study aims to discover explanation tokens and apply a continuous learning setup to update the model iteratively. Multiple stakeholders of interpretability of machine learning models exist, including system developers who need ranked lists of training examples for bug detection and quick model error fixes, and end users who seek important features, decision paths, and text snippets/pixels responsible for prediction. The project will design a utility function to measure the importance of training instances and a metric to quantify interpretability and human experience. |