Executive Summary : | This project aims to improve the coverage of quantitative facts for existing entities in Knowledge Bases (KBs) by investigating state-of-the-art neural language modeling methods to identify quantitative attributes at the presence of a number in proximity. The main challenge lies in understanding quantities with the full content, i.e., extracting numbers with associated units, attributes, and the context from web documents. The first stepping stone towards this direction is to identify the different types of dimensions and associate appropriate representation schemes for them. A simplistic rule-based method is targeted for this purpose. However, similar quantities with a similar unit can appear with different types of entities, conveying different contexts. To address this problem, an efficient clustering method is explored that clusters numeric values and links them with a cluster of entity-quantitative attribute pairs. Extrinsic evaluations using trivia questions and crowdsourcing will be performed to evaluate the effectiveness of the proposed quantity representation and the extraction module. A quantity-rich KB can help in building complex and analytical question-answering systems, allowing reasoning with numbers. The proposed methods target delivering an enhanced model to better comprehend the language and can be applicable across domains, such as quantity-rich medical articles. |