Executive Summary : | The aim of this proposed research work is to design a natural language based retrieval systems for objects from a given dataset. These objects could be generic in nature or specific to an application. Typical such applications could be in surveillance wherein person or vehicles are the potential objects of interest. In particular, many public places, both crowded and isolated, it is necessary to deploy a system to spot a person of interest in a wide area monitored under a CCTV network. Quite often the data is analysed once the event of interest has happened. However, post-hoc analysis from such vast data is hard to scrape to find the desired person. More so when there are multiple cameras and long footages. In this scenario, a text description of the person can be used to identify the person from different available footages across a time span. The proposed system can take a text based description of the person of interest and match the feature representation of this description with the representation of the human images in the given dataset. The system then displays the images of the top ranked matches. |