The machine learning community has recently had increased interest in the climate and disaster damage domain due to a marked increase in occurrences of natural hazards (e.g., hurricanes, forest fires, floods, earthquakes). However, not enough attention has been devoted to mitigating probable destruction from impending natural hazards. We explore this crucial space by predicting building-level damages on a before-the-fact basis that would allow state actors and non-governmental organizations to be best equipped with resource distribution to minimize or preempt losses. We introduce PREDISM that employs an ensemble of ResNets and fully connected layers over decision trees to capture image-level and meta-level information to accurately estimate weakness of man-made structures to disaster-occurrences. Our model performs well and is responsive to tuning across types of disasters and highlights the space of preemptive hazard damage modelling.
@inproceedings{anand2021predism,bibtex_show={true},abbr={NeurIPS},title={PreDisM: Pre-Disaster Modelling With CNN Ensembles for At-Risk Communities},author={<b>Anand, Vishal</b> and Miura, Yuki},booktitle={NeurIPS 2021 Workshop on Tackling Climate Change with Machine Learning},url={https://www.climatechange.ai/papers/neurips2021/53},year={2021},pdf={https://s3.us-east-1.amazonaws.com/climate-change-ai/papers/neurips2021/53/paper.pdf}}
Multimodal-NLP
MultiModal Language Modelling on Knowledge Graphs for Deep Video Understanding
The natural language processing community has had a major interest in auto-regressive [4, 13] and span-prediction based language models [7] recently, while knowledge graphs are often referenced for common-sense based reasoning and fact-checking models. In this paper, we present an equivalence representation of span-prediction based language models and knowledge-graphs to better leverage recent developments of language modelling for multi-modal problem statements. Our method performed well, especially with sentiment understanding for multi-modal inputs, and discovered potential bias in naturally occurring videos when compared with movie-data interaction-understanding. We also release a dataset of an auto-generated questionnaire with ground-truths consisting of labels spanning across 120 relationships, 99 sentiments, and 116 interactions, among other labels for finer-grained analysis of model comparisons in the community.
@inproceedings{10.1145/3474085.3479220,bibtex_show={true},abbr={Multimodal-NLP},author={<b>Anand, Vishal</b> and Ramesh, Raksha and Jin, Boshen and Wang, Ziyin and Lei, Xiaoxiao and Lin, Ching-Yung},title={MultiModal Language Modelling on Knowledge Graphs for Deep Video Understanding},year={2021},isbn={9781450386517},publisher={Association for Computing Machinery},address={New York, NY, USA},url={https://doi.org/10.1145/3474085.3479220},booktitle={Proceedings of the 29th ACM International Conference on Multimedia},pages={4868–4872},numpages={5},pdf={https://vishalanand.net/pdf/3474085.3479220.pdf}}
We create multi-modal fusion models to predict relational classes within entities in free-form inputs such as unseen movies. Our approach identifies information rich features within individual sources – emotion, text-attention, age, gender, and contextual background object tracking. These information are absorbed and contrasted from baseline fusion architectures. These five models then showcase future research areas on this challenging problem of relational knowledge extraction from movies and free-form multi-modal input sources. We find that, generally, the Kinetics model added with Attributes and Objects beat the baseline models.
@inproceedings{10.1145/3395035.3425641,bibtex_show={true},abbr={Multimodal-NLP},author={Ramesh, Raksha and <b>Anand, Vishal</b> and Wang, Ziyin and Zhu, Tianle and Lyu, Wenfeng and Yuan, Serena and Lin, Ching-Yung},title={Kinetics and Scene Features for Intent Detection},year={2020},isbn={9781450380027},publisher={Association for Computing Machinery},address={New York, NY, USA},url={https://doi.org/10.1145/3395035.3425641},doi={10.1145/3395035.3425641},booktitle={Companion Publication of the 2020 International Conference on Multimodal Interaction},pages={135–139},numpages={5},keywords={information extraction, video understanding, scene detection, computer vision, activity recognition, natural language processing, multi-modal fusion, object recognition, neural networks},location={Virtual Event, Netherlands},series={ICMI '20 Companion},pdf={https://vishalanand.net/pdf/3395035.3425641.pdf}}
Multimodal-NLP
Story Semantic Relationships from Multimodal Cognitions
We consider the problem of building semantic relationship of unseen entities from free-form multi-modal sources. This intelligent agent understands semantic properties by creating (1) logical segments from sources, (2) finds interacting objects, (3) infers their interaction actions using (4) extracted textual, auditory, visual, and tonal information. The conversational dialogue discourses are automatically mapped to interacting co-located objects, and fused with their Kinetic action embeddings at each scene of occurrence. This generates a combined probability distribution representation for interacting entities spanning over every semantic relation class. Using these probabilities, we create knowledge graphs capable of answering semantic queries and infer missing properties in a given context.
@inproceedings{10.1145/3394171.3416305,bibtex_show={true},abbr={Multimodal-NLP},author={<b>Anand, Vishal</b> and Ramesh, Raksha and Wang, Ziyin and Feng, Yijing and Feng, Jiana and Lyu, Wenfeng and Zhu, Tianle and Yuan, Serena and Lin, Ching-Yung},title={Story Semantic Relationships from Multimodal Cognitions},year={2020},isbn={9781450379885},publisher={Association for Computing Machinery},address={New York, NY, USA},url={https://doi.org/10.1145/3394171.3416305},booktitle={Proceedings of the 28th ACM International Conference on Multimedia, },pages={4650–4654},numpages={5},pdf={https://vishalanand.net/pdf/3394171.3416305.pdf}}
LREC, NLP
MultiSeg: Parallel Data and Subword Information for Learning Bilingual Embeddings in Low Resource Scenarios
Sarioglu Kayi, Efsun *, Anand, Vishal *, and Muresan, Smaranda
In Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), May 2020
Distributed word embeddings have become ubiquitous in natural language processing as they have been shown to improve performance in many semantic and syntactic tasks. Popular models for learning cross-lingual word embeddings do not consider the morphology of words. We propose an approach to learn bilingual embeddings using parallel data and subword information that is expressed in various forms, i.e. character n-grams, morphemes obtained by unsupervised morphological segmentation and byte pair encoding. We report results for three low resource morphologically rich languages (Swahili, Tagalog, and Somali) and a high resource language (German) in a simulated a low-resource scenario. Our results show that our method that leverages subword information outperforms the model without subword information, both in intrinsic and extrinsic evaluations of the learned embeddings. Specifically, analogy reasoning results show that using subwords helps capture syntactic characteristics. Semantically, word similarity results and intrinsically, word translation scores demonstrate superior performance over existing methods. Finally, qualitative analysis also shows better-quality cross-lingual embeddings particularly for morphological variants in both languages.
@inproceedings{sarioglu-kayi-etal-2020-multiseg,bibtex_show={true},abbr={LREC, NLP},title={{M}ulti{S}eg: Parallel Data and Subword Information for Learning Bilingual Embeddings in Low Resource Scenarios},author={Sarioglu Kayi, Efsun * and <b>Anand, Vishal *</b> and Muresan, Smaranda},booktitle={Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), },month=may,year={2020},address={Marseille, France},publisher={European Language Resources association},url={https://aclanthology.org/2020.sltu-1.13},pages={97--105},language={English},isbn={979-10-95546-35-1},pdf={https://aclanthology.org/2020.sltu-1.13.pdf}}
2019
Patent
Systems and methods for preventing fraud
Anand, Vishal
In 152 PCT countries, Aug 2017, Published Feb 2019
A computer - implemented method of fraud detection comprising receiving a user identification, a standard authentication key , and an alternative authentication key associated with a user. The method includes storing the standard and alternative authentication keys in a user profile associated with the user identification, and storing a contingent action corresponding to the alternative authentication key. The method includes receiving an authorization request including the user identification and an authentication input, and comparing the authentication input with the standard authentication key and the alternative authentication key in the user profile. The method includes determining that the authentication input matches the alternative authentication key. Based on the determination that the authentication input matches the alternative authentication key, the method includes initiating the contingent action stored in the user profile corresponding to the alternative authentication key. The method may include determining if the authorization request matches a third party fraud alert.
2017
Patent
Security approaches for virtual reality transactions
Anand, Vishal
In 152 PCT countries. Filed, Jun 2016, Published Dec 2017
One embodiment of the invention is directed to a computer-implemented method comprising, receiving an indication that an avatar of a user has initiated a transaction in a virtual reality environment. The method further comprises obtaining a first biometric sample from the user interacting with the virtual reality hardware. The method further comprises generating a partial biometric template based at least in part on the first biometric sample. The method further comprises providing the partial biometric template and personal authentication information for the avatar to an authentication computer where the personal authentication information and the partial biometric template are used to authenticate the avatar.
2015
Bachelor_Thesis
Learning Word Vector Representation in Multi-Task Framework
@inproceedings{veit2015optimizing,bibtex_show={true},abbr={AAAI},title={On Optimizing Human-Machine Task Assignments},author={Veit, Andreas and Wilber, Michael and Vaish, Rajan and Belongie, Serge and Davis, James and <b>Vishal Anand</b> and other authors, 38},booktitle={AAAI Conference on Human Computation and Crowdsourcing, },month=nov,year={2015},eprint={1509.07543},archiveprefix={arXiv},primaryclass={cs.HC},pdf={https://www.humancomputation.com/2015/papers/26_Paper72.pdf}}