Publications | Vishal Anand

Highlight paper:

LREC, NLPIARPA
MultiSeg: Parallel Data and Subword Information for Learning Bilingual Embeddings in Low Resource Scenarios

Efsun Sarioglu Kayi *, Vishal Anand *, and Smaranda Muresan

In Proceedings of LREC, SLTU-CCURL, Marseille, France, , May 2020

Abs Bib PDF Code Demo Funding

Distributed word embeddings have become ubiquitous in natural language processing as they have been shown to improve performance in many semantic and syntactic tasks. Popular models for learning cross-lingual word embeddings do not consider the morphology of words. We propose an approach to learn bilingual embeddings using parallel data and subword information that is expressed in various forms, i.e. character n-grams, morphemes obtained by unsupervised morphological segmentation and byte pair encoding. We report results for three low resource morphologically rich languages (Swahili, Tagalog, and Somali) and a high resource language (German) in a simulated a low-resource scenario. Our results show that our method that leverages subword information outperforms the model without subword information, both in intrinsic and extrinsic evaluations of the learned embeddings. Specifically, analogy reasoning results show that using subwords helps capture syntactic characteristics. Semantically, word similarity results and intrinsically, word translation scores demonstrate superior performance over existing methods. Finally, qualitative analysis also shows better-quality cross-lingual embeddings particularly for morphological variants in both languages.
@inproceedings{sarioglu-kayi-etal-2020-multiseg, title = {{M}ulti{S}eg: Parallel Data and Subword Information for Learning Bilingual Embeddings in Low Resource Scenarios}, author = {Sarioglu Kayi, Efsun and Anand, Vishal and Muresan, Smaranda}, editor = {Beermann, Dorothee and Besacier, Laurent and Sakti, Sakriani and Soria, Claudia}, booktitle = {Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)}, month = may, year = {2020}, address = {Marseille, France}, publisher = {European Language Resources association}, url = {https://aclanthology.org/2020.sltu-1.13}, pages = {97--105}, language = {English}, isbn = {979-10-95546-35-1}, }

Complete List:

WWW - NLPAC = 19.8%
Explicit and Implicit Semantic Ranking Framework

Xiaofeng Zhu, Thomas Lin, Vishal Anand, Matthew Calderwood, Eric Clausen-Brown, Gord Lueck, Wen-Wai Yim, and Cheng Wu

In Proceedings of the ACM Web Conference (WWW), Apr 2023

Abs Bib PDF

The core challenge in numerous real-world applications is to match an inquiry to the best document from a mutable and finite set of candidates. Existing industry solutions, especially latency-constrained services, often rely on similarity algorithms that sacrifice quality for speed. In this paper we introduce a generic semantic learning-to-rank framework, Self-training Semantic Cross-attention Ranking (sRank). This transformer-based framework uses linear pairwise loss with mutable training batch sizes and achieves quality gains and high efficiency, and has been applied effectively to show gains on two industry tasks at Microsoft over real-world large-scale data sets: Smart Reply (SR) and Ambient Clinical Intelligence (ACI). In Smart Reply, sRank assists live customers with technical support by selecting the best reply from predefined solutions based on consumer and support agent messages. It achieves 11.7% gain in offline top-one accuracy on the SR task over the previous system, and has enabled 38.7% time reduction in composing messages in telemetry recorded since its general release in January 2021. In the ACI task, sRank selects relevant historical physician templates that serve as guidance for a text summarization model to generate higher quality medical notes. It achieves 35.5% top-one accuracy gain, along with 46% relative ROUGE-L gain in generated medical notes.
@inproceedings{10.1145/3543873.3584621, author = {Zhu, Xiaofeng and Lin, Thomas and Anand, Vishal and Calderwood, Matthew and Clausen-Brown, Eric and Lueck, Gord and Yim, Wen-Wai and Wu, Cheng}, title = {Explicit and Implicit Semantic Ranking Framework}, year = {2023}, month = apr, isbn = {9781450394192}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3543873.3584621}, doi = {10.1145/3543873.3584621}, booktitle = {Proceedings of the ACM Web Conference (WWW)}, pages = {326–330}, numpages = {5}, location = {Austin, TX, USA}, series = {WWW '23}, }
US PatentNLP - Context

Training a Learning-to-Rank Model using a Linear Difference Vector

Xiaofeng Zhu, Vishal Anand, Thomas Lin, Matthew Calderwood, Eric Clausen-Brown, Gord Lueck, Wen-Wai Yim, Anuj Jain, Andres D’Elia, and Cheng Wu

In 152 PCT countries, 18/182,149., Mar 2023

Abs

Will be made public by USPTO in 2024

US PatentNLP - NLG

Natural Language Generation: Building and Using Target-Based Sentiment Models

Vishal Anand, Pramodith Ballapuram, Ananya Mishra, and Cheng Wu

In 152 PCT countries, 17/981,293., Nov 2022

Abs

Will be made public by USPTO in 2023
Multimodal-NLPLeaderboard #1NIST Invited Talk

Semantic Undersanding and Evolving Interaction Tracking in Long-form Multimodal Datasets

Vishal Anand, Yifei Dong, Raksha Ramesh, Zifan Chen, Yun Chen, Linquan Li, and Ching-Yung Lin

In Proceedings of TRECVID, Dec 2022
Multimodal-NLPLeaderboard #1
Leveraging Text Representation and Face-Head Tracking for Long-Form Multimodal Semantic Relation Understanding

Raksha Ramesh *, Vishal Anand *^†, Zifan Chen, Yifei Dong, Yun Chen, and Ching-Yung Lin

In Proceedings of the 30th ACM International Conference on Multimedia, Oct 2022

Abs Bib PDF

In the intricate problem of understanding long-form multi-modal inputs, few key-aspects in scene-understanding and dialogue-and-discourse are often overlooked. In this paper, we investigate two such key-aspects for better semantic and relational understanding - (i). head-object-tracking in addition to usual face-tracking, and (ii). fusing scene-to-text representation with external common-sense knowledge-base for effective mapping to sub-tasks of interest. The usage of head-tracking especially helps with enriching sparse entity mapping to inter-entity conversation interactions. These methods are guided by natural language supervision on visual models, and perform well for interaction and sentiment understanding tasks.
@inproceedings{10.1145/3503161.3551611, author = {Ramesh, Raksha and Anand, Vishal and Chen, Zifan and Dong, Yifei and Chen, Yun and Lin, Ching-Yung}, title = {Leveraging Text Representation and Face-Head Tracking for Long-Form Multimodal Semantic Relation Understanding}, year = {2022}, month = oct, isbn = {9781450392037}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3503161.3551610}, doi = {10.1145/3503161.3551610}, booktitle = {Proceedings of the 30th ACM International Conference on Multimedia}, pages = {7215–7219}, numpages = {5}, keywords = {slot filling, dialogue and discourse, knowledge graphs, object tracking, natural language processing, speaker diarization, language models, intent detection}, location = {Lisboa, Portugal}, series = {MM '22}, }

NeurIPS
PreDisM: Pre-Disaster Modelling With CNN Ensembles for At-Risk Communities

Vishal Anand, and Yuki Miura

In NeurIPS 2021 Workshop on Tackling Climate Change with Machine Learning, Dec 2021

Abs Bib PDF

The machine learning community has recently had increased interest in the climate and disaster damage domain due to a marked increase in occurrences of natural hazards (e.g., hurricanes, forest fires, floods, earthquakes). However, not enough attention has been devoted to mitigating probable destruction from impending natural hazards. We explore this crucial space by predicting building-level damages on a before-the-fact basis that would allow state actors and non-governmental organizations to be best equipped with resource distribution to minimize or preempt losses. We introduce PREDISM that employs an ensemble of ResNets and fully connected layers over decision trees to capture image-level and meta-level information to accurately estimate weakness of man-made structures to disaster-occurrences. Our model performs well and is responsive to tuning across types of disasters and highlights the space of preemptive hazard damage modelling.
@inproceedings{anand2021predism, title = {PreDisM: Pre-Disaster Modelling With CNN Ensembles for At-Risk Communities}, author = {Anand, Vishal and Miura, Yuki}, booktitle = {NeurIPS 2021 Workshop on Tackling Climate Change with Machine Learning}, url = {https://www.climatechange.ai/papers/neurips2021/53}, year = {2021}, month = dec, }
Multimodal-NLPLeaderboard #1NIST Invited Talk
MultiModal Language Modelling on Knowledge Graphs for Deep Video Understanding

Vishal Anand, Raksha Ramesh, Boshen Jin, Ziyin Wang, Xiaoxiao Lei, and Ching-Yung Lin

In Proceedings of the 29th ACM International Conference on Multimedia, Oct 2021

Abs Bib PDF

The natural language processing community has had a major interest in auto-regressive [4, 13] and span-prediction based language models [7] recently, while knowledge graphs are often referenced for common-sense based reasoning and fact-checking models. In this paper, we present an equivalence representation of span-prediction based language models and knowledge-graphs to better leverage recent developments of language modelling for multi-modal problem statements. Our method performed well, especially with sentiment understanding for multi-modal inputs, and discovered potential bias in naturally occurring videos when compared with movie-data interaction-understanding. We also release a dataset of an auto-generated questionnaire with ground-truths consisting of labels spanning across 120 relationships, 99 sentiments, and 116 interactions, among other labels for finer-grained analysis of model comparisons in the community.
@inproceedings{10.1145/3474085.3479220, author = {Anand, Vishal and Ramesh, Raksha and Jin, Boshen and Wang, Ziyin and Lei, Xiaoxiao and Lin, Ching-Yung}, title = {MultiModal Language Modelling on Knowledge Graphs for Deep Video Understanding}, year = {2021}, month = oct, isbn = {9781450386517}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3474085.3479220}, doi = {10.1145/3474085.3479220}, booktitle = {Proceedings of the 29th ACM International Conference on Multimedia}, pages = {4868–4872}, numpages = {5}, keywords = {language model, scene description, intent detection, knowledge graphs, speaker diarization, slot filling, transformers}, location = {Virtual Event, China}, series = {MM '21}, }

Multimodal-NLPOral Talk

Kinetics and Scene Features for Intent Detection

Raksha Ramesh, Vishal Anand, Ziyin Wang, Tianle Zhu, Wenfeng Lyu, Serena Yuan, and Ching-Yung Lin

In Companion Publication of the 2020 International Conference on Multimodal Interaction, Oct 2020

Abs Bib PDF

We create multi-modal fusion models to predict relational classes within entities in free-form inputs such as unseen movies. Our approach identifies information rich features within individual sources – emotion, text-attention, age, gender, and contextual background object tracking. These information are absorbed and contrasted from baseline fusion architectures. These five models then showcase future research areas on this challenging problem of relational knowledge extraction from movies and free-form multi-modal input sources. We find that, generally, the Kinetics model added with Attributes and Objects beat the baseline models.

@inproceedings{10.1145/3395035.3425641,
  author = {Ramesh, Raksha and Anand, Vishal and Wang, Ziyin and Zhu, Tianle and Lyu, Wenfeng and Yuan, Serena and Lin, Ching-Yung},
  title = {Kinetics and Scene Features for Intent Detection},
  year = {2020},
  month = oct,
  isbn = {9781450380027},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3395035.3425641},
  doi = {10.1145/3395035.3425641},
  booktitle = {Companion Publication of the 2020 International Conference on Multimodal Interaction},
  pages = {135–139},
  numpages = {5},
  keywords = {information extraction, video understanding, scene detection, computer vision, activity recognition, natural language processing, multi-modal fusion, object recognition, neural networks},
  location = {Virtual Event, Netherlands},
  series = {ICMI '20 Companion},
}

Multimodal-NLPLeaderboard #3
Story Semantic Relationships from Multimodal Cognitions

Vishal Anand, Raksha Ramesh, Ziyin Wang, Yijing Feng, Jiana Feng, Wenfeng Lyu, Tianle Zhu, Serena Yuan, and Ching-Yung Lin

In Proceedings of the 28th ACM International Conference on Multimedia, , Oct 2020

Abs Bib PDF

We consider the problem of building semantic relationship of unseen entities from free-form multi-modal sources. This intelligent agent understands semantic properties by creating (1) logical segments from sources, (2) finds interacting objects, (3) infers their interaction actions using (4) extracted textual, auditory, visual, and tonal information. The conversational dialogue discourses are automatically mapped to interacting co-located objects, and fused with their Kinetic action embeddings at each scene of occurrence. This generates a combined probability distribution representation for interacting entities spanning over every semantic relation class. Using these probabilities, we create knowledge graphs capable of answering semantic queries and infer missing properties in a given context.
@inproceedings{10.1145/3394171.3416305, author = {Anand, Vishal and Ramesh, Raksha and Wang, Ziyin and Feng, Yijing and Feng, Jiana and Lyu, Wenfeng and Zhu, Tianle and Yuan, Serena and Lin, Ching-Yung}, title = {Story Semantic Relationships from Multimodal Cognitions}, year = {2020}, month = oct, isbn = {9781450379885}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3394171.3416305}, booktitle = {Proceedings of the 28th ACM International Conference on Multimedia, }, pages = {4650–4654}, numpages = {5}, }
LREC, NLPIARPA
MultiSeg: Parallel Data and Subword Information for Learning Bilingual Embeddings in Low Resource Scenarios

Efsun Sarioglu Kayi *, Vishal Anand *, and Smaranda Muresan

In Proceedings of LREC, SLTU-CCURL, Marseille, France, , May 2020

Abs Bib PDF Code Demo Funding

Distributed word embeddings have become ubiquitous in natural language processing as they have been shown to improve performance in many semantic and syntactic tasks. Popular models for learning cross-lingual word embeddings do not consider the morphology of words. We propose an approach to learn bilingual embeddings using parallel data and subword information that is expressed in various forms, i.e. character n-grams, morphemes obtained by unsupervised morphological segmentation and byte pair encoding. We report results for three low resource morphologically rich languages (Swahili, Tagalog, and Somali) and a high resource language (German) in a simulated a low-resource scenario. Our results show that our method that leverages subword information outperforms the model without subword information, both in intrinsic and extrinsic evaluations of the learned embeddings. Specifically, analogy reasoning results show that using subwords helps capture syntactic characteristics. Semantically, word similarity results and intrinsically, word translation scores demonstrate superior performance over existing methods. Finally, qualitative analysis also shows better-quality cross-lingual embeddings particularly for morphological variants in both languages.
@inproceedings{sarioglu-kayi-etal-2020-multiseg, title = {{M}ulti{S}eg: Parallel Data and Subword Information for Learning Bilingual Embeddings in Low Resource Scenarios}, author = {Sarioglu Kayi, Efsun and Anand, Vishal and Muresan, Smaranda}, editor = {Beermann, Dorothee and Besacier, Laurent and Sakti, Sakriani and Soria, Claudia}, booktitle = {Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)}, month = may, year = {2020}, address = {Marseille, France}, publisher = {European Language Resources association}, url = {https://aclanthology.org/2020.sltu-1.13}, pages = {97--105}, language = {English}, isbn = {979-10-95546-35-1}, }

US Patent

Systems and methods for preventing fraud

Vishal Anand

In 152 PCT countries, Aug 2017, Published , Feb 2019

Abs PDF

A computer - implemented method of fraud detection comprising receiving a user identification, a standard authentication key , and an alternative authentication key associated with a user. The method includes storing the standard and alternative authentication keys in a user profile associated with the user identification, and storing a contingent action corresponding to the alternative authentication key. The method includes receiving an authorization request including the user identification and an authentication input, and comparing the authentication input with the standard authentication key and the alternative authentication key in the user profile. The method includes determining that the authentication input matches the alternative authentication key. Based on the determination that the authentication input matches the alternative authentication key, the method includes initiating the contingent action stored in the user profile corresponding to the alternative authentication key. The method may include determining if the authorization request matches a third party fraud alert.

US Patent

Security approaches for virtual reality transactions

Vishal Anand

In 152 PCT countries. Filed, Jun 2016, Published , Dec 2017

Abs PDF

One embodiment of the invention is directed to a computer-implemented method comprising, receiving an indication that an avatar of a user has initiated a transaction in a virtual reality environment. The method further comprises obtaining a first biometric sample from the user interacting with the virtual reality hardware. The method further comprises generating a partial biometric template based at least in part on the first biometric sample. The method further comprises providing the partial biometric template and personal authentication information for the avatar to an authentication computer where the personal authentication information and the partial biometric template are used to authenticate the avatar.

Bachelor_Thesis

Learning Word Vector Representation in Multi-Task Framework

Vishal Anand

In Bachelor’s Thesis - IIT Guwahati, , Apr 2015

Abs PDF

Medical textual data present us with huge volumes of useful raw data, which can be pre-processed with the use of proper learning techniques. The information that is generated using the techniques can be leveraged to extract knowledge and predict varied medical correlations and conditions. Some examples of the predictions can include deciding the course of a given person’s condition in the context of the related data from the medical corpus, the correlation of varied diseases with each other, among others. To generate these analytical data, we have utilized the corpus of textual medical data to train artificial neural networks(ANNs). The ANNs have been loaded with the words from the corpus in an unsupervised fashion to train itself. The method employed utilizes the concepts of backpropagation using hidden layers. With the word vector representation generated, it can be used in varied tasks such as Parts-of-Speech tagging, Chunking, Named Entity Recognition, Semantic Role Labelling in a corpus’ data. In this part of the thesis, I have focussed on the window-based model to analyze and generate the word-vector representation so as to extend it to a multi-task framework once the vectors have been trained.

AAAI

On Optimizing Human-Machine Task Assignments

Andreas Veit, Michael Wilber, Rajan Vaish, Serge Belongie, James Davis, Vishal Anand, and 38 authors

In AAAI Conference on Human Computation and Crowdsourcing, , Nov 2015

Bib PDF

@inproceedings{veit2015optimizing,
  title = {On Optimizing Human-Machine Task Assignments},
  author = {Veit, Andreas and Wilber, Michael and Vaish, Rajan and Belongie, Serge and Davis, James and Anand, Vishal and other authors, 38},
  booktitle = {AAAI Conference on Human Computation and Crowdsourcing, },
  month = nov,
  year = {2015},
  eprint = {1509.07543},
  archiveprefix = {arXiv},
  primaryclass = {cs.HC},
}