Hi!

I'm a Research Fellow at Microsoft Research India. I am Advised by Dr. Sunayana Sitaram, Dr. Satya Lokam and Dr. Navin Goyal. Currently, I am working on a) Multilingual Parameter Efficient Finetuning of LLMs b)Evaluation and Prevention of Catastrophic Forgetting in Finetuning of Multilingual LLMs , c) Language Adaption in Large Language Models and Cross Lingual Transfer and d) Multilingual Benchmarking. My primary focus is on LLM finetuning and augmenting LLM capabilities to new languages and domains through the use of efficient and modular deep learning techniques.

Before this I worked with Dr. Vivek Gupta from ASU, Dr. Anoop Kunchukuttan from AI4Bharat and Dr. Ashwini Vaidya from IIT Delhi on building benchmarking datasets and finetuning multilingual language models for indic languages.

I also held positions at Amex AI Labs and Builder.ai as an AI Researcher and a Data Scientist. During my tenure, I focused on refining language models for credit and fraud risk, as well as enhancing customer service use cases. I graduated from Delhi Technological University (Formerly DCE) in 2021.

Please feel free to reach out to me over my email if you have any questions regarding my research.

Publications

Exploring continual fine-tuning for enhancing language ability in large language model
Divyanshu Aggarwal*, Sankarshan Damle*, Navin Goyal, Satya Lokam, Sunayana Sitaram
Arxiv Preprint
PDF

Improving Self Consistency in LLMs through Probabilistic Tokenization
Ashutosh Sathe*, Divyanshu Aggarwal*, Sunayana Sitaram
Accepted to NAACL 2025 Findings
PDF

Exploring Pretraining via Active Forgetting for Improving Cross Lingual Transfer for Decoder Language Models
Divyanshu Aggarwal*, Ashutosh Sathe*, Sunayana Sitaram
Accepted to NAACL 2025 Main
PDF

MAPLE: Multilingual Evaluation of Parameter Efficient Finetuning of Large Language Models
Divyanshu Aggarwal*, Ashutosh Sathe*, Ishaan Watts, Sunayana Sitaram
Findings of the Association for Computational Linguistics: ACL 2024
PDF

MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks
Sanchit Ahuja, Divyanshu Aggarwal, Varun Gumma, Ishaan Watts, Ashutosh Sathe, Millicent Ochieng, Rishav Hada, Prachi Jain, Maxamed Axmed, Kalika Bali, Sunayana Sitaram
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
PDF Code

Evaluating Inter-Bilingual Semantic Parsing for Indian Languages
Divyanshu Aggarwal, Vivek Gupta, Anoop Kunchukuttan
Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023)
PDF| Website

Xinfotabs: Evaluating multilingual tabular natural language inference
Bhavnick Minhas, Anant Shankhdhar, Vivek Gupta, Divyanshu Aggarwal, Shuo Zhang
Proceedings of the Fifth Fact Extraction and VERification Workshop (FEVER)
PDF| Website

IndicXNLI: Evaluating multilingual inference for indian languages
Divyanshu Aggarwal, Vivek Gupta, Anoop Kunchukuttan
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
PDF| Website

A Review of Deep Learning Techniques for Protein Function Prediction
Divyanshu Aggarwal, Yasha Hasija
IEEE 2nd International Conference for Emerging Technology (INCET) 2021
PDF

Fine-tuning distributional semantic models for closely-related languages
Kushagra Bhatia, Divyanshu Aggarwal, Ashwini Vaidya
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects
PDF

Experience

Microsoft Research Lab, India
September 2023 - Present | Full-time

-- Project VeLLM Project VeLLM .
-- Working on Finetuning Multilingual LLMs, Continual learning for Multilingual LLMs and Language Adaptation.

Amex AI Labs, India
September 2022 - September 2023 | Full Time

-- ScamBERT: finetuned a BERT Model to classify scam call logs vs fraud call logs complaints.
-- LLM Finetuning: finetuned LLaMA-2 to recommend best response to customer support staff in a web chat.

Builder.ai, India
March 2021 - May 2022 | Full time

-- Natasha: Worked on text based recommendation and conversation orchestration for Natasha Cockpit
-- Intent Classification: built an in house intent classification model for indentifying intent behind customer utterances in a video call.

Updates

Jan 2025 Improving Self Consistency in LLMs through Probabilistic Tokenization has been accepted in NAACL 2025 findings!
Jan 2025 Exploring Pretraining via Active Forgetting for Improving Cross Lingual Transfer for Decoder Language Models has been accepted in NAACL 2025 Main!
May 2024 MAPLE:Evaluating Multilingual Parameter Efficient Finetuning in Large Language Models has been accepted in ACL 2024!
Mar 2024 MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks has been accepted in NAACL Main 2024!
Jan 2024 Preprint for MAPLE:Evaluating Multilingual Parameter Efficient Finetuning in Large Language Models is now available!
Nov 2023 Preprint for MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks is now available!
Sep 2023 I will be joining Microsoft Research India as a Research Fellow with Sunayana Sitaram!
May 2023 Our work Evaluating Inter-Bilingual Semantic Parsing for Indic Languages is accepted in NLP4AI Workshop Co-located with ACL 2023!
Oct 2022 Our Work IndicXNLI is accepted in EMNLP 2022 Main!