projects
Large-scale Text Summarization
Summarize the Web
Worked on Large Language Models(LLMs) for Large-scale Text Summarization powering products used by billions everyday.
Project launched at Apple WWDC2024.
Open-domain Question Answering
Answering all your Questions
Worked on developing ML/NLP models to serve most relevant Answer to your Question and ensure Siri answers are based on most Authoritative sources.
Natural Language Understanding (NLU) for Question Answering
Developed NLU models for Question-Answering in Siri
Developed NLU models for answering knowledge seeking questions to Siri.
Specifically, worked on Multi-task Neural models for NLU.
SmartCompose
Worked on Neural Language Generation for Microsoft SmartCompose
Joint work with Chris Quirk, Peter Bailey and others at Microsoft AI Research
Developed and shipped a text-generation-feature to automatically complete emails in Microsoft Outlook based on what user has typed so far and context of the email.
Specifically, developed Neural Language Models for reranking and text generation- using prior context and additional signals from emails.
Search Query Entity Tagger for LinkedIn Search
Developed CRF based query tagger for LinkedIn Search
Before this project, LinkedIn search was using a Hidden Markov Model(HMM) based query tagger.
I developed a vital component in Search Query Understanding Pipeline that extracts LinkedIn ecosystem entities from
your search query using Conditional Random Fields(CRF). Implemented Conditional Random Fields(CRF) library for
LinkedIn Search Query Tagger to detect entities like Name, Company, Title, Location, Skill, Geo-location. In order
to get this tagger in production - I designed and developed end-to-end pipeline to generate training dataset using
SERP click-through chains, extract features, train CRF model and evaluate the model.
These tags are leveraged in downstream components in Query Understanding pipeline to provide most relevant Search
Results to users.
Detecting Knowledge worth ingesting for Bing Knowledge Graph
Developed a NLP/ML framework for Bing's Knowledge Graph that is helping selectively ingest knowledge
from the web
Joint work with Silviu Cucerzan at Microsoft AI Research
Worked on creating a NLP/ML framework for detecting whether information extracted from crowdsourced
knowledge platforms like Wikipedia, Reddit is worth ingesting at any given moment of time. This project was crucial
component in Satori- Bing' Knowledge graph as it checked every single knowledge piece getting ingested and preventing ingestion of misinformation, ephemeral and vandalism content from entering Satori Knowledge graph. This component is filtering 100s of millions of knowledge deltas in production and is helping selectively ingest knowledge and continuously grow knowledge graph.
Machine Learned Ranking for Bing and Office365
Developed and shipped ML rankers for Bing and search in Office 365 products
You can learn more about one of the project here :
https://blog.linkedin.com/2017/september/250/adding-linkedin_s-profile-card-on-office-365-offers-a-simple-way
CMU Never-Ending-Language-Learner(NELL)
Worked on a component in project NELL
Natural Language Processing framework to detect glosses from large web corpus like Wikipedia and ClueWeb. The core of the framework is based on the filters,
transformations, parsers, feature extractors, samplers and modelers in easy-to-use extensible framework design. This enriches NELL's knowledge.
Worked with Prof. William Cohen as advisor. You can read more about this project at: http://rtw.ml.cmu.edu/rtw/
One Laptop per Child
open source contributor for OLPC laptop's sugar desktop environment
Sugar Desktop Environment is being developed for One
Laptop
Per Child project in collaboration with SugarLabs. My goal was to develop Sugar Activities that makes learning experience fun on XO laptops.
As part of this effort, I have developed to:
- Wikipedia Hindi - Wikipedia in Hindi for Sugar.
- DevelopWeb - it is an Activity for Web Development using which children can develop Web Sites through HTML, Javascript and other web technologies. Children can learn quickly how to develop web pages in a step by step approach through examples provided for each HTML component.
- Oopsy is a Sugar activity that will allow children to develop C/C++ programs, compile them and execute them to learn, explore and have fun!
- Project Bhagmalpur : worked with Anish Mangal and Dr. Sameer Verma and Gonzalo Odiard to deploy XSCE school server at Bhagmalpur, India.