Arpan's picture
Hi! 👋 I'm Arpan.

Undergraduate Researcher in NLP and GNNs


Senior in Computer Engineering (Minor in Mathematics)
University of Illinois Urbana-Champaign (UIUC)

About Me

Resume | CV | Senior Thesis | LinkedIn | Github | Transcript | Mail me | Google Scholar

I'm Arpandeep (Arpan) Khatua, a senior majoring in Computer Engineering with minor in Mathematics at University of Illinois at Urbana-Champaign (UIUC). I am passionate about building systems for curbing misinformation and improving healthcare. I interned with Facebook Inc (Menlo Park,CA) in Summer 2021 and Meta Inc (NYC) in Summer 2022 working on NLP & Fullstack Dev.

My research interest focuses broadly on graph neural networks (GNNs) and natural language processing (NLP) with an emphasis on optimizing language models, long text generation, bias-free event understanding, and information retrieval. I completed my senior thesis on Graph Neural Networks advised by Prof. Wen-mei Hwu and Dr. Vikram Sharma Mailthody. I am also working on information retrieval and ranking with Prof. Kevin Chen-Chuan Chang in the Forward Data Lab.

News

  • Reviewed paper for a conference and submitted papers! @ICWSM '23 and @KDD '23.

  • I'm joining Meta as a Software Engineer. 👨‍💻

  • Completed my senior thesis on massive GNN datasets. 🎉

Publications, Preprints and Posters

Internships 💻

Software Engineering Intern @ Meta Inc.
May 2022 - August 2022

FB Live Video Experience, managed by Mr. Gabriel Ochoa

  • Created infinite scroll comments on WWW using React, PHP, and Javascript and live polls video overviews on iOS and Android generating a sig-stat increase in FB Watch time based on A/B testing over 200+ million users.

  • Simplified E2E testing framework and documentation for internal languages used by 50+ teams. Received greatly exceeds expectation (GE) performance rating.

Software Engineering Intern @ Facebook Inc.
May 2021 - August 2021

FB Shops Ranking Team, managed by Mr. Artem Zinchenko

  • Built a product retrieval transformer model personalized to users based on their history and viewing habits. This model beats the current production retrieval model by 35%.

  • Optimized the existing collection pipelines to operate 600x faster using Hack, PHP, and Python. Created a new API to help engineers query PyTorch models which is currently used as an internal tool.

Research 💭

Undergraduate Research @ IBM-ILLINOIS Center for Cognitive Computing Systems Research (C3SR)
August 2021 - Present

IMPACT Lab, advised by Prof. Wen-mei Hwu and Dr. Vikram Sharma Mailthody

  • Generated the largest publicly available graph dataset - Illinois Graph Benchmark (IGB) with 600M nodes and 6B edges collaborating with researchers from Amazon AWS, Deep Graph Library (DGL), NVIDIA, and IBM for supervised node classification and efficient system designing.

  • Combined Microsoft Academic Graph (MAG) and Semantic Scholar databases to annotate 162× more data for supervised learning tasks to test emerging Graph Neural Network (GNN) models at scale.

Undergraduate Research
January 2022 - Present

Forward Data Lab, advised by Prof. Kevin Chen-Chuan Chang

  • Created a 2-stage attention-based seq2seq model to generate subtopics for a given title that performs better than SOTA models for fine-grain attribute tagging. Filtered out noisy web retrieved data using text-rank and trained a few-shot classifier model to classify them into the dynamically generated sub-topics.

IGL Fellow @
Illinois Geometry Lab
January 2022 - Present

Hildebrand Research Group, advised by Prof. AJ Hildebrand

  • Performed large-scale statistical analysis of the first 30 billion continued fraction (CF) digits of π using chi-square tests, p−tests, and Kolmogorov–Smirnov tests to provide compelling evidence that these digits behave like those of a random real number.

  • Implemented massive multi-dimensional space walks to compare random and π CF digits and used extreme-digit, single-digit, and z − scores for further evidence.

SPIN Intern @
National Center for Supercomputing Applications (NCSA)
September 2021 - May 2022

HathiTrust Digital Library, advised by Prof. J. Stephen Downie and Dr. Glen Layne-Worthey

  • Applied transfer learning on a Mask-RCNN based model detectron2 using the Pubmed dataset to detect 5 non-text classes in documents. Coupled with a large image classification model to detect and classify over 1000 classes of non-text objects and images on scanned documents with over 97% accuracy.

  • Used openCV for preprocessing pipeline to process over 16M volumes (5B pages) in the HathiTrust Digital Library to improve run-time by 3× on a V 100 GPU.

IGL Fellow @
Illinois Geometry Lab
August 2022 - Present

Social Computing Lab, advised by Prof. Jana Diesner

  • Using Twitter API to collect 10M tweets for detecting and prioritizing needs during crisis events using NLP (like the Russia-Ukraine conflict) in order to (1) extract a list of needed resources, (2) how they are fulfilled.

Undergraduate Research
October 2020 - May 2021

Koyejo Lab, advised by Prof. Sanmi Koyejo

  • Worked on a cross-department project to predict phenotype combinations in maize/sorghum crop genes to maxi- mize heritability using reinforcement learning (RL). Wrote classic local and global search algorithms like Particle Swarm Optimization (PSO) and Simulated Annealing (SA) to set baselines.

  • Developed a Multi-Layer Perceptron (MLP) to serve as a mapping function between wavelength and experimental ground-truth data which improved the RL search over multi-dimensional wavelength space.

Undergraduate Research @
OSF Healthcare-UIUC Jump ARCHES
June 2020 - May 2021

Health Care Engineering Systems Center (HCESC), advised by Dr. Inki Kim

  • Developed a novel operation training procedure in virtual reality (VR) using Unity with anatomically accurate physics scripts, capable of real-time rendering optimized to run without a GPU.

  • Implementing a reinforcement learning (RL) model to automate operation procedure and create a predictive model to assign probability of success using 6-dimensional c-space A*, RRT, and RRT* search algorithms.

Undergraduate Research
September 2020 - May 2021

Caesar Lab, advised by Prof. Matthew Caesar

  • Experimenting combinations of different neural net architectures for object detection (ResNets, MobileNets) and object tracking using deepSORT with an SSD trained on NCSA’s HAL cluster.

  • Implemented the Hungarian Algorithm and Kalman Filter to track and predict object position during large periods of obstruction.

Undergraduate Research @
EarthSense
October 2019 - March 2020

FRESH Lab, advised by Prof. Girish Chowdhary

  • Soldered and worked on the power source and circuit design for cameras and sensors of an agricultural bot.

  • Wrote Python scripts for autonomous path planning using OpenCV. Labeled over 5000 pictures and helped train a Mask R-CNN model to detect the path and its surrounding with less than 5% error.

Hackathons

Auto-grading on text extracted from PDF assignments with an OCR pipeline, using NLP in python with 98%+ accuracy in 1 week. Reduced auto-grading time by 50% utilizing better algorithms and libraries.

Winner @ HackIllinois

Built a custom NLP model to classify text based on mental health conditions and a web page for easier access by patients and health-care professionals with an OCR and voice to text functionality.

Winner @ HackDuke

Job search portal by scraping real time information from google and linkedin to help curb increasing unemployement rates due to COVID-19 in developiing countries.

Presented @ Hex Cambridge

Education 📚

University of Illinois Urbana-Champaign


Bachelor of Science, Computer Engineering, Minor in Mathematics
GPA - 4.00/4.00 | Transcript | Senior Thesis
Dean's List, Edmund J. James Scholar

  • CS coursework: Natural Language Processing, Artificial Intelligence, Machine Learning, Deep Learning, Algorithms & Models of Computation, Data Structures Honors, Discrete Structure, Databases.

  • ECE coursework: Analog Signal Processing, Computer Systems & Programming, Digital Signal Processing, Computer Systems Engineering, Digital Systems Laboratory.

  • Math coursework: Differential Equations Plus, Fundamental Math, Probability with Engineering, Applied Linear Algebra, Number Theory.

Course Assistant/Staff 👨‍🏫

  • Computer Systems Engineering (ECE 391): Creating new course material and internal grading scripts and conducting office hours to help students debug codes and provide machine problem overviews for intensive upper-level OS kernel-building class.

  • Probability with Engg Applications (ECE 313): Graded and provided feedback for students on weekly home-works and exams in upper-level probability and statistic class for signal processing and control systems.

  • Analog Signal Processing (ECE 210): Graded and provided feedback for students on weekly homeworks and exams for a sophomore-level class covering circuit analysis, Fourier, and Laplace transform.

  • Intro to Electronics and Computing Honors Lab (ECE 110H/ECE120H): Mentored freshmen and sophomores to work on projects involving robotics path planning, computer vision, NLP, FPGAs, and power systems in the honors lab. Conducted office hours and discussion sections for intro-level electrical and computer classes.

Awards & Honors 🏆

IBM-ILLINOIS C3SR Scholar $1,000 research scholarship, 2021

The IBM-ILLINOIS Center for Cognitive Computing Systems Research (C3SR) is a multi-year joint venture between IBM Research and the University of Illinois at Urbana-Champaign to solve the most pressing issues facing the new computing era of artificial intelligence (AI).

National Center for Supercomputing Applications Fellow $6,000 research scholarship, 2021

The National Center for Supercomputing Applications (NCSA) needs talented, creative Illinois undergraduate students to participate in hands-on research and development projects in areas including supercomputing, data analytics, visualization, and more. Students Pushing Innovation (SPIN) fellows work on a research project for one academic year.

O. Thomas and Martha S. Purl Scholarship $3,400 scholarship, 2022

Awarded to 2 students out of 2000 to recognize outstanding students in Electrical and Computer Engineering at UIUC. These scholarships are given by O. Thomas Purl and his wife Martha to recognize outstanding students in Electrical and Computer Engineering. The recipient(s) are recognized at the annual spring ECE Awards and Recognition Banquet. All student recipients are featured on the second floor of the ECE Building's Student Honors Wall.

Illinois Engineering Achievement Scholarship $1,000 scholarship, 2022

Merit-Based Scholarships.

HCESC Jump ARCHES Scholar $6,000 scholarship, 2020

An OSF HealthCare, University of Illinois Urbana-Champaign, and University of Illinois College of Medicine Peoria Collaboration funding interdisciplinary research between engineers, clinicians, and social scientists to improve the future of health care for all. Scholars work on a research project for one academic year.

COVID-19 Wall of Recognition in Engineering 2021

In late 2020, The Grainger College of Engineering asked its faculty and staff to nominate people who had gone “above and beyond” in their response to the COVID-19 pandemic. Students who helped in new, unexpected ways. The response to our request was massive—just like the hours, attention, and skill that our college offered the face of a global catastrophe. Hundreds of people’s names were submitted. A plaque recognizing all of them will be installed in Engineering Hall.

Illinois Geometry Lab Scholar 2022

At the lab, undergraduate students work closely with graduate students and faculty on visualization projects set forth by faculty members, as well as to bring mathematics to the community through school visits and other activities. IGL research is supported by the Department of Mathematics and the Office of Public Engagement at the University of Illinois at Urbana-Champaign.

PURE Scholar 2020

PURE is a student-run program that connects freshmen and sophomores with graduate students for semester-long engineering research projects.

Edmund J. James Scholar 2020

The James Scholar Honors Program in the Grainger College of Engineering recognizes the talents of academically outstanding students promoting curricular/co-curricular activities.

Organizations

Eta Kappa Nu Alpha Chapter (IEEE-Honor Society)
Secretary and Corporate Director

  • Organized python classes for adults at the Champaign Public Library and taught a class of 30 people. Connected with companies and hosted tech talks, and facilitated sponsorships. Set up review sessions and office hours with 40 upperclassmen to support large ECE classes.

Promoting Undergraduate Research in Engineering (PURE)
Vice President and Corporate Director

  • Promoted mentorship opportunities to attract 3x graduate mentors compared to previous semester. Hosted workshops to improve mentor-mentee communications and boosting research output.

Recreational Interests

🎨 Painting: I enjoy painting with oil on canvas. 🎙️ Debating: I'm always down for a lively debate and I led my highschool debate team. I was the President of the local Toastmasters International Gavel Club. 🥾 Hiking: Looking forward to recreate the macOS wallpaper pictures at Yosemite and Big Sur next year. 🥘 Cooking: I'm a huge food afficianado and gastronome. Unfortunately trying out too many new dishes has put a strain on my gym membership. 🍿 Watching shows: "If I can't scuba what's this all about?"

References

Please click here to get the contact information of the referrers.