About Me

Welcome to my professional journey! I'm Sancharika Debnath, a dynamic Data Scientist with a fervent enthusiasm for leveraging cutting-edge technology to craft innovative solutions. With a solid foundation in data science and backend development.

Graduated from Kalinga Institute of Industrial Technology University with a Bachelor's Of Technology in Information Technology has equipped me with a strong academic background to complement my practical skills. I'm eager to connect with professionals and organizations committed to driving meaningful change.

Bringing value that resonates: WHY I stand out!!

  • Dynamic Data Scientist with a passion for leveraging cutting-edge technology to craft innovative solutions.
  • Skilled in translating complex data into actionable insights and robust systems.
  • Led the development of transformative event systems and custom APIs, achieving an impressive accuracy rate of 87.4%.
  • Experienced in backend development using Python, Django, and AWS to optimize website performance and enhance user experiences.
  • Pioneered a regression invoice prediction system and conducted groundbreaking research on Hyperspectral Image Compression and Classification, achieving a remarkable accuracy of 99.8%.

Skills

Programming Languages
  • Python
  • R
  • JavaScript
  • Java
  • HTML
  • C++
  • SQL / query languages
Technical Skills
  • Machine Learning
  • Natural Language Peocessing (NLP)
  • Deep Learning
  • Computer Vision
  • Generative AI
  • Data Analysis
  • Autoencoder
Tools
  • Google Cloud Platform
  • Amazon Web Services
  • Microsoft Azure
  • Neo4j Graph Database
  • Tableau
  • GitHub
  • Ollama
Framework
  • Keras
  • TensorFlow
  • PyTorch
  • LangChain
  • Llama
  • Django
  • PySpark
  • Azure ML
  • MLFlow
  • Databrick
Libraries
  • Pandas
  • NumPy
  • Scikit-Learn
  • OpenCV
  • spaCy
  • Transformers
  • Hugging Face
  • Transfer Learning
  • Pinecone
  • Cassandra

Experience

Defence Research and Development Organisation (DRDO)
January 2025 - August 2025

Graduate Apprentice (CSE)

  • Secure Offline AI Deployment: Deployed and configured Large Language Models (LLMs) in a fully offline, air-gapped environment for national security applications. Ensured complete compliance with DRDO's defense protocols and stringent data privacy regulations.
  • Manual Application-Level Setup: Installed, configured, and optimized LLMs manually at the application-folder level, bypassing cloud services and automated pipelines. Managed installation paths, dependencies, and environment configurations to ensure seamless offline operation.
  • Automated Tender Document Generation: Built a pipeline to automatically generate tender evaluation documents in LaTeX using locally hosted AI models from OLLAMA. Integrated OCR tools (PyTesseract, EasyOCR, pdfplumber) to extract text from PDFs and scanned copies, processed data through LangChain, and generated well-formatted LaTeX outputs.
  • Optimized AI for Restricted Hardware: Tuned memory allocation, inference parameters, and resource usage to run Llama3.2 and Mistral:7B models efficiently within limited hardware constraints, without compromising accuracy or output quality.
Leapon
February 2024 - Present

Product Developer

  • Building Product: We started with the idea of enhancing the end customer experience in the service industry. Through our insights, we realized that enabling professionals is key to making this possible. Today, we have built a productivity tool for professionals that helps them build strong business relationships.
  • Vision-Driven Product Development: Initiated with the vision of revolutionizing the service industry by enhancing the end-customer experience. Realized that empowering professionals is the cornerstone of this mission, leading to the creation of a productivity tool designed to help them build and nurture strong business relationships.
  • System Architecture & Optimization: Designed and implemented a robust, modular system architecture that can scale effortlessly with growing user demand. Utilized advanced caching techniques and database optimization to enhance application performance and reduce latency.
  • User-Centric Development: Continuously gathered feedback through user testing and implemented iterative improvements to enhance usability. Prioritized our user's experience by integrating intuitive design principles into the platform's functionality.
Giggr Technologies
June 2023 - January 2024

AI ML Engineer

  • Event System: Orchestrated the development of a dynamic event management system incorporating cutting-edge technologies like facial recognition for attendance tracking, real-time object identification via camera data, and GPS integration. Achieved an impressive 87.4% accuracy leveraging OpenCV, YOLO, Deep-Face, Geo location API, and JavaScript.
  • Neo4j Architecture Development: Collaborated with a cross-functional team to architect and implement a robust Neo4j database structure, facilitating efficient data mining operations. Leveraged Neo4j's graph database capabilities alongside REST API integration and GitHub Actions for streamlined collaboration and deployment.
  • Custom Public API Development: Spearheaded the creation of a bespoke public API leveraging Elasticsearch to aggregate data for over 1.3 million Indian educational institutions through web scraping techniques. Leveraged Amazon Web Services (AWS), Elastic-Search, and Shell Scripting to ensure scalability and performance.
  • Generative AI Stack Bot Design: Conceptualized and designed a sophisticated Generative AI stack Bot tailored to specific business requirements, significantly enhancing user engagement by 90%. Leveraged Dialogflow, Firebase, and Open AI alongside NLP and GPT for advanced conversational capabilities and Text-to-Speech integration.
  • Technological Expertise: Demonstrated proficiency across a diverse range of technologies including Amazon Web Services, Elastic-Search, Shell Scripting, Dialogflow, Firebase, Open AI, and NLP, showcasing adaptability and versatility in tackling complex project requirements.
Leapon
February 2023 - June 2023

Back-end Developer

  • Customized Back-end Development: Led the development of a tailored back-end solution using Python - Django framework, ensuring seamless functionality and performance for a website housing over 50 advisors.
  • Django REST API Integration: Implemented a comprehensive RESTful API using Django, enabling efficient communication between the website's front-end and back-end systems. Leveraged industry-standard practices for API design to ensure interoperability, scalability, and security.
  • Continuous Integration and Continuous Deployment (CI/CD): Implemented automated testing and deployment pipelines utilizing CI/CD tools to streamline the development process. Conducted unit testing and bug fixing to optimize features, ensuring the highest level of reliability and stability throughout the development lifecycle.
  • Agile Scrum Methodology: Embraced Agile Scrum practices, including sprint planning, daily stand-ups, and iterative development cycles, to foster collaboration and adaptability within the development team. Leveraged tools like GIT for version control and JIRA for project management to facilitate efficient tracking and prioritization of tasks.
  • Feature-rich Functionality: Developed various features to enhance user experience and functionality, including:
    • Creation of digital cards for advisors, providing a personalized and professional touch to their profiles.
    • Implementation of appointment scheduling functionality similar to Google Calendar, enabling seamless booking and management of appointments.
    • Integration of mass communication/marketing email sending features, facilitating targeted outreach and engagement with clients.
    • Utilization of data analytics capabilities using Google Cloud Platform (GCP) for insightful business intelligence and decision-making.
    • Hosting the back-end infrastructure on Heroku, ensuring scalability, reliability, and security of the system.
Maersk Global Service Centres India Pvt. Ltd.
July 2022 - February 2023

Data Science Intern

  • Agile Office Seat Pre-booking Web App: Spearheaded the development of an agile web application for office seat pre-booking, leveraging Spring Boot and Next.js technologies. This innovative solution led to an impressive 80% boost in user booking efficiency, enhancing workplace productivity and organization. Technologies utilized include Spring-Boot for backend development, GitHub Actions for continuous integration and deployment, GitHub for version control, Anchor UI for frontend design, and PostgreSQL for database management.
  • Clustering and Estimation Models for Ports: Implemented advanced models to cluster ports using clustering and estimation methods within the realm of Big Data analytics. Employed techniques such as Gaussian Mixture for cost estimation, enhancing analytical capabilities for port management and optimization. Leveraged Pyspark and Databricks for large-scale data processing, alongside Microsoft Azure for cloud infrastructure and deployment.
  • Predictive Analytics for Distribution Logistics: Developed a solution approach to predict container turn time and forecast attachment ratio in distribution logistics scenarios. Leveraged quantitative metrics and machine learning techniques, with FLAML (Fast, Lightweight AutoML) identified as the optimal AutoML tool for the task. Utilized ETL (Extract, Transform, Load) processes for data preprocessing, Databricks for scalable data processing, and Azure ML for machine learning model development and deployment.
  • Technological Expertise: Demonstrated proficiency across a wide array of technologies including Spring Boot, Next.js, GitHub Actions, GitHub, Anchor UI, PostgreSQL, Pyspark, Databricks, Microsoft Azure, and FLAML. This comprehensive skill set enabled the successful implementation of complex solutions spanning web development, big data analytics, and machine learning.
HighRadius Corporation
January 2022 - April 2022

Data Science Winter Intern

  • Regression Invoice Prediction System: Developed a robust regression-based invoice prediction system employing machine learning models and statistical concepts to streamline financial operations. Leveraging advanced techniques, the system accurately forecasts invoice amounts, aiding in budget planning and resource allocation.
  • Model Utilization and Accuracy: Implemented XGBoost, a powerful gradient boosting algorithm, to train the predictive model achieving an impressive accuracy of 76.15%. Additionally, utilized frameworks such as Keras to enhance the system's capabilities, ensuring accurate predictions and optimization of financial processes.
View My Resume

Projects

Featured Projects

A revolutionary ai-powered document assistant, a "magic lamp" for unlocking the secrets within your files. "Genie File" is an AI-powered document assistant designed to seamlessly extract and analyze information from diverse file formats, enabling intuitive question answering and knowledge discovery. By integrating advanced retrieval-augmented generation techniques with vector databases and customizable knowledge graphs, it transforms raw data into actionable insights.

Generative AIRAGFAISSNeo4jLangChain

An AI-powered job hunting companion revolutionizing career advancement through resume optimization, cover letter generation, interview preparation, and personalized job recommendations.

PromptingHugging FaceLangChain LLMGenerative AI

Streamlit-based conversational RAG Q&A chatbot providing real-time financial analysis of IPOs for startup companies using Vector Database.

RAGLangchainVector DatabasePineoneLLM

A robust computer-aided detection system for gastrointestinal (GI) diseases utilizing a multi-model approach. Leveraging the Kvasir dataset, the system employs deep learning models for accurate detection. Additionally, the RGB images are converted to HSV and YUV color spaces to enhance performance.

AugmentationDetectron2MaskrCNNVGG19ResNet50

Furniture classifier leveraging deep learning and transfer learning models, deployed on AWS SageMaker, achieving 83.16% accuracy in image classification.

Amazon SageMaker S3 EC2 Instances Deep Learning

Other Projects

A tool for generating blog content through speech recognition or text input, employing Google's generative AI models and customizable parameters for tone, style, and language.

Generative AISpeech RecognitionText Extraction

An advanced AI-driven summarization tool tailored for bloggers, offering effortless extraction and concise summarization of diverse document formats into engaging blog content.

PythonStreamlitGeminiLLMPromptingLangchain

Fine-Tuning large language models using state-of-the-art techniques like LoRA for text generation tasks, enabling the models to produce coherent and contextually relevant text responses based on given prompts or instructions.

QuantizationHugging FacePEFT LoRASFT Trainer

This project, part of Udacity's AI Programming with Python Nanodegree program, involves developing an image classifier using PyTorch and converting it into a command-line application, and users can then use the application to predict the class of an input image along with the probability.

PyTorchArgParseShell ScriptingVGG19GPU

Operationalizing machine learning models on Amazon SageMaker, including setting up the initial environment, configuring S3 buckets, utilizing different SageMaker instance types for tuning, training, and hosting, employing EC2 instances for training, implementing Lambda functions for testing trained endpoints, managing permissions and IAM roles, and configuring concurrency and auto-scaling.

Amazon SageMaker S3 EC2 Instances Lambda Functions

Image classification using AWS SageMaker, where a pretrained CNN model (ResNet18) is fine-tuned to classify dog breeds into 133 categories based on a Dog Breeds dataset.

AWS CNNs ResNet Deployment Profiling Hyperparameter

A basic voice assistant using speech recognition, text-to-speech conversion, and various functionalities like playing songs on YouTube, opening applications, fetching current time, and providing information about a person from Wikipedia.

Text-to-Speech Wikipedia API PywhatkitGoogle Speech

This Python project involves the implementation of various neural network models, including CNN, ANN, RBF, and few more from scratch to understand the underlying intuition behind each neural network architecture.

Python Activaion FunctionsNumPyScikit-Learn

VisionaryNet is a web application powered by ResNet50, allowing users to upload images and receive real-time identification of objects present in the image. Leveraging Keras and Django, this application harnesses the ImageNet classification of predefined models to provide accurate object recognition.

ResNet50 Django TensorFlow Keras ResNet 50

a machine learning project focused on classifying images of cats and dogs using Convolutional Neural Networks (CNN) implemented with the Keras Sequential API with an impressive accuracy of 98.7%

Keras Sequential API Tkinter CNN Asirra Dataset

Publication

The Research paper published in IJCISIM, proposes a novel lossy compression technique for hyperspectral images using deep convolutional networks and autoencoders, achieving superior results in compression ratio and Peak Signal-to-Noise Ratio (PSNR) compared to existing methods.

Hyperspectral imagingConvolutional AutoencoderHybridSN