Role Description
We are seeking a highly experienced and innovative Data Science Lead with 8+ years of expertise in core data science concepts and around 2+ years of focused, hands-on experience in Natural Language Processing (NLP) and Generative AI (GenAI). You will lead strategic AI/ML initiatives, mentor junior data scientists, and deliver intelligent solutions that drive business value using both classical and modern machine learning techniques.
Key Responsibilities
-
Lead end-to-end design and delivery of data science solutions, from problem definition to deployment.
-
Design, build, and fine-tune NLP and GenAI models for tasks such as summarization, classification, question answering, translation, and chatbot applications.
-
Apply statistical modeling, predictive analytics, and machine learning algorithms on structured and unstructured datasets.
-
Collaborate with product, engineering, and business teams to translate high-level business problems into data science solutions.
-
Ensure scalability, reproducibility, and performance optimization in all machine learning workflows.
-
Work with large-scale data processing tools and frameworks in cloud-based environments.
-
Mentor and review work of junior data scientists and collaborate on research and experimentation.
-
Track advancements in GenAI, LLMs, and NLP frameworks and bring innovation to enterprise AI use cases.
Qualifications
-
Python: Strong proficiency in Python for data science, modeling, and scripting.
-
Machine Learning: Hands-on with classical and ensemble models (e.g., Random Forest, XGBoost).
-
NLP (2+ years): Experience with transformers, tokenization, embeddings, sentiment analysis.
-
GenAI & LLMs: Working with GPT-like models, fine-tuning, prompt engineering.
-
Deep Learning (PyTorch / TensorFlow): Building and training deep learning models for NLP and other domains.
-
Model Deployment: Deploying models via REST APIs, Docker, or cloud-native services.
-
SQL & Data Manipulation: Strong ability to query, clean, and process data.
-
Statistical Analysis: Applied statistics, hypothesis testing, and A/B testing.
-
Version Control (Git): Experience using Git in collaborative environments.
Optional/nice-to-have skills
-
Vector Databases: Experience with FAISS, Pinecone, or ChromaDB for semantic search.
-
RAG Architecture: Building Retrieval-Augmented Generation pipelines.
-
LLM Orchestration: LangChain, LlamaIndex, or similar frameworks.
-
Cloud Platforms (Azure/GCP/AWS): Cloud-based ML workflows, pipelines, and infrastructure.
-
MLOps: Model tracking, monitoring, CI/CD with MLflow, Kubeflow, etc.
-
Big Data Tools: Spark, Databricks, or Hadoop ecosystem familiarity.
-
Experiment Tracking: Tools like Weights & Biases, MLflow.
-
Academic Research / Publications: Experience publishing whitepapers or research contributions.
-
Hand-on experience with Databricks, preferably Azure Databricks platform.
-
Hand-on experience with Delta Lake, preferably Azure Databricks and ADLS Gen2 platforms.
Educational Qualifications
-
Master's or PhD in Computer Science, Data Science, AI/ML, Statistics, or a related field.
Certifications (preferred but not mandatory)
-
Google Cloud or Azure AI Engineer / Data Scientist Associate.
-
Databricks Certified Machine Learning Professional.
-
DeepLearning.AI Generative AI certification.
-
Hugging Face Transformers certification.