AI Engineer & Arabic NLP Researcher
ENSIA - École Nationale Supérieure d'Intelligence artificielle
Algeria
Building production-grade NLP systems for 400M+ Arabic speakers. Focused on RAG architectures, competitive programming, and ML security. Currently researching Arabic morphological tokenization and training Algeria's national programming olympiad team.
class SamirGuenchi:
def __init__(self):
self.role = "AI Engineer & Arabic NLP Researcher"
self.institution = "ENSIA"
self.location = "Algeria"
self.focus = [
"Arabic NLP",
"RAG Systems",
"Competitive Programming",
"ML Security"
]
def current_work(self):
return {
"research": "Arabic RAG architectures & morphological tokenization",
"coaching": "National Programming Olympiad - Algeria",
"building": "Production-grade NLP systems for RTL languages",
"learning": "Advanced AI/ML at ENSIA",
"security": "ML-powered threat detection systems"
}
def philosophy(self):
return "Code that ships > Code that sits in notebooks"- Arabic Natural Language Processing
- Retrieval-Augmented Generation (RAG) Systems
- Competitive Programming & Algorithm Design
- Machine Learning Security
- Medical AI & Healthcare Applications
Repository: Ministry-Regulation
Production NLP system for Arabic government documents. Implements semantic search over 500+ page PDFs using RAG architecture.
Technical Implementation:
- Python + LangChain framework
- Custom Arabic embeddings and tokenization
- Vector databases for semantic retrieval
- RTL text processing pipeline
Problem Solved: Transformed information retrieval from manual PDF searching to natural language queries in Arabic.
Key Innovation: First-class Arabic language support with proper morphological handling, not post-hoc English translation.
Repository: Qr_Analyzer
Mobile security application with ML-based phishing detection. Analyzes QR codes before execution to prevent attacks.
Technical Implementation:
- Flutter + Dart cross-platform development
- ML classification models for threat detection
- Real-time pattern recognition
- Security-first architecture
Problem Solved: Proactive threat detection before user compromise, not reactive damage control.
Key Innovation: Pre-scan analysis using machine learning to identify malicious patterns.
Repository: BUPA-Liver-Disorder-Analysis
End-to-end machine learning pipeline for medical diagnosis. Research-grade methodology with clinical application focus.
Technical Implementation:
- scikit-learn + pandas for data processing
- Statistical validation protocols
- Cross-validation and feature engineering
- Reproducible research methodology
Problem Solved: Healthcare AI requires reproducibility and rigorous validation, not just high accuracy scores.
Key Innovation: Production-ready medical ML with proper statistical rigor and clinical-grade evaluation.
Repository: Search_Algo
Interactive educational tool for algorithm visualization. Built for competitive programming students.
Technical Implementation:
- Python with visualization libraries
- Interactive UI/UX design
- Pathfinding algorithms (A*, Dijkstra, BFS, DFS)
- Real-time step-by-step execution
Problem Solved: Abstract algorithmic concepts become concrete through visual demonstration.
Key Innovation: Teaching-optimized design based on how competitive programmers actually learn.
View all projects: github.com/Samir-Guenchi
Context: 400 million Arabic speakers online. Most NLP infrastructure built for English, then poorly adapted.
Current Work:
- Proper tokenization for agglutinative morphology
- Embeddings that understand RTL context
- RAG systems with native diacritic handling
- Models trained on Arabic, not translated English
Goal: Infrastructure that treats Arabic as a first-class citizen in NLP systems.
Approach: Constraint-based thinking from competitive programming applies directly to production ML systems.
Applications:
- Transformer inference optimization
- Algorithm design for scale
- System building under real-world limitations
- Performance-critical code development
Philosophy: Code that ships beats code that sits in notebooks.
Institution: ENSIA (École Nationale Supérieure d'Informatique et d'Analyse des Systèmes)
Program: Computer Science & AI Engineering
Location: Algeria
Focus Areas: Arabic NLP, Machine Learning, Algorithm Design
Additional Role: National Programming Olympiad Coach
Research:
- Arabic RAG architectures
- Morphological tokenization systems
- Multilingual NLP infrastructure
Development:
- Production ML systems deployment
- Open-source Arabic language tools
- Security-focused ML applications
Education:
- Training national olympiad candidates
- Algorithm optimization techniques
- Competitive programming methodology
Seeking:
- Research collaborations in Arabic NLP
- ML/NLP engineering internships
- Open-source contribution opportunities
- Hackathons and ML competitions
Professional:
- LinkedIn: linkedin.com/in/guenchi-samir
- Email: samir.guenchi@ensia.edu.dz
- GitHub: github.com/Samir-Guenchi
Competitive Programming:
- Kaggle: kaggle.com/guenchisamir
- Codeforces: codeforces.com/profile/Guenchi_Samir_ia
Last Updated: January 2025