Skip to content
View Samir-Guenchi's full-sized avatar

Highlights

  • Pro

Block or report Samir-Guenchi

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Samir-Guenchi/README.md

Samir Guenchi

AI Engineer & Arabic NLP Researcher
ENSIA - École Nationale Supérieure d'Intelligence artificielle Algeria


About

Building production-grade NLP systems for 400M+ Arabic speakers. Focused on RAG architectures, competitive programming, and ML security. Currently researching Arabic morphological tokenization and training Algeria's national programming olympiad team.

class SamirGuenchi:
    def __init__(self):
        self.role = "AI Engineer & Arabic NLP Researcher"
        self.institution = "ENSIA"
        self.location = "Algeria"
        self.focus = [
            "Arabic NLP",
            "RAG Systems", 
            "Competitive Programming",
            "ML Security"
        ]
        
    def current_work(self):
        return {
            "research": "Arabic RAG architectures & morphological tokenization",
            "coaching": "National Programming Olympiad - Algeria",
            "building": "Production-grade NLP systems for RTL languages",
            "learning": "Advanced AI/ML at ENSIA",
            "security": "ML-powered threat detection systems"
        }
    
    def philosophy(self):
        return "Code that ships > Code that sits in notebooks"

Technical Skills

AI & Machine Learning

Python TensorFlow PyTorch scikit-learn Hugging Face LangChain

Programming Languages

C++ Python Dart SQL LaTeX

Development Tools

Linux Git Docker Jupyter VS Code

Mobile & Web Development

Flutter Firebase FastAPI Streamlit

Specializations

  • Arabic Natural Language Processing
  • Retrieval-Augmented Generation (RAG) Systems
  • Competitive Programming & Algorithm Design
  • Machine Learning Security
  • Medical AI & Healthcare Applications

Projects

Ministry Regulation RAG System

Repository: Ministry-Regulation

Production NLP system for Arabic government documents. Implements semantic search over 500+ page PDFs using RAG architecture.

Technical Implementation:

  • Python + LangChain framework
  • Custom Arabic embeddings and tokenization
  • Vector databases for semantic retrieval
  • RTL text processing pipeline

Problem Solved: Transformed information retrieval from manual PDF searching to natural language queries in Arabic.

Key Innovation: First-class Arabic language support with proper morphological handling, not post-hoc English translation.


QR Analyzer

Repository: Qr_Analyzer

Mobile security application with ML-based phishing detection. Analyzes QR codes before execution to prevent attacks.

Technical Implementation:

  • Flutter + Dart cross-platform development
  • ML classification models for threat detection
  • Real-time pattern recognition
  • Security-first architecture

Problem Solved: Proactive threat detection before user compromise, not reactive damage control.

Key Innovation: Pre-scan analysis using machine learning to identify malicious patterns.


BUPA Liver Disorder Analysis

Repository: BUPA-Liver-Disorder-Analysis

End-to-end machine learning pipeline for medical diagnosis. Research-grade methodology with clinical application focus.

Technical Implementation:

  • scikit-learn + pandas for data processing
  • Statistical validation protocols
  • Cross-validation and feature engineering
  • Reproducible research methodology

Problem Solved: Healthcare AI requires reproducibility and rigorous validation, not just high accuracy scores.

Key Innovation: Production-ready medical ML with proper statistical rigor and clinical-grade evaluation.


Search Algorithm Visualizer

Repository: Search_Algo

Interactive educational tool for algorithm visualization. Built for competitive programming students.

Technical Implementation:

  • Python with visualization libraries
  • Interactive UI/UX design
  • Pathfinding algorithms (A*, Dijkstra, BFS, DFS)
  • Real-time step-by-step execution

Problem Solved: Abstract algorithmic concepts become concrete through visual demonstration.

Key Innovation: Teaching-optimized design based on how competitive programmers actually learn.


View all projects: github.com/Samir-Guenchi


Research Focus

The Arabic NLP Challenge

Context: 400 million Arabic speakers online. Most NLP infrastructure built for English, then poorly adapted.

Current Work:

  • Proper tokenization for agglutinative morphology
  • Embeddings that understand RTL context
  • RAG systems with native diacritic handling
  • Models trained on Arabic, not translated English

Goal: Infrastructure that treats Arabic as a first-class citizen in NLP systems.

Competitive Programming Methodology

Approach: Constraint-based thinking from competitive programming applies directly to production ML systems.

Applications:

  • Transformer inference optimization
  • Algorithm design for scale
  • System building under real-world limitations
  • Performance-critical code development

Philosophy: Code that ships beats code that sits in notebooks.


Education

Institution: ENSIA (École Nationale Supérieure d'Informatique et d'Analyse des Systèmes)
Program: Computer Science & AI Engineering
Location: Algeria
Focus Areas: Arabic NLP, Machine Learning, Algorithm Design

Additional Role: National Programming Olympiad Coach


Current Objectives

Research:

  • Arabic RAG architectures
  • Morphological tokenization systems
  • Multilingual NLP infrastructure

Development:

  • Production ML systems deployment
  • Open-source Arabic language tools
  • Security-focused ML applications

Education:

  • Training national olympiad candidates
  • Algorithm optimization techniques
  • Competitive programming methodology

Seeking:

  • Research collaborations in Arabic NLP
  • ML/NLP engineering internships
  • Open-source contribution opportunities
  • Hackathons and ML competitions

Contact

Professional:

Competitive Programming:


Statistics

GitHub Streak

Last Updated: January 2025

Pinned Loading

  1. Search_Algo Search_Algo Public

    "A collection of AI search algorithms with interactive visualizations to demonstrate their behavior and performance.

    Python

  2. portfolio portfolio Public

    A fully custom, no-framework portfolio built from scratch with vanilla HTML, CSS, and JavaScript. No Bootstrap, no React, no dependencies—just clean, performant code.

    HTML 2

  3. Amal Amal Public

    A Cross-Platform Multilingual LLM & Retrieval-Augmented Generation System for Drug Addiction Awareness, Prevention, and Recovery Support

    TypeScript 1

  4. Tokenized-Reward-Credential-System Tokenized-Reward-Credential-System Public

    A blockchain-based system for issuing, managing, and verifying tokenized rewards and digital credentials using secure smart contracts and decentralized identity.

    TypeScript

  5. ecommerce ecommerce Public

    A modern, interactive e-commerce system

    TypeScript