Skip to content
View arthurmgraf's full-sized avatar
🎯
Focusing
🎯
Focusing
  • Brazil Santa Catarina
  • 22:40 (UTC -03:00)

Block or report arthurmgraf

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
arthurmgraf/README.md


About Me

AI Data Engineer with 5+ years of experience bridging the gap between data engineering and AI/ML systems. Experienced in designing LLM-based solutions, multi-agent architectures, and NLP pipelines while building robust high-throughput data infrastructure using Spark, NiFi, and Kafka. Skilled in Python, SQL, and cloud platforms (AWS/GCP), with a strong focus on delivering production-ready AI features with attention to evaluation, quality, and observability.

Location:     Brazil
Experience:   5+ years in AI & Data Engineering
Focus:        LLM Integration | Data Pipelines | MLOps | GenAI

Tech Stack

AI & LLM

Data Engineering

Databases

Cloud & Infrastructure

DevOps & IaC

Visualization & BI


Highlights

AI & LLM

  • Multi-agent orchestration with CrewAI
  • RAG architecture & vector search
  • Context engineering & prompt optimization
  • Real-time tracing with LangFuse
  • AI evaluation & guardrails
  • MCP (Model Context Protocol) integration

Data Engineering

  • Near real-time pipelines (20TB/week)
  • 93% reduction in pipeline execution time
  • 87% improvement in SQL query performance
  • Serverless pipelines on GCP
  • Observability from day one

DevOps & MLOps

  • Infrastructure as Code (Terraform/Terragrunt)
  • CI/CD with quality gates
  • Automated smoke tests
  • Auto-healing pipelines
  • Kubernetes/OpenShift deployments

Leadership

  • Delivered 48h of training to 40+ students
  • Built 50+ analytical dashboards
  • Served 20+ manufacturers and retail brands
  • 4x faster deliveries with AI-assisted development

Experience

Senior Data Engineer @ NTT DATA (Apr 2025 - Present)

Building high-throughput data pipelines, AI-powered automation, and multi-agent systems

Data Engineer @ Neogrid (Feb 2024 - Apr 2025)

Cut data analytics costs by 60%, built AI chatbot test suites with Google Gemini

Data Analyst @ Motorista PX (Sep 2023 - Feb 2024)

Real-time analytics apps, churn analysis, NLP-based audit analysis


GitHub Stats


Coding Activity

WakaTime Stats


Currently Listening

Recently Played

Pinned Loading

  1. mrhealth-data-platform mrhealth-data-platform Public

    Enterprise data warehouse on GCP Free Tier — Medallion Architecture, Kimball Star Schema, event-driven ingestion, Airflow orchestration, Terraform IaC, K8s observability stack

    Python 1

  2. claude-diagram-generator claude-diagram-generator Public

    Generate professional Excalidraw diagrams from your codebase using Claude Code

  3. graphmind graphmind Public

    Autonomous Knowledge Agent Platform - Agentic RAG with Knowledge Graphs, hybrid retrieval, LangGraph agents, and MCP server

    Python 1

  4. streamflow-analytics streamflow-analytics Public

    Real-time Streaming Data Platform for E-commerce Fraud Detection — Kafka (Strimzi), Flink (PyFlink), Airflow, PostgreSQL (CloudNativePG), K3s, Terraform/Terragrunt. Medallion Architecture, 5 fraud …

    Python

  5. nifi-oilgas-monitoring nifi-oilgas-monitoring Public

    Production-grade Apache NiFi 2.8 streaming platform for real-time IoT sensor monitoring of offshore oil & gas platforms. 10 NiFi Process Groups, Kafka, TimescaleDB, Grafana, Python anomaly detection.

    Python

  6. python-automation python-automation Public

    Collection of Python automation scripts for productivity, screen automation, and workflow optimization

    Python