Skip to content
View johanesalxd's full-sized avatar

Organizations

@googlers

Block or report johanesalxd

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
johanesalxd/README.md

Hi there, I'm Johanes Alexander πŸ‘‹

I'm a Data & AI Architect with deep specialization in agentic AI systems and large-scale data platforms. I design end-to-end solutions at the intersection of data engineering and autonomous AI β€” building multi-agent architectures, real-time pipelines, and cloud-native systems that scale.


πŸ€– Current Focus: AI & Agentic Systems

I'm actively exploring the intersection of data engineering and AI agents, developing customized data science agents that combine traditional data processing with intelligent automation. My work focuses on:

  • Data Agents Development: Creating intelligent agents for data processing and analysis.
  • Agentic Workflows: Designing autonomous systems for data pipeline management.
  • AI-Powered Data Solutions: Integrating LLMs with traditional data engineering patterns.

πŸ”¬ Current Project: Developing production-ready AI agents for BigQuery analytics, combining ADK, MCP protocols, and BQML capabilities with RAG-enhanced documentation retrieval.


🎀 Community Contributions & Speaking

As a thought leader in data analytics and AI, I actively contribute to the developer community through speaking engagements and knowledge sharing.

Recent Speaking Engagements

Talk: "Unleash the Power of Generative AI in BigQuery with Colab Data Science Agents and BigFrames"

Demonstrated practical applications of Generative AI in BigQuery, showcasing how to leverage Colab Data Science Agents and BigFrames for advanced data analytics workflows. Explored the integration of AI-powered tools with BigQuery to enable data scientists and analysts to build intelligent data processing pipelines with natural language interfaces and automated insights generation.

Key Topics: Generative AI, BigQuery, Colab Data Science Agents, BigFrames, Data & AI workshops

Talk: "Metadata: The Key to Unlocking Data Analytics in the Agentic Era"

Presented insights on Google Cloud's latest data analytics innovations from Next '25, focusing on AI integration with BigQuery and the crucial role of metadata in enabling AI agents. Covered specialized AI agents for various user roles, AI-assisted notebooks, and the BigQuery AI Query Engine's capabilities with both structured and unstructured data.

Key Topics: BigQuery metadata, AI agents, data governance, query optimization, autonomous data processing

GDG Monthly Meetup #10 - October 24, 2024

Talk: "Harnessing Real-Time Insights: LLM Inference for Streaming Data with SQL"

Explored practical techniques for performing real-time inference on streaming data using large language models (LLMs) and SQL. Demonstrated seamless integration of LLMs into existing application workflows, enabling real-time insights, predictions, and classifications directly within familiar SQL environments.

Key Topics: Real-time data processing, LLM integration, streaming analytics, SQL-based AI inference


πŸ› οΈ Tech Stack

Go Python Java Apache Spark BigQuery Google Cloud PostgreSQL Redis Cassandra Docker Kafka


πŸš€ Featured Projects

πŸ€– AI & Agentic Systems

bq-agent-app - Multi-Agent BigQuery System

A powerful AI-powered data analysis system combining BigQuery with Google Agent Development Kit (ADK). Features multi-agent orchestration with specialized sub-agents for data retrieval, data science workflows, and BQML operations. Includes RAG corpus integration for BQML documentation and MCP protocol support.

Tech Stack: Python, ADK, MCP, Gemini 2.5, BigQuery, Vertex AI Key Features: Multi-agent architecture, Python code execution, Statistical analysis, BQML with RAG, MCP integration

mcp-cr - Model Context Protocol Server

A comprehensive tutorial for deploying MCP (Model Context Protocol) servers to Google Cloud Run, featuring a zoo animal database with interactive tools. Demonstrates modern AI integration patterns with cloud-native deployment.

Tech Stack: Python, FastMCP, Google Cloud Run, Docker Key Features: MCP server implementation, Cloud Run deployment, Interactive AI tools, RESTful API

πŸ“Š Data Engineering & Analytics

mdm-gcp - Master Data Management with AI

Production-ready MDM solution with 5-strategy AI matching for batch processing and 4-strategy real-time matching for streaming. Features vector embeddings with Gemini, fuzzy matching, business rules, and AI natural language reasoning. Unified batch and streaming architecture with BigQuery and Spanner.

Tech Stack: Python, BigQuery, Spanner, Gemini, Vertex AI Vector Search Key Features: 5-strategy AI matching, Vector embeddings, Real-time streaming, Unified batch+streaming architecture

data-clean-room-demo - BigQuery Data Clean Rooms

Comprehensive BigQuery Data Clean Room implementation with Analytics Hub integration. Demonstrates privacy-preserving analytics, BQML collaborative ML, and secure data sharing patterns with automated setup scripts for both DCR and DCX deployments.

Tech Stack: Python, BigQuery, Analytics Hub, BQML, Vertex AI Key Features: Privacy-preserving analytics, BQML collaborative ML, Analytics Hub automation, Data exchange patterns

random-stuff - BigQuery Analytics Toolkit

Production-ready BigQuery tools and demos covering advanced analytics patterns. Includes FinOps cost optimization, geospatial routing, Places Insights competitive analysis, RLS/CLS security with Dataform, Firebase Analytics integration, Streaming CDC pipelines, and dbt migration workflows.

Tech Stack: Python, BigQuery, Dataform, dbt, PySpark, Jupyter Key Features: FinOps cookbook, Geospatial analysis, Places Insights, RLS/CLS security, Streaming CDC, dbt+Spark+BQ, Firebase Analytics

random-stuff/agent_stuff - AI Agent Configs & Guides

Curated collection of AI agent configurations, coding standards, and workspace architecture guides for multi-model agentic workflows. Includes OpenClaw workspace architecture guides for Anthropic and Gemini, Google-style coding standards for AI-generated code, BigQuery data science agent prompt libraries, and opencode configuration scripts.

Tech Stack: Python, OpenClaw, Anthropic Claude, Gemini, Google Cloud Key Features: OpenClaw workspace architecture guides (Anthropic + Gemini), Google-style AI coding standards (Python/Go/Java), BQ agent prompt library, opencode config + sync scripts, dbt migration agents

spark-hybrid-compute - Advanced Spark Integration

Comprehensive solution for Spark integration with BigLake Metastore and Apache Iceberg, supporting both Dataproc and Docker-based deployments. Demonstrates hybrid cloud computing patterns for modern data lakes.

Tech Stack: Apache Spark, BigLake, Apache Iceberg, Dataproc, Docker, Jupyter Key Features: Hybrid cloud architecture, Iceberg table management, BigQuery integration, Multiple deployment options

bigquery-antipattern-recognition - BigQuery SQL Optimization

Enhanced fork of Google Cloud Platform's utility for identifying and rewriting common anti-patterns in BigQuery SQL. Added query grouping functionality and clustering optimization patterns for improved performance analysis.

Tech Stack: Java, BigQuery, Maven, Docker, Cloud Run, Vertex AI Key Features: 15+ antipattern detections, AI-powered SQL rewriting, Query grouping analysis, Remote UDF deployment

sheets-pyspark - Google Sheets with PySpark

Integration of Google Sheets as a data source for PySpark on Dataproc Serverless. Includes Airflow demo for scheduling notebook execution with three deployment options: PythonVirtualenvOperator, Vertex AI Custom Training, and Dataproc Serverless.

Tech Stack: Python, PySpark, Dataproc Serverless, Airflow, Google Sheets API, Jupyter Key Features: Sheets as data source, Dataproc Serverless, Airflow orchestration, Multiple execution options

πŸ”„ Real-Time Data Pipelines

dataflow-kafka-bq-examples - Kafka to BigQuery Streaming

Comprehensive Dataflow examples for streaming Kafka data to BigQuery. Features multi-branch processing, Beam SQL aggregations, multi-stream joins, and both custom Java pipelines and Flex Templates for different deployment scenarios.

Tech Stack: Java, Apache Beam, Kafka, Dataflow, BigQuery, Beam SQL Key Features: Multi-branch processing, Beam SQL joins, Real-time aggregations, Flex Template deployment

beam-dataflow-iceberg-bqms - Beam with Iceberg Tables

Demonstration of Apache Beam with standard BigQueryIO and Managed I/O for BigQuery operations. Showcases 8 pipeline patterns including BigQuery Iceberg and BigLake Iceberg table operations with automatic schema handling.

Tech Stack: Python, Apache Beam, Dataflow, Apache Iceberg, BigQuery, BigLake Key Features: Managed I/O, BigQuery Iceberg tables, BigLake integration, Multiple pipeline patterns

cf-pubsub-to-bq - Real-Time Data Ingestion

Complete real-time data pipeline solution from Pub/Sub to BigQuery using Cloud Run Functions. Includes data generation, streaming processing, and automated table management.

Tech Stack: Go, Pub/Sub, BigQuery, Cloud Run Functions, Dataflow Key Features: Real-time processing, Automated data generation, Partitioned tables, End-to-end pipeline

dataflow-pubsub-to-bq-examples-py - Pub/Sub to BigQuery Streaming

Python streaming pipeline from Pub/Sub to BigQuery using BigQuery Storage Write API. Features micro-batching, Pub/Sub metadata capture, and partitioned tables with DirectRunner and DataflowRunner V2 support.

Tech Stack: Python, Apache Beam, Dataflow, Pub/Sub, BigQuery Key Features: Storage Write API, Micro-batching, Pub/Sub metadata capture, Runner V2 support

dataflow-pubsub-perf-test - Dataflow/BigQuery Performance Testing

Test infrastructure for diagnosing the Dataflow/BigQuery "Noisy Neighbor" throughput degradation pattern. Six rounds of testing across Pub/Sub and Kafka sources (Python + Java SDKs) β€” 2.2 billion rows, 2.4 TB, 901k rows/sec peak, zero errors. Confirmed linear scaling and identified a shared Kafka consumer group as the root cause of production degradation. Exceeded the BigQuery Storage Write API regional quota and sustained it.

Tech Stack: Java, Python, Apache Beam, Dataflow, Pub/Sub, Kafka (Google Managed), BigQuery Storage Write API Key Features: 2.2B rows / 2.4 TB scale testing, 901k rows/sec peak throughput, Noisy Neighbor root-cause diagnosis, Multi-source testing (Pub/Sub + Kafka), Python + Java SDK coverage

πŸ§ͺ AI Experiments & Tools

gemini-cli-1c - One-Click Gemini CLI Setup

Automated one-command installation script for a complete development environment with NVM, Node.js, and Google's Gemini CLI. Streamlines developer onboarding for AI-powered workflows.

Tech Stack: Shell, Node.js, NVM, Gemini CLI Key Features: One-command installation, Environment configuration, Developer productivity tools

vision-sandbox - Agentic Vision Tool

Agentic vision tool built as an OpenClaw skill, leveraging Gemini's native code execution sandbox for spatial grounding, visual math, and UI auditing tasks. Demonstrates OpenClaw skill architecture for vision-based agentic workflows.

Tech Stack: Python, Gemini, Google Cloud, OpenClaw Key Features: Agentic vision analysis, Spatial grounding, Visual math, UI auditing, OpenClaw skill architecture


πŸ’Ό Core Competencies

Data Architecture & Engineering

  • Big Data Processing: Apache Spark, Dataproc, distributed computing, Iceberg tables
  • Data Warehousing: BigQuery, data modeling, partitioning strategies, performance optimization
  • Real-Time Streaming: Pub/Sub, Kafka, Apache Beam, event-driven architectures
  • Database Technologies: PostgreSQL, Spanner, Redis, Cassandra
  • Master Data Management: AI-powered entity resolution, vector embeddings, multi-strategy matching

AI & Machine Learning

  • AI Agents: Multi-agent systems, agentic workflows, autonomous data processing
  • LLM Integration: Gemini AI, prompt engineering, RAG systems, AI-powered analytics
  • ML Engineering: Model deployment, MLOps, BQML, production ML systems
  • Vector Search: Semantic similarity, embeddings generation, hybrid search strategies

Cloud Architecture

  • Google Cloud Platform: Comprehensive expertise across data, AI, and compute services
  • Serverless Computing: Cloud Functions, Cloud Run, event-driven architectures
  • Infrastructure as Code: Terraform, deployment automation
  • Data Governance: Data Clean Rooms, Analytics Hub, privacy-preserving analytics

πŸ“ˆ GitHub Stats

GitHub Profile Details
GitHub Stats Top Languages

πŸ“« Get in Touch


Building the future of data-driven AI systems, one agent at a time πŸš€

Pinned Loading

  1. audio-transcribe-go audio-transcribe-go Public

    Go

  2. cf-bq-rf-gemini cf-bq-rf-gemini Public

    Go

  3. cf-pubsub-to-bq cf-pubsub-to-bq Public

    Go

  4. gemini-cli-1c gemini-cli-1c Public

    Shell

  5. spark-hybrid-compute spark-hybrid-compute Public

    Jupyter Notebook

  6. bigquery-antipattern-recognition bigquery-antipattern-recognition Public

    Forked from GoogleCloudPlatform/bigquery-antipattern-recognition

    Utility to identify and rewrite common anti patterns in BigQuery SQL syntax

    Java