An advanced, event-driven data science platform built on Google Cloud Platform (GCP) that demonstrates modern cloud engineering best practices. This production-ready infrastructure showcases automated data pipelines, Infrastructure as Code (IaC), enterprise security patterns, and is being enhanced with Google's Agent Development Kit (ADK) for intelligent, conversational data science capabilities.
- 100% Infrastructure as Code - Complete automation with Terraform
- Enterprise Security - Least privilege IAM model with zero manual permissions
- Event-Driven Architecture - Real-time data processing with Cloud Functions Gen 2
- CI/CD Excellence - Automated deployments via GitHub Actions
- Production Ready - Comprehensive testing, monitoring, and validation
- AI-Ready - Prepared for Google ADK integration for intelligent workflows
The Agent Development Kit (ADK) is Google's cutting-edge framework for building intelligent AI agents that can:
- ๐ง Understand Context - Process natural language requests about your data
- ๐ง Take Actions - Automatically execute data operations and create ML models
- ๐ก Make Decisions - Intelligently determine the best approach for each task
- ๐ Learn and Adapt - Improve responses based on interaction patterns
- Natural Language Queries: "Show me survival rates by gender and class"
- Automated Model Creation: "Create a model to predict passenger survival"
- Intelligent Insights: "What factors most influenced survival?"
- Real-time Analysis: Instant answers without writing SQL
- No Code Required: Chat interface for complex data operations
- Instant Insights: Get answers in seconds, not hours
- Automated Reports: "Generate a survival analysis report"
- Predictive Analytics: "What would happen if..."
- API Integration: Connect ADK agents to any application
- Custom Tools: Extend agent capabilities with your own functions
- Workflow Automation: Chain multiple operations intelligently
- Version Control: Track all agent interactions and decisions
graph LR
A[CSV Upload] --> B[Cloud Storage]
B --> C[Event Trigger]
C --> D[Cloud Function Gen 2]
D --> E[Data Processing]
E --> F[BigQuery]
F --> G[Analytics & ML]
H[GitHub Push] --> I[GitHub Actions]
I --> J[Terraform]
J --> K[GCP Infrastructure]
style A fill:#e1f5fe
style F fill:#c8e6c9
style G fill:#fff3e0
style I fill:#f3e5f5
graph TD
U[User] -->|Natural Language| A[ADK Agent]
A -->|Analyzes Intent| B{Decision Engine}
B -->|Data Query| C[BigQuery]
B -->|Create Model| D[AutoML]
B -->|Generate Report| E[Visualization]
C --> F[Results]
D --> F
E --> F
F -->|Natural Language| U
style A fill:#FF6F00,color:#fff
style B fill:#4285F4,color:#fff
style F fill:#34A853,color:#fff
You: "What's the average age of survivors vs non-survivors?"
ADK Agent: "Based on the Titanic dataset analysis:
- Survivors: Average age 28.3 years
- Non-survivors: Average age 30.6 years
- Younger passengers had slightly better survival rates
Would you like me to break this down by passenger class?"
You: "Create a model to predict survival probability"
ADK Agent: "I'll create a BigQuery AutoML model for you:
1. โ
Preparing features: Age, Sex, Pclass, Fare, Embarked
2. โ
Training model with boosted trees algorithm
3. โ
Model accuracy: 82.7%
4. โ
Key insights: Gender and passenger class are strongest predictors
The model is ready! Would you like to make predictions or see feature importance?"
You: "Generate a report on factors affecting survival"
ADK Agent: "I've generated a comprehensive survival analysis report:
๐ Key Findings:
- Gender: Females had 74% survival rate vs 19% for males
- Class: 1st class 63%, 2nd class 47%, 3rd class 24%
- Age: Children under 16 had 58% survival rate
- Port: Cherbourg passengers had highest survival at 55%
๐ Full report saved to: reports/titanic_survival_analysis_2025.pdf
Would you like me to create visualizations for any specific factor?"
# 1. Clone and setup
git clone https://github.com/JeanFraga/agentic-data-science.git
cd "agentic-data-science"
# 2. Initial configuration
.\scripts\setup.ps1 -ProjectId "your-gcp-project-id"
# 3. Configure GitHub Secrets (see GITHUB_SECRETS_SETUP.md)
# Add: GCP_PROJECT_ID, GCP_REGION, GCP_ENVIRONMENT, GCP_SERVICE_ACCOUNT_KEY
# 4. Deploy via CI/CD
git push origin main# Install ADK
pip install google-adk
# Initialize agent
cd titanic-agent
adk init --template bigquery-analyst
# Test locally
python -m google.adk.cli web --port 8000
# Deploy to production
adk deploy --project your-project-id- Foundation - Core infrastructure setup
- Cloud Functions - Event-driven processing
- IAM Security - Enterprise security hardening
- Standardization - Template conversion
- Gen 2 Migration - Modern architecture
- Production Ready - Public release
- ADK Setup - Agent development environment
- Conversational Interface - Natural language processing
- AutoML Integration - Automated model creation
- Vertex AI Agent Engine - Enterprise AI capabilities
- Multi-Dataset Support - Analyze multiple datasets simultaneously
- Custom ML Pipelines - Agent-orchestrated workflows
- Real-time Dashboards - Automatic visualization generation
- Collaborative Agents - Multiple specialized agents working together
- Automated Data Ingestion - CSV files automatically processed
- Data Quality Assurance - Schema validation and error handling
- Analytics Ready - Immediate SQL query capability
- Executive Dashboards - "Show me KPIs for last quarter"
- Predictive Maintenance - "Predict equipment failure probability"
- Customer Analytics - "Segment customers by behavior patterns"
- Risk Assessment - "Analyze risk factors in loan applications"
- 40+ Security Issues Resolved - Enterprise-grade security
- 48% Documentation Optimization - Streamlined experience
- 100% Automation - Zero manual management
- Production Ready - Immediate deployment
This project demonstrates:
- Cloud Architecture Expertise - Modern GCP patterns
- AI/ML Integration - Cutting-edge ADK implementation
- Security First Mindset - Enterprise IAM
- Full Stack Innovation - Infrastructure to AI agents
- Zero Trust Architecture - Least privilege access
- IAM as Code - Version-controlled permissions
- Audit Logging - Complete operation history
- Secret Management - GitHub Secrets integration
- Agent Authentication - OAuth 2.0 / Service Accounts
- Data Privacy - PII detection and masking
- Access Control - Role-based agent capabilities
- Compliance - GDPR/HIPAA ready configurations
- IAM as Code Guide
- Cloud Function Testing
- ADK Integration Guide (Coming Soon)
We welcome contributions! See our Contributing Guidelines.
- ADK agent templates for common data science tasks
- Additional dataset integrations
- Custom visualization tools
- Performance optimizations
This project is licensed under the MIT License - see the LICENSE file for details.
Built with cutting-edge technologies:
- Google Cloud Platform - Enterprise cloud infrastructure
- Google ADK - Agent Development Kit
- Terraform - Infrastructure as Code
- GitHub Actions - CI/CD automation
- Python - Core programming language
Created by: Jean Fraga
LinkedIn: linkedin.com/in/jeanfraga
GitHub: github.com/JeanFraga
Project Journey: Evolution Timeline
This project showcases production-ready cloud engineering enhanced with AI capabilities, demonstrating the future of intelligent data science platforms. From infrastructure automation to conversational AI, it represents the complete modern data stack.
#CloudEngineering #DataScience #GCP #Terraform #InfrastructureAsCode #DevOps #CloudArchitecture #GoogleADK #AIAgents #MachineLearning #BigQuery #VertexAI