Skip to content

βš™οΈπŸ“Š Data Engineering Project with dbt πŸ”„πŸ“ˆ Welcome to the repository for this data engineering project, where we leverage dbt to transform raw source data into clean, structured, and analytics-ready models. This project focuses on data transformation, modeling, and pipeline optimization to support accurate reporting and decision-making.

Notifications You must be signed in to change notification settings

Shanabunga/ssp_analytics_dataeng_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

48 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

βš™οΈ Simple Stack Data Engineering Project βš™οΈ

This repository consists of a dbt project designed to transform raw data into structured, analytics-ready models. The project follows modern data engineering best practices, ensuring efficient, scalable, and maintainable data workflows.

The goal of this project is to implement a simple and effective data architecture that can support analytics and reporting needs. It integrates data ingestion, transformation, testing, and deployment into a streamlined process, ensuring high-quality data pipelines.

The project emphasizes minimal tooling, clear structure, and automation to build a scalable and maintainable data pipeline. This ensures that the project can grow with evolving business needs while maintaining efficiency and data quality.

Tech Stack

To learn more about the overall architecture design & strategy can be found in our centralized handbook:

Sources:

Raw, unformatted data loaded directly from source systems using various data tools.

  • nba_data - The primary source of NBA statistics data captured from an API & loaded via Sling/Airbyte.
    • Schema: analytics.raw_nba_data
  • google_sheets - Internally maintained reference sheets related to the project & loaded via Sling/Airbyte.
    • Schema: analytics.raw_google_sheets

Environments:

Transformed data models built via dbt with 3 distinct environments to enable a sustainable development workflow.

  • Development
    • Schema: `analytics.dev_wilson
    • One per developer to avoid conflicts or overriding changes during development.
  • CI
    • Schema: analytics.ci
    • An isolated schema created specifically for testing Pull Request changes to ensure quality.
  • Production
    • Schemas:
      • analytics.staging
      • analytics.warehouse
      • analytics.marts
    • Separation by layer for easier navigation and permission management.

How to Get Started?

  1. Create your local development environment
    • Use a local IDE (ex. VS Code), dbt Cloud or GitHub Codespaces
  2. Clone the current repo (or create a new one)
    • Checkout the main branch and run git pull to sync changes
  3. Create a New Branch for your new changes
    • First, run git branch your_branch_name to create a new branch
    • Then run git checkout your_branch_name to switch to it
  4. Start developing!
    • Commit & Sync all changes to your branch during development
    • IMPORTANT - All changes should follow the team Style Guide
  5. Create Pull Request
    • When development is complete, Push your branch to GitHub & create a request
    • Request peer reviews & confirm automated CI jobs succeed
  6. Merge changes to the main branch
    • Confirm automated post-merge jobs succeed
  7. Get latest changes in your local environment
    • Checkout the main branch in your local terminal
    • Run "git pull" to sync the latest version of the code
  8. Continue to develop & repeat the process

Notes

  • Provide any other important call-outs of platform-specific information here.

Resources:

  • Learn more about dbt in the docs
  • Check out Discourse for commonly asked questions and answers
  • Join the chat on Slack for live discussions and support
  • Check out the blog for the latest news on dbt's development and best practices

About

βš™οΈπŸ“Š Data Engineering Project with dbt πŸ”„πŸ“ˆ Welcome to the repository for this data engineering project, where we leverage dbt to transform raw source data into clean, structured, and analytics-ready models. This project focuses on data transformation, modeling, and pipeline optimization to support accurate reporting and decision-making.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published