You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+28-7Lines changed: 28 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -116,17 +116,17 @@ It's ideal for scenarios involving **event-driven architectures, microservices c
116
116
117
117
<br>
118
118
119
-
This stack builds a comprehensive analytics platform that erases the line between real-time stream analytics and large-scale batch processing. It achieves this by combining the power of Apache Flinkand Apache Spark on a unified data lakehouse, enabling you to work with a single source of truth for all your data workloads.
119
+
This stack builds a **comprehensive analytics platform** that erases the line between real-time stream analytics and large-scale batch processing. It achieves this by combining the power of **Apache Flink**, enhanced by [**Flex**](https://factorhouse.io/flex) for enterprise-grade management and monitoring, with **Apache Spark** on a unified data lakehouse, enabling you to work with a single source of truth for all your data workloads.
120
120
121
121
### 📌 Description
122
122
123
123
This architecture is designed around a modern data lakehouse that serves both streaming and batch jobs from the same data. At its foundation, data is stored in Apache Iceberg tables on MinIO, an S3-compatible object store. This provides powerful features like ACID transactions, schema evolution, and time travel for your data.
124
124
125
-
A central **Hive Metastore**acts as the unified catalog, or "brain," for the entire ecosystem. By using a robust **PostgreSQL** database as its backend, the metastore reliably tracks all table schemas and metadata. This central catalog allows both **Apache Flink** (for low-latency streaming) and **Apache Spark** (for batch ETL and interactive analytics) to discover, query, and write to the same tables seamlessly, eliminating data silos.
125
+
A central **Hive Metastore**serves as a unified metadata catalogfor the entire data ecosystem, providing essential information about the structure and location of datasets. By using a robust **PostgreSQL** database as its backend, the metastore reliably tracks all table schemas and metadata. This central catalog allows both **Apache Flink** (for low-latency streaming) and **Apache Spark** (for batch ETL and interactive analytics) to discover, query, and write to the same tables seamlessly, eliminating data silos.
126
126
127
127
The role of PostgreSQL is twofold: in addition to providing a durable backend for the metastore, it is configured as a high-performance transactional database ready for **Change Data Capture (CDC)**. This design allows you to stream every `INSERT`, `UPDATE`, and `DELETE` from your operational data directly into the lakehouse, keeping it perfectly synchronized in near real-time.
128
128
129
-
The platform is rounded out by enterprise-grade tooling: **Flex** simplifies Flink management and monitoring, a **Flink SQL Gateway** enables interactive queries on live data streams, and a full**Spark cluster** supports complex data transformations. This integrated environment is ideal for building sophisticated solutions for fraud detection, operational intelligence, and unified business analytics.
129
+
The platform is rounded out by enterprise-grade tooling: **Flex** simplifies Flink management and monitoring, a **Flink SQL Gateway** enables interactive queries on live data streams, and a single node**Spark cluster** supports complex data transformations. This integrated environment is ideal for building sophisticated solutions for fraud detection, operational intelligence, and unified business analytics.
130
130
131
131
---
132
132
@@ -135,7 +135,26 @@ The platform is rounded out by enterprise-grade tooling: **Flex** simplifies Fli
135
135
#### 🚀 Flex (Enterprise Flink Runtime)
136
136
137
137
- Container: **kpow** from (`factorhouse/flex:latest` (**enterprise**)) or **kpow-ce** from (`factorhouse/flex-ce:latest` (**community**))
138
-
- Provides an enterprise-ready tooling solution to streamline and simplify Apache Flink management. It gathers Flink resource information, offering custom telemetry, insights, and a rich data-oriented UI.
138
+
- Provides an enterprise-ready tooling solution to streamline and simplify Apache Flink management. It gathers Flink resource information, offering custom telemetry, insights, and a rich data-oriented UI. Key features include:
139
+
-**Comprehensive Flink Monitoring & Insights:**
140
+
- Gathers Flink resource information minute-by-minute.
141
+
- Offers fully integrated metrics and telemetry.
142
+
- Provides access to long-term metrics and aggregated consumption/production data, from cluster-level down to individual job-level details.
143
+
-**Simplified Management for All User Groups:**
144
+
- User-friendly interface and intuitive controls.
145
+
- Aims to align business needs with Flink capabilities.
-**Robust Authorization:** Offers Simple or fine-grained Role-Based Access Controls (RBAC).
149
+
-**Data Policies:** Includes capabilities for masking and redaction of sensitive data (e.g., PII, Credit Card).
150
+
-**Audit Logging:** Captures all user actions for comprehensive data governance.
151
+
-**Secure Deployments:** Supports HTTPS and is designed for air-gapped environments (all data remains local).
152
+
-**Powerful Flink Enhancements:**
153
+
-**Multi-tenancy:** Advanced capabilities to manage Flink resources effectively with control over visibility and usage.
154
+
-**Multi-Cluster Monitoring:** Manage and monitor multiple Flink clusters from a single installation.
155
+
-**Key Integrations:**
156
+
-**Prometheus:** Exposes endpoints for integration with preferred metrics and alerting systems.
157
+
-**Slack:** Allows user actions to be sent to an operations channel in real-time.
139
158
- Exposes UI at `http://localhost:3001`
140
159
141
160
#### 🧠 Flink Cluster (Real-Time Engine)
@@ -322,11 +341,11 @@ cd factorhouse-local
322
341
323
342
Core services like Flink, Spark, and Kafka Connect are designed to be modular and do not come bundled with the specific connectors and libraries needed to communicate with other systems like the Hive Metastore, Apache Iceberg, or S3.
324
343
325
-
`setup-env.sh` automates the process of downloading all the required dependencies and organizing them into a local deps directory. When the services are started with docker-compose, this directory is mounted as a volume, injecting the libraries directly into each container's classpath.
344
+
`setup-env.sh` automates the process of downloading all the required dependencies and organizing them into a local `deps` directory. When the services are started with docker-compose, this directory is mounted as a volume, injecting the libraries directly into each container's classpath.
326
345
327
346
<details>
328
347
329
-
<summary><b>The following dependencies are downloaded.</b></summary>
348
+
<summary><b>View all downloaded dependencies</b></summary>
> By default, it is configured to deploy the Enterprise edition. See below for instructions on how to configure it to run the Community edition instead.
451
+
431
452
<details>
432
453
433
454
<summary>License file example</summary>
@@ -466,7 +487,7 @@ services:
466
487
467
488
</details>
468
489
469
-
## Running the Platform with Docker
490
+
## Running the Platform
470
491
471
492
To get the platform running, you first need to configure your local environment. This involves setting environment variables to select the edition you want to run (Community or Enterprise) and providing the file paths to your licenses. Once these prerequisites are set, you can launch the services using `docker compose`. You have two primary options: you can start all services (Kpow, Flex, and Pinot) together for a fully integrated experience, or you can run Kpow and Flex independently for more focused use cases. When you are finished, remember to run the corresponding `down` command to stop and remove the containers, and unset the environment variables to clean up your session.
0 commit comments