Document setting up grafana agent, github runners, split ingress (#244)

michaeljguarino · web-flow · commit 087a31a1e554 · 2023-11-20T09:20:47.000-06:00
Documenting some more advanced workflows around setting up monitoring, ingress, self-hosted github actions runners.
diff --git a/pages/deployments/advanced-configuration.md b/pages/deployments/advanced-configuration.md
@@ -0,0 +1,4 @@
+---
+title: Advanced Configuration
+description: Fine-tuning your Plural Console to meet your requirements
+---
diff --git a/pages/deployments/ci-gh-actions.md b/pages/deployments/ci-gh-actions.md
@@ -29,6 +29,41 @@ plural cd services update @{cluster-handle}/{service-name} --conf {name}={value}
 
 Feel free to run `plural cd services update --help` for more documentation as well.
 
+## Self-Hosted Runners
+
+Many users will want to host their console in a private network. If that's the case, a standard hosted Github Actions runner will not be able to network to the console api and allow the execution of the `plural cd` commands. The solution for this is to leverage github's self-hosted runners to allow you to run the Actions in an adjacent network and maintain the security posture of your console. We've added a few add-ons to make this setup trivially easy to handle, you'll want to:
+
+- install the `github-actions-controller` runner to set up the k8s operator that manages runners in a cluster. You likely want this to be installed in your management cluster for network adjacency.
+- install the `plrl-github-actions-runner` in that same cluster to create a runner set you can schedule jobs on.
+
+Once both are deployed, you can create your first job, it'll likely look something like this:
+
+```yaml
+jobs:
+  # some previous jobs...
+  update-service:
+    needs: [docker-build]
+    runs-on: plrl-github-actions-runner
+    env:
+      PLURAL_CONSOLE_TOKEN: ${{ secrets.PLURAL_CONSOLE_TOKEN }}
+      PLURAL_CONSOLE_URL: ${{ secrets.PLURAL_CONSOLE_URL }}
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+      - name: installing plural
+        uses: pluralsh/setup-plural@v0.2.0
+      - name: Using short sha
+        run: echo ${GITHUB_SHA::7}
+      - name: Update service
+        run: plural cd services update @mgmt/marketing --conf tag=sha-${GITHUB_SHA::7}
+```
+
+Note that the `runs-on` job attribute is what specifies this as using the plrl-github-actions runner. It's worth also looking into some of the control mechanisms Github provides to gate what repositories and workflows can leverage self-hosted runners to manage the security tradeoffs it poses.
+
+{% callout severity="warning" %}
+Github recommends you don't use self-hosted runners on public repositories due to the complexity required to prevent workflows from being run by fork repository pull requests.
+{% /callout %}
+
 ## Addendum
 
 Since the plural cli is a standalone go binary, it can easily be injected in any CI framework in much the same way by installing it and the executing the appropriate cli command to modify your service once a deployable artifact has been built.
diff --git a/pages/deployments/monitoring-addons.md b/pages/deployments/monitoring-addons.md
@@ -8,3 +8,18 @@ description: Set up common monitoring agents
 The `datadog` add-on will automatically install Datadog's onto a cluster for you. You can also create a global service to automate installing the agent throughout your fleet. It will ask for your datadog api and app keys, and automatically inject them into the agent. We'll also manage future upgrades of the agent so you don't have to.
 
 Once the agent is installed, there are often additional features that need to be enabled to get the full Datadog ecosystem functioning. We recommend visiting their docs [here](https://docs.datadoghq.com/containers/kubernetes/installation/?tab=operator#next-steps)
+
+## Grafana Agent
+
+The `grafana-agent` add-on will deploy an instance of grafana's metrics agent on a cluster in a self-serviceable way. The agent simplifies the process of configuring remote writes for prometheus (without needing a full prometheus db) and also integrates with the standard coreos `ServiceMonitor` and `PodMonitor` CRDs.
+
+Our configuration for the agent will ask you to:
+
+- input hostname and basic auth information for prometheus
+- input hostname and basic auth information for loki
+
+And will immediately start shipping logs to both on your behalf. We'd recommend also leveraging our global service setup to simplify rolling it out to your entire fleet. You'll be able to distinguish metrics via the `cluster` label in both Loki and Prometheus, which will map to the cluster handle attribute to ensure it's human-readable.
+
+If you haven't set up Loki or Prometheus, and you created your console via Plural, we recommend using our Mimir and Loki setups in the Plural marketplace. They're completely self-serviceable and will properly configure the underlying S3/GCS/Azure Blob Storage needed to persist the metrics data. In addition, our Grafana distribution auto-integrates them as datasources, so there's no additional setup needed there.
+
+If you set up your console via BYOK, then feel free to let us know and we can help you set them up as part of our support packages.
diff --git a/pages/deployments/network-configuration.md b/pages/deployments/network-configuration.md
@@ -0,0 +1,50 @@
+---
+title: Network Configuration
+description: Modifying ingress controller and setting up public/private endpoints for your console
+---
+
+## Overview
+
+There are a few strategies you can take to harden the network security of your console or align it with how you typically secure kubernetes ingresses. We'll note a few of these here.
+
+## Bringing Your Own Ingress
+
+Our helm chart has the ability to reconfigure the ingress class for your console. This could be useful if you already have an ingress controller with CIDR ranges and WAF setups built in. The helm values change is relatively simple, simply do:
+
+```yaml
+ingress:
+  ingressClass: <new-ingress-class>
+  # potentially you might also want to add some annotations
+  annotations:
+    new.ingress.annotations: <value>
+
+kas:
+  ingress:
+    ingressClass: <new-ingress-class>
+```
+
+Both KAS and the console leverage websockets for some portion of their functionality. In the case of the console, the websockets are also far more performant with connection stickiness in place. Some ingress controllers have inconsistent websocket support (or require paid versions to unlock it), which is worth keeping in mind.
+
+Also we do configure the ingresses with cert-manager by default. Some orgs will set a wildcard cert at the ingress level, in which case you'd want to disable the ingress-level certs.
+
+## Public/Private Ingress
+
+Another setup we support is splitting the console ingress between public and private. This allows you to host the entirety of the Console's api in a private network, while exposing a subset needed to serve the apis for the deployment agents to poll our APIs. These apis are minimal, they only provide:
+
+- read access to the services deployable to an agent
+- a ping endpoint for a given cluster sending the cluster version and a timestamp
+- the ability to update the components created for a service by an agent
+
+This is a relatively easy way to ensure network connectivity to end clusters in a pretty broad network topology, but there are of course other more advanced setups a team can attempt. The basic setup for this is as follows:
+
+```yaml
+ingress:
+  ingressClass: internal-nginx # or another private ingress controller
+
+externalIngress:
+  hostname: console-ext.your.subdomain # or whatever you'd like to rename it
+```
+
+This will create a second, limited ingress exposing only the apis listed above via path routing. In this world, we'd also recommend you leave the KAS service also on a similar network as the external ingress.
+
+There are still additional tactics you can use to harden this setup, for instance adding CIDR ranges for the NAT gateways of all the networks the clusters you wish to deploy to reside on can provide robust firewalling for the ingresses you'd configured.
diff --git a/src/NavData.tsx b/src/NavData.tsx
@@ -80,6 +80,16 @@ const rootNavData: NavMenu = deepFreeze([
             href: '/deployments/existing-cluster',
             title: 'Set Up on your own Cluster',
           },
+          {
+            href: '/deployments/advanced-configuration',
+            title: 'Advanced Configuration',
+            sections: [
+              {
+                title: 'Network Configuration',
+                href: '/deployments/network-configuration',
+              },
+            ],
+          },
         ],
       },
       {
diff --git a/src/generated/pages.json b/src/generated/pages.json
@@ -86,6 +86,9 @@
   {
     "path": "/deployments/addons"
   },
+  {
+    "path": "/deployments/advanced-configuration"
+  },
   {
     "path": "/deployments/architecture"
   },
@@ -152,6 +155,9 @@
   {
     "path": "/deployments/network-addons"
   },
+  {
+    "path": "/deployments/network-configuration"
+  },
   {
     "path": "/deployments/operations"
   },

-Original file line number
+Diff line change
@@ @@ -0,0 +1,4 @@ @@
 +---
 +title: Advanced Configuration
 +description: Fine-tuning your Plural Console to meet your requirements
 +---
Original file line number	Diff line number	Diff line change
`@@ -86,6 +86,9 @@`
`86`	`86`	`{`
`87`	`87`	`"path": "/deployments/addons"`
`88`	`88`	`},`
	`89`	`+ {`
	`90`	`+ "path": "/deployments/advanced-configuration"`
	`91`	`+ },`
`89`	`92`	`{`
`90`	`93`	`"path": "/deployments/architecture"`
`91`	`94`	`},`
`@@ -152,6 +155,9 @@`
`152`	`155`	`{`
`153`	`156`	`"path": "/deployments/network-addons"`
`154`	`157`	`},`
	`158`	`+ {`
	`159`	`+ "path": "/deployments/network-configuration"`
	`160`	`+ },`
`155`	`161`	`{`
`156`	`162`	`"path": "/deployments/operations"`
`157`	`163`	`},`