Skip to content

Conversation

@cmpadden
Copy link
Contributor

Summary & Motivation

Adds a dg scaffold gitlab-ci command to reach feature parity with dg scaffold github-actions.

Note: consider consolidating into dg scaffold ci --{github,gitlab}

How I Tested These Changes

Changelog

  • Adds a dg scaffold gitlab-ci command for scaffolding CI for GitLab projects

),
ContainerRegistryInfo(
name="DockerHub",
match=lambda url: "docker.io" in url,

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

The string
docker.io
may be at an arbitrary position in the sanitized URL.

Copilot Autofix

AI 3 days ago

The optimal fix is to parse the provided URL using Python's urlparse from the standard library, then extract the hostname for checking. This ensures we only match exact hostnames or well-defined subdomains rather than arbitrary string matches.

For the lambda functions highlighted (such as those checking for 'docker.io', 'gcr.io', 'ecr', and especially 'gitlab.com'/'registry.gitlab.com'), the match function should parse the URL, retrieve its .hostname, and then only return True if the host exactly matches the registry's domain or is a well-formed subdomain (as appropriate).

What to change:

  • In lines where match=lambda url: "docker.io" in url and similar substring checks occur, update these to parse the URL, retrieve the hostname, and check for an exact match or proper subdomain (e.g., hostname == "docker.io" or hostname.endswith(".docker.io") as needed).
  • Add from urllib.parse import urlparse at the top if not already imported in the shown snippet.
  • Do not change the calling or surrounding code, only the lambda match definitions.

Suggested changeset 1
python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py b/python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py
--- a/python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py
+++ b/python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py
@@ -1,5 +1,6 @@
 import subprocess
 from pathlib import Path
+from urllib.parse import urlparse
 from typing import Callable, NamedTuple, Optional, cast
 
 import click
@@ -77,7 +78,11 @@
 REGISTRY_INFOS = [
     ContainerRegistryInfo(
         name="ECR",
-        match=lambda url: "ecr" in url,
+        match=lambda url: (
+            urlparse(url).hostname is not None and (
+                "ecr" in urlparse(url).hostname
+            )
+        ),
         fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "ecr-login-fragment.yml",
         secrets_hints=[
             "AWS_ACCESS_KEY_ID - Your AWS access key ID",
@@ -87,7 +92,11 @@
     ),
     ContainerRegistryInfo(
         name="DockerHub",
-        match=lambda url: "docker.io" in url,
+        match=lambda url: (
+            urlparse(url).hostname is not None and (
+                urlparse(url).hostname == "docker.io" or urlparse(url).hostname.endswith(".docker.io")
+            )
+        ),
         fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "dockerhub-login-fragment.yml",
         secrets_hints=[
             "DOCKERHUB_USERNAME - Your DockerHub username",
@@ -96,7 +105,13 @@
     ),
     ContainerRegistryInfo(
         name="GitLab Container Registry",
-        match=lambda url: "registry.gitlab.com" in url or "gitlab.com" in url,
+        match=lambda url: (
+            urlparse(url).hostname is not None and (
+                urlparse(url).hostname == "registry.gitlab.com" or
+                urlparse(url).hostname == "gitlab.com" or
+                urlparse(url).hostname.endswith(".gitlab.com")
+            )
+        ),
         fragment=TEMPLATES_DIR
         / "gitlab_registry_fragments"
         / "gitlab-container-registry-login-fragment.yml",
@@ -116,7 +131,11 @@
     ),
     ContainerRegistryInfo(
         name="Google Container Registry",
-        match=lambda url: "gcr.io" in url,
+        match=lambda url: (
+            urlparse(url).hostname is not None and (
+                urlparse(url).hostname == "gcr.io" or urlparse(url).hostname.endswith(".gcr.io")
+            )
+        ),
         fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "gcr-login-fragment.yml",
         secrets_hints=[
             "GCR_JSON_KEY - Your GCR service account JSON key",
EOF
@@ -1,5 +1,6 @@
import subprocess
from pathlib import Path
from urllib.parse import urlparse
from typing import Callable, NamedTuple, Optional, cast

import click
@@ -77,7 +78,11 @@
REGISTRY_INFOS = [
ContainerRegistryInfo(
name="ECR",
match=lambda url: "ecr" in url,
match=lambda url: (
urlparse(url).hostname is not None and (
"ecr" in urlparse(url).hostname
)
),
fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "ecr-login-fragment.yml",
secrets_hints=[
"AWS_ACCESS_KEY_ID - Your AWS access key ID",
@@ -87,7 +92,11 @@
),
ContainerRegistryInfo(
name="DockerHub",
match=lambda url: "docker.io" in url,
match=lambda url: (
urlparse(url).hostname is not None and (
urlparse(url).hostname == "docker.io" or urlparse(url).hostname.endswith(".docker.io")
)
),
fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "dockerhub-login-fragment.yml",
secrets_hints=[
"DOCKERHUB_USERNAME - Your DockerHub username",
@@ -96,7 +105,13 @@
),
ContainerRegistryInfo(
name="GitLab Container Registry",
match=lambda url: "registry.gitlab.com" in url or "gitlab.com" in url,
match=lambda url: (
urlparse(url).hostname is not None and (
urlparse(url).hostname == "registry.gitlab.com" or
urlparse(url).hostname == "gitlab.com" or
urlparse(url).hostname.endswith(".gitlab.com")
)
),
fragment=TEMPLATES_DIR
/ "gitlab_registry_fragments"
/ "gitlab-container-registry-login-fragment.yml",
@@ -116,7 +131,11 @@
),
ContainerRegistryInfo(
name="Google Container Registry",
match=lambda url: "gcr.io" in url,
match=lambda url: (
urlparse(url).hostname is not None and (
urlparse(url).hostname == "gcr.io" or urlparse(url).hostname.endswith(".gcr.io")
)
),
fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "gcr-login-fragment.yml",
secrets_hints=[
"GCR_JSON_KEY - Your GCR service account JSON key",
Copilot is powered by AI and may make mistakes. Always verify output.
),
ContainerRegistryInfo(
name="GitLab Container Registry",
match=lambda url: "registry.gitlab.com" in url or "gitlab.com" in url,

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

The string
registry.gitlab.com
may be at an arbitrary position in the sanitized URL.

Copilot Autofix

AI 3 days ago

The safest way to fix this is to parse the URL and verify the hostname matches the allowed values, rather than checking for substring inclusion. To do this, update the match lambdas in REGISTRY_INFOS to parse the input (likely a URL) using urllib.parse.urlparse, then check if the .hostname is exactly equal to or endswith the required host (for supporting subdomains if desired). For registries supporting only the exact host, use ==, otherwise use .endswith() with a preceding dot.

The following fixes should be applied within the same file:

  • At the top, import urlparse from urllib.parse.
  • For each match=lambda url: ... lambda:
    • Use urlparse(url).hostname to extract the host.
    • Change checks from substring in to hostname comparison, e.g. hostname == "registry.gitlab.com" (or .endswith(".gitlab.com") to allow subdomains, as appropriate for each registry).

Edit only the shown code regions (i.e. rework the relevant lambdas inside the REGISTRY_INFOS list).

Suggested changeset 1
python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py b/python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py
--- a/python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py
+++ b/python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py
@@ -1,6 +1,7 @@
 import subprocess
 from pathlib import Path
 from typing import Callable, NamedTuple, Optional, cast
+from urllib.parse import urlparse
 
 import click
 from dagster_dg_core.config import DgRawCliConfig, normalize_cli_config
@@ -12,7 +13,6 @@
 
 from dagster_dg_cli.cli.plus.constants import DgPlusAgentType
 from dagster_dg_cli.utils.plus.build import get_agent_type, get_dockerfile_path, merge_build_configs
-
 TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates"
 SERVERLESS_GITLAB_CI_FILE = TEMPLATES_DIR / "serverless-gitlab-ci.yml"
 HYBRID_GITLAB_CI_FILE = TEMPLATES_DIR / "hybrid-gitlab-ci.yml"
@@ -77,7 +77,7 @@
 REGISTRY_INFOS = [
     ContainerRegistryInfo(
         name="ECR",
-        match=lambda url: "ecr" in url,
+        match=lambda url: (urlparse(url).hostname or "").endswith(".ecr.aws") if urlparse(url).hostname else False,
         fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "ecr-login-fragment.yml",
         secrets_hints=[
             "AWS_ACCESS_KEY_ID - Your AWS access key ID",
@@ -87,7 +87,7 @@
     ),
     ContainerRegistryInfo(
         name="DockerHub",
-        match=lambda url: "docker.io" in url,
+        match=lambda url: (urlparse(url).hostname == "docker.io") if urlparse(url).hostname else False,
         fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "dockerhub-login-fragment.yml",
         secrets_hints=[
             "DOCKERHUB_USERNAME - Your DockerHub username",
@@ -96,7 +96,7 @@
     ),
     ContainerRegistryInfo(
         name="GitLab Container Registry",
-        match=lambda url: "registry.gitlab.com" in url or "gitlab.com" in url,
+        match=lambda url: urlparse(url).hostname in {"registry.gitlab.com", "gitlab.com"} if urlparse(url).hostname else False,
         fragment=TEMPLATES_DIR
         / "gitlab_registry_fragments"
         / "gitlab-container-registry-login-fragment.yml",
@@ -104,7 +104,7 @@
     ),
     ContainerRegistryInfo(
         name="Azure Container Registry",
-        match=lambda url: "azurecr.io" in url,
+        match=lambda url: (urlparse(url).hostname or "").endswith(".azurecr.io") if urlparse(url).hostname else False,
         fragment=TEMPLATES_DIR
         / "gitlab_registry_fragments"
         / "azure-container-registry-login-fragment.yml",
@@ -116,7 +116,7 @@
     ),
     ContainerRegistryInfo(
         name="Google Container Registry",
-        match=lambda url: "gcr.io" in url,
+        match=lambda url: (urlparse(url).hostname or "").endswith(".gcr.io") if urlparse(url).hostname else False,
         fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "gcr-login-fragment.yml",
         secrets_hints=[
             "GCR_JSON_KEY - Your GCR service account JSON key",
EOF
@@ -1,6 +1,7 @@
import subprocess
from pathlib import Path
from typing import Callable, NamedTuple, Optional, cast
from urllib.parse import urlparse

import click
from dagster_dg_core.config import DgRawCliConfig, normalize_cli_config
@@ -12,7 +13,6 @@

from dagster_dg_cli.cli.plus.constants import DgPlusAgentType
from dagster_dg_cli.utils.plus.build import get_agent_type, get_dockerfile_path, merge_build_configs

TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates"
SERVERLESS_GITLAB_CI_FILE = TEMPLATES_DIR / "serverless-gitlab-ci.yml"
HYBRID_GITLAB_CI_FILE = TEMPLATES_DIR / "hybrid-gitlab-ci.yml"
@@ -77,7 +77,7 @@
REGISTRY_INFOS = [
ContainerRegistryInfo(
name="ECR",
match=lambda url: "ecr" in url,
match=lambda url: (urlparse(url).hostname or "").endswith(".ecr.aws") if urlparse(url).hostname else False,
fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "ecr-login-fragment.yml",
secrets_hints=[
"AWS_ACCESS_KEY_ID - Your AWS access key ID",
@@ -87,7 +87,7 @@
),
ContainerRegistryInfo(
name="DockerHub",
match=lambda url: "docker.io" in url,
match=lambda url: (urlparse(url).hostname == "docker.io") if urlparse(url).hostname else False,
fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "dockerhub-login-fragment.yml",
secrets_hints=[
"DOCKERHUB_USERNAME - Your DockerHub username",
@@ -96,7 +96,7 @@
),
ContainerRegistryInfo(
name="GitLab Container Registry",
match=lambda url: "registry.gitlab.com" in url or "gitlab.com" in url,
match=lambda url: urlparse(url).hostname in {"registry.gitlab.com", "gitlab.com"} if urlparse(url).hostname else False,
fragment=TEMPLATES_DIR
/ "gitlab_registry_fragments"
/ "gitlab-container-registry-login-fragment.yml",
@@ -104,7 +104,7 @@
),
ContainerRegistryInfo(
name="Azure Container Registry",
match=lambda url: "azurecr.io" in url,
match=lambda url: (urlparse(url).hostname or "").endswith(".azurecr.io") if urlparse(url).hostname else False,
fragment=TEMPLATES_DIR
/ "gitlab_registry_fragments"
/ "azure-container-registry-login-fragment.yml",
@@ -116,7 +116,7 @@
),
ContainerRegistryInfo(
name="Google Container Registry",
match=lambda url: "gcr.io" in url,
match=lambda url: (urlparse(url).hostname or "").endswith(".gcr.io") if urlparse(url).hostname else False,
fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "gcr-login-fragment.yml",
secrets_hints=[
"GCR_JSON_KEY - Your GCR service account JSON key",
Copilot is powered by AI and may make mistakes. Always verify output.
),
ContainerRegistryInfo(
name="GitLab Container Registry",
match=lambda url: "registry.gitlab.com" in url or "gitlab.com" in url,

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

The string
gitlab.com
may be at an arbitrary position in the sanitized URL.

Copilot Autofix

AI 3 days ago

The best way to fix the problem is to ensure that URL matching works by examining the parsed hostname of the URL, not by substring-matching the URL string. Specifically, for each lambda in the match argument of ContainerRegistryInfo, replace unsafe substring checks like "gitlab.com" in url with a safe check using urlparse(url).hostname. For example, check if the hostname equals or ends with gitlab.com or its valid subdomains. Also, update "docker.io" in url, "ecr" in url, "azurecr.io" in url, "gcr.io" in url similarly, parsing the URL first and matching the relevant registry domain in the hostname.
Add the necessary import for urlparse from urllib.parse at the top of the file. This change should be made within the definition of the REGISTRY_INFOS list in python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py.

Suggested changeset 1
python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py b/python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py
--- a/python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py
+++ b/python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py
@@ -1,7 +1,7 @@
 import subprocess
 from pathlib import Path
 from typing import Callable, NamedTuple, Optional, cast
-
+from urllib.parse import urlparse
 import click
 from dagster_dg_core.config import DgRawCliConfig, normalize_cli_config
 from dagster_dg_core.context import DgContext
@@ -77,7 +77,10 @@
 REGISTRY_INFOS = [
     ContainerRegistryInfo(
         name="ECR",
-        match=lambda url: "ecr" in url,
+        match=lambda url: (
+            (lambda host: host is not None and ("ecr." in host or host.endswith(".ecr"))) 
+            (urlparse(url).hostname)
+        ),
         fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "ecr-login-fragment.yml",
         secrets_hints=[
             "AWS_ACCESS_KEY_ID - Your AWS access key ID",
@@ -87,7 +90,9 @@
     ),
     ContainerRegistryInfo(
         name="DockerHub",
-        match=lambda url: "docker.io" in url,
+        match=lambda url: (
+            urlparse(url).hostname == "docker.io"
+        ),
         fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "dockerhub-login-fragment.yml",
         secrets_hints=[
             "DOCKERHUB_USERNAME - Your DockerHub username",
@@ -96,7 +101,11 @@
     ),
     ContainerRegistryInfo(
         name="GitLab Container Registry",
-        match=lambda url: "registry.gitlab.com" in url or "gitlab.com" in url,
+        match=lambda url: (
+            # Registry: registry.gitlab.com, also allow gitlab.com
+            (lambda host: host is not None and (host == "registry.gitlab.com" or host == "gitlab.com" or host.endswith(".gitlab.com"))) 
+            (urlparse(url).hostname)
+        ),
         fragment=TEMPLATES_DIR
         / "gitlab_registry_fragments"
         / "gitlab-container-registry-login-fragment.yml",
@@ -104,7 +113,10 @@
     ),
     ContainerRegistryInfo(
         name="Azure Container Registry",
-        match=lambda url: "azurecr.io" in url,
+        match=lambda url: (
+            (lambda host: host is not None and host.endswith(".azurecr.io"))
+            (urlparse(url).hostname)
+        ),
         fragment=TEMPLATES_DIR
         / "gitlab_registry_fragments"
         / "azure-container-registry-login-fragment.yml",
@@ -116,7 +128,10 @@
     ),
     ContainerRegistryInfo(
         name="Google Container Registry",
-        match=lambda url: "gcr.io" in url,
+        match=lambda url: (
+            (lambda host: host is not None and host.endswith(".gcr.io"))
+            (urlparse(url).hostname)
+        ),
         fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "gcr-login-fragment.yml",
         secrets_hints=[
             "GCR_JSON_KEY - Your GCR service account JSON key",
EOF
@@ -1,7 +1,7 @@
import subprocess
from pathlib import Path
from typing import Callable, NamedTuple, Optional, cast

from urllib.parse import urlparse
import click
from dagster_dg_core.config import DgRawCliConfig, normalize_cli_config
from dagster_dg_core.context import DgContext
@@ -77,7 +77,10 @@
REGISTRY_INFOS = [
ContainerRegistryInfo(
name="ECR",
match=lambda url: "ecr" in url,
match=lambda url: (
(lambda host: host is not None and ("ecr." in host or host.endswith(".ecr")))
(urlparse(url).hostname)
),
fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "ecr-login-fragment.yml",
secrets_hints=[
"AWS_ACCESS_KEY_ID - Your AWS access key ID",
@@ -87,7 +90,9 @@
),
ContainerRegistryInfo(
name="DockerHub",
match=lambda url: "docker.io" in url,
match=lambda url: (
urlparse(url).hostname == "docker.io"
),
fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "dockerhub-login-fragment.yml",
secrets_hints=[
"DOCKERHUB_USERNAME - Your DockerHub username",
@@ -96,7 +101,11 @@
),
ContainerRegistryInfo(
name="GitLab Container Registry",
match=lambda url: "registry.gitlab.com" in url or "gitlab.com" in url,
match=lambda url: (
# Registry: registry.gitlab.com, also allow gitlab.com
(lambda host: host is not None and (host == "registry.gitlab.com" or host == "gitlab.com" or host.endswith(".gitlab.com")))
(urlparse(url).hostname)
),
fragment=TEMPLATES_DIR
/ "gitlab_registry_fragments"
/ "gitlab-container-registry-login-fragment.yml",
@@ -104,7 +113,10 @@
),
ContainerRegistryInfo(
name="Azure Container Registry",
match=lambda url: "azurecr.io" in url,
match=lambda url: (
(lambda host: host is not None and host.endswith(".azurecr.io"))
(urlparse(url).hostname)
),
fragment=TEMPLATES_DIR
/ "gitlab_registry_fragments"
/ "azure-container-registry-login-fragment.yml",
@@ -116,7 +128,10 @@
),
ContainerRegistryInfo(
name="Google Container Registry",
match=lambda url: "gcr.io" in url,
match=lambda url: (
(lambda host: host is not None and host.endswith(".gcr.io"))
(urlparse(url).hostname)
),
fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "gcr-login-fragment.yml",
secrets_hints=[
"GCR_JSON_KEY - Your GCR service account JSON key",
Copilot is powered by AI and may make mistakes. Always verify output.
),
ContainerRegistryInfo(
name="Azure Container Registry",
match=lambda url: "azurecr.io" in url,

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

The string
azurecr.io
may be at an arbitrary position in the sanitized URL.

Copilot Autofix

AI 3 days ago

To securely check if a URL belongs to a recognized registry domain, we must parse the URL and examine its hostname in a structured way. The fix is to use the standard urlparse function from Python’s urllib.parse to parse the provided URL, extract the hostname, and only then perform matching—for example, by comparing the hostname to an explicit domain name or ensuring it ends with a valid domain preceded by a dot. For public registries with well-known host patterns (like docker.io, azurecr.io, gcr.io, registry.gitlab.com), use the parsed hostname for exact or suffix matching that prevents accidental substring matches or subdomain bypasses (i.e., .azurecr.io matches, but not notazurecr.io).

You only need to change the match lambdas in each ContainerRegistryInfo definition; add the necessary import for urlparse from urllib.parse if not already present.


Suggested changeset 1
python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py b/python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py
--- a/python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py
+++ b/python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py
@@ -1,6 +1,7 @@
 import subprocess
 from pathlib import Path
 from typing import Callable, NamedTuple, Optional, cast
+from urllib.parse import urlparse
 
 import click
 from dagster_dg_core.config import DgRawCliConfig, normalize_cli_config
@@ -77,7 +78,11 @@
 REGISTRY_INFOS = [
     ContainerRegistryInfo(
         name="ECR",
-        match=lambda url: "ecr" in url,
+        match=lambda url: (
+            (urlparse(url).hostname or "").endswith(".ecr.aws")
+            or (urlparse(url).hostname or "").endswith(".ecr.amazonaws.com")
+            or (urlparse(url).hostname or "").endswith(".ecr")
+        ),
         fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "ecr-login-fragment.yml",
         secrets_hints=[
             "AWS_ACCESS_KEY_ID - Your AWS access key ID",
@@ -87,7 +92,7 @@
     ),
     ContainerRegistryInfo(
         name="DockerHub",
-        match=lambda url: "docker.io" in url,
+        match=lambda url: (urlparse(url).hostname or "") == "docker.io",
         fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "dockerhub-login-fragment.yml",
         secrets_hints=[
             "DOCKERHUB_USERNAME - Your DockerHub username",
@@ -96,7 +101,10 @@
     ),
     ContainerRegistryInfo(
         name="GitLab Container Registry",
-        match=lambda url: "registry.gitlab.com" in url or "gitlab.com" in url,
+        match=lambda url: (
+            (urlparse(url).hostname or "") == "registry.gitlab.com"
+            or (urlparse(url).hostname or "").endswith(".gitlab.com")
+        ),
         fragment=TEMPLATES_DIR
         / "gitlab_registry_fragments"
         / "gitlab-container-registry-login-fragment.yml",
@@ -104,7 +112,7 @@
     ),
     ContainerRegistryInfo(
         name="Azure Container Registry",
-        match=lambda url: "azurecr.io" in url,
+        match=lambda url: (urlparse(url).hostname or "").endswith(".azurecr.io"),
         fragment=TEMPLATES_DIR
         / "gitlab_registry_fragments"
         / "azure-container-registry-login-fragment.yml",
@@ -116,7 +124,7 @@
     ),
     ContainerRegistryInfo(
         name="Google Container Registry",
-        match=lambda url: "gcr.io" in url,
+        match=lambda url: (urlparse(url).hostname or "").endswith(".gcr.io"),
         fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "gcr-login-fragment.yml",
         secrets_hints=[
             "GCR_JSON_KEY - Your GCR service account JSON key",
EOF
@@ -1,6 +1,7 @@
import subprocess
from pathlib import Path
from typing import Callable, NamedTuple, Optional, cast
from urllib.parse import urlparse

import click
from dagster_dg_core.config import DgRawCliConfig, normalize_cli_config
@@ -77,7 +78,11 @@
REGISTRY_INFOS = [
ContainerRegistryInfo(
name="ECR",
match=lambda url: "ecr" in url,
match=lambda url: (
(urlparse(url).hostname or "").endswith(".ecr.aws")
or (urlparse(url).hostname or "").endswith(".ecr.amazonaws.com")
or (urlparse(url).hostname or "").endswith(".ecr")
),
fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "ecr-login-fragment.yml",
secrets_hints=[
"AWS_ACCESS_KEY_ID - Your AWS access key ID",
@@ -87,7 +92,7 @@
),
ContainerRegistryInfo(
name="DockerHub",
match=lambda url: "docker.io" in url,
match=lambda url: (urlparse(url).hostname or "") == "docker.io",
fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "dockerhub-login-fragment.yml",
secrets_hints=[
"DOCKERHUB_USERNAME - Your DockerHub username",
@@ -96,7 +101,10 @@
),
ContainerRegistryInfo(
name="GitLab Container Registry",
match=lambda url: "registry.gitlab.com" in url or "gitlab.com" in url,
match=lambda url: (
(urlparse(url).hostname or "") == "registry.gitlab.com"
or (urlparse(url).hostname or "").endswith(".gitlab.com")
),
fragment=TEMPLATES_DIR
/ "gitlab_registry_fragments"
/ "gitlab-container-registry-login-fragment.yml",
@@ -104,7 +112,7 @@
),
ContainerRegistryInfo(
name="Azure Container Registry",
match=lambda url: "azurecr.io" in url,
match=lambda url: (urlparse(url).hostname or "").endswith(".azurecr.io"),
fragment=TEMPLATES_DIR
/ "gitlab_registry_fragments"
/ "azure-container-registry-login-fragment.yml",
@@ -116,7 +124,7 @@
),
ContainerRegistryInfo(
name="Google Container Registry",
match=lambda url: "gcr.io" in url,
match=lambda url: (urlparse(url).hostname or "").endswith(".gcr.io"),
fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "gcr-login-fragment.yml",
secrets_hints=[
"GCR_JSON_KEY - Your GCR service account JSON key",
Copilot is powered by AI and may make mistakes. Always verify output.
),
ContainerRegistryInfo(
name="Google Container Registry",
match=lambda url: "gcr.io" in url,

Check failure

Code scanning / CodeQL

Incomplete URL substring sanitization High

The string
gcr.io
may be at an arbitrary position in the sanitized URL.

Copilot Autofix

AI 3 days ago

To properly fix the issue, we should parse the input registry URL and compare the hostname portion (and potentially scheme) of the parsed URL to known valid hosts. This means updating the match lambda functions in the ContainerRegistryInfo definitions to avoid substring searching and instead use URL parsing (e.g., via urllib.parse.urlparse). For example, instead of "gcr.io" in url, use something like hostname = urlparse(url).hostname; hostname and hostname.endswith('gcr.io'). We should update all lambdas to use a consistent, correct host matching approach.

This affects the REGISTRY_INFOS definitions (lines 78-125), specifically updating the match lambdas for ECR, DockerHub, GitLab, Azure, and Google. For this, we should import urlparse from Python's standard library and update each relevant lambda accordingly.


Suggested changeset 1
python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py b/python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py
--- a/python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py
+++ b/python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py
@@ -1,6 +1,7 @@
 import subprocess
 from pathlib import Path
 from typing import Callable, NamedTuple, Optional, cast
+from urllib.parse import urlparse
 
 import click
 from dagster_dg_core.config import DgRawCliConfig, normalize_cli_config
@@ -12,7 +13,6 @@
 
 from dagster_dg_cli.cli.plus.constants import DgPlusAgentType
 from dagster_dg_cli.utils.plus.build import get_agent_type, get_dockerfile_path, merge_build_configs
-
 TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates"
 SERVERLESS_GITLAB_CI_FILE = TEMPLATES_DIR / "serverless-gitlab-ci.yml"
 HYBRID_GITLAB_CI_FILE = TEMPLATES_DIR / "hybrid-gitlab-ci.yml"
@@ -77,7 +77,9 @@
 REGISTRY_INFOS = [
     ContainerRegistryInfo(
         name="ECR",
-        match=lambda url: "ecr" in url,
+        match=lambda url: (
+            (urlparse(url).hostname or "").endswith(".ecr.amazonaws.com")
+        ),
         fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "ecr-login-fragment.yml",
         secrets_hints=[
             "AWS_ACCESS_KEY_ID - Your AWS access key ID",
@@ -87,7 +89,7 @@
     ),
     ContainerRegistryInfo(
         name="DockerHub",
-        match=lambda url: "docker.io" in url,
+        match=lambda url: (urlparse(url).hostname or "") == "docker.io",
         fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "dockerhub-login-fragment.yml",
         secrets_hints=[
             "DOCKERHUB_USERNAME - Your DockerHub username",
@@ -96,7 +98,10 @@
     ),
     ContainerRegistryInfo(
         name="GitLab Container Registry",
-        match=lambda url: "registry.gitlab.com" in url or "gitlab.com" in url,
+        match=lambda url: (
+            (urlparse(url).hostname or "") == "registry.gitlab.com"
+            or (urlparse(url).hostname or "").endswith(".gitlab.com")
+        ),
         fragment=TEMPLATES_DIR
         / "gitlab_registry_fragments"
         / "gitlab-container-registry-login-fragment.yml",
@@ -104,7 +109,7 @@
     ),
     ContainerRegistryInfo(
         name="Azure Container Registry",
-        match=lambda url: "azurecr.io" in url,
+        match=lambda url: (urlparse(url).hostname or "").endswith(".azurecr.io"),
         fragment=TEMPLATES_DIR
         / "gitlab_registry_fragments"
         / "azure-container-registry-login-fragment.yml",
@@ -116,7 +121,7 @@
     ),
     ContainerRegistryInfo(
         name="Google Container Registry",
-        match=lambda url: "gcr.io" in url,
+        match=lambda url: (urlparse(url).hostname or "").endswith(".gcr.io"),
         fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "gcr-login-fragment.yml",
         secrets_hints=[
             "GCR_JSON_KEY - Your GCR service account JSON key",
EOF
@@ -1,6 +1,7 @@
import subprocess
from pathlib import Path
from typing import Callable, NamedTuple, Optional, cast
from urllib.parse import urlparse

import click
from dagster_dg_core.config import DgRawCliConfig, normalize_cli_config
@@ -12,7 +13,6 @@

from dagster_dg_cli.cli.plus.constants import DgPlusAgentType
from dagster_dg_cli.utils.plus.build import get_agent_type, get_dockerfile_path, merge_build_configs

TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates"
SERVERLESS_GITLAB_CI_FILE = TEMPLATES_DIR / "serverless-gitlab-ci.yml"
HYBRID_GITLAB_CI_FILE = TEMPLATES_DIR / "hybrid-gitlab-ci.yml"
@@ -77,7 +77,9 @@
REGISTRY_INFOS = [
ContainerRegistryInfo(
name="ECR",
match=lambda url: "ecr" in url,
match=lambda url: (
(urlparse(url).hostname or "").endswith(".ecr.amazonaws.com")
),
fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "ecr-login-fragment.yml",
secrets_hints=[
"AWS_ACCESS_KEY_ID - Your AWS access key ID",
@@ -87,7 +89,7 @@
),
ContainerRegistryInfo(
name="DockerHub",
match=lambda url: "docker.io" in url,
match=lambda url: (urlparse(url).hostname or "") == "docker.io",
fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "dockerhub-login-fragment.yml",
secrets_hints=[
"DOCKERHUB_USERNAME - Your DockerHub username",
@@ -96,7 +98,10 @@
),
ContainerRegistryInfo(
name="GitLab Container Registry",
match=lambda url: "registry.gitlab.com" in url or "gitlab.com" in url,
match=lambda url: (
(urlparse(url).hostname or "") == "registry.gitlab.com"
or (urlparse(url).hostname or "").endswith(".gitlab.com")
),
fragment=TEMPLATES_DIR
/ "gitlab_registry_fragments"
/ "gitlab-container-registry-login-fragment.yml",
@@ -104,7 +109,7 @@
),
ContainerRegistryInfo(
name="Azure Container Registry",
match=lambda url: "azurecr.io" in url,
match=lambda url: (urlparse(url).hostname or "").endswith(".azurecr.io"),
fragment=TEMPLATES_DIR
/ "gitlab_registry_fragments"
/ "azure-container-registry-login-fragment.yml",
@@ -116,7 +121,7 @@
),
ContainerRegistryInfo(
name="Google Container Registry",
match=lambda url: "gcr.io" in url,
match=lambda url: (urlparse(url).hostname or "").endswith(".gcr.io"),
fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "gcr-login-fragment.yml",
secrets_hints=[
"GCR_JSON_KEY - Your GCR service account JSON key",
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants