- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.9k
[dg] adds `dg scaffold gitlab-ci #32659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| ), | ||
| ContainerRegistryInfo( | ||
| name="DockerHub", | ||
| match=lambda url: "docker.io" in url, | 
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High
docker.io
          
            
              
                
              
            
            Show autofix suggestion
            Hide autofix suggestion
          
      Copilot Autofix
AI 3 days ago
The optimal fix is to parse the provided URL using Python's urlparse from the standard library, then extract the hostname for checking. This ensures we only match exact hostnames or well-defined subdomains rather than arbitrary string matches.
For the lambda functions highlighted (such as those checking for 'docker.io', 'gcr.io', 'ecr', and especially 'gitlab.com'/'registry.gitlab.com'), the match function should parse the URL, retrieve its .hostname, and then only return True if the host exactly matches the registry's domain or is a well-formed subdomain (as appropriate).
What to change:
- In lines where match=lambda url: "docker.io" in urland similar substring checks occur, update these to parse the URL, retrieve the hostname, and check for an exact match or proper subdomain (e.g.,hostname == "docker.io"orhostname.endswith(".docker.io")as needed).
- Add from urllib.parse import urlparseat the top if not already imported in the shown snippet.
- Do not change the calling or surrounding code, only the lambda match definitions.
- 
    
    
    Copy modified line R3 
- 
    
    
    Copy modified lines R81-R85 
- 
    
    
    Copy modified lines R95-R99 
- 
    
    
    Copy modified lines R108-R114 
- 
    
    
    Copy modified lines R134-R138 
| @@ -1,5 +1,6 @@ | ||
| import subprocess | ||
| from pathlib import Path | ||
| from urllib.parse import urlparse | ||
| from typing import Callable, NamedTuple, Optional, cast | ||
|  | ||
| import click | ||
| @@ -77,7 +78,11 @@ | ||
| REGISTRY_INFOS = [ | ||
| ContainerRegistryInfo( | ||
| name="ECR", | ||
| match=lambda url: "ecr" in url, | ||
| match=lambda url: ( | ||
| urlparse(url).hostname is not None and ( | ||
| "ecr" in urlparse(url).hostname | ||
| ) | ||
| ), | ||
| fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "ecr-login-fragment.yml", | ||
| secrets_hints=[ | ||
| "AWS_ACCESS_KEY_ID - Your AWS access key ID", | ||
| @@ -87,7 +92,11 @@ | ||
| ), | ||
| ContainerRegistryInfo( | ||
| name="DockerHub", | ||
| match=lambda url: "docker.io" in url, | ||
| match=lambda url: ( | ||
| urlparse(url).hostname is not None and ( | ||
| urlparse(url).hostname == "docker.io" or urlparse(url).hostname.endswith(".docker.io") | ||
| ) | ||
| ), | ||
| fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "dockerhub-login-fragment.yml", | ||
| secrets_hints=[ | ||
| "DOCKERHUB_USERNAME - Your DockerHub username", | ||
| @@ -96,7 +105,13 @@ | ||
| ), | ||
| ContainerRegistryInfo( | ||
| name="GitLab Container Registry", | ||
| match=lambda url: "registry.gitlab.com" in url or "gitlab.com" in url, | ||
| match=lambda url: ( | ||
| urlparse(url).hostname is not None and ( | ||
| urlparse(url).hostname == "registry.gitlab.com" or | ||
| urlparse(url).hostname == "gitlab.com" or | ||
| urlparse(url).hostname.endswith(".gitlab.com") | ||
| ) | ||
| ), | ||
| fragment=TEMPLATES_DIR | ||
| / "gitlab_registry_fragments" | ||
| / "gitlab-container-registry-login-fragment.yml", | ||
| @@ -116,7 +131,11 @@ | ||
| ), | ||
| ContainerRegistryInfo( | ||
| name="Google Container Registry", | ||
| match=lambda url: "gcr.io" in url, | ||
| match=lambda url: ( | ||
| urlparse(url).hostname is not None and ( | ||
| urlparse(url).hostname == "gcr.io" or urlparse(url).hostname.endswith(".gcr.io") | ||
| ) | ||
| ), | ||
| fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "gcr-login-fragment.yml", | ||
| secrets_hints=[ | ||
| "GCR_JSON_KEY - Your GCR service account JSON key", | 
| ), | ||
| ContainerRegistryInfo( | ||
| name="GitLab Container Registry", | ||
| match=lambda url: "registry.gitlab.com" in url or "gitlab.com" in url, | 
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High
registry.gitlab.com
          
            
              
                
              
            
            Show autofix suggestion
            Hide autofix suggestion
          
      Copilot Autofix
AI 3 days ago
The safest way to fix this is to parse the URL and verify the hostname matches the allowed values, rather than checking for substring inclusion. To do this, update the match lambdas in REGISTRY_INFOS to parse the input (likely a URL) using urllib.parse.urlparse, then check if the .hostname is exactly equal to or endswith the required host (for supporting subdomains if desired). For registries supporting only the exact host, use ==, otherwise use .endswith() with a preceding dot.
The following fixes should be applied within the same file:
- At the top, import urlparsefromurllib.parse.
- For each match=lambda url: ...lambda:- Use urlparse(url).hostnameto extract the host.
- Change checks from substring into hostname comparison, e.g.hostname == "registry.gitlab.com"(or.endswith(".gitlab.com")to allow subdomains, as appropriate for each registry).
 
- Use 
Edit only the shown code regions (i.e. rework the relevant lambdas inside the REGISTRY_INFOS list).
- 
    
    
    Copy modified line R4 
- 
    
    
    Copy modified line R80 
- 
    
    
    Copy modified line R90 
- 
    
    
    Copy modified line R99 
- 
    
    
    Copy modified line R107 
- 
    
    
    Copy modified line R119 
| @@ -1,6 +1,7 @@ | ||
| import subprocess | ||
| from pathlib import Path | ||
| from typing import Callable, NamedTuple, Optional, cast | ||
| from urllib.parse import urlparse | ||
|  | ||
| import click | ||
| from dagster_dg_core.config import DgRawCliConfig, normalize_cli_config | ||
| @@ -12,7 +13,6 @@ | ||
|  | ||
| from dagster_dg_cli.cli.plus.constants import DgPlusAgentType | ||
| from dagster_dg_cli.utils.plus.build import get_agent_type, get_dockerfile_path, merge_build_configs | ||
|  | ||
| TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" | ||
| SERVERLESS_GITLAB_CI_FILE = TEMPLATES_DIR / "serverless-gitlab-ci.yml" | ||
| HYBRID_GITLAB_CI_FILE = TEMPLATES_DIR / "hybrid-gitlab-ci.yml" | ||
| @@ -77,7 +77,7 @@ | ||
| REGISTRY_INFOS = [ | ||
| ContainerRegistryInfo( | ||
| name="ECR", | ||
| match=lambda url: "ecr" in url, | ||
| match=lambda url: (urlparse(url).hostname or "").endswith(".ecr.aws") if urlparse(url).hostname else False, | ||
| fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "ecr-login-fragment.yml", | ||
| secrets_hints=[ | ||
| "AWS_ACCESS_KEY_ID - Your AWS access key ID", | ||
| @@ -87,7 +87,7 @@ | ||
| ), | ||
| ContainerRegistryInfo( | ||
| name="DockerHub", | ||
| match=lambda url: "docker.io" in url, | ||
| match=lambda url: (urlparse(url).hostname == "docker.io") if urlparse(url).hostname else False, | ||
| fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "dockerhub-login-fragment.yml", | ||
| secrets_hints=[ | ||
| "DOCKERHUB_USERNAME - Your DockerHub username", | ||
| @@ -96,7 +96,7 @@ | ||
| ), | ||
| ContainerRegistryInfo( | ||
| name="GitLab Container Registry", | ||
| match=lambda url: "registry.gitlab.com" in url or "gitlab.com" in url, | ||
| match=lambda url: urlparse(url).hostname in {"registry.gitlab.com", "gitlab.com"} if urlparse(url).hostname else False, | ||
| fragment=TEMPLATES_DIR | ||
| / "gitlab_registry_fragments" | ||
| / "gitlab-container-registry-login-fragment.yml", | ||
| @@ -104,7 +104,7 @@ | ||
| ), | ||
| ContainerRegistryInfo( | ||
| name="Azure Container Registry", | ||
| match=lambda url: "azurecr.io" in url, | ||
| match=lambda url: (urlparse(url).hostname or "").endswith(".azurecr.io") if urlparse(url).hostname else False, | ||
| fragment=TEMPLATES_DIR | ||
| / "gitlab_registry_fragments" | ||
| / "azure-container-registry-login-fragment.yml", | ||
| @@ -116,7 +116,7 @@ | ||
| ), | ||
| ContainerRegistryInfo( | ||
| name="Google Container Registry", | ||
| match=lambda url: "gcr.io" in url, | ||
| match=lambda url: (urlparse(url).hostname or "").endswith(".gcr.io") if urlparse(url).hostname else False, | ||
| fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "gcr-login-fragment.yml", | ||
| secrets_hints=[ | ||
| "GCR_JSON_KEY - Your GCR service account JSON key", | 
| ), | ||
| ContainerRegistryInfo( | ||
| name="GitLab Container Registry", | ||
| match=lambda url: "registry.gitlab.com" in url or "gitlab.com" in url, | 
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High
gitlab.com
          
            
              
                
              
            
            Show autofix suggestion
            Hide autofix suggestion
          
      Copilot Autofix
AI 3 days ago
The best way to fix the problem is to ensure that URL matching works by examining the parsed hostname of the URL, not by substring-matching the URL string. Specifically, for each lambda in the match argument of ContainerRegistryInfo, replace unsafe substring checks like "gitlab.com" in url with a safe check using urlparse(url).hostname. For example, check if the hostname equals or ends with gitlab.com or its valid subdomains. Also, update "docker.io" in url, "ecr" in url, "azurecr.io" in url, "gcr.io" in url similarly, parsing the URL first and matching the relevant registry domain in the hostname.
Add the necessary import for urlparse from urllib.parse at the top of the file. This change should be made within the definition of the REGISTRY_INFOS list in python_modules/libraries/dagster-dg-cli/dagster_dg_cli/cli/scaffold/gitlab_ci.py.
- 
    
    
    Copy modified line R4 
- 
    
    
    Copy modified lines R80-R83 
- 
    
    
    Copy modified lines R93-R95 
- 
    
    
    Copy modified lines R104-R108 
- 
    
    
    Copy modified lines R116-R119 
- 
    
    
    Copy modified lines R131-R134 
| @@ -1,7 +1,7 @@ | ||
| import subprocess | ||
| from pathlib import Path | ||
| from typing import Callable, NamedTuple, Optional, cast | ||
|  | ||
| from urllib.parse import urlparse | ||
| import click | ||
| from dagster_dg_core.config import DgRawCliConfig, normalize_cli_config | ||
| from dagster_dg_core.context import DgContext | ||
| @@ -77,7 +77,10 @@ | ||
| REGISTRY_INFOS = [ | ||
| ContainerRegistryInfo( | ||
| name="ECR", | ||
| match=lambda url: "ecr" in url, | ||
| match=lambda url: ( | ||
| (lambda host: host is not None and ("ecr." in host or host.endswith(".ecr"))) | ||
| (urlparse(url).hostname) | ||
| ), | ||
| fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "ecr-login-fragment.yml", | ||
| secrets_hints=[ | ||
| "AWS_ACCESS_KEY_ID - Your AWS access key ID", | ||
| @@ -87,7 +90,9 @@ | ||
| ), | ||
| ContainerRegistryInfo( | ||
| name="DockerHub", | ||
| match=lambda url: "docker.io" in url, | ||
| match=lambda url: ( | ||
| urlparse(url).hostname == "docker.io" | ||
| ), | ||
| fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "dockerhub-login-fragment.yml", | ||
| secrets_hints=[ | ||
| "DOCKERHUB_USERNAME - Your DockerHub username", | ||
| @@ -96,7 +101,11 @@ | ||
| ), | ||
| ContainerRegistryInfo( | ||
| name="GitLab Container Registry", | ||
| match=lambda url: "registry.gitlab.com" in url or "gitlab.com" in url, | ||
| match=lambda url: ( | ||
| # Registry: registry.gitlab.com, also allow gitlab.com | ||
| (lambda host: host is not None and (host == "registry.gitlab.com" or host == "gitlab.com" or host.endswith(".gitlab.com"))) | ||
| (urlparse(url).hostname) | ||
| ), | ||
| fragment=TEMPLATES_DIR | ||
| / "gitlab_registry_fragments" | ||
| / "gitlab-container-registry-login-fragment.yml", | ||
| @@ -104,7 +113,10 @@ | ||
| ), | ||
| ContainerRegistryInfo( | ||
| name="Azure Container Registry", | ||
| match=lambda url: "azurecr.io" in url, | ||
| match=lambda url: ( | ||
| (lambda host: host is not None and host.endswith(".azurecr.io")) | ||
| (urlparse(url).hostname) | ||
| ), | ||
| fragment=TEMPLATES_DIR | ||
| / "gitlab_registry_fragments" | ||
| / "azure-container-registry-login-fragment.yml", | ||
| @@ -116,7 +128,10 @@ | ||
| ), | ||
| ContainerRegistryInfo( | ||
| name="Google Container Registry", | ||
| match=lambda url: "gcr.io" in url, | ||
| match=lambda url: ( | ||
| (lambda host: host is not None and host.endswith(".gcr.io")) | ||
| (urlparse(url).hostname) | ||
| ), | ||
| fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "gcr-login-fragment.yml", | ||
| secrets_hints=[ | ||
| "GCR_JSON_KEY - Your GCR service account JSON key", | 
| ), | ||
| ContainerRegistryInfo( | ||
| name="Azure Container Registry", | ||
| match=lambda url: "azurecr.io" in url, | 
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High
azurecr.io
          
            
              
                
              
            
            Show autofix suggestion
            Hide autofix suggestion
          
      Copilot Autofix
AI 3 days ago
To securely check if a URL belongs to a recognized registry domain, we must parse the URL and examine its hostname in a structured way. The fix is to use the standard urlparse function from Python’s urllib.parse to parse the provided URL, extract the hostname, and only then perform matching—for example, by comparing the hostname to an explicit domain name or ensuring it ends with a valid domain preceded by a dot. For public registries with well-known host patterns (like docker.io, azurecr.io, gcr.io, registry.gitlab.com), use the parsed hostname for exact or suffix matching that prevents accidental substring matches or subdomain bypasses (i.e., .azurecr.io matches, but not notazurecr.io).
You only need to change the match lambdas in each ContainerRegistryInfo definition; add the necessary import for urlparse from urllib.parse if not already present.
- 
    
    
    Copy modified line R4 
- 
    
    
    Copy modified lines R81-R85 
- 
    
    
    Copy modified line R95 
- 
    
    
    Copy modified lines R104-R107 
- 
    
    
    Copy modified line R115 
- 
    
    
    Copy modified line R127 
| @@ -1,6 +1,7 @@ | ||
| import subprocess | ||
| from pathlib import Path | ||
| from typing import Callable, NamedTuple, Optional, cast | ||
| from urllib.parse import urlparse | ||
|  | ||
| import click | ||
| from dagster_dg_core.config import DgRawCliConfig, normalize_cli_config | ||
| @@ -77,7 +78,11 @@ | ||
| REGISTRY_INFOS = [ | ||
| ContainerRegistryInfo( | ||
| name="ECR", | ||
| match=lambda url: "ecr" in url, | ||
| match=lambda url: ( | ||
| (urlparse(url).hostname or "").endswith(".ecr.aws") | ||
| or (urlparse(url).hostname or "").endswith(".ecr.amazonaws.com") | ||
| or (urlparse(url).hostname or "").endswith(".ecr") | ||
| ), | ||
| fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "ecr-login-fragment.yml", | ||
| secrets_hints=[ | ||
| "AWS_ACCESS_KEY_ID - Your AWS access key ID", | ||
| @@ -87,7 +92,7 @@ | ||
| ), | ||
| ContainerRegistryInfo( | ||
| name="DockerHub", | ||
| match=lambda url: "docker.io" in url, | ||
| match=lambda url: (urlparse(url).hostname or "") == "docker.io", | ||
| fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "dockerhub-login-fragment.yml", | ||
| secrets_hints=[ | ||
| "DOCKERHUB_USERNAME - Your DockerHub username", | ||
| @@ -96,7 +101,10 @@ | ||
| ), | ||
| ContainerRegistryInfo( | ||
| name="GitLab Container Registry", | ||
| match=lambda url: "registry.gitlab.com" in url or "gitlab.com" in url, | ||
| match=lambda url: ( | ||
| (urlparse(url).hostname or "") == "registry.gitlab.com" | ||
| or (urlparse(url).hostname or "").endswith(".gitlab.com") | ||
| ), | ||
| fragment=TEMPLATES_DIR | ||
| / "gitlab_registry_fragments" | ||
| / "gitlab-container-registry-login-fragment.yml", | ||
| @@ -104,7 +112,7 @@ | ||
| ), | ||
| ContainerRegistryInfo( | ||
| name="Azure Container Registry", | ||
| match=lambda url: "azurecr.io" in url, | ||
| match=lambda url: (urlparse(url).hostname or "").endswith(".azurecr.io"), | ||
| fragment=TEMPLATES_DIR | ||
| / "gitlab_registry_fragments" | ||
| / "azure-container-registry-login-fragment.yml", | ||
| @@ -116,7 +124,7 @@ | ||
| ), | ||
| ContainerRegistryInfo( | ||
| name="Google Container Registry", | ||
| match=lambda url: "gcr.io" in url, | ||
| match=lambda url: (urlparse(url).hostname or "").endswith(".gcr.io"), | ||
| fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "gcr-login-fragment.yml", | ||
| secrets_hints=[ | ||
| "GCR_JSON_KEY - Your GCR service account JSON key", | 
| ), | ||
| ContainerRegistryInfo( | ||
| name="Google Container Registry", | ||
| match=lambda url: "gcr.io" in url, | 
Check failure
Code scanning / CodeQL
Incomplete URL substring sanitization High
gcr.io
          
            
              
                
              
            
            Show autofix suggestion
            Hide autofix suggestion
          
      Copilot Autofix
AI 3 days ago
To properly fix the issue, we should parse the input registry URL and compare the hostname portion (and potentially scheme) of the parsed URL to known valid hosts. This means updating the match lambda functions in the ContainerRegistryInfo definitions to avoid substring searching and instead use URL parsing (e.g., via urllib.parse.urlparse). For example, instead of "gcr.io" in url, use something like hostname = urlparse(url).hostname; hostname and hostname.endswith('gcr.io'). We should update all lambdas to use a consistent, correct host matching approach.
This affects the REGISTRY_INFOS definitions (lines 78-125), specifically updating the match lambdas for ECR, DockerHub, GitLab, Azure, and Google. For this, we should import urlparse from Python's standard library and update each relevant lambda accordingly.
- 
    
    
    Copy modified line R4 
- 
    
    
    Copy modified lines R80-R82 
- 
    
    
    Copy modified line R92 
- 
    
    
    Copy modified lines R101-R104 
- 
    
    
    Copy modified line R112 
- 
    
    
    Copy modified line R124 
| @@ -1,6 +1,7 @@ | ||
| import subprocess | ||
| from pathlib import Path | ||
| from typing import Callable, NamedTuple, Optional, cast | ||
| from urllib.parse import urlparse | ||
|  | ||
| import click | ||
| from dagster_dg_core.config import DgRawCliConfig, normalize_cli_config | ||
| @@ -12,7 +13,6 @@ | ||
|  | ||
| from dagster_dg_cli.cli.plus.constants import DgPlusAgentType | ||
| from dagster_dg_cli.utils.plus.build import get_agent_type, get_dockerfile_path, merge_build_configs | ||
|  | ||
| TEMPLATES_DIR = Path(__file__).parent.parent.parent / "templates" | ||
| SERVERLESS_GITLAB_CI_FILE = TEMPLATES_DIR / "serverless-gitlab-ci.yml" | ||
| HYBRID_GITLAB_CI_FILE = TEMPLATES_DIR / "hybrid-gitlab-ci.yml" | ||
| @@ -77,7 +77,9 @@ | ||
| REGISTRY_INFOS = [ | ||
| ContainerRegistryInfo( | ||
| name="ECR", | ||
| match=lambda url: "ecr" in url, | ||
| match=lambda url: ( | ||
| (urlparse(url).hostname or "").endswith(".ecr.amazonaws.com") | ||
| ), | ||
| fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "ecr-login-fragment.yml", | ||
| secrets_hints=[ | ||
| "AWS_ACCESS_KEY_ID - Your AWS access key ID", | ||
| @@ -87,7 +89,7 @@ | ||
| ), | ||
| ContainerRegistryInfo( | ||
| name="DockerHub", | ||
| match=lambda url: "docker.io" in url, | ||
| match=lambda url: (urlparse(url).hostname or "") == "docker.io", | ||
| fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "dockerhub-login-fragment.yml", | ||
| secrets_hints=[ | ||
| "DOCKERHUB_USERNAME - Your DockerHub username", | ||
| @@ -96,7 +98,10 @@ | ||
| ), | ||
| ContainerRegistryInfo( | ||
| name="GitLab Container Registry", | ||
| match=lambda url: "registry.gitlab.com" in url or "gitlab.com" in url, | ||
| match=lambda url: ( | ||
| (urlparse(url).hostname or "") == "registry.gitlab.com" | ||
| or (urlparse(url).hostname or "").endswith(".gitlab.com") | ||
| ), | ||
| fragment=TEMPLATES_DIR | ||
| / "gitlab_registry_fragments" | ||
| / "gitlab-container-registry-login-fragment.yml", | ||
| @@ -104,7 +109,7 @@ | ||
| ), | ||
| ContainerRegistryInfo( | ||
| name="Azure Container Registry", | ||
| match=lambda url: "azurecr.io" in url, | ||
| match=lambda url: (urlparse(url).hostname or "").endswith(".azurecr.io"), | ||
| fragment=TEMPLATES_DIR | ||
| / "gitlab_registry_fragments" | ||
| / "azure-container-registry-login-fragment.yml", | ||
| @@ -116,7 +121,7 @@ | ||
| ), | ||
| ContainerRegistryInfo( | ||
| name="Google Container Registry", | ||
| match=lambda url: "gcr.io" in url, | ||
| match=lambda url: (urlparse(url).hostname or "").endswith(".gcr.io"), | ||
| fragment=TEMPLATES_DIR / "gitlab_registry_fragments" / "gcr-login-fragment.yml", | ||
| secrets_hints=[ | ||
| "GCR_JSON_KEY - Your GCR service account JSON key", | 
| 
 This stack of pull requests is managed by Graphite. Learn more about stacking. | 

Summary & Motivation
Adds a
dg scaffold gitlab-cicommand to reach feature parity withdg scaffold github-actions.Note: consider consolidating into
dg scaffold ci --{github,gitlab}How I Tested These Changes
Changelog
dg scaffold gitlab-cicommand for scaffolding CI for GitLab projects