Introduction

In a Kubernetes setup where GitLab CI runners are utilized to build Docker images using Kaniko, DNS resolution issues can occasionally arise. This can lead to failed builds due to the inability of the Kaniko pods to resolve the hostname of the GitLab server. This guide will explore potential causes and solutions for this issue, particularly in environments protected by pfSense.

Setup Overview

The configuration involves a private GitLab server that orchestrates CI/CD processes through GitLab CI runners deployed on Kubernetes. The runners utilize the Kaniko image to build Docker images. The entire infrastructure is secured behind a pfSense firewall.

Problem Description

Users have reported intermittent failures where Kaniko pods cannot resolve the GitLab server's hostname, resulting in failed git pull commands and subsequent build failures. The failure rate can be as high as 60%, which is unacceptable for production environments. Interestingly, retrying the build often resolves the issue on subsequent attempts.

Environment Details

  • Kubernetes Cluster: Running on CentOS 7
  • SELinux: Disabled
  • FirewallD: Disabled
  • DNS Resolution: All hosts can resolve the GitLab server, and the issue is not isolated to specific nodes.

Observations

  • The problem does not appear to affect other pods that do not rely on DNS for connections.
  • The GitLab runner can pull the Kaniko image from gcr.io, indicating DNS functionality is present.

Troubleshooting Steps Taken

  1. DNS Testing: Pods dedicated to DNS requests were spawned, and they successfully resolved the domain without failures.
  2. Cluster Reboot: A complete reboot of the Kubernetes cluster and GitLab instance was performed, but the issue persisted.
  3. Static DNS Configuration: Attempts were made to statically configure DNS routes in pfSense, yet the problem continued.

Example CI Configuration

Below is a sample configuration for the CI pipeline using Kaniko:

build:
  stage: build
  image:
    name: gcr.io/kaniko-project/executor:debug
    entrypoint: [""]
  script:
    - echo $REGISTRY_AUTH > /kaniko/.docker/config.json
    - /kaniko/executor --context $CI_PROJECT_DIR --dockerfile $CI_PROJECT_DIR/Dockerfile --destination $REGISTRY_URL/$REGISTRY_IMAGE:$CI_JOB_ID
  only:
    - master

Error Example

During the build process, the following error may occur:

Initialized empty Git repository in /builds/MYPROJECT/.git/
Fetching changes...
Created fresh repository.
fatal: unable to access 'https://gitlab-ci-token:[MASKED]@git.mydomain.com/MYPROJECT.git/': Could not resolve host: git.mydomain.com

Possible Solutions

  • DNS Configuration: Ensure that the DNS settings in the GitLab runner's config.toml file are correctly pointing to a reliable DNS server. For example:
[[runners]]
  name = "my-runner"
  url = "https://git.mydomain.com"
  executor = "docker"
  dns = ["<YOUR_DNS_SERVER_IP>", "8.8.8.8"]
  • Restart Docker Service: If DNS resolution fails after job failures, restarting the Docker service may resolve the issue temporarily.
  • Network Policies: Review any network policies or firewall rules in pfSense that may be affecting DNS traffic to ensure that requests from the Kubernetes cluster are allowed.

Conclusion

DNS resolution issues in GitLab CI runners can be challenging, especially in complex environments. By following the troubleshooting steps outlined above and ensuring proper configuration, you can mitigate these issues and maintain a stable CI/CD pipeline.