Introduction
In a Kubernetes setup where GitLab CI runners are utilized to build Docker images using Kaniko, DNS resolution issues can occasionally arise. This can lead to failed builds due to the inability of the Kaniko pods to resolve the hostname of the GitLab server. This guide will explore potential causes and solutions for this issue, particularly in environments protected by pfSense.
Setup Overview
The configuration involves a private GitLab server that orchestrates CI/CD processes through GitLab CI runners deployed on Kubernetes. The runners utilize the Kaniko image to build Docker images. The entire infrastructure is secured behind a pfSense firewall.
Problem Description
Users have reported intermittent failures where Kaniko pods cannot resolve the GitLab server's hostname, resulting in failed git pull commands and subsequent build failures. The failure rate can be as high as 60%, which is unacceptable for production environments. Interestingly, retrying the build often resolves the issue on subsequent attempts.
Environment Details
- Kubernetes Cluster: Running on CentOS 7
- SELinux: Disabled
- FirewallD: Disabled
- DNS Resolution: All hosts can resolve the GitLab server, and the issue is not isolated to specific nodes.
Observations
- The problem does not appear to affect other pods that do not rely on DNS for connections.
- The GitLab runner can pull the Kaniko image from
gcr.io, indicating DNS functionality is present.
Troubleshooting Steps Taken
- DNS Testing: Pods dedicated to DNS requests were spawned, and they successfully resolved the domain without failures.
- Cluster Reboot: A complete reboot of the Kubernetes cluster and GitLab instance was performed, but the issue persisted.
- Static DNS Configuration: Attempts were made to statically configure DNS routes in pfSense, yet the problem continued.
Example CI Configuration
Below is a sample configuration for the CI pipeline using Kaniko:
build:
stage: build
image:
name: gcr.io/kaniko-project/executor:debug
entrypoint: [""]
script:
- echo $REGISTRY_AUTH > /kaniko/.docker/config.json
- /kaniko/executor --context $CI_PROJECT_DIR --dockerfile $CI_PROJECT_DIR/Dockerfile --destination $REGISTRY_URL/$REGISTRY_IMAGE:$CI_JOB_ID
only:
- master
Error Example
During the build process, the following error may occur:
Initialized empty Git repository in /builds/MYPROJECT/.git/
Fetching changes...
Created fresh repository.
fatal: unable to access 'https://gitlab-ci-token:[MASKED]@git.mydomain.com/MYPROJECT.git/': Could not resolve host: git.mydomain.com
Possible Solutions
- DNS Configuration: Ensure that the DNS settings in the GitLab runner's
config.tomlfile are correctly pointing to a reliable DNS server. For example:
[[runners]]
name = "my-runner"
url = "https://git.mydomain.com"
executor = "docker"
dns = ["<YOUR_DNS_SERVER_IP>", "8.8.8.8"]
- Restart Docker Service: If DNS resolution fails after job failures, restarting the Docker service may resolve the issue temporarily.
- Network Policies: Review any network policies or firewall rules in pfSense that may be affecting DNS traffic to ensure that requests from the Kubernetes cluster are allowed.
Conclusion
DNS resolution issues in GitLab CI runners can be challenging, especially in complex environments. By following the troubleshooting steps outlined above and ensuring proper configuration, you can mitigate these issues and maintain a stable CI/CD pipeline.