How We Configured ingress-nginx to Prevent Search Indexing on Our Development Environment

Table of Contents

Introduction

While preparing for the release of Gigahatch Managed Kubernetes, we encountered the issue of our staging environment potentially being indexed by search engines. To fix this, we would like to serve a robots.txt file with the following content in the root of our website:

User-agent: *
Disallow: /

To do this, we can either host the file in the application itself, or we can host it somewhere in the environment outside the application. Let’s look at both approaches.

Hosting the robots.txt from the Container Image

One approach to adding the robots.txt file is embedding it within the web application container. This would require building different images for staging and production environments or managing the robots.txt file within the CD pipeline. Alternatively, we could configure the web container to serve a different robots.txt file based on the environment. However, this adds unwanted complexity to an otherwise independent application image and requires a code change every time we want to change something in this setup.

Configuring ingress-nginx to serve the file

Ideally we would be able to configure our kubernetes ingress to serve this file directly. Then we wouldn’t have to touch the image and we could choose to serve the file or not based on the environment. We could also trivially reuse the same solution for multiple applications.

We could run a minimal pod that just serves a robots.txt file and add a new path entry to the ingress. But that seems like a very complicated and inflexible setup. There doesn’t seem to be an ingress-native way to do this, so we looked for ingress-nginx-specific solutions, since that is the ingress controller we use. Fortunately, we can configure NGINX to handle this with almost zero overhead using configuration snippets.

Configuration snippets are disabled by default by ingress-nginx due to security concerns, so make sure to understand the implications of allowing configuration snippets before enabling this flag.. In our case this is no problem, because we are the only ones using our kubernetes cluster.

We use Flux for our GitOps workflow, combined with Kustomize to manage environment-specific deployments. This setup makes it easy to apply this configuration selectively to specific environments using kustomize patches. If you use a different approach to GitOps, your approach to applying this per environment will differ.

Here is our base ingress resource, shared across all environments. This is the yaml file the patches will be applied to.

ingress-frontend.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: frontend
  labels:
    name: frontend
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  rules:
    - host: cloud.gigahatch.ch
      http:
        paths:
          - pathType: Prefix
            path: '/'
            backend:
              service:
                name: frontend
                port:
                  name: http
          # ... other paths omitted

  tls:
    - hosts:
        - cloud.gigahatch.ch
      secretName: cloud-gigahatch-ch-crt

To instruct NGINX to serve our robots.txt, we simply add the nginx.ingress.kubernetes.io/server-snippet to our ingress (here using a Kustomize merge patch). You could also add this annotation directly to your resource definition if you don’t want it to change between environments.

patch-ingress.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: frontend
  annotations:
    nginx.ingress.kubernetes.io/server-snippet: |
      location /robots.txt {
        return 200 "User-agent: *\nDisallow: /\n";
      }
    nginx.ingress.kubernetes.io/configuration-snippet: |
      add_header X-Robots-Tag none;

We also added the nginx.ingress.kubernetes.io/configuration-snippet annotation to set the X-Robots-Tag header on all locations to none. This is the recommended way to instruct search engines not to index a page. Note that robots.txt alone does not prevent indexing if another website links to your page. Setting the X-Robots-Tag header to none fully blocks indexing.

If you are using flux like us, make sure you don’t forget to add the patch-ingress.yaml file to your kustomize.yaml file, otherwise nothing will happen:

dev/kustomize.yaml

resources:
  - ../base/ingress-frontend.yaml

patches:
  - patch:
    path: patch-ingress.yaml
    target:
      kind: Ingress
      name: frontend

Configuring ingress-nginx

For security reasons, configuration snippets are disabled by default in ingress-nginx. Since we control and trust every ingress resource created in this cluster, we can safely enable this feature.

To allow snippets, we modified our ingress-nginx Helm chart deployment. We use the HelmRelease CRD from Flux to provision the ingress, so we only needed to set the allowSnippetAnnotations: true flag in the chart values.

ingress-nginx.yaml

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: ingress-nginx
  namespace: ingress-nginx
spec:
  chart:
    spec:
      chart: ingress-nginx
      reconcileStrategy: ChartVersion
      sourceRef:
        kind: HelmRepository
        name: ingress-nginx
  interval: 1h0m0s
  timeout: 10m
  targetNamespace: ingress-nginx
  install:
    crds: Create
  upgrade:
    crds: CreateReplace
  values:
    controller:
      # --> Set this flag to true
      allowSnippetAnnotations: true

      config:
        # ...
      service:
        type: LoadBalancer
        annotations:
          # Depending on your cloud provider, you might need to adjust these labels.
          # In this example, we use Gigahatch Managed Kubernetes
          load-balancer.gigahatch.cloud/location: 'EUROPE_CENTRAL_1'
          load-balancer.gigahatch.cloud/use-private-ip: 'true'
          load-balancer.gigahatch.cloud/uses-proxyprotocol: 'true'

If you are using the Helm CLI rather than Flux, the following Helm command accomplishes a similar setup:

helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx 
  --namespace ingress-nginx 
  --set controller.allowSnippetAnnotations=true

Conclusion

By using the nginx.ingress.kubernetes.io/server-snippet annotation provided by the ingress-nginx controller, we configured NGINX to serve the robots.txt file without running a separate pod. This approach helps us keep the web application image simple and manage the ingress configuration using Flux GitOps.

This solution demonstrates how we can leverage Kubernetes’ flexibility to solve problems in a clean and efficient way.


I hope you found this article helpful. If you have any questions or suggestions, please leave a comment below.

GKS All prices without guarantee and excluding VAT.
© 2024 Gigahatch - All Right Reserved Terms of Service Privacy Policy