Top DevOps Interview Questions

— ny_wk

DevOps interview questions span culture, CI/CD pipelines, Git, Docker, Kubernetes, infrastructure as code, and cloud fundamentals. This guide organizes the questions hiring teams actually ask in 2026, with clear, corrected, modern answers you can explain out loud with confidence.

Whether you are targeting a junior DevOps role, an SRE position, or a platform engineering job, the same core themes come up again and again. Below you will find representative DevOps interview questions and answers grouped by topic, each written to help you understand the why behind the answer rather than memorize a single line.

DevOps Culture and CI/CD Fundamentals

Interviews almost always open with conceptual questions to test whether you understand DevOps as a culture, not just a toolset. Strong answers connect practices to outcomes like faster delivery, fewer failures, and quicker recovery.

What is DevOps?

DevOps is a set of cultural practices and tools that shorten the software delivery lifecycle by tightly integrating development and operations. The goal is to deliver value to users faster and more reliably through automation, shared ownership, and continuous feedback. It is not a job title or a single tool; it is a way of working that breaks down the wall between people who write code and people who run it.

What is the difference between Continuous Integration, Continuous Delivery, and Continuous Deployment?

Continuous Integration (CI): Developers merge code into a shared branch frequently, and every merge triggers an automated build and test run to catch integration problems early.
Continuous Delivery (CD): Every change that passes CI is automatically packaged and kept in a deployable state, but the final push to production requires a manual approval.
Continuous Deployment: Goes one step further—every change that passes the automated pipeline is released to production with no manual gate.

A common mistake is treating the two CDs as identical. The distinction is the manual approval step before production.

What are DORA metrics?

The four key DORA (DevOps Research and Assessment) metrics measure delivery performance and are frequently referenced in modern interviews:

Deployment frequency — how often you ship to production.
Lead time for changes — time from commit to running in production.
Change failure rate — percentage of deployments that cause a failure.
Mean time to recovery (MTTR) — how quickly you restore service after an incident.

What is a blue-green deployment versus a canary deployment?

A blue-green deployment runs two identical environments; you deploy to the idle one (green), test it, then switch all traffic over. Rollback is instant—just switch back. A canary deployment releases the new version to a small percentage of users first, watches metrics, and gradually increases traffic if healthy. Canary limits blast radius; blue-green prioritizes fast, clean cutover and rollback.

Version Control and Git Interview Questions

Git knowledge is non-negotiable. Expect questions that probe whether you understand the underlying model, not just memorized commands.

What is the difference between `git merge` and `git rebase`?

Both integrate changes from one branch into another. git merge creates a new merge commit and preserves the exact history of both branches. git rebase replays your commits on top of the target branch, producing a linear history but rewriting commit hashes. The golden rule: never rebase commits that have already been pushed and shared, because rewriting public history breaks everyone else's clones.

What does `git fetch` do compared to `git pull`?

git fetch downloads new commits and refs from the remote but does not change your working branch. git pull is effectively git fetch followed by a merge (or rebase) into your current branch. Fetch is safe and read-only; pull modifies your branch.

How do you undo a commit that was already pushed?

Use git revert <commit>. It creates a new commit that reverses the changes, leaving history intact—safe for shared branches. Avoid git reset --hard on a shared branch, because it rewrites history and forces everyone else to recover.

Command	Use case
`git reset`	Move branch pointer; good for local, unpushed changes
`git revert`	Safely undo a pushed commit with a new commit
`git cherry-pick`	Apply a single commit from another branch
`git stash`	Temporarily shelve uncommitted work

What is a trunk-based development workflow?

Developers commit small changes to a single main branch (trunk) frequently, often behind feature flags, rather than maintaining long-lived feature branches. It pairs naturally with CI/CD because the trunk stays releasable, reducing painful merge conflicts and integration drift.

Containers and Docker Interview Questions

Containers are core to modern DevOps. Interviewers want to confirm you understand isolation, image layering, and the difference between an image and a container.

What is the difference between a container and a virtual machine?

A virtual machine virtualizes hardware and runs a full guest operating system with its own kernel, making it heavier and slower to start. A container virtualizes the operating system and shares the host kernel, packaging only the application and its dependencies. Containers are lightweight, start in milliseconds, and are far more portable, but they offer slightly weaker isolation than VMs.

What is the difference between a Docker image and a Docker container?

An image is a read-only template (a snapshot of filesystem layers and metadata). A container is a running—or stopped—instance of that image with a writable layer on top. One image can spawn many containers, the same way one class can create many objects.

What is the difference between `CMD` and `ENTRYPOINT` in a Dockerfile?

ENTRYPOINT defines the executable that always runs and is harder to override.
CMD provides default arguments that can be easily overridden at runtime.

A common pattern is ENTRYPOINT for the binary and CMD for default flags, so users can pass their own arguments without changing the entry command.

What is a multi-stage build and why use it?

A multi-stage build uses multiple FROM statements in one Dockerfile. You compile or build in an early stage that has all the heavy build tools, then copy only the final artifacts into a small runtime image. This produces a smaller, more secure final image with no compilers or build dependencies left behind.

How do you persist data in Docker?

Containers are ephemeral, so any data written to the container's writable layer is lost when it is removed. Use named volumes (managed by Docker, the recommended approach for databases) or bind mounts (mapping a host directory into the container, useful for local development).

Kubernetes Orchestration Interview Questions

Kubernetes is one of the most heavily tested topics. Be ready to explain its objects and how they relate to one another.

What is Kubernetes and why use it over plain Docker?

Docker runs and builds containers; Kubernetes orchestrates them at scale. It handles scheduling across many machines, self-healing (restarting failed containers), horizontal scaling, service discovery, load balancing, and rolling updates. The honest framing for an interview: Docker and Kubernetes are not competitors—Docker packages the app, Kubernetes runs many of those packaged apps reliably across a cluster.

What is a Pod?

A Pod is the smallest deployable unit in Kubernetes. It wraps one or more tightly coupled containers that share the same network namespace (same IP) and storage. Most Pods run a single application container, sometimes alongside a helper sidecar container.

Explain the relationship between a Deployment, ReplicaSet, and Pod.

A Deployment declares the desired state—which image and how many replicas—and manages rolling updates and rollbacks.
The Deployment creates and manages a ReplicaSet, which ensures the specified number of Pod copies are running.
The ReplicaSet creates and maintains the actual Pods.

You almost always interact with the Deployment, and Kubernetes handles the layers beneath it.

What are the different Kubernetes Service types?

Service type	Purpose
`ClusterIP`	Default; exposes the service only inside the cluster
`NodePort`	Exposes the service on a static port on every node
`LoadBalancer`	Provisions an external cloud load balancer
`ExternalName`	Maps the service to an external DNS name

For HTTP routing across many services, an Ingress (or the newer Gateway API) is typically used in front of ClusterIP services instead of many LoadBalancers.

What is the difference between a ConfigMap and a Secret?

Both inject configuration into Pods. A ConfigMap stores non-sensitive plain-text config. A Secret stores sensitive data like passwords and tokens. Note an important correction many candidates get wrong: Kubernetes Secrets are only base64-encoded by default, not encrypted. For real protection, enable encryption at rest and use external secret managers or sealed secrets.

How does a liveness probe differ from a readiness probe?

A liveness probe checks if the container is alive; if it fails, Kubernetes restarts the container.
A readiness probe checks if the container is ready to receive traffic; if it fails, the Pod is removed from the Service endpoints but is not restarted.

Infrastructure as Code: Terraform and Ansible

IaC questions test whether you understand declarative provisioning, state, and idempotency.

What is Infrastructure as Code?

Infrastructure as Code (IaC) means defining and managing infrastructure—servers, networks, load balancers—through machine-readable configuration files instead of manual clicks in a console. Benefits include version control, repeatability, peer review, and the ability to recreate environments reliably.

What is the difference between Terraform and Ansible?

Terraform is primarily a provisioning tool: it creates and manages cloud infrastructure declaratively and tracks it in a state file. Ansible is primarily a configuration management tool: it configures software on existing machines, typically in a more procedural style. They are often used together—Terraform builds the servers, Ansible configures what runs on them.

What is Terraform state and why does it matter?

Terraform stores a state file that maps your configuration to real-world resources. It uses this to know what already exists and what needs to change. In teams, store state remotely (for example in an S3 bucket with locking via DynamoDB or a managed backend) so multiple engineers do not corrupt it. State can contain secrets, so it must be protected.

What does idempotency mean in IaC?

Idempotency means running the same configuration multiple times produces the same end state without unintended side effects. Apply a Terraform plan twice and the second run reports no changes. This is what makes declarative IaC safe to run repeatedly.

What is the difference between `terraform plan` and `terraform apply`?

terraform plan shows a preview of exactly what will be created, changed, or destroyed without making any changes. terraform apply executes that plan against your real infrastructure. Always review the plan output before applying.

CI Tools: Jenkins and GitLab CI Interview Questions

Expect questions about pipeline design, agents, and the move toward configuration-as-code.

What is a Jenkins pipeline and what is a Jenkinsfile?

A Jenkins pipeline is an automated sequence of stages—build, test, deploy. A Jenkinsfile is a text file that defines that pipeline as code and lives in your repository, so the pipeline is version-controlled alongside the application. Declarative pipeline syntax is the modern, recommended style.

What is the difference between a Jenkins agent and the controller?

The controller (formerly called master) schedules jobs, manages configuration, and serves the UI. Agents (formerly slaves) are the worker machines that actually execute the build steps. Distributing work across agents allows parallel builds and isolation.

How does GitLab CI define a pipeline?

GitLab CI uses a .gitlab-ci.yml file in the repository root. It defines stages and jobs; jobs in the same stage run in parallel, and stages run in sequence. GitLab Runners execute the jobs. Because the pipeline is built into the platform and the repo, there is no separate server to maintain as with classic Jenkins.

How do you keep secrets out of a pipeline?

Never hardcode credentials in pipeline files. Use the platform's built-in secrets store—Jenkins Credentials, GitLab CI/CD variables (masked and protected), or an external vault like HashiCorp Vault or a cloud secret manager—and inject them as environment variables at runtime.

Monitoring and Observability Interview Questions

Modern roles expect fluency in observability, not just basic uptime checks.

What is the difference between monitoring and observability?

Monitoring tells you whether a known problem is happening by watching predefined metrics and alerts. Observability lets you ask new questions about your system's internal state from its outputs—helping you understand why something unexpected is happening. Monitoring answers known unknowns; observability helps with unknown unknowns.

What are the three pillars of observability?

Metrics — numeric time-series data such as CPU usage or request rate.
Logs — timestamped, detailed event records.
Traces — the path of a single request as it flows across services, essential for debugging distributed systems.

What is Prometheus and how does it collect data?

Prometheus is an open-source metrics and alerting system. It works on a pull model—it scrapes metrics from HTTP endpoints exposed by your applications and exporters at regular intervals, storing them as time-series data. It is commonly paired with Grafana for dashboards and Alertmanager for routing alerts.

What is an SLI, SLO, and SLA?

SLI (Service Level Indicator): a measured value, such as the percentage of successful requests.
SLO (Service Level Objective): the internal target for that indicator, such as 99.9% success.
SLA (Service Level Agreement): a formal contract with customers, often with penalties if the SLO is missed.

The related concept of an error budget is the small amount of allowed failure (the gap between 100% and your SLO) that teams can spend on risk and releases.

Cloud Computing Basics for DevOps

Cloud literacy underpins almost every DevOps role, so expect foundational questions regardless of provider.

What is the difference between IaaS, PaaS, and SaaS?

Model	You manage	Example
IaaS	OS, runtime, app; provider manages hardware	EC2, Compute Engine
PaaS	Just your app and data	App Engine, Heroku
SaaS	Nothing; you just use it	Gmail, Salesforce

What is auto-scaling?

Auto-scaling automatically adjusts the number of running compute resources based on demand—adding instances during traffic spikes and removing them when load drops. This improves availability while controlling cost. Horizontal scaling adds more instances; vertical scaling makes a single instance bigger.

What is the principle of least privilege?

Every user, service, and process should have only the minimum permissions required to do its job—nothing more. In the cloud this is enforced through IAM roles and policies, and it dramatically limits the damage from a compromised credential.

What is the difference between a region and an availability zone?

A region is a geographic area (for example, a country or city cluster). An availability zone is one or more isolated data centers within a region. Spreading workloads across multiple availability zones protects against the failure of a single data center, which is the foundation of high-availability architecture.

Key Takeaways

Explain the why, not just the command. Interviewers reward understanding of trade-offs over memorized syntax.
Know the boundaries between tools: Docker packages, Kubernetes orchestrates; Terraform provisions, Ansible configures.
Correct the common myths: Kubernetes Secrets are base64-encoded by default, and continuous delivery is not continuous deployment.
Tie everything back to outcomes using DORA metrics and SLO/error-budget thinking.
Security is part of every answer: least privilege, protected state files, and managed secrets stores should come up naturally.

Frequently Asked Questions

What should a junior DevOps candidate focus on first?

Master Git fundamentals, Linux basics, one CI/CD tool, and core Docker concepts. Then layer on Kubernetes and one cloud provider. Depth in a few areas beats shallow exposure to everything.

Are scenario-based DevOps questions common?

Yes. Expect prompts like "a deployment is failing in production—how do you debug it?" Walk through checking logs, metrics, recent changes, rollback options, and probes. Showing a calm, structured troubleshooting process matters as much as the answer.

Do I need to memorize exact command flags?

No. Interviewers care that you understand what a command does and when to use it. Knowing that git revert is safe on shared branches matters more than reciting every flag.

How is DevOps different from SRE?

DevOps is a broad culture of collaboration and automation. Site Reliability Engineering is one specific implementation of those ideas, applying software engineering to operations with formal SLOs, error budgets, and toil reduction. SRE is often described as "DevOps with concrete practices."

For more tech tutorials and walkthroughs, subscribe on YouTube @explorenystream.