Argo CD is largely stateless. All data is persisted as Kubernetes objects, which in turn is stored in Kubernetes' etcd. Redis is only used as a throw-away cache and can be lost. When lost, it will be rebuilt without loss of service.
A set of HA manifests are provided for users who wish to run Argo CD in a highly available manner. This runs more containers, and runs Redis in HA mode.
NOTE: The HA installation will require at least three different nodes due to pod anti-affinity roles in the specs. Additionally, IPv6 only clusters are not supported.
argocd-repo-server is responsible for cloning Git repository, keeping it up to date and generating manifests using the appropriate tool.
argocd-repo-serverfork/exec config management tool to generate manifests. The fork can fail due to lack of memory or limit on the number of OS threads. The
--parallelismlimitflag controls how many manifests generations are running concurrently and helps avoid OOM kills.
argocd-repo-serverensures that repository is in the clean state during the manifest generation using config management tools such as Kustomize, Helm or custom plugin. As a result Git repositories with multiple applications might affect repository server performance. Read Monorepo Scaling Considerations for more information.
argocd-repo-serverclones the repository into
/tmp(or the path specified in the
TMPDIRenv variable). The Pod might run out of disk space if it has too many repositories or if the repositories have a lot of files. To avoid this problem mount a persistent volume.
git ls-remoteto resolve ambiguous revisions such as
HEAD, a branch or a tag name. This operation happens frequently and might fail. To avoid failed syncs use the
ARGOCD_GIT_ATTEMPTS_COUNTenvironment variable to retry failed requests.
argocd-repo-serverEvery 3m (by default) Argo CD checks for changes to the app manifests. Argo CD assumes by default that manifests only change when the repo changes, so it caches the generated manifests (for 24h by default). With Kustomize remote bases, or Helm patch releases, the manifests can change even though the repo has not changed. By reducing the cache time, you can get the changes without waiting for 24h. Use
--repo-cache-expiration duration, and we'd suggest in low volume environments you try '1h'. Bear in mind that this will negate the benefits of caching if set too low.
argocd-repo-serverexecutes config management tools such as
kustomizeand enforces a 90 second timeout. This timeout can be changed by using the
ARGOCD_EXEC_TIMEOUTenv variable. The value should be in the Go time duration string format, for example,
argocd_git_request_total- Number of git requests. This metric provides two tags:
repo- Git repo URL;
ARGOCD_ENABLE_GRPC_TIME_HISTOGRAM- Is an environment variable that enables collecting RPC performance metrics. Enable it if you need to troubleshoot performance issues. Note: This metric is expensive to both query and store!
argocd-repo-server to get generated manifests and Kubernetes API server to get the actual cluster state.
each controller replica uses two separate queues to process application reconciliation (milliseconds) and app syncing (seconds). The number of queue processors for each queue is controlled by
--status-processors(20 by default) and
--operation-processors(10 by default) flags. Increase the number of processors if your Argo CD instance manages too many applications. For 1000 application we use 50 for
--status-processorsand 25 for
The manifest generation typically takes the most time during reconciliation. The duration of manifest generation is limited to make sure the controller refresh queue does not overflow. The app reconciliation fails with
Context deadline exceedederror if the manifest generation is taking too much time. As a workaround increase the value of
--repo-server-timeout-secondsand consider scaling up the
The controller uses Kubernetes watch APIs to maintain a lightweight Kubernetes cluster cache. This allows avoiding querying Kubernetes during app reconciliation and significantly improves performance. For performance reasons the controller monitors and caches only the preferred versions of a resource. During reconciliation, the controller might have to convert cached resources from the preferred version into a version of the resource stored in Git. If
kubectl convertfails because the conversion is not supported then the controller falls back to Kubernetes API query which slows down reconciliation. In this case, we advise to use the preferred resource version in Git.
The controller polls Git every 3m by default. You can change this duration using the
timeout.reconciliationsetting in the
argocd-cmConfigMap. The value of
timeout.reconciliationis a duration string e.g
If the controller is managing too many clusters and uses too much memory then you can shard clusters across multiple controller replicas. To enable sharding increase the number of replicas in
StatefulSetand repeat the number of replicas in the
ARGOCD_CONTROLLER_REPLICASenvironment variable. The strategic merge patch below demonstrates changes required to configure two controller replicas.
- name: argocd-application-controller
- name: ARGOCD_CONTROLLER_REPLICAS
ARGOCD_ENABLE_GRPC_TIME_HISTOGRAM- environment variable that enables collecting RPC performance metrics. Enable it if you need to troubleshoot performance issues. Note: This metric is expensive to both query and store!
argocd_app_reconcile- reports application reconciliation duration. Can be used to build reconciliation duration heat map to get a high-level reconciliation performance picture.
argocd_app_k8s_request_total- number of k8s requests per application. The number of fallback Kubernetes API queries - useful to identify which application has a resource with non-preferred version and causes performance issues.
argocd-server is stateless and probably the least likely to cause issues. To ensure there is no downtime during upgrades, consider increasing the number of replicas to
3 or more and repeat the number in the
ARGOCD_API_SERVER_REPLICAS environment variable. The strategic merge patch below
- name: argocd-server
- name: ARGOCD_API_SERVER_REPLICAS
ARGOCD_API_SERVER_REPLICASenvironment variable is used to divide the limit of concurrent login requests (
ARGOCD_MAX_CONCURRENT_LOGIN_REQUESTS_COUNT) between each replica.
ARGOCD_GRPC_MAX_SIZE_MBenvironment variable allows specifying the max size of the server response message in megabytes. The default value is 200. You might need to increase this for an Argo CD instance that manages 3000+ applications.
argocd-dex-server uses an in-memory database, and two or more instances would have inconsistent data.
argocd-redis is pre-configured with the understanding of only three total redis servers/sentinels.
Monorepo Scaling Considerations¶
Argo CD repo server maintains one repository clone locally and uses it for application manifest generation. If the manifest generation requires to change a file in the local repository clone then only one concurrent manifest generation per server instance is allowed. This limitation might significantly slowdown Argo CD if you have a mono repository with multiple applications (50+).
Enable Concurrent Processing¶
Argo CD determines if manifest generation might change local files in the local repository clone based on the config management tool and application settings. If the manifest generation has no side effects then requests are processed in parallel without a performance penalty. The following are known cases that might cause slowness and their workarounds:
Multiple Helm based applications pointing to the same directory in one Git repository: ensure that your Helm chart doesn't have conditional dependencies and create
.argocd-allow-concurrencyfile in the chart directory.
Multiple Custom plugin based applications: avoid creating temporal files during manifest generation and create
.argocd-allow-concurrencyfile in the app directory, or use the sidecar plugin option, which processes each application using a temporary copy of the repository.
Multiple Kustomize applications in same repository with parameter overrides: sorry, no workaround for now.
Webhook and Manifest Paths Annotation¶
Argo CD aggressively caches generated manifests and uses the repository commit SHA as a cache key. A new commit to the Git repository invalidates the cache for all applications configured in the repository.
This can negatively affect repositories with multiple applications. You can use webhooks and the
argocd.argoproj.io/manifest-generate-paths Application CRD annotation to solve this problem and improve performance.
argocd.argoproj.io/manifest-generate-paths annotation contains a semicolon-separated list of paths within the Git repository that are used during manifest generation. The webhook compares paths specified in the annotation with the changed files specified in the webhook payload. If no modified files match the paths specified in
argocd.argoproj.io/manifest-generate-paths, then the webhook will not trigger application reconciliation and the existing cache will be considered valid for the new commit.
Installations that use a different repository for each application are not subject to this behavior and will likely get no benefit from using these annotations.
Application manifest paths annotation support depends on the git provider used for the Application. It is currently only supported for GitHub, GitLab, and Gogs based repos.
- Relative path The annotation might contain a relative path. In this case the path is considered relative to the path specified in the application source:
# resolves to the 'guestbook' directory
- Absolute path The annotation value might be an absolute path starting with '/'. In this case path is considered as an absolute path within the Git repository:
- Multiple paths It is possible to put multiple paths into the annotation. Paths must be separated with a semicolon (
# resolves to 'my-application' and 'shared'