|
2689 | 2689 | This metric tracks the runtime of phase2 of an Agency sync. Phase2 calculates |
2690 | 2690 | what actions to execute given the difference of the local and target state. |
2691 | 2691 |
|
| 2692 | +- name: arangodb_metadata_number_of_collections |
| 2693 | + introducedIn: "3.12.7" |
| 2694 | + help: | |
| 2695 | + Global number of collections. |
| 2696 | + unit: number |
| 2697 | + type: gauge |
| 2698 | + category: Statistics |
| 2699 | + complexity: simple |
| 2700 | + exposedBy: |
| 2701 | + - coordinator |
| 2702 | + - single |
| 2703 | + description: | |
| 2704 | + Total number of collections in the deployment (cluster or single server). |
| 2705 | + This includes system collections. |
| 2706 | + troubleshoot: | |
| 2707 | + **Configuration:** |
| 2708 | + - No global limit on collection count |
| 2709 | + - Query limit: `--query.max-collections-per-query` (default: 2048) |
| 2710 | + - Queries exceeding this fail with "too many collections/shards" error |
| 2711 | + |
| 2712 | + **Impact:** |
| 2713 | + - High counts affect startup/shutdown times, memory, and file descriptors |
| 2714 | + - Each collection consumes memory for indexes and metadata |
| 2715 | + - Impacts backup and restore operations |
| 2716 | + |
| 2717 | + **Recommendations:** |
| 2718 | + - Remove unused or temporary collections regularly |
| 2719 | + - Consider consolidating related collections |
| 2720 | + - Review schema design to reduce collection proliferation |
| 2721 | + |
| 2722 | + **See also:** |
| 2723 | + - Query limits: https://github.com/arangodb/arangodb/issues/10787 |
| 2724 | + - Operational factors: https://docs.arango.ai/arangodb/stable/develop/operational-factors/ |
| 2725 | +
|
| 2726 | +- name: arangodb_metadata_number_of_databases |
| 2727 | + introducedIn: "3.12.7" |
| 2728 | + help: | |
| 2729 | + Global number of databases. |
| 2730 | + unit: number |
| 2731 | + type: gauge |
| 2732 | + category: Statistics |
| 2733 | + complexity: simple |
| 2734 | + exposedBy: |
| 2735 | + - coordinator |
| 2736 | + - single |
| 2737 | + description: | |
| 2738 | + Total number of databases in the deployment (cluster or single server). |
| 2739 | + troubleshoot: | |
| 2740 | + **Configuration:** |
| 2741 | + - Maximum controlled by `--database.max-databases` (default: unlimited) |
| 2742 | + - Exceeding limit returns `TRI_ERROR_RESOURCE_LIMIT` |
| 2743 | + |
| 2744 | + **Impact:** |
| 2745 | + - High counts affect startup time, memory usage, and file descriptors |
| 2746 | + - Each database adds operational overhead |
| 2747 | + |
| 2748 | + **Recommendations:** |
| 2749 | + - Remove unused databases |
| 2750 | + |
| 2751 | + **See also:** |
| 2752 | + - Operational factors: https://docs.arango.ai/arangodb/stable/develop/operational-factors/ |
| 2753 | +
|
| 2754 | +- name: arangodb_metadata_number_of_shards |
| 2755 | + introducedIn: "3.12.7" |
| 2756 | + help: | |
| 2757 | + Global number of shards. |
| 2758 | + unit: number |
| 2759 | + type: gauge |
| 2760 | + category: Statistics |
| 2761 | + complexity: simple |
| 2762 | + exposedBy: |
| 2763 | + - coordinator |
| 2764 | + description: | |
| 2765 | + Total number of shards in the deployment. In a cluster, |
| 2766 | + this is the number of shards across all collections. |
| 2767 | + troubleshoot: | |
| 2768 | + **Configuration:** |
| 2769 | + - Max per collection: `--cluster.max-number-of-shards` (default: 1000) |
| 2770 | + - Exceeding limit returns `TRI_ERROR_CLUSTER_TOO_MANY_SHARDS` |
| 2771 | + - Query limit: `--query.max-collections-per-query` affects total shards in queries |
| 2772 | + - Queries exceeding this fail with "too many collections/shards" error |
| 2773 | + - Practical cluster limit: ~50,000 total shards across all collections |
| 2774 | + |
| 2775 | + **Impact:** |
| 2776 | + - High shard counts increase cluster coordination overhead |
| 2777 | + - Affects query performance, memory usage, leader election, and rebalancing |
| 2778 | + |
| 2779 | + **Recommendations:** |
| 2780 | + - Choose shard count based on data volume, query patterns, and DB-Server count |
| 2781 | + - Use rebalancing to ensure even distribution |
| 2782 | + |
| 2783 | + **Note:** |
| 2784 | + - Approaching 50k shards may cause performance degradation |
| 2785 | + |
| 2786 | + **See also:** |
| 2787 | + - Cluster limitations: https://docs.arango.ai/arangodb/stable/deploy/cluster/limitations/ |
| 2788 | + - Query limits: https://github.com/arangodb/arangodb/issues/10787 |
| 2789 | + - Operational factors: https://docs.arango.ai/arangodb/stable/develop/operational-factors/ |
| 2790 | +
|
2692 | 2791 | - name: arangodb_network_connectivity_failures_coordinators_total |
2693 | 2792 | introducedIn: "3.11.4" |
2694 | 2793 | help: | |
|
5500 | 5599 | Amount of memory in bytes that is used for writing to an inverted index of |
5501 | 5600 | a collection or index of a View (`arangosearch` View link). |
5502 | 5601 |
|
| 5602 | +- name: arangodb_server_statistics_cpu_cgroup_version |
| 5603 | + introducedIn: "3.12.7" |
| 5604 | + help: | |
| 5605 | + CGroup version detected on the system (0=none, 1=v1, 2=v2). |
| 5606 | + unit: number |
| 5607 | + type: gauge |
| 5608 | + category: Statistics |
| 5609 | + complexity: simple |
| 5610 | + exposedBy: |
| 5611 | + - coordinator |
| 5612 | + - dbserver |
| 5613 | + - agent |
| 5614 | + - single |
| 5615 | + description: | |
| 5616 | + Indicates which cgroup version was detected on the system at startup: |
| 5617 | + - 0: No cgroup support detected |
| 5618 | + - 1: cgroup v1 (legacy) detected |
| 5619 | + - 2: cgroup v2 (unified hierarchy) detected |
| 5620 | + |
| 5621 | + This metric is useful for understanding whether container resource limits |
| 5622 | + (CPU quotas) can be detected by ArangoDB. Systems with cgroup support |
| 5623 | + typically report more accurate CPU core counts when running in containers. |
| 5624 | + |
| 5625 | +
|
5503 | 5626 | - name: arangodb_server_statistics_cpu_cores |
5504 | 5627 | introducedIn: "3.8.0" |
5505 | 5628 | help: | |
|
5518 | 5641 | environment variable `ARANGODB_OVERRIDE_DETECTED_NUMBER_OF_CORES` |
5519 | 5642 | is set. In that case, the environment variable's value is reported. |
5520 | 5643 |
|
| 5644 | +- name: arangodb_server_statistics_effective_cpu_cores |
| 5645 | + introducedIn: "3.12.7" |
| 5646 | + help: | |
| 5647 | + Number of effective CPU cores available to the arangod process. |
| 5648 | + unit: number |
| 5649 | + type: gauge |
| 5650 | + category: Statistics |
| 5651 | + complexity: simple |
| 5652 | + exposedBy: |
| 5653 | + - coordinator |
| 5654 | + - dbserver |
| 5655 | + - agent |
| 5656 | + - single |
| 5657 | + description: | |
| 5658 | + Number of effective CPU cores available to the arangod process, taking into |
| 5659 | + account container CPU limits when running in containerized environments. |
| 5660 | + |
| 5661 | + This value is determined by: |
| 5662 | + - **cgroup v1**: Reading `/sys/fs/cgroup/cpu/cpu.cfs_quota_us` and |
| 5663 | + `/sys/fs/cgroup/cpu/cpu.cfs_period_us` to calculate CPU quota |
| 5664 | + - **cgroup v2**: Reading `/sys/fs/cgroup/cpu.max` to get CPU quota |
| 5665 | + - **No cgroups**: Falls back to total CPU cores from the system |
| 5666 | + |
| 5667 | + When running in Docker or Kubernetes with CPU limits set (e.g., `--cpus=2`), |
| 5668 | + this metric will report the container's CPU limit rather than the host's |
| 5669 | + total CPU cores, providing a more accurate view of available CPU resources |
| 5670 | + for capacity planning and auto-scaling decisions. |
| 5671 | + |
| 5672 | + If the environment variable `ARANGODB_OVERRIDE_DETECTED_NUMBER_OF_CORES` |
| 5673 | + is set, it takes precedence over both cgroup limits and detected CPU cores. |
| 5674 | + |
| 5675 | + This metric includes a `machine_id` label to help identify the physical host |
| 5676 | + in containerized environments. |
| 5677 | + |
| 5678 | +
|
| 5679 | +- name: arangodb_server_statistics_effective_physical_memory |
| 5680 | + introducedIn: "3.12.7" |
| 5681 | + help: | |
| 5682 | + Effective physical memory available to the arangod process in bytes. |
| 5683 | + unit: bytes |
| 5684 | + type: gauge |
| 5685 | + category: Statistics |
| 5686 | + complexity: simple |
| 5687 | + exposedBy: |
| 5688 | + - coordinator |
| 5689 | + - dbserver |
| 5690 | + - agent |
| 5691 | + - single |
| 5692 | + description: | |
| 5693 | + Effective physical memory available to the arangod process in bytes, |
| 5694 | + taking into account container memory limits when running in containerized |
| 5695 | + environments. |
| 5696 | + |
| 5697 | + This value is determined by: |
| 5698 | + - **cgroup v1**: Reading `/sys/fs/cgroup/memory/memory.limit_in_bytes` |
| 5699 | + - **cgroup v2**: Reading `/sys/fs/cgroup/memory.max` |
| 5700 | + - **No cgroups**: Falls back to total physical memory |
| 5701 | + |
| 5702 | + When running in Docker or Kubernetes with memory limits set, this metric |
| 5703 | + will report the container's memory limit rather than the host's total |
| 5704 | + physical memory, providing a more accurate view of available memory for |
| 5705 | + capacity planning and monitoring. |
| 5706 | + |
| 5707 | + If the environment variable `ARANGODB_OVERRIDE_DETECTED_TOTAL_MEMORY` |
| 5708 | + is set, it takes precedence over both cgroup limits and detected physical |
| 5709 | + memory. |
| 5710 | + |
| 5711 | +
|
5521 | 5712 | - name: arangodb_server_statistics_idle_percent |
5522 | 5713 | introducedIn: "3.8.0" |
5523 | 5714 | help: | |
|
5638 | 5829 | category: Replication |
5639 | 5830 | complexity: simple |
5640 | 5831 | exposedBy: |
5641 | | - - coordinator |
5642 | 5832 | - dbserver |
5643 | | - - agent |
5644 | 5833 | description: | |
5645 | 5834 | Number of leader shards on this machine. Every shard has a leader and |
5646 | 5835 | potentially multiple followers. |
|
5668 | 5857 | category: Replication |
5669 | 5858 | complexity: simple |
5670 | 5859 | exposedBy: |
5671 | | - - coordinator |
5672 | 5860 | - dbserver |
5673 | | - - agent |
5674 | 5861 | description: | |
5675 | 5862 | Number of shards not replicated at all. This is counted for all shards |
5676 | 5863 | for which this server is currently the leader. The number is increased |
5677 | | - by one for every shards for which no follower is in sync. |
| 5864 | + by one for every shard for which no follower is in sync. |
5678 | 5865 | troubleshoot: | |
5679 | 5866 | Needless to say, such a situation is very bad for resilience, since it |
5680 | 5867 | indicates a single point of failure. So, if this number is greater than 0, |
|
5722 | 5909 | exposedBy: |
5723 | 5910 | - dbserver |
5724 | 5911 | description: | |
5725 | | - Number of leader shards not fully replicated. This is counted for all |
| 5912 | + Number of shards that are not fully replicated. This is counted for all |
5726 | 5913 | shards for which this server is currently the leader. The number is |
5727 | | - increased by one for every shards for which not all followers are in sync. |
| 5914 | + increased by one for every shard for which not all followers are in sync. |
5728 | 5915 | troubleshoot: | |
5729 | 5916 | Needless to say, such a situation is not good resilience, since we |
5730 | 5917 | do not have as many copies of the data as the `replicationFactor` |
|
0 commit comments