MongooseIM metrics
MongooseIM by default collects many metrics showing the user behaviour and general system statistics. They are managed by exometer. MongooseIM uses ESL's fork of this project.
All metrics are divided into the following groups:
-
Per host type metrics: Gathered separately for every host type supported by the cluster.
Warning
If a cluster supports many (thousands or more) host types, performance issues might occur. To avoid this, use global equivalents of the metrics with
all_metrics_are_global
config option.- Hook metrics. They are created for every hook and incremented on every call to it.
-
Global metrics: Metrics common for all host types.
- Data metrics. These are misc. metrics related to data transfers (e.g. sent and received stanza size statistics).
- VM metrics. Basic Erlang VM statistics.
- Backend metrics: Histograms with timings of calls to various backends.
Metrics types
spiral
This kind of metric provides 2 values: total
event count (e.g. stanzas processed) and a value in 60s window (one
value).
Dividing one
value by 60 provides an average per-second value over last minute.
Example: [{total, 1000}, {one, 20}]
value
A simple value.
It is actually a one-element proplist: [{value, N}]
.
Example: [{value, 256}]
gauge
It is similar to a value
type but consists of two properties:
value
ms_since_reset
- Time in milliseconds elapsed from the last metric update.
Example: [{value, 12}, {ms_since_reset, 91761}]
proplist
A metric which is a nonstandard proplist. You can find the lists of keys in metrics descriptions.
Example: [{total,295941736}, {processes_used,263766824}, {atom_used,640435}, {binary,1513152}, {ets,3942592}, {system,32182072}]
histogram
A histogram collects values over a sliding window of 60s and exposes the following stats:
n
- A number of samples.mean
- An arithmetic mean.min
max
median
50
,75
,90
,95
,99
,999
- 50th, 75th, 90th, 95th, 99th and 99.9th percentile
Per host type metrics
Hook metrics
There are more hook metrics than what is listed in this table, because they are automatically created for every new hook. As a result it makes more sense to maintain a list of the most relevant or useful items, rather than keeping this table fully in sync with the code.
Name | Type | Description (when it gets incremented) |
---|---|---|
[HostType, anonymous_purge_hook] |
spiral | An anonymous user disconnects. |
[HostType, disco_info] |
spiral | An information about the server has been requested via Disco protocol. |
[HostType, disco_local_features] |
spiral | A list of server features is gathered. |
[HostType, disco_local_identity] |
spiral | A list of server identities is gathered. |
[HostType, disco_local_items] |
spiral | A list of server's items (e.g. services) is gathered. |
[HostType, disco_sm_features] |
spiral | A list of user's features is gathered. |
[HostType, disco_sm_identity] |
spiral | A list of user's identities is gathered. |
[HostType, disco_sm_items] |
spiral | A list of user's items is gathered. |
[HostType, mam_lookup_messages] |
spiral | An archive lookup is performed. |
[HostType, offline_message_hook] |
spiral | A message was sent to an offline user. (Except for "error", "headline" and "groupchat" message types.) |
[HostType, offline_groupchat_message_hook] |
spiral | A groupchat message was sent to an offline user. |
[HostType, privacy_updated_list] |
spiral | User's privacy list is updated. |
[HostType, resend_offline_messages_hook] |
spiral | A list of offline messages is gathered for delivery to a user's new connection. |
[HostType, roster_get_subscription_lists] |
spiral | Presence subscription lists (based on which presence updates are broadcasted) are gathered. |
[HostType, roster_in_subscription] |
spiral | A presence with subscription update is processed. |
[HostType, roster_out_subscription] |
spiral | A presence with subscription update is received from a client. |
[HostType, sm_broadcast] |
spiral | A stanza is broadcasted to all of user's resources. |
[HostType, unset_presence_hook] |
spiral | A user disconnects or sends an unavailable presence. |
Presences & rosters
Name | Type | Description (when it gets incremented) |
---|---|---|
[HostType, modPresenceSubscriptions] |
spiral | Presence subscription is processed. |
[HostType, modPresenceUnsubscriptions] |
spiral | Presence unsubscription is processed. |
[HostType, modRosterGets] |
spiral | User's roster is fetched. |
[HostType, modRosterPush] |
spiral | A roster update is pushed to a single session. |
[HostType, modRosterSets] |
spiral | User's roster is updated. |
Privacy lists
Name | Type | Description (when it gets incremented) |
---|---|---|
[HostType, modPrivacyGets] |
spiral | IQ privacy get is processed. |
[HostType, modPrivacyPush] |
spiral | Privacy list update is sent to a single session. |
[HostType, modPrivacySets] |
spiral | IQ privacy set is processed. |
[HostType, modPrivacySetsActive] |
spiral | Active privacy list is changed. |
[HostType, modPrivacySetsDefault] |
spiral | Default privacy list is changed. |
[HostType, modPrivacyStanzaAll] |
spiral | A packet is checked against the privacy list. |
[HostType, modPrivacyStanzaDenied] |
spiral | Privacy list check resulted in deny . |
[HostType, modPrivacyStanzaBlocked] |
spiral | Privacy list check resulted in block . |
Other
Name | Type | Description (when it gets incremented) |
---|---|---|
[HostType, sessionAuthFails] |
spiral | A client failed to authenticate. |
[HostType, sessionCount] |
counter | Number of active sessions. |
[HostType, sessionLogouts] |
spiral | A client session is closed. |
[HostType, sessionSuccessfulLogins] |
spiral | A client session is opened. |
[HostType, xmppErrorIq] |
spiral | An error IQ is sent to a client. |
[HostType, xmppErrorMessage] |
spiral | An error message is sent to a client. |
[HostType, xmppErrorPresence] |
spiral | An error presence is sent to a client. |
[HostType, xmppErrorTotal] |
spiral | A stanza with error type is routed. |
[HostType, xmppMessageBounced] |
spiral | A service-unavailable error is sent, because the message recipient if offline. |
[HostType, xmppIqSent] |
spiral | An IQ is sent by a client. |
[HostType, xmppMessageSent] |
spiral | A message is sent by a client |
[HostType, xmppPresenceSent] |
spiral | A presence is sent by a client. |
[HostType, xmppStanzaSent] |
spiral | A stanza is sent by a client. |
[HostType, xmppIqReceived] |
spiral | An IQ is sent to a client. |
[HostType, xmppMessageReceived] |
spiral | A message is sent to a client. |
[HostType, xmppPresenceReceived] |
spiral | A presence is sent to a client. |
[HostType, xmppStanzaReceived] |
spiral | A stanza is sent to a client. |
[HostType, xmppStanzaCount] |
spiral | A stanza is sent to and by a client. |
[HostType, xmppStanzaDropped] |
spiral | A stanza is dropped due to an AMP rule or a filter_packet processing flow. |
Extension-specific metrics
Metrics specific to an extension, e.g. Message Archive Management, are described in respective module documentation pages.
Global metrics
Name | Type | Description (when it gets incremented) |
---|---|---|
[global, routingErrors] |
spiral | It is not possible to route a stanza (all routing handlers failed). |
[global, nodeSessionCount] |
value | A number of sessions connected to a given MongooseIM node. |
[global, totalSessionCount] |
value | A number of sessions connected to a MongooseIM cluster. |
[global, uniqueSessionCount] |
value | A number of unique users connected to a MongooseIM cluster (e.g. 3 sessions of the same user will be counted as 1 in this metric). |
[global, cache, unique_sessions_number] |
gauge | A cached value of uniqueSessionCount . It is automatically updated when a unique session count is calculated. |
[global, nodeUpTime] |
value | Node uptime. |
[global, clusterSize] |
value | A number of nodes in a MongooseIM cluster seen by a given MongooseIM node (based on Mnesia). For CETS use global.cets.system.joined_nodes instead. |
[global, tcpPortsUsed] |
value | A number of open tcp connections. This should relate to the number of connected sessions and databases, as well as federations and http requests, in order to detect connection leaks. |
[global, processQueueLengths] |
probe | The number of queued messages in the internal message queue of every erlang process, and the internal queue of every fsm (ejabberd_s2s). This is sampled every 30 seconds asynchronously. It is a good indicator of an overloaded system: if too many messages are queued at the same time, the system is not able to process the data at the rate it was designed for. |
Data metrics
Metric name | Type | Description |
---|---|---|
[global, data, xmpp, received, xml_stanza_size] |
histogram | A size (in bytes) of a received stanza after decryption. |
[global, data, xmpp, sent, xml_stanza_size] |
histogram | A size (in bytes) of a sent stanza before encryption. |
[global, data, xmpp, received, c2s, tcp] |
spiral | A size (in bytes) of unencrypted data received from a client via TCP channel. |
[global, data, xmpp, sent, c2s, tcp] |
spiral | A size (in bytes) of unencrypted data sent to a client via TCP channel. |
[global, data, xmpp, received, c2s, tls] |
spiral | A size (in bytes) of a data received from a client via TLS channel after decryption. |
[global, data, xmpp, sent, c2s, tls] |
spiral | A size (in bytes) of a data sent to a client via TLS channel before encryption. |
[global, data, xmpp, received, c2s, bosh] |
spiral | A size (in bytes) of a data received from a client via BOSH connection. |
[global, data, xmpp, sent, c2s, bosh] |
spiral | A size (in bytes) of a data sent to a client via BOSH connection. |
[global, data, xmpp, received, c2s, websocket] |
spiral | A size (in bytes) of a data received from a client via WebSocket connection. |
[global, data, xmpp, sent, c2s, websocket] |
spiral | A size (in bytes) of a data sent to a client via WebSocket connection. |
[global, data, xmpp, received, s2s] |
spiral | A size (in bytes) of a data received via TCP and TLS (after decryption) Server-to-Server connections. |
[global, data, xmpp, sent, s2s] |
spiral | A size (in bytes) of a data sent via TCP and TLS (before encryption) Server-to-Server connections. |
[global, data, xmpp, received, component] |
spiral | A size (in bytes) of a data received from XMPP component. |
[global, data, xmpp, sent, component] |
spiral | A size (in bytes) of a data sent to XMPP component. |
[HostType, data, xmpp, c2s, message, processing_time ] |
histogram | Processing time for incomming c2s stanzas. |
[global, data, dist] |
proplist | Network stats for an Erlang distributed communication. A proplist with values: recv_oct , recv_cnt , recv_max , send_oct , send_max , send_cnt , send_pend , connections . |
[global, data, rdbms, PoolName] |
proplist | For every RDBMS pool defined, an instance of this metric is available. It is a proplist with values workers , recv_oct , recv_cnt , recv_max , send_oct , send_max , send_cnt , send_pend . |
CETS system metrics
Metric name | Type | Description |
---|---|---|
[global, cets, system] |
proplist | A proplist with a list of stats. Description is below. |
Stat Name | Description |
---|---|
available_nodes |
Available nodes (nodes that are connected to us and have the CETS disco process started). |
unavailable_nodes |
Unavailable nodes (nodes that do not respond to our pings). |
joined_nodes |
Joined nodes (nodes that have our local tables running). |
discovered_nodes |
Discovered nodes (nodes that are extracted from the discovery backend). |
remote_nodes_without_disco |
Nodes that have more tables registered than the local node. |
remote_nodes_with_unknown_tables |
Nodes with unknown tables. |
remote_unknown_tables |
Unknown remote tables. |
remote_nodes_with_missing_tables |
Nodes that are available, but do not host some of our local tables. |
remote_missing_tables |
Nodes that replicate at least one of our local tables to a different list of nodes. |
conflict_nodes |
Nodes that replicate at least one of our local tables to a different list of nodes. |
conflict_tables |
Tables that have conflicting replication destinations. |
discovery_works |
Returns 1 if the last discovery attempt is successful (otherwise returns 0). |
VM metrics
Metric name | Type | Description |
---|---|---|
[global, erlang, memory] |
proplist | A proplist with total , processes_used , atom_used , binary , ets and system memory stats. |
[global, erlang, system_info] |
proplist | A proplist with port_count , port_limit , process_count , process_limit , ets_limit stats. |
Backend metrics
Some extension modules expose histograms with timings of calls made to their backends. Please check the documentation of modules that are enabled in your config file, in order to learn if they provide them.
All module backend metrics names use the following convention: [global, backends, Module, BackendAction]
and [global, backends, Module, BackendAction, count]
.
The former is a histogram of operation times. However, the time is not recorded if a backend operation exits with an exception.
The latter is a number of calls (spiral metric), incremented for every call (even a failed one).
Besides these, following authentication metrics are always available:
[HostType, backends, auth, authorize]
[HostType, backends, auth, check_password]
[HostType, backends, auth, try_register]
[HostType, backends, auth, does_user_exist]
These are total times of respective operations. One operation usually requires only a single call to an auth backend but sometimes with e.g. 3 backends configured, the operation may fail for first 2 backends. In such case, these metrics will be updated with combined time of 2 failed and 1 successful request.
Additionally, the RDBMS layer in MongooseIM exposes two more metrics, if RDBMS is configured:
[global, backends, mongoose_rdbms, query]
- Execution time of a "simple" (not prepared) query by a DB driver.[global, backends, mongoose_rdbms, execute]
- Execution time of a prepared query by a DB driver.