mod_broadcast
Module Description¶
mod_broadcast lets administrators send the same XMPP message to many users in a domain.
It is intended for operational announcements (maintenance windows, policy changes, emergency notices) and similar one-to-many communication.
Recipients receive a normal <message/> stanza routed by the server.
The module itself is not user-facing: broadcasts are typically created and managed through the GraphQL Admin API.
Prerequisites¶
- RDBMS authentication is required.
mod_broadcastcurrently works only withejabberd_auth_rdbms, because it uses the RDBMS auth backend to iterate over a snapshot of registered users. - RDBMS outgoing pool must be configured. Both
ejabberd_auth_rdbmsand the broadcast job storage use an outgoing connection pool of typerdbms(typically thedefaulttag) defined in theoutgoing_poolssection. See Outgoing connections and RDBMS authentication. - GraphQL Admin API is the recommended control plane. Broadcast management is exposed via the admin GraphQL schema (
broadcast.*). See GraphQL API (Admin) for general GraphQL setup and authentication.
Warning
Broadcast content (subject/body) is stored in the database until the broadcast is deleted. Treat this as sensitive data: review your retention policy and access controls.
Options¶
modules.mod_broadcast.backend¶
- Syntax: string, currently only
"rdbms" - Default:
"rdbms" - Example:
backend = "rdbms"
Backend used to store broadcast jobs and worker progress.
modules.mod_broadcast.lease_time¶
- Syntax: integer (seconds), minimum
10 - Default:
600 - Example:
lease_time = 900
Lease duration in seconds for broadcast job ownership. The owner node periodically renews the lease while the job is running.
Values below 10 seconds are rejected and mod_broadcast will not start, to avoid too frequent sync cycles.
Warning
mod_broadcast relies on standard RDBMS auth API to retrieve the recipient count. If user count estimation is enabled, then the number of recipients per job may be inaccurate. This option is disabled by default.
Broadcast job parameters and limits¶
These are not configuration keys; they are job parameters provided when starting a broadcast (for example via GraphQL):
- Domain: broadcasts are scoped to a single XMPP domain.
- Recipient group: currently only
ALL_USERS_IN_DOMAINis supported. - Sender JID: must be an existing account.
- Message rate: must be between 1 and 1000 messages/second.
- Content limits:
name: 1..250 charactersmessageSubject: 0..1024 characters (may be empty)messageBody: 1..16000 characters- Concurrency limit: currently only one running broadcast per domain is allowed.
Warning
Be careful with high messageRate values.
Broadcasts can put load on routing, offline storage, push notifications, MAM, and external integrations (depending on your deployment).
Managing broadcasts (GraphQL Admin API)¶
The admin schema exposes the following operations under the broadcast category:
- Query:
getBroadcasts(domain, limit, index) - Query:
getBroadcast(domain, id) - Mutation:
startBroadcast(...) - Mutation:
abortBroadcast(domain, id) - Mutation:
deleteInactiveBroadcastsByIds(domain, ids) - Mutation:
deleteInactiveBroadcastsByDomain(domain)
Example: start a broadcast¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
Example: monitor progress¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
Example: abort a running broadcast¶
1 2 3 4 5 6 7 8 | |
Note
Aborting a running job is performed by contacting the Erlang node that currently owns the job. If that node is down or unreachable, aborting may fail temporarily until the owner becomes online again or the job is taken over by another node.
Message format and delivery semantics¶
Each recipient gets an XMPP <message/> stanza with:
type="chat"fromset to the configured sender bare JIDtoset to the recipient bare JID<subject/>and<body/>containing the configured content- An
origin-idelement (urn:xmpp:sid:0) for traceability
Delivery to offline users depends on your deployment (for example, whether offline storage is enabled, whether push is configured, etc.).
This extension delivers messages at least once - duplication may occur in case of errors and a restart of the broadcast process (recipients are processed in batches). Client can deduplicate messages based on their IDs, as they are deterministic.
Example configuration¶
1 2 3 | |
Metrics¶
If you'd like to learn more about metrics in MongooseIM, please visit the MongooseIM metrics page.
Note
All mod_broadcast metrics are local to a single Erlang node.
In a cluster, sum (or otherwise aggregate) the metrics across all MongooseIM nodes to obtain a cluster-wide total.
Prometheus metrics have a host_type label associated with these metrics.
Since Exometer doesn't support labels, the host types, or the word global, are part of the metric names, depending on the instrumentation.exometer.all_metrics_are_global option.
| Name | Type | Description |
|---|---|---|
mod_broadcast_live_jobs |
gauge | Number of currently running broadcast jobs on the local node. |
mod_broadcast_jobs_started |
counter | Broadcast jobs started. |
mod_broadcast_jobs_finished |
counter | Broadcast jobs finished successfully. |
mod_broadcast_jobs_aborted_admin |
counter | Broadcast jobs aborted by an administrator. |
mod_broadcast_jobs_aborted_error |
counter | Broadcast jobs aborted automatically due to an error. |
mod_broadcast_recipients_processed |
counter | Recipients processed (attempted deliveries). |
mod_broadcast_recipients_success |
counter | Successful per-recipient routes. |
mod_broadcast_recipients_skipped |
counter | Per-recipient routes that failed and were skipped. |
| Name | Type | Description |
|---|---|---|
[HostType, mod_broadcast_live_jobs, count] |
gauge | Number of currently running broadcast jobs on the local node. |
[HostType, mod_broadcast_jobs_started, count] |
spiral | Broadcast jobs started. |
[HostType, mod_broadcast_jobs_finished, count] |
spiral | Broadcast jobs finished successfully. |
[HostType, mod_broadcast_jobs_aborted_admin, count] |
spiral | Broadcast jobs aborted by an administrator. |
[HostType, mod_broadcast_jobs_aborted_error, count] |
spiral | Broadcast jobs aborted automatically due to an error. |
[HostType, mod_broadcast_recipients_processed, count] |
spiral | Recipients processed (attempted deliveries). |
[HostType, mod_broadcast_recipients_success, count] |
spiral | Successful per-recipient routes. |
[HostType, mod_broadcast_recipients_skipped, count] |
spiral | Per-recipient routes that failed and were skipped. |
Architecture overview¶
A broadcast is represented as a job persisted in the database. The job is started on the Erlang node that handled the start request. That node is the initial owner, but ownership may later move to another node if the original owner stops renewing its lease. In a cluster, this means that different broadcast runs may be owned by different nodes.
Aborting a job is routed to the node currently recorded as the owner. Retrieving broadcast information (listing jobs and reading job details) is independent of job ownership and can be served by any node.
A per-host-type manager process starts a worker that:
- Reads the job metadata and the last persisted worker state.
- Loads recipients in batches from a snapshot of registered users (so the recipient list is consistent for the duration of the job).
- Routes one message per recipient, rate-limited to the configured message rate.
- Persists progress after each batch.
If the node restarts, the manager resumes jobs that were owned by that node and were still marked as RUNNING.
Ownership and failover¶
Broadcast ownership exists to make sure that, in a cluster, each running job has exactly one active node responsible for sending messages and updating progress. Without this mechanism, multiple nodes could continue the same job at the same time and cause unnecessary duplication.
Ownership is lease-based.
When a broadcast starts, the node that accepted the request becomes the owner for a limited time window defined by lease_time.
While the job is healthy, that node keeps renewing its lease in the database, which signals to the rest of the cluster that the job is still actively managed.
If the owner node stops, loses database access for long enough, or otherwise cannot renew the lease, the lease eventually expires. At that point, another node can take over the job and continue it from the last persisted worker state. This is how unfinished broadcasts survive node loss and other interruptions without restarting from the beginning.
Workers may also be paused temporarily even if the node itself is still up. This happens when the local manager cannot reliably synchronize ownership state with the database. In that situation, the node stops actively progressing its local broadcast workers until synchronization succeeds again, which reduces the risk of processing a job whose ownership is uncertain. While this safety mode is active, starting new broadcasts may temporarily be unavailable.
Once synchronization recovers, the node re-checks which jobs it currently owns. Workers for jobs that are still owned by that node are resumed from their saved progress. Workers for jobs that are no longer owned stay stopped, because another node has already taken responsibility for them.
From an operator's perspective, this means a running broadcast can briefly pause during node restarts, database outages, or cluster instability, and later resume automatically. Some delivery duplication is still possible around failures, so clients should rely on the message IDs for deduplication if needed.