3.3.0 to 3.4.0
New field in Message Archive Management MUC entries: Sender ID
As a part of ensuring GDPR compliance, it is essential to be able to efficiently query MAM MUC data via sender ID (to retrieve user's personal data). Originally, the sender JID could be found only as a part of an encoded XML message element, so finding all items sent by a certain user would be extremely inefficient (or rather: anti-efficient). MongooseIM 3.4.0 uses a modified schema for MAM MUC backends which enables a more efficient extraction.
Below you may find migration instructions specific to your MAM backend.
RDBMS
Step 1
Please execute the following SQL statements on your MIM database:
MySQL
1 2 |
|
PostgreSQL
1 2 |
|
MSSQL
1 2 |
|
Step 2
Now you have a schema that is compatible with MIM 3.4.0 but isn't GDPR-compliant yet because the new column has no meaningful data.
Please pick your favourite scripting/programming language and populate the new column with the help of a dedicated script.
You'll need to iterate over the whole mam_muc_message
table with the following algorithm:
- Provide
message
column content to the script. - The script returns sender's JID as
username@server
string. You need to split it to get a separate username and server. - Select ID from
mam_server_user
by the username and server. If it doesn't exist, insert a new one (id
column is automatically incremented). - Update the
sender_id
column inmam_muc_message
with the retrieved ID.
Cassandra
Step 1
Please execute the following CQL statements on your MIM database:
1 2 3 4 |
|
Step 2
Now you have a schema that is compatible with MIM 3.4.0 but isn't GDPR-compliant yet because the new column has no meaningful data.
Please pick your favourite scripting/programming language and populate the new column with the help of a dedicated script.
You'll need to iterate over the whole mam_muc_message
table with the following algorithm:
- Extract the whole
mam_muc_message
table. Please make sure to use the paging feature of your Cassandra client, as the MAM tables tend to be very large.1
SELECT * FROM mam_muc_message;
- To make data extraction faster, MongooseIM stores 2 copies of the message in the table:
1 2 3 4 5 6
cqlsh:mongooseim> SELECT * FROM mam_muc_message WHERE id = 399582233150625537 ALLOW FILTERING; room_jid | with_nick | id | from_jid | message | nick_name -------------------------------+-----------+--------------------+----------+--------------------------------+----------- room-ad1d999b9e@muc.localhost | | 399582233150625537 | null | 0x8350000001...998de2fa8426837 | Sid room-ad1d999b9e@muc.localhost | Sid | 399582233150625537 | null | 0x8350000001...998de2fa8426837 | Sid
- The copy with an empty
with_nick
column must be updated. - Extract the sender's JID from the
message
column in the same way as described in the RDBMS migration section. By default cassandra backend uses theeterm
format. - Update the
from_jid
column with the value of the extracted sender's JID:1 2 3 4 5 6 7
cqlsh:mongooseim> UPDATE mam_muc_message SET from_jid = 'username@server' WHERE id = 399582233150625537 AND with_nick = '' AND room_jid = 'room-ad1d999b9e@muc.localhost'; cqlsh:mongooseim> SELECT * FROM mam_muc_message WHERE id = 399582233150625537 ALLOW FILTERING; room_jid | with_nick | id | from_jid | message | nick_name -------------------------------+-----------+--------------------+-----------------+--------------------------------+----------- room-ad1d999b9e@muc.localhost | | 399582233150625537 | username@server | 0x8350000001...998de2fa8426837 | Sid room-ad1d999b9e@muc.localhost | Sid | 399582233150625537 | null | 0x8350000001...998de2fa8426837 | Sid
Riak
Changes to Riak schema are backward compatible with the current MongooseIM release. This means that skipping the migration will cause only some of the new features (namely GDPR data retrival) to not work correctly.
Step 1
Please update the Riak schema:
1 2 3 4 5 6 |
|
After that we need to either reload all Riak nodes (restart them) or manually reload the schema on live nodes. Reloading the schema on live nodes requires access to Erlang Shell of one of the Riak nodes (any of them). The instruction on how to get to Riak's Erlang shell is beyond this guide, but if you manage to get to it, just call:
1 |
|
Step 2
After the schema is posted and reloaded, all "new" objects will be indexed properly as long they contain 2 new fields: msg_owner_jid
and mam_type
.
The new MongooseIM code will insert both of them for all new MAM entires, but for all existing ones need to have the fields added.
In order to do that, we need to create a migration script (just pick your favourite scripting/programming language) that will do the following for each object in each bucket of type mam_yz
(the object will be referred as obj
):
- Use this dedicated script to convert the
obj.packet_register
field value into a so called$SENDER_JID
. - If the script returns
$SENDER_JID
correctly: - set
obj.mam_type = 'muc'
- set
obj.msg_owner_jid = $SENDER_JID
- If the script returns error code
-2
- set
obj.mam_type = 'pm'
- based on
obj_yz_rk
formatted as$LOCAL_JID/$REMOTE_JID/$MSG_ID
, setobj.msg_owner_jid = $LOCAL_JID
- Save the modified
obj
ElasticSearch
Step 1
Please update the mapping for muc_messages
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
Step 2
Now you have a schema that is compatible with MIM 3.4.0 but isn't GDPR-compliant yet because the new field has no meaningful data.
Please pick your favourite scripting/programming language and populate the new column with the help of a dedicated script.
You'll need to iterate over the all muc_messages
documents with the following algorithm:
- Extract some documents (notice the
size
parameter) for conversion:1
GET muc_messages/_search/?size=100&q=!_exists_:from_jid
- Extract the sender's JID from the
message
field in the same way as described in the RDBMS migration section. Elasticsearch backend uses exclusively thexml
format. - Update the
from_jid
column with the value of the extracted sender's JID:1 2 3 4 5 6
POST localhost:9200/muc_messages/muc/%_id%/_update { "doc": { "from_jid" : "%sender's jid%" } }
- Repeat all the actions until the full conversion of the database is done.