XDCR Document Count Mis-match Troubleshooting Guide – Couchbase Support

The content of this article applies to Couchbase Server 6.6.X and 7.X

Issue Summary:

Cross Data Center Replication (XDCR) is a feature in Couchbase that allows you to replicate data between different clusters, typically located in different geographical locations. Document count mismatches between source and target clusters can occur for various reasons, and troubleshooting such issues may involve examining different aspects of the Couchbase setup.

At the end of the article, you will understand how to identify XDCR document count mismatch using xdcrDiff tool and what is the appropriate level of details to share with Support to expedite the investigation.

Useful Tool to understand the document count difference:

xdcrDiffer is a diffing tool used to confirm data consistency between XDCR clusters. By default, it compares the metadata of the documents to find if there are any data inconsistencies, lists them on screen, and also outputs the data to a JSON file. However, the default behavior can be configured to compare the entire document body or both.

Instructions to download and run the xdcrDiffer -

On any spare node (which is not part of the production cluster but has access to both Source and Target clusters) clone the git repository from xdcrDiffer.
In case of spare node unavailability, you may consider to install this tool on any of the non-kv nodes where you have enough system resources.
Install Golang on the node (which is a prerequisite) via the official website of GoLang.
Navigate to xdcrDiffer directory and then enter `make deps`
Once the above command executes successfully, enter `make`
After the above steps, you can execute the below command to identify the mutation difference between two clusters for a specific bucket.

./runDiffer.sh -u <username> -p <password> -h <source-cluster-node-ip>:8091 -r <XDCR-Remote-Cluster-Name> -s <Source-Bucket-Name> -t <Target-Bucket-Name>

After the successful execution, the results will be saved into mutationDiff directory.

DiffTool output representation:

Under mutationDiff, the results can be viewed as JSON files:

:~/xdcrDiffer/mutationDiff$ jsonpp mutationDiffDetails | head
{
  "Mismatch": {},
  "MissingFromSource": {},
  "MissingFromTarget": {
    "0": {
      "xdcrProv_C10": {
        "Value": "eyJuYW1lIjogInhkY3JQcm92X0MxMCIsICJhZ2UiOiAwLCAiaW5kZXgiOiAiMCIsICJib2R5IjoiMDAwMDAwMDAwMCJ9",
        "Flags": 0,
        "Datatype": 1,
        "Cas": 1620776636481929216
...

If there are no differences, then the result set will be empty. Otherwise, any differences will be shown as the above example as a JSON document.

The key of "0" represents the collection ID. For MissingFromTarget, the collection ID represents the target collection that the specific document should belong to. For MissingFromSource, the collectionID would represent the collection ID under the source bucket. For the Mismatch column, the collection ID would represent the collection ID for the source bucket.

Running the xdcrDiffer after tombstones have been purged will show documents as missing and not replicated. This could easily be mistaken as XDCR missed replicating documents.

Document Count Discrepancies: Exploring Corner Cases

XDCR Filters

Issue: Cross Data Center Replication (XDCR) allows data to be replicated across different clusters. If XDCR filters are applied, it can lead to differences in document count between the source and target clusters.
Recommendation: Ensure that XDCR filters are configured appropriately. Review and validate the filter criteria to confirm that the expected documents are being replicated.

Expirations

Issue: Couchbase documents can have expiration times set, and due to the nature of lazy expiration, even if a document's TTL has passed it may still contribute to the total item count until an attempt is made to access it or it is cleaned up by the expiry pager/compaction. Document count differences may occur between clusters if there are variations in the runtimes of expiry pager/compaction processes.
Details: Check the expiration settings for documents in both the source and target clusters. Additionally, understand the impact of expiration times on document synchronization between clusters. Another possible way is to run manual compaction on both the source and target clusters and verify whether the document count difference reduces.

Transactions Usage:

Issue: If transactions are used during document updates, inserts, or deletions, inconsistencies in document count may arise, especially if transactions are not committed or rolled back uniformly across all participating nodes.
Recommendation: Review transactional operations in your application code and ensure that transactions are appropriately committed or rolled back. Monitor transaction logs and statuses to identify any discrepancies.

Working with the Couchbase Technical Support:

The information that is required by Couchbase Technical Support to expedite the investigation of this issue is:

1] Couchbase Server logs from all the XDCR clusters (i.e source and target clusters)

2] Output of mutationDiff directory collected by running the xdcrDiff tool.

3] Need to know whether the replication is unidirectional or bidirectional.

4] Need to know whether transactions are being used? If transactions are being used, the vBuckets may contain “Active Transaction Records (ATRs)”, which are documents that begin with the prefix “atr” as documents.

5] Need to know whether any filtering mechanism is being used?

Related to

Related articles