Product: Couchbase Server Components: Analytics-Service, Columnar Issue Link: MB-62713 Affects Versions: 7.2.8, 7.6.8, 8.0.0 Fix Versions: 7.2.9, 7.6.10, 8.0.1
Summary
The buffer cache in the Analytics Service involves the pinning and un-pinning of pages. This can sometimes run into an issue that causes the pinCount for a specific page (or specific pages) to have a value of -1.
Typically, this can be triggered when a query for a specific collection gets interrupted (usually due to query timeouts, cancellations or interruptions), whilst it is fetching data from the underlying disk B-tree structure.
The invalid pinCount then causes issues when carrying out actions related to the affected collection(s) until the below described workaround is actioned.
Symptoms
Please note that not all of the following symptoms are required to deem this issue as present.
- Specific collections are reported as unavailable.
- The Analytics Service is reported as temporarily unavailable.
- "msg": "Internal error" messages are returned following query execution.
- Can see numerous messages akin to the following in the logging:
Cannot execute request, cluster is RECOVERING & Internal Server Error,
Triggers
- The “double unpin” of a page occurs during a B-Tree search operation.
- Query interruptions can cause this.
Verification
- Look into the ns_server.analytics_warn.log diagnostics file during the timeframe of the issue.
- If you can find a warning message with a similar stack trace as follows, then we can confirm the presence of this issue. Specifically, we are looking for the decrementAndGet: Invalid pinCount: -1 in page: message, alongside the stack trace including messages similar to BTree.Search:
{YYYY-MM-DDThh:mm:ss}+{timezone difference from UTC} WARN CBAS.nc.Task [SA:JID:2.13094:TAID:TID:ANID:ODID:1:1:8:0:0] Task failed with exception
org.apache.hyracks.api.exceptions.HyracksDataException: java.lang.IllegalStateException: decrementAndGet: Invalid pinCount: -1 in page: org.apache.hyracks.storage.common.buffercache.CachedPage@59fca1c0
at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:47) ~[hyracks-api-8.0.0-3777.jar:8.0.0-3777]
at org.apache.hyracks.storage.am.common.dataflow.IndexSearchOperatorNodePushable.nextFrame(IndexSearchOperatorNodePushable.java:297) ~[hyracks-storage-am-common-8.0.0-3777.jar:8.0.0-3777]
at org.apache.hyracks.dataflow.common.comm.util.FrameUtils.flushFrame(FrameUtils.java:50) ~[hyracks-dataflow-common-8.0.0-3777.jar:8.0.0-3777]
at org.apache.hyracks.dataflow.std.sort.AbstractExternalSortRunMerger.merge(AbstractExternalSortRunMerger.java:199) ~[hyracks-dataflow-std-8.0.0-3777.jar:8.0.0-3777]
Caused by: java.lang.IllegalStateException: decrementAndGet: Invalid pinCount: -1 in page: org.apache.hyracks.storage.common.buffercache.CachedPage@59fca1c0
at org.apache.hyracks.storage.common.buffercache.CachedPage.decrementAndGetPinCount(CachedPage.java:73) ~[hyracks-storage-common-8.0.0-3777.jar:8.0.0-3777]
at org.apache.hyracks.storage.common.buffercache.BufferCache.unpin(BufferCache.java:589) ~[hyracks-storage-common-8.0.0-3777.jar:8.0.0-3777]
at org.apache.hyracks.storage.am.btree.impls.DiskBTree.searchDown(DiskBTree.java:140) ~[hyracks-storage-am-btree-8.0.0-3777.jar:8.0.0-3777]
at org.apache.hyracks.storage.am.btree.impls.DiskBTree.search(DiskBTree.java:105) ~[hyracks-storage-am-btree-8.0.0-3777.jar:8.0.0-3777]
Workarounds
- Our foremost recommendation would be to upgrade to a version of Couchbase Server with the fix for this issue implemented, i.e. 7.2.9, 7.6.10, and/or 8.0.1.
- If an upgrade isn’t available, then a workaround that can be actioned would be to call the Analytics Cluster Restart API as per the documentation, which would clear the memory state of the buffer cache residing within the Analytics Service, which includes the invalid pinCount.
- Following this restart, the Analytics Service will then take a short while to complete bootstrapping, and after this, queries should be able to function as expected again.
Comments
0 comments
Article is closed for comments.