EDB Postgres Distributed (PGD) v4 - Commit At Most Once (2024)

The objective of the Commit At Most Once (CAMO) feature is to preventthe application from committing more than once.

Without CAMO, when a client loses connection after a COMMIT issubmitted, the application might not receive a reply from the serverand is therefore unsure whether the transaction committed.

The application can't easily decide between the two options of:

  • Retrying the transaction with the same data, since this can in some cases cause the data to be entered twice

  • Not retrying the transaction and risk that the data doesn't getprocessed at all

Either of those is a critical error with high-value data.

One way to avoid this situation is to make sure that the transactionincludes at least one INSERT into a table with a unique index, butthat depends on the application design and requires application-specific error-handling logic, so it isn't effective in all cases.

The CAMO feature in BDR offers a more general solution and doesn't require an INSERT. When activated bybdr.enable_camo or bdr.commit_scope, the applicationreceives a message containing the transaction identifier, if alreadyassigned. Otherwise, the first write statement in a transactionsends that information to the client.If the application sends an explicit COMMIT, the protocol ensures that the application receives the notificationof the transaction identifier before the COMMIT is sent.If the server doesn't reply to the COMMIT, the application canhandle this error by using the transaction identifier to requestthe final status of the transaction from another BDR node.If the prior transaction status is known, then the application can safelydecide whether to retry the transaction.

CAMO works in one of two modes:

  • Pair mode
  • With Eager All Node Replication

In pair mode, CAMO works by creating a pair of partner nodes thatare two BDR master nodes from the same top-level BDR group. In this operation mode,each node in the pair knows the outcome of any recent transaction executedon the other peer and especially (for our need) knows the outcome of anytransaction disconnected during COMMIT.The node that receives the transactions fromthe application might be referred to as "origin" and the node that confirms these transactions as "partner."However, there's no difference in the CAMO configuration for the nodes in theCAMO pair. The pair is symmetric.

When combined with Eager All-Node Replication, CAMOenables every peer (that is, a full BDR master node) to act as a CAMO partner.No designated CAMO partner must be configured in this mode.

Warning

CAMO requires changes to the user's applicationto take advantage of the advanced error handling. Enabling a parameter isn't enough to gain protection. Reference client implementationsare provided to customers on request.

Requirements

To use CAMO, an application must issue an explicit COMMIT messageas a separate request (not as part of a multi-statement request).CAMO can't provide status for transactions issued from proceduresor from single-statement transactions that use implicit commits.

Configuration

Assume an existing EDB Postgres Distributed cluster consists of the nodes node1 andnode2. Both nodes are part of a BDR-enabled database called bdrdemo, and both partof the same node group mygroup. You can configure the nodesto be CAMO partners for each other.

  1. Create the EDB Postgres Distributed cluster where nodes node1 and node2 are part of themygroup node group.

  2. Run the function bdr.add_camo_pair() on one node:

    SELECT bdr.add_camo_pair('mygroup', 'node1', 'node2');
  3. Adjust the application to use the COMMIT error handling that CAMO suggests.

We don't recommend enabling CAMO at the server level, as this imposeshigher latency for all transactions, even when not needed. Instead,selectively enable it for individual transactionsby turning on CAMO at the session or transaction level.

To enable CAMO at the session level:

SET bdr.enable_camo = 'remote_commit_flush';

To enable CAMO for individual transactions, after starting thetransaction and before committing it:

SET LOCAL bdr.enable_camo = 'remote_commit_flush';

Valid values for bdr.enable_camo that enable CAMO are:

  • off (default)
  • remote_write
  • remote_commit_async
  • remote_commit_flush or on

See the Comparison of synchronous replicationmodes for details about how each mode behaves.Setting bdr.enable_camo = off disables this feature, which is the default.

CAMO with Eager All-Node Replication

To use CAMO with Eager All-Node Replication, no changes are requiredon either node. It is enough to enable the global commitscope after the start of the transaction. You don't need to setbdr.enable_camo.

The application still needs to be adjusted to use COMMIT errorhandling as specified but is free to connect to any available BDRnode to query the transaction's status.

Failure scenarios

Different failure scenarios occur in differentconfigurations.

Data persistence at receiver side

By default, a PGL writer operates inbdr.synchronous_commit = off mode when applying transactionsfrom remote nodes. This holds true for CAMO as well, meaning thattransactions are confirmed to the origin node possibly before reachingthe disk of the CAMO partner. In case of a crash or hardware failure,it is possible for a confirmed transaction to be unrecoverable on theCAMO partner by itself. This isn't an issue as long as the CAMOorigin node remains operational, as it redistributes thetransaction once the CAMO partner node recovers.

This in turn means CAMO can protect against a single-node failure,which is correct for local mode as well as or even in combinationwith remote write.

To cover an outage of both nodes of a CAMO pair, you can usebdr.synchronous_commit = local to enforce a flush prior to thepre-commit confirmation. This doesn't work witheither remote write or local mode and has a performanceimpact due to I/O requirements on the CAMO partner in thelatency sensitive commit path.

Local mode

When synchronous_replication_availability = 'async', a node(i.e., master) detects whether its CAMO partner isready. If not, it temporarily switches to local mode.When in local mode, a node commits transactions locally untilswitching back to CAMO mode.

This doesn't allow COMMIT status to be retrieved, but it doeslet you choose availability over consistency. This modecan tolerate a single-node failure. In case both nodes of a CAMO pairfail, they might choose incongruent commit decisions to maintainavailability, leading to data inconsistencies.

For a CAMO partner to switch to ready, it needs to be connected, andthe estimated catchup interval needs to drop belowbdr.global_commit_timeout. The current readiness status of a CAMOpartner can be checked with bdr.is_camo_partner_ready, whilebdr.node_replication_rates provides the current estimate of the catchuptime.

The switch from CAMO protected to local mode is only ever triggered byan actual CAMO transaction either because the commit exceeds thebdr.global_commit_timeout or, in case the CAMO partner is alreadyknown, disconnected at the time of commit. This switch is independentof the estimated catchup interval. If the CAMO pair is configured torequire Raft to switch to local mode, this switch requires amajority of nodes to be operational (see the require_raft flag forbdr.add_camo_pair). This can prevent asplit brain situation due to an isolated node from switching to localmode. If require_raft isn't set for the CAMO pair, the origin nodeswitches to local mode immediately.

You can configure the detection on the sending node using PostgreSQLsettings controlling keep-alives and timeouts on the TCP connection tothe CAMO partner.The wal_sender_timeout is the time that a node waitsfor a CAMO partner until switching to local mode. Additionally,the bdr.global_commit_timeout setting puts a per-transactionlimit on the maximum delay a COMMIT can incur due to theCAMO partner being unreachable. It might be lower than thewal_sender_timeout, which influences synchronous standbys aswell, and for which a good compromise between responsiveness andstability must be found.

The switch from local mode to CAMO mode depends on the CAMO partnernode, which initiates the connection. The CAMO partner tries toreconnect at least every 30 seconds. After connectivity isreestablished, it might therefore take up to 30 seconds until the CAMOpartner connects back to its origin node. Any lag that accumulated onthe CAMO partner further delays the switch back to CAMO protectedmode.

Unlike during normal CAMO operation, in local mode there's noadditional commit overhead. This can be problematic, as it allows thenode to continuously process more transactions than the CAMOpair can normally process. Even if the CAMO partner eventuallyreconnects and applies transactions, its lag only ever increasesin such a situation, preventing reestablishing the CAMO protection.To artificially throttle transactional throughput, BDR provides thebdr.camo_local_mode_delay setting, which allows you to delay a COMMIT inlocal mode by an arbitrary amount of time. We recommend measuringcommit times in normal CAMO mode during expected workloads andconfiguring this delay accordingly. The default is 5 ms, which reflectsa local network and a relatively quick CAMO partner response.

Consider the choice of whether to allow local mode in view ofthe architecture and the availability requirements. The following examples provide some detail.

Example: Symmetric node pair

This example considers a setup with two BDR nodes that are theCAMO partner of each other.This is the only possible configuration starting with BDR4.

This configuration enables CAMO behavior on both nodes. It'stherefore suitable for workload patterns where it is acceptable towrite concurrently on more than one node, such as in cases that aren'tlikely to generate conflicts.

With local mode

If local mode is allowed, there's no single point of failure. When one node fails:

  • The other node can determine the status of all transactions thatwere disconnected during COMMIT on the failed node.
  • New write transactions are allowed:
    • If the second node also fails, then the outcome of thosetransactions that were being committed at that time isunknown.

Without local mode

If local mode isn't allowed, then each node requires the other nodefor committing transactions, that is, each node is a single point offailure. When one node fails:

  • The other node can determine the status of all transactions thatwere disconnected during COMMIT on the failed node.
  • New write transactions are prevented until the node recovers.

Application use

Overview and requirements

CAMO relies on a retry loop and specific error handlingon the client side. There are three aspects to it:

  • The result of a transaction's COMMIT needs to be checked and, incase of a temporary error, the client must retry the transaction.
  • Prior to COMMIT, the client must retrieve a globalidentifier for the transaction, consisting of a node id and atransaction id (both 32-bit integers).
  • If the current server fails while attempting a COMMIT of a transaction,the application must connect to its CAMO partner, retrieve the statusof that transaction, and retry depending on the response.

The application must store the global transactionidentifier only for the purpose of verifying the transaction status incase of disconnection during COMMIT. In particular, the applicationdoesn't need an additional persistence layer. If the applicationfails, it needs only the information in the database to restart.

Adding a CAMO pair

The function bdr.add_camo_pair() configures an existing pair of BDRnodes to work as a symmetric CAMO pair.

The require_raft option controls how and when to switch to localmode in case synchronous_replication_availability is set to async,allowing such a switch in general.

Synopsis

bdr.add_camo_pair(node_group text, left_node text, right_node text, require_raft boolean)

Note

The names left and right have no special meaning.

Note

Since BDR version 4.0, only symmetric CAMO configurations aresupported, that is, both nodes of the pair act as a CAMO partner foreach other.

Changing the configuration of a CAMO pair

The function bdr.alter_camo_pair() allows you to toggle therequire_raft You can't currently changethe nodes of a pairing. You must instead use bdr.remove_camo_pair followed bybdr.add_camo_pair.

Synopsis

bdr.alter_camo_pair(node_group text, left_node text, right_node text, require_raft boolean)

Removing a CAMO pair

The function bdr.remove_camo_pair() removes a CAMO pairing of twonodes and disallows future use of CAMO transactions bybdr.enable_camo on those two nodes.

Synopsis

bdr.remove_camo_pair(node_group text, left_node text, right_node text)

Note

The names left and right have no special meaning.

CAMO partner connection status

The function bdr.is_camo_partner_connected allows checking theconnection status of a CAMO partner node configured in pair mode.There currently is no equivalent for CAMO used withEager Replication.

Synopsis

bdr.is_camo_partner_connected()

Return value

A Boolean value indicating whether the CAMO partner is currentlyconnected to a WAL sender process on the local node and therefore canreceive transactional data and send back confirmations.

CAMO partner readiness

The function bdr.is_camo_partner_ready allows checking the readinessstatus of a CAMO partner node configured in pair mode. Underneath,this triggers the switch to and from local mode.

Synopsis

bdr.is_camo_partner_ready()

Return value

A Boolean value indicating whether the CAMO partner can reasonably beexpected to confirm transactions originating from the local node in atimely manner (before bdr.global_commit_timeout expires).

Note

This function queries the past or current state. Apositive return value doesn't indicate whether the CAMO partner canconfirm future transactions.

Fetch the CAMO partner

This function shows the local node's CAMO partner (configured by pairmode).

bdr.get_configured_camo_partner()

Wait for consumption of the apply queue from the CAMO partner

The function bdr.wait_for_camo_partner_queue is a wrapper ofbdr.wait_for_apply_queue defaulting to query the CAMO partner node.It yields an error if the local node isn't part of a CAMO pair.

Synopsis

bdr.wait_for_camo_partner_queue()

Transaction status between CAMO nodes

This function enables a wait for CAMO transactions to be fully resolved.

bdr.camo_transactions_resolved()

Transaction status query function

To check the status of a transaction that was being committed when the nodefailed, the application must use this function:

bdr.logical_transaction_status(node_id OID, xid OID, require_camo_partner boolean)

With CAMO used in pair mode, use this function only ona node that's part of a CAMO pair. Along with Eagerreplication, you can use it on all nodes.

In both cases, you must call the function within 15 minutes afterthe commit was issued. The CAMO partner must regularly purgesuch meta-information and therefore can't provide correct answers forolder transactions.

Before querying the status of a transaction, this function waits forthe receive queue to be consumed and fully applied. This preventsearly negative answers for transactions that werereceived but not yet applied.

Despite its name, it's not always a read-only operation.If the status is unknown, the CAMO partner decides whether tocommit or abort the transaction, storing that decision locally toensure consistency going forward.

The client must not call this function beforeattempting to commit on the origin. Otherwise the transaction might beforced to roll back.

Synopsis

bdr.logical_transaction_status(node_id OID, xid OID, require_camo_partner boolean DEFAULT true)

Parameters

  • node_id — The node id of the BDR node the transaction originatesfrom, usually retrieved by the client before COMMIT from the PQparameter bdr.local_node_id.
  • xid — The transaction id on the origin node, usually retrieved bythe client before COMMIT from the PQ parameter transaction_id(requires enable_camo to be set to on, remote_write,remote_commit_async, or remote_commit_flush. SeeCommit At Most Once settings)
  • require_camo_partner — Defaults to true and enables configurationchecks. Set to false to disable these checks and query thestatus of a transaction that was protected by Eager All-NodeReplication.

Return value

The function returns one of these results:

  • 'committed'::TEXT — The transaction was committed, is visibleon both nodes of the CAMO pair, and will eventually be replicated toall other BDR nodes. No need for the client to retry it.

  • 'aborted'::TEXT — The transaction was aborted and will not bereplicated to any other BDR node. The client needs to eitherretry it or escalate the failure to commit the transaction.

  • 'in progress'::TEXT — The transaction is still in progress on thislocal node and wasn't committed or aborted yet. The transaction might be in the COMMIT phase, waiting forthe CAMO partner to confirm or deny the commit. The recommendedclient reaction is to disconnect from the origin node and reconnectto the CAMO partner to query that instead. With a load balancer or proxyin between, where the client lacks control over which node getsqueried, the client can only poll repeatedly until the statusswitches to either 'committed' or 'aborted'.

    For Eager All-Node Replication, peer nodes yield this result fortransactions that aren't yet committed or aborted. This means thateven transactions not yet replicated (or not even started on theorigin node) might yield an in progress result on a peer BDR node inthis case. However, the client must not query the transactionstatus prior to attempting to commit on the origin.

  • 'unknown'::TEXT — The transaction specified is unknown, eitherbecause it's in the future, not replicated to that specific nodeyet, or too far in the past. The status of such a transaction isnot yet or no longer known. This return value is a sign of improperuse by the client.

The client must be prepared to retry the function call on error.

Connection pools and proxies

The effect of connection pools and proxies needs to be considered whendesigning a CAMO cluster. A proxy may freely distribute transactionsto all nodes in the commit group (i.e. to both nodes of a CAMO pair orto all BDR nodes in case of Eager All Node Replication).

Care needs to be taken to ensure that the application fetchesthe proper node id: when using session pooling, the client remainsconnected to the same node, so the node id remains constant for thelifetime of the client session. However, with finer-grained transactionpooling, the client needs to fetch the node id for every transaction (asin the example given below).

A client that is not directly connected to the BDR nodes might not evennotice a failover or switchover, but can always use thebdr.local_node_id parameter to determine which node it is currentlyconnected to. In the crucial situation of a disconnect during COMMIT,the proxy must properly forward that disconnect as an error to theclient applying the CAMO protocol.

For CAMO in remote_write mode, a proxy that potentially switchesbetween the CAMO pairs must use the bdr.wait_for_camo_partner_queuefunction to prevent stale reads.

HARP is the only proxy that supports all of the above requirements.PgBouncer and HAproxy can work with CAMO, but do not support CAMO'sremote_write mode.

Example

The following example demonstrates what a retry loop of a CAMO awareclient application should look like in C-like pseudo-code. It expectstwo DSNs origin_dsn and partner_dsn providing connection information.These usually are the same DSNs as used for the initial call tobdr.create_node, and can be looked up in bdr.node_summary, columninterface_connstr.

PGconn *conn = PQconnectdb(origin_dsn);loop { // start a transaction PQexec(conn, "BEGIN"); // apply transactional changes PQexec(conn, "INSERT INTO ..."); ... // store a globally unique transaction identifier node_id = PQparameterStatus(conn, "bdr.local_node_id"); xid = PQparameterStatus(conn, "transaction_id"); // attempt to commit PQexec(conn, "COMMIT"); if (PQresultStatus(res) == PGRES_COMMAND_OK) return SUCCESS; else if (PQstatus(res) == CONNECTION_BAD) { // Re-connect to the partner conn = PQconnectdb(partner_dsn); // Check if successfully reconnected if (!connectionEstablished()) panic(); // Check the attempted transaction's status sql = "SELECT bdr.logical_transaction_status($node_id, $xid)"; txn_status = PQexec(conn, sql); if (txn_status == "committed") return SUCCESS; else continue; // to retry the transaction on the partner } else { // The connection is intact, but the transaction failed for some // other reason. Differentiate between permanent and temporary // errors. if (isPermanentError()) return FAILURE; else { // Determine an appropriate delay to back-off to account for // temporary failures due to congestion, so as to decrease // the overall load put on the servers. sleep(increasing_retry_delay); continue; } }}

This example needs to be extended with proper logic for connecting, includingretries and error handling. If using a load balancer(e.g. PgBouncer), re-connecting can be implemented by simply usingPQreset. Ensure that the load balancer onlyever redirects a client to a CAMO partner and not any other BDR node.

In practice, an upper limit of retries is recommended. Depending on theactions performed in the transaction, other temporary errors may bepossible and need to be handled by retrying the transaction dependingon the error code, similarly to the best practices on deadlocks or onserialization failures while in SERIALIZABLE isolation mode.

Interaction with DDL and global locks

Transactions protected by CAMO can contain DDL operations. However, DDL uses global locks, which already provide somesynchronization among nodes. SeeDDL locking details for moreinformation.

Combining CAMO with DDL imposes a higher latency and alsoincreases the chance of global deadlocks. We therefore recommend using arelatively low bdr.global_lock_timeout, which aborts the DDL andtherefore resolves a deadlock in a reasonable amount of time.

Nontransactional DDL

The following DDL operations aren't allowed in a transactionblock and therefore can't benefit from CAMO protection. Forthese, CAMO is automatically disabled internally:

  • all concurrent index operations (CREATE, DROP, and REINDEX)
  • REINDEX DATABASE, REINDEX SCHEMA, and REINDEX SYSTEM
  • VACUUM
  • CLUSTER without any parameter
  • ALTER TABLE DETACH PARTITION CONCURRENTLY
  • ALTER TYPE [enum] ADD VALUE
  • ALTER SYSTEM
  • CREATE and DROP DATABASE
  • CREATE and DROP TABLESPACE
  • ALTER DATABASE [db] TABLESPACE

CAMO limitations

  • CAMO is designed to query the results of a recently failed COMMIT onthe origin node, so in case of disconnection, code the applicationto immediately request the transaction status from the CAMO partner.Have as little delay as possible after the failure beforerequesting the status. Applications must not rely on CAMO decisionsbeing stored for longer than 15 minutes.

  • If the application forgets the global identifier assigned, for exampleas a result of a restart, there's no easy way to recoverit. Therefore, we recommend that applications wait for outstandingtransactions to end before shutting down.

  • For the client to apply proper checks, a transaction protected by CAMOcan't be a single statement with implicit transaction control. You also can'tuse CAMO with a transaction-controlling procedure orin a DO block that tries to start or end transactions.

  • CAMO resolves commit status but doesn't yet resolve pendingnotifications on commit. CAMO and Eager replication options don'tallow the NOTIFY SQL command or the pg_notify() function.They also don't allow LISTEN or UNLISTEN.

  • When replaying changes, CAMO transactions may detect conflicts justthe same as other transactions. If timestamp conflict detection is used,the CAMO transaction uses the timestamp of the prepare on the originnode, which is before the transaction becomes visible on the originnode itself.

  • CAMO is not currently compatible with transaction streaming. Pleaseensure to disable transaction streaming when planning to useCAMO. This can be configured globally or in the BDR node group, seeTransaction Streaming Configuration.

Performance implications

CAMO extends the Postgres replication protocol by adding amessage roundtrip at commit. Applications have a highercommit latency than with asynchronous replication, mostly determinedby the roundtrip time between involved nodes. Increasing the numberof concurrent sessions can help to increase parallelism toobtain reasonable transaction throughput.

The CAMO partner confirming transactions must store transactionstates. Compared to non-CAMO operation, this might require anadditional seek for each transaction applied from the origin.

Client application testing

Proper use of CAMO on the client side isn't trivial. We stronglyrecommend testing the application behavior with the BDRcluster against failure scenarios such as node crashes or networkoutages.

CAMO versus group commit

CAMO doesn't currently work withgroup commit.

EDB Postgres Distributed (PGD) v4 - Commit At Most Once (2024)
Top Articles
Nova Scotia in Canada bezienswaardigheden, tips & ervaringen
Nova Scotia Duck Tolling Retriever | zooplus Magazin
Regal Amc Near Me
Quick Pickling 101
Pickswise the Free Sports Handicapping Service 2023
Pj Ferry Schedule
Stolen Touches Neva Altaj Read Online Free
AB Solutions Portal | Login
Lqse-2Hdc-D
Shariraye Update
Herbalism Guide Tbc
Hmr Properties
Diablo 3 Metascore
Chris Hipkins Fue Juramentado Como El Nuevo Primer Ministro De...
Bitlife Tyrone's
Truth Of God Schedule 2023
Craigslist Free Stuff Greensboro Nc
Abortion Bans Have Delayed Emergency Medical Care. In Georgia, Experts Say This Mother’s Death Was Preventable.
Harem In Another World F95
Everything We Know About Gladiator 2
Loves Employee Pay Stub
Mccain Agportal
Amih Stocktwits
Eine Band wie ein Baum
Yisd Home Access Center
Chime Ssi Payment 2023
Craigslist Dubuque Iowa Pets
Rugged Gentleman Barber Shop Martinsburg Wv
Sensual Massage Grand Rapids
Salemhex ticket show3
Yoshidakins
Ma Scratch Tickets Codes
Lake Dunson Robertson Funeral Home Lagrange Georgia Obituary
Roto-Rooter Plumbing and Drain Service hiring General Manager in Cincinnati Metropolitan Area | LinkedIn
Emerge Ortho Kronos
Page 5662 – Christianity Today
Bismarck Mandan Mugshots
Restored Republic May 14 2023
Nsav Investorshub
Walmart Car Service Near Me
Arcane Bloodline Pathfinder
Flappy Bird Cool Math Games
Sam's Club Gas Price Sioux City
Online College Scholarships | Strayer University
Suppress Spell Damage Poe
Craigslist Charles Town West Virginia
Read Love in Orbit - Chapter 2 - Page 974 | MangaBuddy
What Is The Gcf Of 44J5K4 And 121J2K6
Att Corporate Store Location
Ok-Selection9999
Latest Posts
Article information

Author: Manual Maggio

Last Updated:

Views: 5987

Rating: 4.9 / 5 (69 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Manual Maggio

Birthday: 1998-01-20

Address: 359 Kelvin Stream, Lake Eldonview, MT 33517-1242

Phone: +577037762465

Job: Product Hospitality Supervisor

Hobby: Gardening, Web surfing, Video gaming, Amateur radio, Flag Football, Reading, Table tennis

Introduction: My name is Manual Maggio, I am a thankful, tender, adventurous, delightful, fantastic, proud, graceful person who loves writing and wants to share my knowledge and understanding with you.