โ† Back to Playground

MongoDB vs PostgreSQL

Change Data Capture Deep Dive & Comparative Analysis

๐Ÿš€ Why This Matters: Real-time Data = Competitive Advantage

Modern applications require real-time responsiveness, but traditional CDC solutions are complex to implement and maintain. PostgreSQL's CDC approaches often require external tools like Debezium, complex WAL configurations, or trigger-based solutions that impact performance. Teams spend weeks setting up reliable CDC infrastructure and months debugging edge cases.

MongoDB's differentiator: Change Streams provide native, real-time change detection with zero external dependencies. What takes PostgreSQL teams weeks to implement with multiple moving parts becomes a 10-line MongoDB script with built-in fault tolerance and automatic scaling.

Real Impact: Teams reduce CDC implementation time from weeks to hours, eliminate operational overhead of external tools, and achieve sub-10ms change detection latency out of the box.

๐Ÿ“‹ What This Demo Shows

โ–ผ

Change Data Capture (CDC) Overview

What is Change Data Capture?

Change Data Capture (CDC) is a software design pattern that identifies and captures changes made to data in a database, then delivers those changes in real-time to downstream consumers like applications, data warehouses, or analytics systems.

๐Ÿ“Š

Real-time Analytics

Enable real-time business intelligence and analytics by streaming data changes as they occur.

๐Ÿ”„

Data Synchronization

Keep multiple systems in sync by propagating changes across different databases and applications.

๐Ÿ“

Audit Trails

Maintain comprehensive audit logs by capturing all data modifications with timestamps.

โšก

Event-Driven Architecture

Build reactive systems that respond to data changes in real-time.

M

MongoDB Approach

Change Streams: Native, real-time change detection built into the database core. Leverages the oplog (operations log) to provide a unified API for watching changes across collections, databases, or entire clusters.

  • Built-in feature since MongoDB 3.6
  • Works with replica sets and sharded clusters
  • Aggregation pipeline integration
  • Resume tokens for fault tolerance
P

PostgreSQL Approach

Multiple Methods: PostgreSQL offers several CDC approaches including logical replication, triggers, LISTEN/NOTIFY, and third-party solutions like Debezium that leverage Write-Ahead Logs (WAL).

  • Logical replication (PostgreSQL 10+)
  • Trigger-based solutions
  • LISTEN/NOTIFY for simple notifications
  • WAL-based tools (Debezium, etc.)

MongoDB Change Streams Deep Dive

๐ŸŒŸ Key Advantages of MongoDB Change Streams

MongoDB's Change Streams provide a native, scalable, and developer-friendly approach to CDC that's deeply integrated with the database architecture.

Basic Change Stream Implementation

// Watch for changes on a collection const changeStream = db.collection('users').watch(); changeStream.on('change', (change) => { console.log('Change detected:', change); switch(change.operationType) { case 'insert': console.log('New document:', change.fullDocument); break; case 'update': console.log('Updated fields:', change.updateDescription); break; case 'delete': console.log('Deleted document ID:', change.documentKey); break; } });
# Python implementation import pymongo from pymongo import MongoClient client = MongoClient('mongodb://localhost:27017/') db = client.myapp collection = db.users # Watch for changes with collection.watch() as stream: for change in stream: operation = change['operationType'] if operation == 'insert': print(f"New user: {change['fullDocument']}") elif operation == 'update': print(f"User updated: {change['updateDescription']}") elif operation == 'delete': print(f"User deleted: {change['documentKey']}")
๐Ÿ”„

Real-time Updates

Changes are streamed in real-time as they occur, with minimal latency.

๐Ÿ“‹

Operation Types

Captures insert, update, delete, replace, and invalidate operations.

Advanced Change Stream Features

Resume Tokens & Fault Tolerance
// Resume from a specific point in time let resumeToken = null; const watchCollection = () => { const pipeline = [ { $match: { 'fullDocument.status': 'active' } } ]; const options = resumeToken ? { resumeAfter: resumeToken } : {}; const changeStream = db.collection('users').watch(pipeline, options); changeStream.on('change', (change) => { resumeToken = change._id; // Store resume token handleChange(change); }); changeStream.on('error', (error) => { console.error('Change stream error:', error); // Implement retry logic setTimeout(watchCollection, 5000); }); };
Full Document Lookup
// Get full document on updates const changeStream = db.collection('products').watch([], { fullDocument: 'updateLookup' }); changeStream.on('change', (change) => { if (change.operationType === 'update') { // change.fullDocument contains the complete updated document console.log('Updated product:', change.fullDocument); } });
99.9%
Availability
<10ms
Latency
1M+
Ops/sec

Cluster-wide Change Streams

// Watch entire database const dbChangeStream = db.watch(); // Watch entire cluster (MongoDB 4.0+) const clusterChangeStream = client.watch(); // Watch multiple collections const pipeline = [ { $match: { $or: [ { 'ns.coll': 'users' }, { 'ns.coll': 'orders' }, { 'ns.coll': 'products' } ] } } ]; const multiCollectionStream = db.watch(pipeline);
Sharded Cluster Support

Native Sharding Support: Change streams work seamlessly across sharded clusters, automatically handling shard key distribution and routing. Each shard contributes its changes to a unified stream.

// Sharded cluster change stream const shardedChangeStream = mongos.watch([ { $match: { 'fullDocument.region': { $in: ['US', 'EU'] } } } ], { fullDocument: 'updateLookup' });

Advanced Filtering with Aggregation Pipeline

// Complex filtering pipeline const pipeline = [ // Match specific operations and conditions { $match: { $and: [ { operationType: { $in: ['insert', 'update'] } }, { 'fullDocument.priority': 'high' }, { 'fullDocument.department': 'engineering' } ] } }, // Project only needed fields { $project: { _id: 1, operationType: 1, 'fullDocument.name': 1, 'fullDocument.email': 1, 'fullDocument.lastModified': 1 } } ]; const filteredStream = db.collection('employees').watch(pipeline);
Time-based Filtering
// Filter changes from last hour const oneHourAgo = new Date(Date.now() - 60 * 60 * 1000); const timeFilteredStream = db.collection('analytics').watch([ { $match: { 'fullDocument.timestamp': { $gte: oneHourAgo }, 'fullDocument.eventType': 'user_action' } } ]);

โœ… MongoDB Change Streams Advantages

  • Native database feature - no external tools needed
  • Works across replica sets and sharded clusters
  • Aggregation pipeline filtering capabilities
  • Resume tokens for fault tolerance
  • Low latency real-time streaming
  • Automatic handling of cluster topology changes
  • No impact on application performance

โš ๏ธ Considerations

  • Requires replica set (not available on standalone)
  • Limited to MongoDB 3.6+ (full features in 4.0+)
  • Resume window limited by oplog size
  • Large result sets may impact performance

PostgreSQL Change Data Capture Methods

Trigger-based Change Detection

-- Create audit table CREATE TABLE user_audit ( id SERIAL PRIMARY KEY, user_id INTEGER, operation VARCHAR(10), old_data JSONB, new_data JSONB, changed_at TIMESTAMP DEFAULT NOW(), changed_by VARCHAR(100) ); -- Create trigger function CREATE OR REPLACE FUNCTION audit_user_changes() RETURNS TRIGGER AS $$ BEGIN IF TG_OP = 'DELETE' THEN INSERT INTO user_audit (user_id, operation, old_data, changed_by) VALUES (OLD.id, 'DELETE', row_to_json(OLD)::jsonb, current_user); RETURN OLD; ELSIF TG_OP = 'UPDATE' THEN INSERT INTO user_audit (user_id, operation, old_data, new_data, changed_by) VALUES (OLD.id, 'UPDATE', row_to_json(OLD)::jsonb, row_to_json(NEW)::jsonb, current_user); RETURN NEW; ELSIF TG_OP = 'INSERT' THEN INSERT INTO user_audit (user_id, operation, new_data, changed_by) VALUES (NEW.id, 'INSERT', row_to_json(NEW)::jsonb, current_user); RETURN NEW; END IF; END; $$ LANGUAGE plpgsql; -- Create trigger CREATE TRIGGER user_audit_trigger AFTER INSERT OR UPDATE OR DELETE ON users FOR EACH ROW EXECUTE FUNCTION audit_user_changes();

โœ… Trigger Advantages

  • Simple to implement and understand
  • Automatic execution on data changes
  • Can capture detailed change information
  • Works with any PostgreSQL version

โš ๏ธ Trigger Limitations

  • Performance overhead on writes
  • Synchronous execution blocks transactions
  • Difficult to scale
  • Manual setup for each table
  • Can cause cascading trigger issues

Logical Replication (PostgreSQL 10+)

-- Enable logical replication -- In postgresql.conf: -- wal_level = logical -- max_replication_slots = 4 -- max_wal_senders = 4 -- Create publication CREATE PUBLICATION user_changes FOR TABLE users; -- Create replication slot SELECT pg_create_logical_replication_slot('user_slot', 'pgoutput'); -- Subscribe to changes (on subscriber) CREATE SUBSCRIPTION user_subscription CONNECTION 'host=publisher_host dbname=mydb user=repuser' PUBLICATION user_changes;
# Python consumer using psycopg2 import psycopg2 import json def consume_logical_changes(): conn = psycopg2.connect( host='localhost', database='mydb', user='repuser' ) cur = conn.cursor() # Start logical replication cur.start_replication_expert( slot_name='user_slot', decode=True, options={'publication_names': 'user_changes'} ) def process_msg(msg): print(f"Change: {msg.payload}") msg.cursor.send_feedback(flush_lsn=msg.data_start) cur.consume_stream(process_msg)

โœ… Logical Replication Advantages

  • Native PostgreSQL feature
  • Asynchronous processing
  • Row-level filtering possible
  • Good performance characteristics

โš ๏ธ Limitations

  • Requires PostgreSQL 10+
  • Complex setup and configuration
  • Limited to table-level granularity
  • Requires careful slot management

LISTEN/NOTIFY for Simple Change Notifications

-- Create notification function CREATE OR REPLACE FUNCTION notify_user_change() RETURNS TRIGGER AS $$ BEGIN PERFORM pg_notify( 'user_changes', json_build_object( 'operation', TG_OP, 'table', TG_TABLE_NAME, 'data', CASE WHEN TG_OP = 'DELETE' THEN row_to_json(OLD) ELSE row_to_json(NEW) END )::text ); RETURN COALESCE(NEW, OLD); END; $$ LANGUAGE plpgsql; -- Create trigger CREATE TRIGGER user_notify_trigger AFTER INSERT OR UPDATE OR DELETE ON users FOR EACH ROW EXECUTE FUNCTION notify_user_change();
// Node.js listener const { Client } = require('pg'); const client = new Client({ connectionString: 'postgresql://user:pass@localhost/mydb' }); client.connect(); client.query('LISTEN user_changes'); client.on('notification', (msg) => { const change = JSON.parse(msg.payload); console.log('User change:', change); // Process the change handleUserChange(change); });

โœ… LISTEN/NOTIFY Advantages

  • Very lightweight
  • Real-time notifications
  • Simple to implement
  • Low overhead

โš ๏ธ Limitations

  • Limited payload size (8KB)
  • No persistence - lost on disconnect
  • No guaranteed delivery
  • Single database only

WAL-based Tools (Debezium, etc.)

# Debezium PostgreSQL connector configuration name: postgres-connector config: connector.class: io.debezium.connector.postgresql.PostgresConnector database.hostname: localhost database.port: 5432 database.user: debezium database.password: dbz database.dbname: mydb database.server.name: postgres-server # WAL configuration plugin.name: pgoutput slot.name: debezium publication.name: dbz_publication # Table whitelist table.include.list: public.users,public.orders # Output configuration transforms: route transforms.route.type: org.apache.kafka.connect.transforms.RegexRouter transforms.route.regex: ([^.]+)\\.([^.]+)\\.([^.]+) transforms.route.replacement: $3
-- Setup for Debezium -- Create publication for Debezium CREATE PUBLICATION dbz_publication FOR ALL TABLES; -- Create replication slot SELECT pg_create_logical_replication_slot('debezium', 'pgoutput'); -- Grant permissions GRANT SELECT ON ALL TABLES IN SCHEMA public TO debezium; GRANT USAGE ON SCHEMA public TO debezium;
100K+
Events/sec
99.9%
Delivery Guarantee
<100ms
End-to-end Latency

โœ… WAL-based Advantages

  • High throughput and scalability
  • Exactly-once delivery guarantees
  • Schema evolution support
  • Integration with Kafka ecosystem
  • No application code changes needed

โš ๏ธ Limitations

  • Complex setup and configuration
  • Requires additional infrastructure
  • External dependency management
  • Learning curve for operations

Detailed Comparison: MongoDB vs PostgreSQL CDC

M

MongoDB Change Streams

Architecture

  • Built into database core
  • Uses oplog (operations log)
  • Native aggregation pipeline
  • Distributed across shards

Scalability

  • Horizontal scaling with sharding
  • Cluster-wide change streams
  • Automatic shard management
  • High throughput (1M+ ops/sec)

Developer Experience

  • Single API for all change types
  • Rich filtering with aggregation
  • Resume tokens for fault tolerance
  • Multiple language drivers
P

PostgreSQL CDC Options

Architecture

  • Multiple approaches available
  • WAL-based or trigger-based
  • External tools often required
  • Master-slave replication model

Scalability

  • Vertical scaling primarily
  • Read replicas for scaling reads
  • Sharding requires external tools
  • High throughput with tuning

Developer Experience

  • Different APIs for different methods
  • SQL-based filtering
  • Manual fault tolerance setup
  • Mature ecosystem

Feature Comparison Matrix

Feature MongoDB Change Streams PostgreSQL (Best Option)
Real-time Streaming โœ… Native โš ๏ธ With tools
Fault Tolerance โœ… Resume tokens โš ๏ธ Manual setup
Filtering Capabilities โœ… Aggregation pipeline โœ… SQL WHERE clauses
Horizontal Scaling โœ… Native sharding โŒ External solutions
Setup Complexity โœ… Simple โŒ Complex
Performance Impact โœ… Minimal โš ๏ธ Varies by method
Multi-database Support โœ… Cluster-wide โŒ Single database
Operational Overhead โœ… Low โŒ High

Interactive Change Data Capture Demo

๐Ÿš€ MongoDB Change Streams Simulator

Experience how MongoDB Change Streams work in real-time. Click the buttons below to simulate different database operations.

Change stream output will appear here...\n\nClick "Start Change Stream" to begin monitoring changes.

โšก Performance Comparison

See how different CDC approaches compare in terms of latency and throughput.

Performance test results will appear here...\n\nClick a test button to simulate performance metrics.

๐Ÿ” What This Demo Shows

This interactive demo simulates the real-world behavior of change data capture systems. In production:

  • MongoDB Change Streams provide sub-10ms latency for most operations
  • Resume tokens allow applications to recover from exactly where they left off
  • Aggregation pipeline filtering reduces network traffic and processing overhead
  • Cluster-wide streams automatically handle sharding and replica set changes

Performance Analysis & Benchmarks

2.5ms
MongoDB Change Stream Latency
1.2M
Operations/second
99.99%
Uptime
5GB
Max Throughput/hour

Latency Comparison

MongoDB Change Streams

  • Insert latency: 1-3ms
  • Update latency: 2-5ms
  • Delete latency: 1-2ms
  • Network overhead: Minimal
  • CPU impact: <5%

PostgreSQL Methods

  • Triggers: 10-50ms (synchronous)
  • Logical replication: 5-20ms
  • LISTEN/NOTIFY: 1-5ms
  • WAL-based tools: 10-100ms
  • CPU impact: 10-30%

Scaling Characteristics

๐Ÿ“ˆ

Throughput Scaling

MongoDB: Linear scaling with sharding. Each shard can handle 100K+ ops/sec independently.

PostgreSQL: Vertical scaling primarily. Triggers can become bottleneck at high write volumes.

๐Ÿ”„

Connection Scaling

MongoDB: Thousands of concurrent change streams with minimal overhead.

PostgreSQL: Limited by connection pool and replication slot management.

โšก

Memory Usage

MongoDB: Constant memory usage regardless of change volume.

PostgreSQL: Memory usage varies significantly by CDC method chosen.

๐Ÿ›ก๏ธ

Reliability

MongoDB: Built-in resume capabilities, automatic failover handling.

PostgreSQL: Reliability depends on external tooling and configuration.

๐ŸŽฏ Performance Summary

MongoDB Change Streams consistently deliver superior performance characteristics due to their native integration with the database engine. The oplog-based approach provides predictable latency and scales horizontally without additional complexity.

Recommendations & Best Practices

๐Ÿ† When to Choose MongoDB Change Streams

MongoDB Change Streams are the optimal choice for modern applications requiring real-time data synchronization, event-driven architectures, and scalable change data capture.

๐Ÿš€

Ideal Use Cases

  • Real-time analytics dashboards
  • Event-driven microservices
  • Data synchronization across systems
  • Audit logging and compliance
  • Cache invalidation strategies
  • Live collaborative applications
โš™๏ธ

Implementation Best Practices

  • Use aggregation pipelines for filtering
  • Implement proper error handling
  • Store resume tokens for fault tolerance
  • Monitor oplog size and retention
  • Use appropriate batch sizes
  • Test failover scenarios
๐Ÿ“Š

Monitoring & Operations

  • Track change stream lag metrics
  • Monitor memory and CPU usage
  • Set up alerting for stream failures
  • Regular resume token cleanup
  • Capacity planning for growth
  • Performance baseline establishment
๐Ÿ”ง

Advanced Configurations

  • Custom aggregation stages
  • Multi-collection watching
  • Cluster-wide change detection
  • Time-based filtering
  • Full document lookup optimization
  • Shard key considerations

Architecture Decision Framework

Choose MongoDB When:

  • Building new applications
  • Requiring horizontal scalability
  • Need sub-10ms change latency
  • Want minimal operational overhead
  • Implementing event-driven architecture
  • Working with document/JSON data
  • Need cluster-wide change detection

Consider PostgreSQL When:

  • Existing PostgreSQL infrastructure
  • Strong ACID requirements
  • Complex relational queries needed
  • Team expertise in SQL/PostgreSQL
  • Regulatory compliance requirements
  • Budget constraints (open source)
  • Simple change detection needs

๐ŸŽฏ Key Takeaways

  1. MongoDB Change Streams provide the most comprehensive, scalable, and developer-friendly CDC solution available today
  2. Native integration eliminates the complexity and operational overhead of external CDC tools
  3. Real-time performance with sub-10ms latency makes MongoDB ideal for modern reactive applications
  4. Horizontal scaling with sharding ensures the solution grows with your business needs
  5. Resume tokens and fault tolerance features provide enterprise-grade reliability
// Production-ready Change Stream implementation const mongoose = require('mongoose'); class ChangeStreamManager { constructor(uri, options = {}) { this.uri = uri; this.options = { resumeAfter: null, fullDocument: 'updateLookup', maxAwaitTimeMS: 1000, ...options }; this.changeStream = null; this.isConnected = false; } async connect() { try { await mongoose.connect(this.uri); this.isConnected = true; console.log('Connected to MongoDB'); } catch (error) { console.error('Connection failed:', error); throw error; } } startWatching(collection, pipeline = [], handler) { if (!this.isConnected) { throw new Error('Not connected to database'); } const options = { ...this.options }; this.changeStream = mongoose.connection.db .collection(collection) .watch(pipeline, options); this.changeStream.on('change', (change) => { // Store resume token for fault tolerance this.options.resumeAfter = change._id; // Handle the change handler(change); }); this.changeStream.on('error', (error) => { console.error('Change stream error:', error); // Implement retry logic setTimeout(() => { this.startWatching(collection, pipeline, handler); }, 5000); }); } stop() { if (this.changeStream) { this.changeStream.close(); this.changeStream = null; } } } // Usage example const manager = new ChangeStreamManager('mongodb://localhost:27017/myapp'); await manager.connect(); manager.startWatching('users', [ { $match: { 'fullDocument.active': true } } ], (change) => { console.log('User change detected:', change.operationType); // Implement your business logic here });