<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Ny Hasinavalona Randriantsarafara]]></title><description><![CDATA[Ny Hasinavalona Randriantsarafara]]></description><link>https://blog.nyhasinavalona.com</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1770939686209/67823c21-b32e-49c1-a268-37ded5ffca43.png</url><title>Ny Hasinavalona Randriantsarafara</title><link>https://blog.nyhasinavalona.com</link></image><generator>RSS for Node</generator><lastBuildDate>Wed, 22 Apr 2026 18:03:30 GMT</lastBuildDate><atom:link href="https://blog.nyhasinavalona.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[DynamoDB to Aurora PostgreSQL: Migrating 43M Records with a Resumable CLI Tool]]></title><description><![CDATA[Migrated 43 million records from DynamoDB to Aurora PostgreSQL using a resumable CLI migration tool built in Node.js (TypeScript).

Final runtime: 2h15m

Rows inserted: 12.2M

Errors: 0


Idempotent design, file-level checkpointing, and predictable r...]]></description><link>https://blog.nyhasinavalona.com/dynamodb-to-aurora-postgresql-migrating-43m-records-with-a-resumable-cli-tool</link><guid isPermaLink="true">https://blog.nyhasinavalona.com/dynamodb-to-aurora-postgresql-migrating-43m-records-with-a-resumable-cli-tool</guid><category><![CDATA[DynamoDB]]></category><category><![CDATA[PostgreSQL]]></category><category><![CDATA[AWS]]></category><category><![CDATA[System Design]]></category><category><![CDATA[backend]]></category><category><![CDATA[idempotency]]></category><dc:creator><![CDATA[Ny Hasinavalona Randriantsarafara]]></dc:creator><pubDate>Mon, 16 Feb 2026 15:30:31 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1771193449996/ee7b584f-f387-4e83-be68-2faa6ddfa26b.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Migrated <strong>43 million records</strong> from DynamoDB to Aurora PostgreSQL using a resumable CLI migration tool built in Node.js (TypeScript).</p>
<ul>
<li><p><strong>Final runtime:</strong> 2h15m</p>
</li>
<li><p><strong>Rows inserted:</strong> 12.2M</p>
</li>
<li><p><strong>Errors:</strong> 0</p>
</li>
</ul>
<p>Idempotent design, file-level checkpointing, and predictable restart behavior made the system reliable under failure.</p>
<p>No scaling miracle. Just calm engineering.</p>
<hr />
<h2 id="heading-who-this-is-for">Who This Is For</h2>
<p>This guide is for engineers who need to migrate large datasets between databases and want a system that handles failure gracefully. You'll find it useful if you've dealt with partial writes, crashed processes, or the anxiety of restarting a migration at 3am.</p>
<hr />
<h2 id="heading-context-large-scale-dynamodb-to-aurora-postgresql-migration">Context: Large-Scale DynamoDB to Aurora PostgreSQL Migration</h2>
<p>I've had to run this kind of migration more than once.</p>
<p>Data moves. Architectures evolve. What was "good enough" two years ago often isn't anymore.</p>
<p>Eventually someone says: <em>"We need to migrate this."</em></p>
<p>I don't decide that part. But I'm usually the one responsible for making sure it finishes, doesn't duplicate data, and doesn't explode at 3am.</p>
<h3 id="heading-migration-context">Migration Context</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Property</td><td>Value</td></tr>
</thead>
<tbody>
<tr>
<td>Source</td><td>DynamoDB</td></tr>
<tr>
<td>Target</td><td>Aurora PostgreSQL</td></tr>
<tr>
<td>Table size</td><td>43,378,852 items</td></tr>
<tr>
<td>Data size</td><td>~8.3 GB</td></tr>
<tr>
<td>Export format</td><td>DynamoDB Export to S3</td></tr>
</tbody>
</table>
</div><p>The export produced 16 gzipped JSON files, each between 150–166 MB.</p>
<p>On paper: Read → Transform → Insert.</p>
<p>In practice, data migration is a reliability problem.</p>
<hr />
<h2 id="heading-dynamodb-to-postgresql-migration-challenges">DynamoDB to PostgreSQL Migration Challenges</h2>
<p>Long-running migrations fail in many ways:</p>
<ul>
<li><p>Process crash</p>
</li>
<li><p>Partial writes</p>
</li>
<li><p>Network interruption</p>
</li>
<li><p>DB timeouts</p>
</li>
<li><p>Infrastructure restart</p>
</li>
<li><p>Malformed rows</p>
</li>
</ul>
<p>The real questions become:</p>
<ul>
<li><p>What was written?</p>
</li>
<li><p>What wasn't?</p>
</li>
<li><p>Can I restart safely?</p>
</li>
<li><p>Will I duplicate data?</p>
</li>
</ul>
<p>After dealing with this uncertainty multiple times, I stopped treating migration as a scripting problem. <strong>It's a failure-handling problem.</strong></p>
<hr />
<h2 id="heading-why-aws-lambda-failed-for-long-running-data-migration">Why AWS Lambda Failed for Long-Running Data Migration</h2>
<p>My first attempt used AWS Lambda. It looked clean: stateless execution, easy scaling, no servers, chunk-based processing.</p>
<p>But migrations run long. Very long.</p>
<h3 id="heading-problem-30-minute-timeout">Problem: 30-Minute Timeout</h3>
<p>Lambda's timeout forced me to build logic to detect near-timeout, persist checkpoint, and reinvoke itself.</p>
<p>Technically possible. Operationally? Messy.</p>
<h3 id="heading-operational-issues">Operational Issues</h3>
<ul>
<li><p>Logs fragmented across executions</p>
</li>
<li><p>Difficult to understand global progress</p>
</li>
<li><p>Hard to correlate runs</p>
</li>
<li><p>CloudWatch streams everywhere</p>
</li>
<li><p>Monitoring became painful</p>
</li>
<li><p>More orchestration logic than migration logic</p>
</li>
</ul>
<p>Instead of solving data migration, I was building a mini scheduler. That's clever in the wrong direction.</p>
<p>So I killed it.</p>
<hr />
<h2 id="heading-final-approach-a-boring-controlled-cli-migration-tool">Final Approach: A Boring, Controlled CLI Migration Tool</h2>
<p>I switched to a long-running CLI process in Node.js (TypeScript).</p>
<p>No reinvocation logic. No distributed coordination. No orchestration complexity.</p>
<p>Just: Stream file → Parse → Transform → Batch → Upsert → Log → Repeat.</p>
<p>Sometimes boring is better.</p>
<hr />
<h2 id="heading-designing-an-idempotent-migration-system">Designing an Idempotent Migration System</h2>
<p>Before writing the tool, I defined one rule:</p>
<blockquote>
<p><strong>If I restart the migration from the beginning, the final database state must be identical.</strong></p>
</blockquote>
<p>This simplified everything. Instead of trying to eliminate duplicate work entirely, I made duplicate work harmless.</p>
<h3 id="heading-implementation">Implementation</h3>
<p>The core of idempotency relies on PostgreSQL's <a target="_blank" href="https://www.postgresql.org/docs/current/sql-insert.html#SQL-ON-CONFLICT"><code>ON CONFLICT</code> clause</a>:</p>
<p>sql</p>
<pre><code class="lang-sql"><span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> target_table (<span class="hljs-keyword">id</span>, <span class="hljs-keyword">data</span>, updated_at)
<span class="hljs-keyword">VALUES</span> ($<span class="hljs-number">1</span>, $<span class="hljs-number">2</span>, $<span class="hljs-number">3</span>)
<span class="hljs-keyword">ON</span> CONFLICT (<span class="hljs-keyword">id</span>) <span class="hljs-keyword">DO</span> <span class="hljs-keyword">UPDATE</span> <span class="hljs-keyword">SET</span>
  <span class="hljs-keyword">data</span> = EXCLUDED.data,
  updated_at = EXCLUDED.updated_at;
</code></pre>
<p>Key elements:</p>
<ul>
<li><p><strong>Deterministic primary key</strong> derived from source ID</p>
</li>
<li><p><strong>PostgreSQL</strong> <code>ON CONFLICT DO UPDATE</code> for upsert-based writes</p>
</li>
<li><p><strong>No external state</strong> required to determine if a row was already processed</p>
</li>
</ul>
<p>That means:</p>
<ul>
<li><p>Retrying a batch is safe</p>
</li>
<li><p>Restarting the whole process is safe</p>
</li>
<li><p>Reprocessing a file is safe</p>
</li>
</ul>
<p>Even if something runs twice, the final state is consistent. That removed most of the stress.</p>
<hr />
<h2 id="heading-checkpoint-strategy-file-level-instead-of-row-level">Checkpoint Strategy: File-Level Instead of Row-Level</h2>
<p>I considered row-level checkpointing. It adds offset storage, metadata persistence, edge-case complexity, and more reasoning under failure.</p>
<p>Instead, I chose <strong>file-level checkpointing</strong>.</p>
<p>Each S3 file is:</p>
<ol>
<li><p>Processed entirely</p>
</li>
<li><p>Marked complete only after success</p>
</li>
</ol>
<p>If the process crashes halfway through a file, the file is reprocessed from the beginning. Because the system is idempotent, this is safe.</p>
<p>Less precise. Much simpler. Much more predictable.</p>
<hr />
<h2 id="heading-handling-bad-records-during-bulk-inserts">Handling Bad Records During Bulk Inserts</h2>
<p>Large DynamoDB exports always contain problematic records.</p>
<p>If a batch insert fails:</p>
<ol>
<li><p>Split batch in half</p>
</li>
<li><p>Retry each half</p>
</li>
<li><p>Keep splitting</p>
</li>
<li><p>Isolate the bad record</p>
</li>
<li><p>Skip it</p>
</li>
</ol>
<p>Everything else continues. <strong>One bad row should not stop hours of work.</strong></p>
<hr />
<h2 id="heading-streaming-strategy-for-large-json-exports">Streaming Strategy for Large JSON Exports</h2>
<p>Each <code>.json.gz</code> file is:</p>
<ul>
<li><p>Streamed directly from S3</p>
</li>
<li><p>Decompressed on the fly</p>
</li>
<li><p>Parsed line by line</p>
</li>
<li><p>Inserted in batches of 500</p>
</li>
</ul>
<h3 id="heading-why-not-parallelize">Why Not Parallelize?</h3>
<p>I intentionally used single worker, sequential file processing.</p>
<p>Yes, parallel execution would increase throughput. But it complicates restart logic, pressure control, and Aurora resource stability.</p>
<p><strong>Predictability &gt; maximum throughput.</strong></p>
<hr />
<h2 id="heading-real-migration-execution-metrics">Real Migration Execution Metrics</h2>
<p>Here's one full production run:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Metric</td><td>Value</td></tr>
</thead>
<tbody>
<tr>
<td>Duration</td><td>2h15m</td></tr>
<tr>
<td>Batch size</td><td>500</td></tr>
<tr>
<td>Files processed</td><td>16 / 16</td></tr>
<tr>
<td>Items scanned</td><td>37,783,238</td></tr>
<tr>
<td>Rows inserted</td><td>12,205,544</td></tr>
<tr>
<td>Skipped</td><td>0</td></tr>
<tr>
<td>Errors</td><td>0</td></tr>
</tbody>
</table>
</div><h3 id="heading-why-37m-scanned-but-only-122m-inserted">Why 37M Scanned but Only 12.2M Inserted?</h3>
<p>Not every DynamoDB item mapped directly to a PostgreSQL row. The source data used a single-table design where multiple entity types (users, sessions, events) lived in one table with composite keys.</p>
<p>For the target relational model:</p>
<ul>
<li><p>Some rows were grouped (e.g., denormalized event sequences → single aggregate row)</p>
</li>
<li><p>Some entity types weren't needed in PostgreSQL</p>
</li>
<li><p>Only target-valid entities were inserted</p>
</li>
</ul>
<p>Everything else was ignored by design. No errors. No manual cleanup afterward.</p>
<hr />
<h2 id="heading-aurora-postgresql-performance-during-migration">Aurora PostgreSQL Performance During Migration</h2>
<p>I monitored <a target="_blank" href="https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Monitoring.html">Aurora CloudWatch metrics</a> throughout the run.</p>
<h3 id="heading-observed-behavior">Observed Behavior</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Metric</td><td>Value</td></tr>
</thead>
<tbody>
<tr>
<td>Commit throughput</td><td>~3–6 commits/sec</td></tr>
<tr>
<td>Commit latency</td><td>~200–300µs</td></tr>
<tr>
<td>Replica lag</td><td>~15–25 seconds during peak bursts</td></tr>
<tr>
<td>CPU credit usage</td><td>Stable</td></tr>
<tr>
<td>Buffer cache hit ratio</td><td>~100%</td></tr>
</tbody>
</table>
</div><p>Nothing exploded. No replication issues. No WAL pressure disaster.</p>
<p>Batch size of 500 turned out to be a good balance: high throughput with controlled database pressure.</p>
<hr />
<h2 id="heading-migration-architecture-structure">Migration Architecture Structure</h2>
<pre><code class="lang-plaintext">┌─────────────────────────────────────────────────────────┐
│                     Core Engine                         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐      │
│  │ Checkpoints │  │   Retries   │  │Orchestration│      │
│  └─────────────┘  └─────────────┘  └─────────────┘      │
└─────────────────────────────────────────────────────────┘
          │                                    │
          ▼                                    ▼
┌─────────────────────┐          ┌─────────────────────────┐
│   Source Dialect    │          │     Target Dialect      │
│  ─────────────────  │          │  ─────────────────────  │
│  DynamoDB S3 Export │          │  PostgreSQL Abstraction │
│  Reader             │          │  Layer                  │
└─────────────────────┘          └─────────────────────────┘
          │                                    │
          └──────────────┬─────────────────────┘
                         ▼
              ┌─────────────────────┐
              │    Profile Layer    │
              │  ─────────────────  │
              │  Domain-specific    │
              │  transformation     │
              │  logic              │
              └─────────────────────┘
</code></pre>
<p>Clear separation of concerns:</p>
<ul>
<li><p><strong>Core Engine:</strong> Checkpoints, retries, orchestration logic</p>
</li>
<li><p><strong>Source Dialect:</strong> DynamoDB S3 export reader</p>
</li>
<li><p><strong>Target Dialect:</strong> PostgreSQL abstraction layer</p>
</li>
<li><p><strong>Profile Layer:</strong> Domain-specific transformation logic</p>
</li>
</ul>
<p>The engine knows nothing about business logic. The business logic knows nothing about checkpoint mechanics. That makes reuse possible.</p>
<hr />
<h2 id="heading-failure-scenarios-considered">Failure Scenarios Considered</h2>
<p>The system was designed assuming failure is normal:</p>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Failure</td><td>Recovery Behavior</td></tr>
</thead>
<tbody>
<tr>
<td>Process crash</td><td>Restart from last incomplete file</td></tr>
<tr>
<td>EC2 reboot</td><td>Same as process crash</td></tr>
<tr>
<td>Network interruption</td><td>Batch retry with exponential backoff</td></tr>
<tr>
<td>Database timeout</td><td>Connection pool refresh + retry</td></tr>
<tr>
<td>Malformed row</td><td>Isolate via binary search, skip, continue</td></tr>
<tr>
<td>Manual stop</td><td>Resume from checkpoint</td></tr>
</tbody>
</table>
</div><p>In every case, restart behavior is deterministic. That clarity mattered more than optimization.</p>
<hr />
<h2 id="heading-tradeoffs-accepted">Tradeoffs Accepted</h2>
<ul>
<li><p>File-level checkpoints instead of row-level</p>
</li>
<li><p>Single worker instead of parallel</p>
</li>
<li><p>CLI instead of serverless</p>
</li>
<li><p>Continue-on-error for isolated bad records</p>
</li>
</ul>
<p>Each decision favors <strong>recovery safety over cleverness</strong>.</p>
<p>Migration systems are not where I try to be fancy. They are where I try to sleep well.</p>
<hr />
<h2 id="heading-what-i-would-improve">What I Would Improve</h2>
<ul>
<li><p>Failure-injection testing (e.g., <a target="_blank" href="https://netflix.github.io/chaosmonkey/">Chaos Monkey</a> style)</p>
</li>
<li><p>Better metrics export (Prometheus/Grafana integration)</p>
</li>
<li><p>Possibly safe parallelism with bounded concurrency</p>
</li>
</ul>
<p>The core engine does not need rewriting.</p>
<hr />
<h2 id="heading-key-lessons-from-large-scale-data-migration">Key Lessons from Large-Scale Data Migration</h2>
<p>The hardest part of migration is not moving data. It's designing a system that behaves calmly when things go wrong.</p>
<p>This reinforced:</p>
<ul>
<li><p><strong>Idempotency &gt; micro-optimization</strong></p>
</li>
<li><p><strong>Deterministic recovery &gt; precision checkpointing</strong></p>
</li>
<li><p><strong>Simple boundaries &gt; clever orchestration</strong></p>
</li>
<li><p><strong>Observability &gt; elegance</strong></p>
</li>
</ul>
<p>It's not flashy. But it's something I would trust to run overnight.</p>
<p>And that matters more.</p>
<hr />
<h2 id="heading-source-code">Source Code</h2>
<p>If you're in a similar situation and want to adapt it:</p>
<p><a target="_blank" href="https://github.com/ny-randriantsarafara/dataflux"><strong>github.com/ny-randriantsarafara/dataflux</strong></a></p>
<hr />
<p><em>Have questions or built something similar? I'd love to hear about your approach.</em></p>
]]></content:encoded></item><item><title><![CDATA[A brief introduction into the art of software craftsmanship]]></title><description><![CDATA[I had a conversation today where someone asked me, "What is software craftsmanship ?". The discussion went far, inspiring me to write this post and share my thoughts on the topic.
To explain it in simple terms, I often use an analogy. Imagine creatin...]]></description><link>https://blog.nyhasinavalona.com/a-brief-introduction-into-the-art-of-software-craftsmanship</link><guid isPermaLink="true">https://blog.nyhasinavalona.com/a-brief-introduction-into-the-art-of-software-craftsmanship</guid><category><![CDATA[software craftsmanship]]></category><category><![CDATA[introduction]]></category><category><![CDATA[Code Quality]]></category><category><![CDATA[software development]]></category><category><![CDATA[software culture]]></category><dc:creator><![CDATA[Ny Hasinavalona Randriantsarafara]]></dc:creator><pubDate>Mon, 13 Nov 2023 09:00:12 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/BfrQnKBulYQ/upload/8dfae30091eef4b211d933f25adf4ead.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I had a conversation today where someone asked me, "What is software craftsmanship ?". The discussion went far, inspiring me to write this post and share my thoughts on the topic.</p>
<p>To explain it in simple terms, I often use an analogy. Imagine creating a delicious croissant that your computer would relish, exclaiming, "C'est trop bon !".</p>
<p>You might be wondering, "Is he simply saying that software craftsmanship is all about writing good code ?" Yes, that's what I mean. Then you might ask me, "What exactly constitutes good code ?". What makes code "good" is akin to what makes a croissant delicious – it depends on the baker, and individual tastes, and the same goes for writing good code.</p>
<p>At its core, <strong>software craftsmanship is an approach to software development that emphasizes the developer's skills, dedication, and professionalism</strong>. It's not merely about writing code, but about crafting high-quality code. <strong>Software craftsmanship is a commitment to quality</strong>. It's about paying attention to details and delivering software that's reliable, maintainable, and efficient. It's about promoting a culture that encourages continuous learning, collaboration, and pride in delivering valuable code.</p>
<p><strong>Quality in code means it's reliable, maintainable, and efficient</strong>. <strong>The key ingredient for me is simplicity</strong>. <strong>It's about writing clean code, which is easy to understand and well-organized</strong>. This involves adhering to fundamental principles, like the "single responsibility" pattern, keeping functions short and straightforward, and using meaningful variable names instead of vague ones like "value" or "a". It also involves crafting modular, independent, and reusable components that make the code easier to maintain and enhance. <strong>Having a comprehensive set of automated tests is also crucial to identify issues early in the software development process</strong>.</p>
<p><strong>But software craftsmanship isn't just about the code. It's about the process and the culture</strong>. <strong>It's about embracing continuous learning, as technology is always changing and evolving</strong>. It's about fostering collaboration, as the best solutions often come from collective effort and shared learning. It's about taking pride in your work because you're not just writing code, you're creating something of value.</p>
<p><strong>So, when someone asks me, "What is software craftsmanship?", I tell them it's a commitment to quality</strong>. It's about investing the time and effort to do things right, to learn and grow, and to create the best product possible. It's about being a professional in every sense of the word.</p>
<p>That is my point of view about software craftsmanship. What's yours? I hope you'll enjoy this perspective, and it marks the beginning of a long series exploring the art and nuances of software craftsmanship.</p>
]]></content:encoded></item></channel></rss>