Building WordPress Multilingual Plugin with Claude Code

I installed Polylang on our production blog and immediately regretted it. So I built our own multilingual WordPress plugin in two days with Claude Code.

Some context: we wanted to start publishing Portuguese versions of our English posts on the Codeminer42 engineering blog. Polylang seemed like the obvious choice, but the moment I activated it, permalinks broke. Posts returned 404. The homepage died. I had to go to Settings > Permalinks and click “Save Changes” (without changing anything) just to get the site working again. Every reactivation, the same thing. On top of that, AI translation is locked behind their premium tier, and asking authors to manually translate 2000-word technical articles wasn’t going to scale.

So I opened Claude Code and started building.

The Plan

I opened Claude Code and described what I wanted:

A WordPress plugin for multilingual posts
AI-powered translation (Ollama, Anthropic, Gemini)
No custom database tables – use WordPress taxonomy and post meta
Language switcher widget with flags
URL prefixes (/pt-br/my-post/)
hreflang SEO tags
REST API with language filtering
Support for both Gutenberg and Classic Editor

Claude Code broke it into five phases. We started writing tests first.

The Development Environment

Before anything, a note on tooling. WordPress plugin development has a testing story that most people don't know about. wp-env is an official tool that spins up a full WordPress installation inside Docker – with a separate test instance that has PHPUnit and the WordPress test framework pre-configured.

Running tests means executing PHPUnit inside the test container:

npm run wp-env start
npm run test:unit

This matters because our test suite uses WordPress factories ($this->factory()->post->create()), real taxonomy queries, real WP_Query objects. Not mocks – actual WordPress internals running against a test database. That's how we caught bugs that mocks would have missed.

Day 1: The Architecture

The first commit landed the core architecture. The key design decision was no custom tables. Everything uses WordPress primitives:

A cm_language taxonomy to tag posts with their language
_cm_translation_group post meta with a shared UUID to link translations together
A provider pattern for AI translation backends

14 classes, 1,581 lines of PHP, zero custom database tables.

Why no custom tables? WordPress plugins that create their own tables are a pain to maintain. Migrations, uninstall cleanup, multisite compatibility – it’s a whole category of bugs. The taxonomy system already does what we need: tag posts with metadata and query by it. The tradeoff is performance at scale – a tax_query with NOT EXISTS is slower than a direct table lookup. For a blog with ~500 posts, that’s irrelevant. For a site with 100,000 posts, we’d need to revisit. We chose simplicity over premature optimization.

About the AI providers: Ollama, Anthropic, and Gemini. Those are the ones I use. OpenAI would work too, but I don’t use it day-to-day, so it didn’t make the first release. The plugin will be released publicly after this test period on our blog, and I’ll add more providers then.

Ollama deserves its own paragraph. You can run models locally for free during development, but Ollama Cloud also gives you access to models like Minimax and Kimi K2.5 that won’t fit on a laptop. The Qwen series is great for translation too. So you get local dev without API costs and access to bigger models when you need them.

The second commit added 90 PHPUnit tests. Tests first, implementation second. Claude Code followed TDD – write the test, watch it fail, implement the minimum code to make it pass. Here's what a typical test looks like:

public function test_default_language_query_includes_unlabeled_posts() {
    $english_post = $this->factory()->post->create();
    $this->assign_language($english_post, 'en');

    $unlabeled_post = $this->factory()->post->create();

    $query = new WP_Query(['post_type' => 'post']);
    $this->multilingual->filter_by_language($query);

    $this->assertContains($english_post, wp_list_pluck($query->posts, 'ID'));
    $this->assertContains($unlabeled_post, wp_list_pluck($query->posts, 'ID'));
}

This test verifies that the default language query uses an OR condition: show posts that are tagged English OR have no language at all. That second condition is critical – when you install the plugin on a blog with 450 existing posts, none of them have a language assigned yet. Without NOT EXISTS, they'd all disappear from the homepage.

By the end of day 1, we had:

Language configuration with default language support
Translation linking (bidirectional, UUID-based)
All three AI translation providers with HTTP mocking in tests
URL rewriting with language prefixes
Language switcher (widget, shortcode, Gutenberg block)
hreflang tag generation
REST API with ?lang= filtering
90 passing tests

Day 2: The Classic Editor and Real-World Bugs

Day 2 was about making it work in the real world. Our blog uses the Classic Editor with Markdown (via WP Githuber MD), not Gutenberg. So I asked Claude Code to add a Classic Editor meta box with the same functionality as the Gutenberg sidebar panel.

Both UIs call the same REST API endpoints. Same result, different presentation. The meta box uses vanilla JS; the sidebar uses React. The REST API doesn't care:

// Both interfaces make identical API calls
fetch('/wp-json/cm-multilingual/v1/translate', {
  method: 'POST',
  body: JSON.stringify({
    post_id: currentPostId,
    target_language: 'pt-br'
  })
})

Then came the real-world bugs. When I deployed to production, I discovered the query filter was using the language code (pt_BR) instead of the taxonomy slug (pt-br) in the taxonomy query. This code:

// Wrong - pt_BR doesn't match the taxonomy term slug
$tax_query[] = [
    'taxonomy' => 'cm_language',
    'field' => 'slug',
    'terms' => [$language_code], // pt_BR
];

Should have been:

// Right - resolve the slug first
$language_term = get_term_by('slug', $language_slug, 'cm_language');
$tax_query[] = [
    'taxonomy' => 'cm_language',
    'field' => 'term_id',
    'terms' => [$language_term->term_id],
];

A subtle bug that only showed up with non-ASCII language codes like pt_BR (slug: pt-br). English worked fine because the code and slug were both en.

The biggest lesson came later, during production setup. Our homepage uses WordPress Query Loop blocks with a category filter (taxQuery: {"category": [21]}). My plugin was replacing the entire tax_query with the language filter – wiping the block's category filter. Dev Weekly posts started appearing on the homepage because the category exclusion was gone.

The fix: merge into the existing tax_query, never replace it:

// Get existing tax_query and merge our filter in
$existing_tax_query = $query->get('tax_query') ?: [];
$existing_tax_query[] = $language_filter;
$query->set('tax_query', $existing_tax_query);

This is the kind of bug you only find on a real site with real plugins and real block configurations. Tests against a clean WordPress install would never catch it.

The Test Suite

After all the production fixes, the final numbers:

Metric	Count
Test files	12
Test methods	156
Assertions	323
Line coverage	57% (903/1,581 lines)
Method coverage	56% (69/124 methods)

Coverage by class:

Class	Lines	Methods
CM_Translations	97.6%	81.8%
CM_Provider_Ollama	98.0%	83.3%
CM_Provider_Anthropic	97.7%	75.0%
CM_Provider_Gemini	97.8%	75.0%
CM_Translation_Provider	96.6%	66.7%
CM_REST_API	89.8%	66.7%
CM_Query	80.8%	62.5%
CM_Languages	100%	100%
CM_Language	100%	100%
CM_Switcher	70.0%	88.9%
CM_Links	45.6%	42.9%
CM_Meta_Box	57.8%	16.7%
CM_Admin	10.2%	16.7%
CM_Multilingual	6.5%	25.0%

The core logic (translations, providers, query filtering, REST API) is well covered. The admin UI and orchestrator classes are lower because they depend heavily on WordPress admin hooks that are hard to unit test. The providers are at ~98% because HTTP calls are mocked via WordPress's pre_http_request filter – we test the request construction and response parsing without hitting real APIs.

The AI Translation Flow

Here's how translation works:

Author clicks "Translate with AI" on an English post
Plugin acquires a lock (transient-based, 5 min TTL) to prevent duplicate translations
Plugin sends the title and content separately to the configured AI provider
Provider returns the translations
Plugin creates a new draft post with the translated content
Posts are linked bidirectionally via _cm_translation_group UUID
Categories, tags, co-authors (Co-Authors Plus), and featured image are copied

The prompt matters more than the provider you pick. Here’s what we actually send:

You are a professional translator. Translate the following text
from {source} to {target}. Preserve all HTML tags, markdown
formatting, and code blocks exactly as they are. Return only the
translated text, no explanations or notes.

CRITICAL: The translation must read like it was originally written
in {target} by a native speaker, not like a translated text.

That’s the short version. The full prompt also has formatting rules: don’t add em dashes that weren’t in the original, don’t use curly quotes, and preserve heading capitalization. And a list of language rules that reads like an anti-AI-writing checklist: don’t inflate importance (“is” stays “is”, not “serves as” or “stands as”), don’t add significance phrases (“testament to”, “underscores”), don’t use promotional language (“vibrant”, “groundbreaking”), don’t force the rule of three, don’t cycle synonyms.

The formatting rules are there because we’re a tech blog. Our posts have code blocks, inline backticks, Markdown headers, HTML embeds. Without those instructions, AI providers will “helpfully” convert your Markdown to HTML, merge your code blocks into paragraphs, or strip backticks from inline code. One bad translation breaks every code example in a 2000-word post.

The language rules solve a different problem. AI translation tends to produce text that reads like… AI translation. It inflates, hedges, and polishes. A post that says “this is broken” becomes “this represents a significant challenge.” Those rules keep the translated version from drifting away from what the author actually wrote.

What Claude Got Wrong

I promised an unfiltered story, so here's what didn't work.

The taxonomy slug bug. Claude used the language code (pt_BR) in taxonomy queries instead of the slug (pt-br). The tests passed because they used simple codes where the code and slug were the same. Only broke with Portuguese. This is the kind of edge case that only shows up in the real world, and it's the reason you deploy to a real site early.

Replacing tax_query instead of merging. Claude's first implementation of filter_by_language() did $query->set('tax_query', $new_array) – replacing the entire tax_query. On a clean wp-env install, this works fine. On production, where the homepage uses Query Loop blocks with category filters, it wiped the block's configuration. Dev Weekly posts started appearing everywhere. The fix was a one-line change (merge instead of replace), but finding it required inspecting the block editor's internal state on the live site.

OG meta tags. Claude's first approach was to output our own OG tags in wp_head. But AIOSEO outputs its tags afterward, and social crawlers use the last occurrence. Duplicate tags, wrong data. The second approach used AIOSEO's filter hooks, which worked for title and description but not for og:image (no filter exists for that). The third approach – output buffering on template_redirect to find-and-replace the image URL in the raw HTML – finally worked. Three iterations to get it right.

The pattern is consistent: Claude writes correct code for the test environment, but the test environment is too clean. Production has caching layers, SEO plugins, Query Loop blocks, cookies, CDN. Every production bug came from the gap between wp-env and the real site.

The Git Log

2 days to build, 1 day to ship to production:

March 10 - Initial commit: Core architecture, 90 tests
March 11 - Classic Editor support, provider abstractions
March 20 - Production deployment: slug fix, tax_query merge, OG tags

March 10-11 was the build. March 20 was the production deployment day – Classic Editor support, the slug bug, the tax_query merge, and the OG tags. The gap in between was intentional: I waited until I had a real use case (a Portuguese translation to publish) before deploying.

Key Takeaways

Deploy to production early. wp-env is great for TDD but it's too clean. The bugs that matter only show up on a real site with real plugins, real themes, and real caching.

Merge, don't replace. If your plugin touches tax_query, meta_query, or any WordPress query parameter, always get the existing value first and append to it. Other plugins and blocks depend on those parameters.

Test with curl, not just the browser. Social crawlers, search engines, and caching proxies see different HTML than your browser. curl -s URL -H 'User-Agent: facebookexternalhit/1.1' is your friend.

TDD works with AI. Write the test first, describe the expected behavior, let Claude implement. When something breaks in production, write a test that reproduces it before fixing. The test suite is your safety net for the next refactor.

The plugin is running on blog.codeminer42.com/pt-br/ right now. Every new post gets a Portuguese translation. We’re not doing bulk translations yet – the blog team is using it on new posts first, checking if the plugin feels right and if the translations are actually good before we go back and translate the catalog. Once they’re confident, we’ll pick the most recent posts and work backwards.

156 tests, two languages, zero custom tables. Here’s the latest one.

Thanks for reading!

We want to work with you. Check out our Services page!

Building WordPress Multilingual Plugin with Claude Code

The Plan

The Development Environment

Day 1: The Architecture

Day 2: The Classic Editor and Real-World Bugs

The Test Suite

The AI Translation Flow

What Claude Got Wrong

The Git Log

Key Takeaways

Related

Edy Silva

The Plan

The Development Environment

Day 1: The Architecture

Day 2: The Classic Editor and Real-World Bugs

The Test Suite

The AI Translation Flow

What Claude Got Wrong

The Git Log

Key Takeaways

Related

Edy Silva

You might also like

How To Structure Your Application Like An Artisan

CodeTips#2: Clean code in JavaScriptA beginner's approach

A beginner's approach

ActiveRecord Callback Chain: A Cool (and Weird) Edge Case

CodeTips#2: Clean code in JavaScript
A beginner's approach