AI Translation Notes | Lucida Blog

This did not begin in any especially formal way. I just wanted to make Lampson’s Hints for Computer System Design easier to read.

Over the last few days, I translated 11 classic computer science papers and essays into Chinese with Codex.

The word “translation” here does not mean pasting English into a box and copying out Chinese.

What I ended up with was a set of Chinese Markdown files that can live inside the blog’s /zh/paper/ section for long-term maintenance: page metadata, paragraph structure, code blocks, tables, citation anchors, English glosses on the first appearance of technical terms, plus Hugo build checks and browser verification.

The 11 pieces, in completion order, were:

Hints for Computer System Design: Hints for Computer System Design, Butler W. Lampson
The Rise of Worse Is Better: The Rise of “Worse Is Better”, Richard P. Gabriel
No Silver Bullet: No Silver Bullet: Essence and Accidents of Software Engineering, Frederick P. Brooks, Jr.
Computer Programming as an Art: Computer Programming as an Art, Donald E. Knuth
Reflections on Trusting Trust: Reflections on Trusting Trust, Ken Thompson
Programming as Theory Building: Programming as Theory Building, Peter Naur
The Bitter Lesson: The Bitter Lesson, Rich Sutton
The Unreasonable Effectiveness of Data: The Unreasonable Effectiveness of Data, Alon Halevy, Peter Norvig, Fernando Pereira
You and Your Research: You and Your Research, Richard W. Hamming
An Axiomatic Basis for Computer Programming: An Axiomatic Basis for Computer Programming, C. A. R. Hoare
The UNIX Time-Sharing System: The UNIX Time-Sharing System, Dennis M. Ritchie, Ken Thompson

Looking back now, “AI can translate” is the least interesting part. That statement is too crude. It only says a model can turn English sentences into Chinese sentences.

What is actually worth writing down is that every paper had a different kind of trouble. Codex did not crush them all with one prompt. It handled them step by step in the filesystem: ingest the input, clean the structure, translate, repair terminology, repair citations, rebuild tables, run checks, and preview inside the blog.

AI paper translation factory

I will go through them in order.

1. Hints for Computer System Design: from one experiment to a whole pipeline

The first paper was Butler Lampson’s Hints for Computer System Design.

It compresses Lampson’s system-building experience into a set of maxims about interfaces, completeness, performance, and fault tolerance. Many of the lines look like common sense until you actually try to build systems around them.

It became the prototype for the whole batch.

The initial problem was simple. I wanted to read this classic paper on system design, but the PDF was tiring. Twenty-seven pages, roughly 13,000 English words, with a lot of systems terminology, references, and one big diagram that threads the whole essay together.

The hard part was not “the English is difficult.” The hard part was the PDF.

PDF stores characters and positions on a page, not document structure. Once extracted, you get headers, footers, broken lines, page numbers, paragraphs split apart, and diagrams that are easy to mangle. If you translate that directly, all you have done is wrap bad structure in smoother language.

The first extracted Markdown was basically unreadable. I ended up making Codex stop and clean the source first: delete the headers and footers, stitch the paragraphs back together, then translate.

Figure 1 was essentially a text matrix, so it got rebuilt as a Markdown table instead of being left as an image. The references stayed in English, and the in-text citations were turned into navigable anchors.

More importantly, Codex started showing value beyond being a “translation tool.” It was not just emitting Chinese. It was maintaining a set of intermediate files in the working directory. source.md held the cleaned source text. translated.zh-CN.md held the translation. terms-candidates.tmp.md held terminology candidates. terms-annotation-report.tmp.md held the terminology annotation report.

After this paper, the rough workflow had emerged:

Clean the source first. Do not rush into translation.
References stay in English by default.
Technical terms get their own pass.
Citations must remain clickable.
The final gate is scripts, not “looks okay.”

The most important result of this paper was not the paper itself. It proved the work could be engineered.

2. The Rise of “Worse Is Better”: even short essays carry formatting noise

The second piece was Richard P. Gabriel’s The Rise of “Worse Is Better”.

It explains why “worse is better” can beat “the right thing” in the real world: systems that are simple, portable, and easy to spread often outrun designs that are more perfect but heavier.

At first I assumed it would be quick. It is much shorter than Hints for Computer System Design, so it looked like an easy one.

The first task was still cleaning formatting noise.

The source contained navigation residue from a PDF or web page: Previous, Up, Next. There were also hard line breaks, page-style breaks, PDF-style ``…’’ quotes, and extraction errors such as the-right-thing, must sacrificed, and Unix-namely.

These problems look tiny, but if you leave them alone, AI will happily translate the navigation buttons into the body. When readers see “previous page, up, next page,” they do not think “paper style.” They think the translation is contaminated with web residue.

The fix was direct: clean the source, then translate, then extract terminology.

Short essays do not get to skip source cleanup. The shorter the essay, the more obvious leftover noise becomes.

3. No Silver Bullet: Essence and Accidents of Software Engineering: footnotes are not footnotes, they are references

By the time I reached Frederick P. Brooks, Jr.’s No Silver Bullet, the trouble had changed shape.

The paper separates essential difficulty from accidental difficulty in software engineering. Every later wave of “new silver bullet” hype eventually has to circle back to Brooks’s question.

Broken PDF pipeline

Its main problem was not the two-column PDF. It was the citation structure.

The original paper does not have a standard references section. Citation details are scattered through footnotes in the main text. At first glance that looks like a layout difference, but it matters a lot on the web: the body has footnotes, the bottom has footnote definitions, Hugo has its own footnote treatment, and readers still expect paper-style references to be navigable in one place.

Codex first did the routine cleanup: remove duplicate headers, page-break markers, and page numbers; fix obvious OCR or extraction problems such as pipes and filters, bug-type frequencies, and data flow; rebuild Figure 1 from fixed-width text into a Markdown table.

Then it did something closer to editing than translation: it lifted the reference information buried in the footnotes into a proper ## References section, turned [^n] references in the body into [n](#ref-n), and added anchors to each reference entry.

Without that step, citations on the web page would remain footnotes that technically existed but were awkward to navigate. If the goal were only “translation,” the footnotes could have stayed as they were. But if the goal is a paper translation that belongs in the blog, then citations need to become a navigable structure. What Codex did here was not language conversion. It was document normalization.

The mechanical check at the end reported 12 reference anchors, 12 in-body citation links, 0 errors, and 0 warnings.

This was one of the moments where I started trusting the workflow. Not because the model was so smart, but because it could turn a broken structure like “reference data scattered through footnotes” into a structure that could actually be checked.

4. Computer Programming as an Art: two columns, Greek, and broken references

Knuth’s Computer Programming as an Art was when I really started to understand that old PDFs are not merely ugly.

This is Knuth’s Turing Award lecture. It argues that programming is both science and art, and preserves a judgment that today’s toolchains, metrics, and performance reviews often flatten out: a program is not finished just because it runs.

This paper enters truly troublesome old-paper territory.

The PDF had an extractable text layer, so OCR was not required. But the raw extraction in source.raw.md was unusable: the two text columns interleaved on the same lines, especially on the first and seventh pages. There were also footers, copyright blocks jammed into paragraphs, broken words, the Greek word τέχνη, and references mangled by OCR or extraction, including titles like Tile, Tire, and Ratioehlative.

The paper itself is also delicate. Knuth keeps discussing the English words art and science. If you translate every occurrence mechanically into their Chinese equivalents, some sentences lose the point that he is explicitly talking about those English words and their meanings.

So the handling had several stages: first cross-check the PDF against an external transcription, fix the obvious two-column extraction failures and broken references, then translate the body while preserving art and science where the essay is explicitly talking about the English terms themselves. Terminology handling also stopped being mechanical. The report explicitly noted that English glosses for art and science should remain sparse, otherwise the entire page would be interrupted by repetitive annotations.

This paper made one thing very clear: terminology management is not “the more the better.”

In technical translation, you preserve English to disambiguate, not to make the page look more professional. Terms that need annotation should be annotated. Foundational words that are discussed repeatedly often require restraint instead.

5. Reflections on Trusting Trust: code figures must be treated as code

Ken Thompson’s Reflections on Trusting Trust is short, but it cannot be translated casually.

The essay uses self-reproducing programs and compiler backdoors to show that clean source code does not automatically imply a trustworthy toolchain.

The hard part here is the code.

The PDF’s embedded text was extractable, but the two-column layout scrambled the paragraphs, and every code figure was damaged. In this paper, the code is not decoration. Self-reproducing programs, compiler escape handling, and the Trojan compiler all depend on code examples to carry the argument.

If the code breaks, the paper breaks.

Codex could not just translate OCR output. It had to reconstruct the C snippets against the scanned page images and preserve them as c code blocks. The explanatory material around Figure 1 was turned into a table. In the Chinese translation, these figures are explicitly treated as code because they are not ordinary illustrations.

There was also a subtle source issue. In the scan, the captions for Figures 2.1 and 2.2 appear reversed relative to the body logic. Codex did not silently “fix” that and move on. It recorded the inconsistency in the review notes: the final version follows the body logic, where 2.1 is the basic \n escape, 2.2 adds \v, and 2.3 uses decimal 11 to lead the compiler.

At that point Codex was not behaving like a translator anymore. It was behaving more like someone I would ask to repair a technical document. A normal translation tool would turn broken code into broken Chinese narration and quietly leave. Codex can treat code as code, captions as structure, and possible source mistakes as review artifacts.

Technical translation is not the act of scrubbing English into Chinese. Its first job is to protect the technical objects.

6. Programming as Theory Building: scans, bleed-through, and philosophical terms

Peter Naur’s Programming as Theory Building came with a different kind of trouble.

Naur argues that the essence of programming is not producing program text, but building a theory in the programmer’s mind around the problem and its solution. What is really lost over time is often not the text, but that living theory.

The input was worse too. The PDF was image-only, with no extractable text. Standard extraction tools produced little more than a few page markers. The scan was two-column, with visible bleed-through or neighboring-page text. There was even readable text near page edges that did not belong to the body at all.

At that point Codex did not try to brute-force OCR to the end.

It rendered all 14 pages as images, found the Flylib web transcription as the main textual base, and then visually checked the title, page 7, the references, and the last page against the rendered images. The stray edge text was explicitly ignored. Obvious extraction issues were also corrected, including modifi-ability, page ranges like 207213 and 248274, and the final fused phrase cannotand so need notsay.

The real difficulty, though, was terminology.

This is not a routine engineering paper. It talks about Ryle, Popper, tacit knowledge, and program theory. program death and program revival do not mean a process exiting and restarting. They refer to the disappearance of the theory held by a team, and later attempts by new people to rebuild that theory. World 3 is not the “Third World.” It is Popper’s World 3.

Codex wrote these choices down explicitly in the terminology report:

Programming as Theory Building becomes “programming as theory building,” not a stiffer phrasing like “programming as the construction of theory.”
program death and program revival keep the life metaphor rather than flattening it into generic maintenance language.
tacit knowledge is kept as tacit knowledge rather than broadened into something looser.
World 3 stays as World 3, preventing a geopolitical misreading.

This paper showed that Codex can solve more than OCR.

It can also turn terminology choices into something discussable. In older workflows these judgments lived only in the translator’s head. Here they land in a report that can be reviewed, challenged, or changed.

7. The Bitter Lesson: the value of a short essay lies in terminology density

The seventh piece was Rich Sutton’s The Bitter Lesson.

Sutton distills a recurring lesson from decades of AI research: in the long run, methods that scale with computation through search and learning tend to beat systems stuffed with handcrafted human domain knowledge.

It is short, almost blog-length.

And yes, it was easier. The work was mostly straightforward: clean the title, author, and date, remove page markers, repair split phrases, restore the emphasis on search and learning.

But the terminology density is still high.

general methods, leverage computation, human-knowledge approach, deep search, self play, value function, hidden Markov models, and scaling computation all need consistent handling. If they are not normalized, the essay collapses into a vague AI-history reflection. It will read smoothly, but readers will no longer know which English concept each phrase actually corresponds to.

So here the emphasis shifted to terminology annotation: preserve the English on first appearance, then continue in Chinese. The report confirmed that nothing important had been missed.

When the input is clean and the structure is simple, the workflow should move quickly. It did. The pipeline cannot exist only for the hardest papers. It also has to process short essays cheaply, or the tool itself becomes the next bottleneck.

8. The Unreasonable Effectiveness of Data: multi-column PDFs and repeated pull quotes

With Alon Halevy, Peter Norvig, and Fernando Pereira’s The Unreasonable Effectiveness of Data, the problem swung back to the PDF itself.

The paper explains why, in natural language and web-scale problems, huge amounts of real data plus simple scalable models often beat more elegant but smaller-scale theoretical systems.

This came from a printed, multi-column IEEE PDF with a lot of visual decoration.

The main difficulty was reading order.

The easiest way to break multi-column PDF extraction is to mix body text, headers, footers, decorative pull quotes, and section labels into the same stream. Pull quotes are especially tricky: on the page they look like design elements, but the text often just repeats a sentence from the body. If you keep them in the source, the translation makes the author sound like they are suddenly quoting themselves for no reason.

Codex used rendered page images to validate the column reading order, then removed repeated headers, footers, page numbers, and decorative section elements. Four pull quotes were explicitly omitted because they simply repeated body text and broke the reading flow.

Two smaller risks were also documented: Dublin Core Metadata Initiative point-encoding scheme was preserved according to the PDF extraction, and one example section had a suspicious extra right parenthesis that was treated as punctuation noise and removed in Chinese.

The lesson here was that fidelity does not mean preserving all visual noise.

What gets translated is the paper, not the accident scene created by the PDF layout. If the pull quotes only duplicate body text, they should disappear from the readable source. Otherwise “faithful to the original” ends up meaning faithful to the page-layout software.

9. You and Your Research: long transcripts and chunk stitching

Richard W. Hamming’s You and Your Research has no formulas, no code, no tables, and no references.

It is Hamming’s talk about how important work gets done: choosing problems, having courage, building habits, communicating well, shaping your environment, and not lying to yourself. It is not research self-help. The sharpest line in it is the question underneath everything else: do you actually want to do first-rate work?

That sounds easy enough, but it is long, and it is a lecture transcript.

The hard part of long transcripts is not structure. It is tone and consistency. Hamming’s text moves through spoken transitions, examples, and admonitions. If you translate it all at once, style drift creeps in. If you translate in chunks, chunk boundaries can leave visible seams.

Codex split the translation into six auditable blocks and then merged them into translated.zh-CN.md. After that, it checked and repaired two sentence-level problems caused by chunk boundaries. The terminology report preserved several context-sensitive renderings, including attack as “line of attack,” drive as “driving force,” appearance of conforming as “the appearance of conforming,” and orientation as “orientation.”

There was also a small but very engineering-style cleanup step. After chunk translation, Markdown often ends up with hard line breaks and neighboring single-line paragraphs. Codex did a mechanical reflow pass and kept the pre-reflow result as translated.zh-CN.before-reflow.tmp.md.

The value of this paper is that it shows the pipeline still matters even when PDFs are not torturing you.

One of the most annoying classes of writing problem is when every paragraph is individually acceptable, but the full article no longer sounds like one person speaking. AI translation has the same problem. Chunking solves context length. Merge and reflow have to follow.

10. An Axiomatic Basis for Computer Programming: image OCR, mathematical formulas, and HTML tables

C. A. R. Hoare’s An Axiomatic Basis for Computer Programming was the hardest paper in the entire batch.

This is the paper where Hoare presents the axioms and inference rules for proving program properties, the origin point of what later became Hoare logic.

The PDF was image-only. PyMuPDF reported 6 pages and 0 embedded text. In other words, there was nothing to extract through normal text extraction.

The moment I saw 0 embedded text, I knew we were walking into a hole. The method was to render full-page images, crop the left and right columns separately, run Apple Vision for OCR, and keep the raw OCR output. But I did not trust the OCR. The formulas, theorem notation, and tables all had to be rebuilt against the page images.

Formula rendering feedback loop

The hardest parts here were the math and the tables.

Hoare’s paper has assertions, inference rules, the assignment axiom, proof steps, and several small tables. Ordinary Markdown tables are fragile for this kind of content, especially when cells contain mathematical symbols, line breaks, formulas, or proof numbering. In the final translation, Tables I, II, and III were rebuilt as structured HTML tables instead of being forced into Markdown.

The formulas did not render correctly on the first try either. I ended up using a clumsy but effective feedback loop: rewrite the formulas into LaTeX-flavored Markdown, then use Chrome browser MCP to inspect the actual rendering, then compare the rendered formulas against screenshots from the original paper. If they did not line up, fix them and try again. A formula was only accepted once the page rendering was basically aligned with the original.

There was also a Hugo/Goldmark trap. Inside raw HTML blocks, if nested tags are indented, Goldmark may parse them as Markdown code blocks, which breaks the tables. Codex recorded that pitfall in the review notes and left-aligns the nested HTML.

Citations needed special treatment too. Because math rendering was enabled on the page, citation links could not use forms that might be damaged by the math renderer. The final form was ordinary anchors like <a href="#ref-8">[8]</a>, with the math renderer configured to ignore <a> tags.

At this point the work barely looked like translation anymore. It looked like a small document-engineering project.

OCR, LaTeX, HTML tables, Hugo, Goldmark, citations, terminology: all of it got packed into a six-page old paper. The paper is short, but the density is high.

The last paper was Dennis M. Ritchie and Ken Thompson’s The UNIX Time-Sharing System.

It presents early UNIX: the file system, process model, Shell, I/O abstractions, and the system’s scale. A lot of modern operating-system culture, toolchain taste, and “everything is a file” thinking can be seen here in early form.

This returned to the older CACM-style two-column PDF.

The first extracted page was a textbook failure: title, authors, abstract, and the first section interleaved with each other; The UNIX Time-Sharing System broken into strange positions; fused words like moredescribesmodernonlyandthe; [l] that had to become [1]; tile password file that had to become the password file.

Later there was Section 9 with statistics. In the original paper it is a statistics table. If left as raw text in a code block, it is unpleasant to read on the blog. If rebuilt as a Markdown table, it clashes with the table style used by the other papers in the section.

Codex first cleaned the two-column extraction damage, then translated the operating-systems terminology. Terms such as image, trap, mount, link, and Shell were translated or partially preserved in a historically aware way so the system context would not be softened away.

The statistics data was first rebuilt as a table inside the translation workspace. Once moved into the blog, it was upgraded again into the same HTML-table style used for the Hoare paper, and checked in the browser for mobile overflow and console errors. At that point it was not really a translation problem anymore. I simply did not want it to look sloppier than the Hoare page.

This paper was a good ending.

Because it ran through the whole chain one more time: PDF cleanup, terminology, citations, statistics tables, Hugo build, browser MCP, mobile overflow checks. By then AI was no longer “translating an article.” It was maintaining content inside a publishing system.

What problem did AI actually solve?

Across these 11 papers, AI did not solve one problem. It solved a bundle of small problems tangled together. I used Codex here, but similar coding agents such as Claude Code could run the same kind of flow as long as they can read and write files, call tools, and keep intermediate artifacts.

Some problems live at the input layer: PDFs, HTML, scans, two-column layouts, multi-column layouts, navigation residue, headers and footers, broken OCR, split words, and repeated pull quotes. Some live in structure: titles, sections, footnotes, references, citation anchors, code blocks, tables, formulas, and Hugo page metadata. And some are more annoying because they are hard to see with the naked eye but keep scratching at the reading experience: should the first occurrence of a term preserve the English, does the link really jump where it should, did the formula break in the browser renderer.

None of that is covered by the sentence “translate this paper.”

What AI did not solve was final responsibility.

How a term should be translated, where fidelity matters, where web readability justifies reflow, which OCR result is wrong, which figure caption in the source may already be wrong: those still require human judgment.

What changed is that humans no longer perform every step by hand. More of the work becomes checking whether the machine’s intermediate artifacts have drifted.

If this were done the traditional way, the workload would not be limited to “translating 180,000 characters.” You would also need to clean PDFs manually, proof OCR, rebuild tables, normalize terminology, repair references, import everything into the blog, and test mobile layout. Someone fluent in computer-science papers might still need two or three months of full-time work, and that estimate only gets worse if they are also reading background material while translating, typesetting, and reviewing. In reality I spent a scattered five or six hours across two days, often while moving other work forward in parallel.

Token ledger and review checklist

Time and tokens

The numbers below are rough totals pulled from Codex session records. They are not sanitized lab benchmarks. They are a ledger from actual work: sessions stay open, I switch away to do other things, and early prototype flows mix with later stable flows. Some sessions were started before I went back to writing company code, and only reviewed later.

The Chinese translation output adds up to roughly:

Metric	Value
Translation files	11
Markdown lines	2,812
`wc -m` character count	180,333
Unique recorded sessions	11
Total session log span	about 26.8 hours
Total recorded tokens	about 61.3M
Cached input tokens	about 56.1M
Non-cached input tokens	about 4.75M
Output tokens	about 500k
Reasoning output tokens	about 117k

The easiest numbers to misread here are time and tokens.

26.8 hours does not mean “I worked continuously for 26.8 hours,” and it does not mean “the machine ran continuously for 26.8 hours.” It is the log span between session start and the final token accounting, which includes offline time, task-switching, and sessions that stayed open for long stretches.

A better description at human scale would be this: the first paper, Hints for Computer System Design, took the longest because the workflow had not stabilized yet. Once the process settled down, short essays could be done in a few dozen minutes, while long PDFs or heavy table reconstruction could still take an hour or more. The time I actually spent at the keyboard pushing the process forward was about five or six scattered hours across two days.

61.3M total tokens looks alarming, but 56.1M of that is cached input. Long sessions keep carrying context, file snippets, tool output, and historical instructions, so cached input can grow very large. That number is good for describing how much context the model processed. It is not good for treating as an API bill directly.

The more meaningful numbers are these: about 4.75M non-cached input, about 500k output, and about 117k reasoning output. Even then, this is not a tiny task. It is closer to a small editing project compressed into an agent workflow.

AI did not make the work disappear. It turned large chunks of manual labor into machine execution plus human review.

The cost did not vanish. It moved from sentence-by-sentence labor into workflow design and final review.

Review

After this batch, the thing I am most certain about is that the workflow matters more than the prompt.

If all you rely on is a single prompt like “translate this paper,” you will probably get Chinese that looks acceptable but has a structure you cannot really audit. The reusable parts are the things that do not look like AI magic at all: clean the source first, handle references separately, pull out terminology candidates for human pruning, inspect formulas and tables in the browser.

That is also why the workflow looks direct but still would not be easy for a non-programmer to reproduce. It does not require building a complicated system, but it assumes you can break the work into phases, preserve intermediate artifacts, read diffs, and trust scripts more than vibes. source.md can be checked against the original. translated.zh-CN.md can be diffed. review-notes.tmp.md can record uncertainties and source issues. Hugo builds and browser checks can prove the publishing layer is at least not broken.

The people most exposed by AI translation may not be “people who know English.” They may be the workers whose value comes from moving English into Chinese sentence by sentence as piecework. Once draft translation, format cleanup, terminology replacement, and citation repair can be carried by an agent, the human work becomes less pleasant: choose the text, set the standards, judge terminology, resolve ambiguity, review key passages, take final responsibility.

AI cannot read the classics for me, but it can reduce the friction before reading.

I did not translate these papers in order to collect a pile of Chinese files. The real goal was to read them. Of course the English originals can be read directly, but classic papers come with extra friction: ugly PDFs, inconvenient references, terminology that has to be looked up repeatedly, and long, dense paragraphs.

What AI does is shave off that friction.

It does not explain to me why Brooks says there is no silver bullet, and it does not decide how much of Hoare’s formal ideal still shines today. But it can turn the originals into versions that I am willing to read, revise, and send to other people.

Back in grad school, I had an ambition that in hindsight was wildly overconfident: translate a batch of classic computer-science papers and organize them into a Chinese collection. That idea then sat still for years, because it had the shape of one of those grand plans you write in a notebook when you are young. It sounds right. It is also very easy to postpone forever.

This time it finally did not stay in the TODO pile.

Not because AI understands the papers better than I do, but because it pushed down the most draining parts of the work: organizing PDFs, repairing structure, moving citations, adding terminology, rebuilding tables, checking links. Translating a classic paper used to be a standalone project. Now it looks more like a build task: prepare the input, run the pipeline, inspect the diff, fix the errors, build, preview, publish.

The reason that old ambition sat untouched for so long was not that translation itself was sacred. It was that the setup cost was too high: finding versions, cleaning PDFs, repairing OCR, aligning references, standardizing terminology, and typesetting for publication. Each step is small, but together they are enough to push the whole effort into the drawer labeled “later.”

AI did not fulfill that ambition for me. It just pushed those setup costs down into a range I could actually carry.

What remains, finally, is the reading itself.