Project: Alexander Kustov Academic Website

Project: Alexander Kustov Academic Website

Overview

Jekyll 4.3 academic website using the AcademicPages remote theme (Minimal Mistakes fork). Hosted on GitHub Pages. Multilingual: 12 languages (EN, ES, FR, PT, DE, IT, PL, RU, JP, KO, TR, AR).

Architecture

Languages & Directories

| Code | Language | Directory | Notes | |——|———-|———–|——-| | en | English | _pages/ | Default, no prefix | | es | Spanish | es/ | Latin American conventions | | fr | French | fr/ | Metropolitan French | | pt | Portuguese | pt/ | Brazilian Portuguese (pt-BR) | | de | German | de/ | Hochdeutsch | | it | Italian | it/ | | | pl | Polish | pl/ | | | ru | Russian | ru/ | Cyrillic script | | jp | Japanese | jp/ | CJK; site uses “jp” internally, hreflang maps to “ja” for SEO | | ko | Korean | ko/ | CJK | | tr | Turkish | tr/ | | | ar | Arabic | ar/ | RTL script; dir="rtl" set on <html> |

Key Directories

  • _layouts/ – Jekyll layouts (see Layouts section below)
  • _includes/ – Partials (see Key Includes section below)
  • _sass/ – SCSS (vendor/susy grid, Minimal Mistakes theme, Breakpoint, Font Awesome 5, Magnific Popup, Academicons)
  • _data/ – Data files (navigation.yml, ui-text.yml, authors.yml, carousel.yml, newsletter_posts.yml)
  • _translations/source/ – English source markdown for newsletter posts (13 files)
  • _translations/agent_{1,2,3}/{lang}/ – Translation agent outputs (working files)
  • {lang}/newsletter/ – Final translated newsletter posts (live on site)
  • assets/css/ – Custom CSS: custom.css (dark mode, RTL, layout), newsletter-post.css, academicons.css, collapse.css, main.scss
  • assets/js/ – JavaScript: main.min.js (bundles jQuery 1.12.4 + plugins), _main.js (source), collapse.js, plugins/ (jQuery plugins), vendor/ (jQuery source)
  • talkmap/ – Leaflet-based talk location maps (inactive, talkmap_link: false in config)
  • markdown_generator/ – Jupyter notebooks & Python scripts for TSV-to-markdown conversion (legacy tooling)

Layouts

  • default – Base layout (extends compress); includes masthead, footer, dark mode toggle, text-expand
  • compress – HTML compression wrapper layout
  • single – Standard page layout with sidebar
  • archive – Collection/list pages
  • archive-taxonomy – Category/tag archive pages
  • newsletter-post – Substack-inspired minimal reading layout (680px centered, serif body, hero image, footnotes, JSON-LD Article schema)
  • splash – Full-width layout without sidebar
  • talk – Specialized layout for talks/presentations

Key Includes

  • masthead.html – Top navigation with language dropdown (reads site.languages from _config.yml)
  • language-switcher.html – Inline language links (reads site.languages from _config.yml)
  • hreflang.html – SEO alternate language tags; maps jp to ja for hreflang
  • seo.html – Open Graph, Twitter cards, canonical URLs, og:locale per language
  • author-profile.html – Sidebar author section with social links
  • events-sidebar.html – Book tour/events sidebar (activated via show_events: true)
  • text-expand.html – Vanilla JS [expand]/[/expand] block expander
  • head/custom.html – Favicons, MathJax 3.x, dark mode flash prevention CSS, external CSS links
  • analytics.html + analytics-providers/custom.html – Google Analytics (gtag, measurement ID: G-J78N1YFWN8)
  • footer.html – Copyright, theme attribution, AI translation disclosure for non-EN pages

Dark Mode System

  • Toggle button: #dark-mode-toggle in masthead with sun/moon icons
  • Storage: localStorage.setItem('darkMode', true/false) for persistence across pages
  • Flash prevention: Inline CSS in head/custom.html with html.dark-mode-pending class applied before body renders
  • JS implementation: In _layouts/default.html – reads localStorage, applies body.dark-mode class, toggles on click
  • Styles: assets/css/custom.css – 400+ lines covering all components (masthead, sidebar, tables, accordions, newsletter cards, blockquotes, code blocks, events sidebar)
  • Color palette: Background #1a1a2e, text #d4d4dc, links #6eaadc, masthead #16162a

Configuration Reference (_config.yml)

  • languages: – Array of {code, label} objects defining supported languages and display order (used by masthead, language-switcher)
  • analytics.provider: "custom" – Uses custom gtag.js include
  • analytics.google.measurement_id: "G-J78N1YFWN8" – GA4 measurement ID
  • talkmap_link: false – Disables talkmap feature on talks page
  • compress_html: – HTML compression (clippings: all, ignored in development)
  • future: true – Allows future-dated posts
  • whitelist: – Plugins allowed in --safe mode (for GitHub Pages compatibility)
  • Collections defined: teaching, publications, portfolio, talks (mostly empty, inherited from theme)

Newsletter System

  • English newsletter cards (_pages/newsletter.md) link to Substack URLs
  • Non-English newsletter cards ({lang}/newsletter.md) link to local translated pages at /{lang}/newsletter/{slug}/
  • Each translated post uses layout: newsletter-post with front matter: title, subtitle, date, permalink, lang, ref, original_url, image
  • Hero images served from Substack CDN (substackcdn.com)
  • Footnotes use kramdown native syntax: [^1] with [^1]: text at end

Build & Test

# Build site
bundle exec jekyll build

# Serve locally
bundle exec jekyll serve --port 4000 --no-watch

# Build takes ~20-30 seconds with all translations

# Rebuild minified JS (requires npm/node)
npm run build:js
# This runs: uglifyjs jquery + plugins + _main.js → main.min.js

Requirements

  • Ruby (with bundler) – Gemfile specifies jekyll ~> 4.3, includes wdm gem for Windows file watching
  • Node.js (optional) – Only needed for rebuilding main.min.js via npm run build:js
  • _config.dev.yml – Development overrides (localhost URL, analytics disabled); use with bundle exec jekyll serve --config _config.yml,_config.dev.yml

Translation Best Practices

Critical: UTF-8 Diacritics

The #1 issue encountered: Translation agents sometimes output ASCII-safe characters instead of proper UTF-8. This MUST be caught and rejected.

Language-specific checks:

  • Spanish: Must have accents – “académicos” NOT “academicos”, “también” NOT “tambien”, “más” NOT “mas”
  • French: Must have accents – “réfugiés” NOT “refugies”, “vérités” NOT “verites”, “écrire” NOT “ecrire”
  • German: Must have umlauts/eszett – “über” NOT “ueber”, “für” NOT “fuer”, “müssen” NOT “muessen”, “Straße” NOT “Strasse”
  • Italian: Must have accents – è, é, à, ò, ù, ì
  • Polish: Must have diacritics – ą, ć, ę, ł, ń, ó, ś, ź, ż
  • Portuguese: Must have diacritics – ã, õ, ç, á, é, í, ó, ú, â, ê, ô
  • Russian: Standard UTF-8 Cyrillic
  • Japanese: Standard UTF-8 kanji/hiragana/katakana
  • Korean: Standard UTF-8 Hangul
  • Turkish: Must have special chars – ğ, ı, ö, ş, ü, ç, İ
  • Arabic: Standard UTF-8 Arabic script (RTL)

Quick validation command:

# Check German for ASCII umlauts (should return 0 matches for good files)
grep -c "ueber\|fuer\|muessen\|Laender\|aeuf\|oeff" de/newsletter/*.md

# Check French for missing accents
grep -c "verite\|refugie\|qualifie\|universite" fr/newsletter/*.md

# Check Spanish for missing accents
grep -c " mas \| tambien\|academico\|inmigracion " es/newsletter/*.md

Translation Workflow (3-Agent Voting)

MANDATORY for ALL translation work — including small front page edits. Never skip this workflow even for single-sentence changes.

  1. Extract source content from Substack to _translations/source/
  2. Launch 3 independent translation agents per language (agent_1, agent_2, agent_3)
  3. Each agent translates all 13 posts independently
  4. Deliberation agent per language compares all 3 and picks best per post
  5. Final files written to {lang}/newsletter/

For front page / about page edits: The same 3-agent voting applies. Launch 3 agents per language, each producing an independent translation of the changed text. A deliberation agent picks the best version. This catches awkward phrasing that a single-pass translation misses.

Translation Quality Rules

  • Natural fluency over literal accuracy: If a direct translation sounds awkward or stilted in the target language, use a simpler, more natural synonym. The translated text should read as if originally written in that language.
  • Loanwords and cognates: If a concept is commonly expressed using an English loanword in the target language (e.g., “фокус” in Russian for “focus”, “フォーカス” in Japanese), prefer the loanword over a clunky native equivalent.
  • Avoid bureaucratic/academic jargon: Prefer everyday equivalents. E.g., in Russian: “демократическую политику” (democratic politics) is better than “демократическую выработку политики” (democratic policy-making process); “поддержать” (support) is better than “принять” (accept) when the meaning is about endorsing rather than receiving.
  • Read the sentence aloud: If a translated sentence would sound unnatural spoken aloud to an educated native speaker, rephrase it.
  • Sentence structure: Don’t mirror English syntax when the target language has different natural word order. Restructure sentences to flow naturally.
  • Consistency check: After translating, compare the translated paragraph against the English source and ask: “Does this convey the same meaning with the same tone, without any phrase that would make a native speaker pause?”
  • First sentence of bio pages is LOCKED: The opening sentence of each translated front page (index.md) uses a deliberately simplified form — “professor of migration at the University of Notre Dame” in the native language. Do NOT replace this with a literal translation of the English source (which mentions the Keough School). The simplified form is intentional. Only change it if the user explicitly asks. The pattern is: [Name] [is a] professor of migration at [University of Notre Dame in native language].
    • ES: “Alexander Kustov es profesor de migración en la Universidad de Notre Dame.”
    • FR: “Alexander Kustov est professeur de migration à l’Université de Notre Dame.”
    • PT: “Alexander Kustov é professor de migração na Universidade de Notre Dame.”
    • DE: “Alexander Kustov ist Professor für Migration an der Universität Notre Dame.”
    • IT: “Alexander Kustov è professore di migrazioni presso la University of Notre Dame.”
    • PL: “Alexander Kustov jest profesorem migracji na Uniwersytecie Notre Dame.”
    • RU: “Александр Кустов — профессор миграции в Университете Нотр-Дам.”
    • JP: “アレクサンダー・クストフはノートルダム大学の移民研究の教授である。”
    • KO: “알렉산더 쿠스토프는 노트르담 대학교의 이민 연구 교수이다.”
    • TR: “Alexander Kustov, Notre Dame Üniversitesi’nde göç alanında profesördür.”
    • AR: “ألكسندر كوستوف أستاذ مشارك في كلية كيو للشؤون العالمية بجامعة نوتردام.”

Translation Rules

  • Translate title, subtitle, and body; keep all hyperlinks as English URLs
  • Translate footnote content but keep markers [^1], [^2]
  • Translate image alt text
  • Idioms: substantive equivalent, NOT literal translation
  • Name handling: “Alexander Kustov” in Latin script for all languages except Russian (“Aleksandr Kustov”) and Japanese (“Aleksanda Kusutofu” in katakana)
  • Preserve all markdown formatting exactly
  • Do NOT add or remove content
  • Do NOT translate proper nouns unless established translations exist

Publications & Media Translation Format

When translating article/piece titles in publications.md and media.md, use two separate <a> tags — one for the translated title, one for the English original in parentheses:

CORRECT (two links):

<a href="URL">Translated Title</a> (<a href="URL">English Title</a>)

WRONG (single link — causes BiDi rendering issues, especially in Arabic RTL):

<a href="URL">Translated Title (English Title)</a>

For publications:

  • Translate article titles; keep author names, journal names, volume/page numbers in English
  • Translate filter button labels and group labels
  • Translate abstract text and resource labels (“Final Draft” → “Versión Final”, etc.)
  • Keep DOIs, URLs, and all data-* attributes unchanged

For media:

  • Translate card titles with English original in parentheses (two <a> tags as above)
  • Translate filter labels (Topic/Format) and filter button text
  • Translate format tags (Interview, Op-ed, Analysis, etc.)
  • Keep outlet names, dates, favicon URLs unchanged

Arabic (RTL) Specific Rules

Arabic pages use dir="rtl" on <html>. Special CSS in assets/css/custom.css handles:

  • Sidebar: Flipped to right side (profile pic, book cover on right)
  • Events sidebar: Flipped to LEFT side with left: 0; right: auto;. Content area gets margin-left: 230px via :has(.events-sidebar) to avoid overlap
  • Publication citations: direction: ltr; unicode-bidi: isolate to prevent BiDi scrambling of mixed English/Arabic text
  • Abstracts: Kept in native RTL
  • Media cards: Title in RTL, metadata isolated in LTR
  • Article-accordion (about page): Citations isolated as LTR, abstracts RTL, expand icon on left
  • Book page: Cover floats left, descriptions right-aligned, review borders on right side
  • Navigation/masthead: RTL direction

Arabic content preferences (these apply to Arabic only, NOT other languages):

  • Publications page: Arabic title on its own line above, then the regular English citation below (same format as the English publications page). The <span class="ar-title"> sits outside/before the <span class="pub-citation">.
  • About/front page: No “select articles” section. The Arabic about page has only the bio text, no article accordion.
  • Book page & press links: Use Arabic transliterations for organization names (e.g., “أمازون” not “Amazon”, “فورين أفيرز” not “Foreign Affairs”) to avoid BiDi misalignment. Reduce English title/publisher font size.
  • Ongoing research page: Arabic section headings (right-aligned), but paper titles and author names stay in English (left-aligned).
  • English content alignment: All English publication content (citations, resources, media mentions) is LEFT-aligned on Arabic pages, per W3C RTL guidelines and Arab academic journal conventions (foreign refs are left-aligned, Arabic refs are right-aligned). Arabic text (titles, abstracts, headings) remains right-aligned.
  • Name, position, and events are NOT translated into Arabic (or any language).

When adding new content to Arabic pages:

  • Use Arabic transliterations for organization/publication names wherever possible to minimize BiDi issues
  • For mixed Arabic/English text in headings, use dir="rtl" on the container and dir="ltr" on English <a> tags
  • Put email addresses on a separate line (<br>) to avoid BiDi mixing with Arabic text
  • Test rendering of mixed-direction text — periods, question marks, and parentheses can get misplaced
  • Keep the expand/collapse icon on the left side (CSS handles this via summary::after position swap)
  • ALWAYS visually verify Arabic pages before committing — BiDi issues are not visible in source code

Front Matter Template for Newsletter Posts

---
layout: newsletter-post
title: "TRANSLATED TITLE"
subtitle: "TRANSLATED SUBTITLE"
date: YYYY-MM-DD
permalink: /{lang}/newsletter/{slug}/
lang: {lang}
ref: newsletter-{slug}
original_url: https://alexanderkustov.substack.com/p/{slug}
image: https://substackcdn.com/image/fetch/...
author_profile: false
---

CJK Read Time

The newsletter-post layout has special handling for Japanese read time using number_of_words: "cjk" with a 500 chars/min reading speed (vs 160 words/min for non-CJK).

Common Pitfalls

  1. Diacritics loss – Always verify UTF-8 characters after translation. Some agents output ASCII-safe substitutes.
  2. Jekyll _ directories – Directories starting with _ are not processed as pages by default (except predefined ones like _posts). The _translations/ directory is intentionally excluded.
  3. Substack CDN images – Hero images use Substack CDN URLs. These may break if Substack changes their CDN structure.
  4. Chrome auto-translate – When testing non-English pages locally, Chrome may auto-translate them back to English, making it appear that translations are broken when they’re actually fine. Check the actual HTML source.
  5. Language switcher – Dropdown in masthead and inline switcher both read from site.languages in _config.yml (centralized). Each page must have both lang and ref fields in front matter for the switcher to appear.
  6. hreflang mapping – The site’s _includes/hreflang.html maps jp to ja for proper SEO. Generic: iterates all pages with matching ref.
  7. Bio style – All non-English bios use third person (“Alexander Kustov is…”); English uses first person (“I am…”).
  8. Book title format – Use “double parentheses” style: Title (Translated Title) (Publisher, Year).
  9. AI translation disclosure – Footer includes disclosure for non-EN pages. Newsletter posts also have individual disclosure.
  10. BiDi rendering – Arabic (RTL) pages require two separate <a> tags for translated+English titles. Putting both in one <a> tag causes BiDi scrambling of punctuation, spaces, and reading order.
  11. Front matter lang and ref – EVERY page (including English source pages) MUST have lang: and ref: in front matter, or the language switcher won’t appear. English pages use lang: en.
  12. Events sidebar – Only shown on pages with show_events: true in front matter (currently only about/index pages).
  13. Navigation order – Media tab comes before Ongoing Research in all language nav blocks.
  14. Pre-commit verification – ALWAYS preview RTL/Arabic pages visually (screenshot or local serve) before committing. BiDi issues are invisible in source code and can only be caught by rendering the page.
  15. Email in RTL context – English email addresses (akustov [at] nd [dot] edu) should be on a separate line in Arabic pages to prevent BiDi mixing with surrounding Arabic text.
  16. Book page RTL – Book cover image floats LEFT (not right) on Arabic pages. Review text borders go on the RIGHT side. Table alignments are flipped.

Seamless Translation Update Workflow

When the user adds or modifies content on the English website and wants translations updated:

IMPORTANT: Always use the 3-Agent Voting workflow (see above) for any translation work, including small text changes on the front page. A single-pass translation frequently produces awkward phrasing that native speakers would notice.

Step 1: Identify Changes

  • Compare the updated English file with its translated counterparts
  • Identify what’s new, modified, or removed

Step 2: Update All 11 Languages

  • Apply the same change to all {lang}/ versions
  • For new entries (e.g., new publication, new media card):
    • Translate the title and any translatable text
    • Use two separate <a> tags: <a href="URL">Translated Title</a> (<a href="URL">English Title</a>)
    • Keep author names, journal names, URLs, data-* attributes unchanged
    • Copy the exact same HTML structure as the English original

Step 3: Verify

  • Run diacritics checks for FR, DE, ES, PL, PT (see validation commands above)
  • For Arabic: verify two-<a>-tag structure is used (NOT single-link)
  • Build site: bundle exec jekyll build
  • Check line counts match across languages (media files should all be ~same length)

Step 4: Update Navigation (if new pages added)

  • Add entries to ALL 12 main-* blocks in _data/navigation.yml
  • Add lang: and ref: to English source file front matter
  • Add show_events: true if the page should show events sidebar

Quick Reference: Pages Per Language

Each non-EN language directory ({lang}/) should contain:

  • index.md – Homepage/about (with show_events: true)
  • book.md – Book page
  • newsletter.md – Newsletter index
  • newsletter/{slug}.md – 13 individual newsletter posts
  • publications.md – Published research
  • media.md – Media engagement
  • ongoing-research.md – Ongoing research
  • cv.md – CV page

File Counts

  • 13 newsletter source posts in _translations/source/
  • 143 translated newsletter posts (13 posts × 11 languages) in {lang}/newsletter/
  • 11 translated newsletter index pages in {lang}/newsletter.md
  • 12 homepage variants (EN + 11 translations)
  • 12 book page variants
  • 11 publications pages, 11 media pages, 11 ongoing-research pages, 11 CV pages
  • Navigation entries in _data/navigation.yml for all 12 languages

Substack Posts (13 total)

| Slug | Title | ~Words | |——|——-|——–| | academics-need-to-wake-up-on-ai-part | Academics Need to Wake Up on AI, Part II | 3,500 | | academics-need-to-wake-up-on-ai | Academics Need to Wake Up on AI | 4,000 | | western-countries-do-not-need-immigration | Western Countries Do Not “Need” Immigration | 3,200 | | student-migration-is-popularuntil | What’s the Matter with Foreign Students? | 3,000 | | reflections-on-the-uncomfortable | Reflections on “The Uncomfortable Truths” | 3,500 | | the-uncomfortable-truths-about-immigration | The Uncomfortable Truths About Immigration | 8,500 | | immigration-is-not-a-thing-that-has | Immigration Is Not One Thing That Has Effects | 3,200 | | why-japan-is-so-uncanny-uncannily | Why Japan Is So Uncanny… Uncannily Normal | 3,500 | | the-immigration-substack-universe | The Immigration Substack Universe | 2,500 | | do-people-like-refugees-more-than | Do People Like Refugees more than Economic Immigrants? | 3,000 | | why-dont-you-house-them-yourself | “Why Don’t You House Them Yourself?” | 3,000 | | why-skilled-migration-is-popular | Why Skilled Migration Is Popular | 3,000 | | welcome-to-popular-by-design | Welcome to “Popular by Design” | 800 |