SEO Engineering on a Static Site — Structured Data, Social Cards, and Crawler Signals

holas.pl scores 100 in Lighthouse's SEO category. What that actually checks: meta title is present, meta description is present, canonical URL is set, links are crawlable, the page is mobile-friendly. These are the minimum requirements — the things that block indexing if they're missing.

What Lighthouse SEO doesn't check: whether your structured data is complete, how your page renders as a social card, what feed readers see when they subscribe, whether Google can find and index your images without crawling every page.

This post covers the implementation layer beneath that score.

Structured Data

Structured data is JSON-LD in a <script type="application/ld+json"> block. It tells search engines what a page is, not just what it says. holas.pl uses four schema types.

WebSite + SearchAction

Every page carries a WebSite schema identifying the site and its search endpoint:

{
    "@context": "https://schema.org",
    "@type": "WebSite",
    "name": "holas.pl",
    "url": "https://holas.pl",
    "author": {
        "@type": "Person",
        "name": "Paweł Holik",
        "url": "https://holas.pl"
    },
    "potentialAction": {
        "@type": "SearchAction",
        "target": {
            "@type": "EntryPoint",
            "urlTemplate": "https://holas.pl/search/?q={search_term_string}"
        },
        "query-input": "required name=search_term_string"
    }
}

The potentialAction enables the Google Sitelinks search box — a search input that appears directly in the Google result for the site. It maps to the Pagefind-powered search at /search/. This is one extra field on an existing schema with no downside.

BlogPosting

Blog posts carry the richest schema. Beyond headline, description, url, and datePublished, several fields matter for how Google represents the content:

inLanguage — "en" or "pl", needed for multilingual indexing
wordCount — computed at parse time by ContentItem::wordCount() (strips HTML tags, counts tokens)
articleSection — the post category
keywords — the post tags as a comma-separated string
image — a nested ImageObject with url, width, and height

{
    "@type": "BlogPosting",
    "headline": "Post title",
    "inLanguage": "en",
    "wordCount": 842,
    "articleSection": "tutorials",
    "keywords": "seo, symfony, static-site",
    "image": {
        "@type": "ImageObject",
        "url": "https://holas.pl/media/post-dir/featured.webp",
        "width": 1280,
        "height": 720
    }
}

Without ImageObject, Google treats the featured image as an unknown attachment. With width and height explicitly set, the image becomes eligible for large preview cards in Google Discover and Search.

BreadcrumbList

Google can replace the raw URL in search results with breadcrumb navigation — "Home / Blog / tutorials / Post Title". This requires BreadcrumbList schema.

It's rendered in breadcrumb.html.twig alongside the HTML nav. Each crumb is a ListItem with position and item (URL). The last item — the current page — has a name but no URL:

{
    "@type": "BreadcrumbList",
    "itemListElement": [
        { "@type": "ListItem", "position": 1, "name": "Home", "item": "https://holas.pl/" },
        { "@type": "ListItem", "position": 2, "name": "Blog", "item": "https://holas.pl/blog/" },
        { "@type": "ListItem", "position": 3, "name": "tutorials", "item": "https://holas.pl/blog/tutorials/" },
        { "@type": "ListItem", "position": 4, "name": "Post Title" }
    ]
}

CollectionPage + ItemList

Category, tag, and archive listing pages carry CollectionPage with a nested ItemList. Each entry has a position and url. Only rendered when the listing has posts — an empty category page doesn't get it.

{
    "@type": "CollectionPage",
    "name": "tutorials | holas.pl",
    "mainEntity": {
        "@type": "ItemList",
        "numberOfItems": 5,
        "itemListElement": [
            { "@type": "ListItem", "position": 1, "url": "https://holas.pl/blog/post-name/" }
        ]
    }
}

Social Sharing

OpenGraph Image Dimensions

Without og:image:width and og:image:height, platforms like LinkedIn and Slack must fetch the image before rendering the preview card. With them, the card renders immediately:

<!-- blog post (WebP, 1280×720) -->
<meta property="og:image:width" content="1280">
<meta property="og:image:height" content="720">
<meta property="og:image:type" content="image/webp">

<!-- other pages (default og:image, JPG, 1200×630) -->
<meta property="og:image:width" content="1200">
<meta property="og:image:height" content="630">
<meta property="og:image:type" content="image/jpeg">

The conditional is in base.html.twig: if a content object with an image is defined (blog post or page with featured image), use the WebP dimensions; otherwise use the defaults for og-default.jpg. The JPG exception exists because og:image is consumed by external crawlers that don't reliably support WebP.

Twitter/X Card

The base twitter:card type was already present. Three explicit fields were added:

<meta name="twitter:title" content="...">
<meta name="twitter:description" content="...">
<meta name="twitter:image" content="...">

Without them, Twitter/X falls back to OG properties. Explicit meta removes that dependency — if OG processing has any issue, Twitter Card still has the correct values.

article:* Meta

Blog posts get article-specific OG meta in post.html.twig's og_article_meta block:

<meta property="article:published_time" content="2026-06-21T00:00:00+00:00">
<meta property="article:modified_time" content="2026-06-21T00:00:00+00:00">
<meta property="article:author" content="Paweł Holik">
<meta property="article:section" content="tutorials">
<meta property="article:tag" content="seo">
<meta property="article:tag" content="symfony">
<meta property="article:tag" content="static-site">

article:tag is one element per tag — not comma-separated. The Open Graph spec requires separate elements for multi-value properties.

Feed Readers

content:encoded

The default RSS <description> contains only the post excerpt — first paragraph, HTML stripped. content:encoded carries the full post HTML in a CDATA block:

<content:encoded><![CDATA[<p>Full post content...</p>]]></content:encoded>

This requires xmlns:content="http://purl.org/rss/1.0/modules/content/" on the root <rss> element. Feed readers like NetNewsWire, Reeder, and Feedbin render content:encoded inline — subscribers read the full article without leaving their reader.

category and media:content

Each RSS item gets <category> elements for the post category and each tag:

<category>tutorials</category>
<category>seo</category>
<category>symfony</category>

media:content attaches the featured image as a typed media attachment:

<media:content url="https://holas.pl/media/post-dir/featured.webp"
               medium="image" type="image/webp" width="1280" height="720"/>

Feed readers that render inline images (Feedly, Inoreader) use this for the post thumbnail in the feed list. This requires xmlns:media="http://search.yahoo.com/mrss/" on the <rss> element.

Crawling Signals

max-image-preview:large

The default robots behavior limits image previews in Google Search and Discover to standard size. max-image-preview:large opts into full-size previews. Combined with max-snippet:-1 (no restriction on text snippet length), this is the default robots meta on every page:

<meta name="robots" content="max-image-preview:large, max-snippet:-1">

Implemented as the default {% block robots %} in base.html.twig. Child templates override the block for pages that shouldn't be indexed — coming-soon pages use noindex, nofollow, the search page uses noindex.

Image Sitemap

The standard sitemap lists page URLs. The image sitemap adds <image:image> blocks, giving Google direct visibility into image locations and alt text without crawling every page first:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
    <url>
        <loc>https://holas.pl/blog/post-name/</loc>
        <image:image>
            <image:loc>https://holas.pl/media/post-dir/featured.webp</image:loc>
            <image:title>Alt text from image_alt frontmatter</image:title>
        </image:image>
    </url>

image:title comes from the image_alt frontmatter field — the same text used in the HTML alt attribute. Both the xmlns:image namespace and the image:image block are in sitemap.xml.twig.

What Changed

The Lighthouse SEO score was 100 before these changes. It's still 100 after them. That score measures the technical floor: crawlability, meta tags, mobile-friendliness.

The changes above operate at a different level. Structured data shapes how search engines represent content in rich results. Explicit social meta ensures correct rendering without relying on platform fallback logic. RSS extensions let subscribers read full posts in their reader. The image sitemap gives Google image visibility without requiring a crawl of every page.

None of it is architecturally complex — most of it is Twig template additions and namespace declarations. The constraint is discipline: every field needs a real value from frontmatter, not a placeholder.

The architecture that makes all of this straightforward is covered in Part 2 of this series — the static generation pipeline that produces complete HTML for every page.

Contact

Structured Data#

WebSite + SearchAction#

BlogPosting#

BreadcrumbList#

CollectionPage + ItemList#

Social Sharing#

OpenGraph Image Dimensions#

Twitter/X Card#

article:* Meta#

Feed Readers#

content:encoded#

category and media:content#

Crawling Signals#

max-image-preview:large#

Image Sitemap#

What Changed#

Related Posts