Libsyn Directory

Analysing Robots.txt at scale with HTTP Archive and BigQuery - transcript

Search Off the Record

Analysing Robots.txt at scale with HTTP Archive and BigQuery

Search Off the Record

In this episode of Search Off the Record, Martin and Gary turn a simple robots.txt question into a data‑driven deep dive using HTTP Archive, WebPageTest, custom JavaScript metrics, and BigQuery. They explore how millions of real robots.txt files are actually written in 2025–2026, which directives and user‑agents are most common, and what that means for modern crawling and AI bots. Perfect for beginner to mid‑level developers and SEOs, you’ll learn how large‑scale web measurement works (HTTP Archive, Chrome UX Report, Web Almanac), and how to turn raw crawl data into actionable SEO...

Are websites getting “fat”? Page weight, HTML size & Googlebot limits explained - transcript

Search Off the Record

Are websites getting “fat”? Page weight, HTML size & Googlebot limits explained

Search Off the Record

In this episode of Search Off the Record, Gary and Martin dig into what “page size” and “page weight” actually mean for developers, users, and search engines. They discuss exploding web page sizes: median mobile homepages hit 2.3 MB in 2025 Web Almanac (up 3x from 2015), key insights for developers on page weight definitions, Googlebot's crawl limits, HTML bloat from structured data/images, and why size still hurts UX on slow connections despite faster networks. If you build or maintain websites, this conversation will help you rethink how much data your pages ship, where bloat really...

Google crawlers behind the scenes - transcript

Search Off the Record

Google crawlers behind the scenes

Search Off the Record

Developers often talk about Googlebot as if it were a single program you could just run as “googlebot.exe”, but that is not how Google’s crawling actually works. In this episode of Search Off the Record, Martin and Gary from the Search Relations team unpack how Google’s crawling infrastructure is really built and operated. They cover why “Googlebot” is a misnomer and how it relates to a central crawling software-as-a-service used by many Google products, how crawl behavior is controlled centrally to avoid overwhelming sites (throttling, handling 503s, and “don’t break the...

How Browsers Really Parse HTML (and What That Means for SEO) - transcript

Search Off the Record

How Browsers Really Parse HTML (and What That Means for SEO)

Search Off the Record

Martin and Gary unpack how HTML parsing really works, why the HTML standard is so lenient, and how messy markup can silently break key SEO signals like hreflang and rel=canonical. They revisit validators and cross‑browser hacks from the Netscape/IE days, and discuss whether semantic HTML and strict validity truly matter for search. You’ll also hear when link hints like preload, prefetch, and DNS prefetch help performance (and indirectly SEO), and where meta and link tags really belong. Resources: HTML Living Standard → https://html.spec.whatwg.org/ Episode transcript →...

Do You Still Need a Website in 2026? (Transcript)

Search Off the Record

Do You Still Need a Website in 2026?

Search Off the Record

In this episode of Search Off the Record, Martin and Gary from the Google Search Relations team tackle a deceptively simple question: do you still need a website in 2026? Starting from the recurring industry claim that “the web is dead,” they explore how the web has evolved through the rise of apps, AI chatbots, and social platforms, and why the answer almost always ends up being “it depends.” Tune in for an engaging discussion on how websites remain relevant and what it means for content creation and discovery. Episode transcript → Listen to more Search Off the Record → ...

TOPICS