Schema markup is a set of hidden labels you add to your website that tell machines — search engines, AI assistants, browsers — what your content actually means. Without it, a machine reading your page sees words. With it, it sees structure: this is a business, this is a person, this is a question and its answer. llms.txt is a plain-text file you put at the root of your site that serves as an index for AI engines — a curated list of your most important pages and what each one covers. Together, these two tools make your site dramatically easier for AI systems to read, understand, and cite.
Neither one is magic. But both are the kind of quiet infrastructure that separates a site an AI will confidently reference from one it will ignore. This post explains how each works, what types of schema actually matter for a small business, and what "AI-readable" really means in practice.
What schema markup is
Think of schema as a second layer of meaning on your page — one written for machines, not people. Your visitors read your homepage and understand you run a landscaping company in Austin. A machine reading raw HTML just sees text. Schema bridges that gap by wrapping your content in a standardized vocabulary, maintained by Schema.org, that every major search engine and AI platform has agreed to understand.
In practice, schema lives inside a <script type="application/ld+json"> block in your page's head — a small JSON object that describes the entities on your page. It doesn't change what your visitors see. It changes what machines conclude about what they're looking at.
The payoff is concrete. Schema markup is the reason some Google results show star ratings, FAQ accordions, and breadcrumb trails while others show plain blue links. It's also why some businesses appear in Google's Knowledge Panel — the box on the right side of a search result — and why AI tools can extract a clean, accurate summary of your business instead of guessing. The machine has to work much harder without it, and machines that have to guess will often guess wrong or skip you entirely.
Getting into Google's Knowledge Graph — the database of real-world entities that powers the Knowledge Panel and feeds AI answers — generally benefits from around 30 corroborating sources referencing the same entity. Wikidata, the open knowledge base, now feeds the Knowledge Graph directly, which is why having a Wikidata entry for your business or organization is worth more than it looks. Schema on your site is one signal in that web of corroboration; it's not sufficient on its own, but it's necessary.
The schema that matters for a small business
Schema.org defines hundreds of types. Most of them don't apply to a service business. Here are the five worth knowing:
- Organization. The foundational type — establishes that your site represents a real company with a name, URL, logo, and contact information. Everything else you add will link back to this.
- LocalBusiness (or a more specific subtype like
PlumbingServiceorLandscapingBusiness). Adds your physical or service area, hours, and the signals Google needs to rank you in local results and the map pack. - Article (or
BlogPosting). Tells machines that a page is editorial content: who wrote it, when it was published, and what it's part of — the signals that make content citable. - FAQPage. Marks up your questions and answers explicitly, so search engines can display them as a rich result and AI engines can extract them cleanly without having to guess where the question ends and the answer begins.
- Person. Establishes a named individual — an author, a founder — with credentials and a consistent identity across the web. Strong authorship signals are increasingly tied to how AI tools evaluate source credibility.
A minimal Organization block looks like this:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Organization",
"@id": "https://example.com/#organization",
"name": "Summit Services",
"url": "https://example.com"
}
</script>
That's a starting point — four fields, thirty seconds to fill in. In production you'd add your logo, address, sameAs links to your Google Business Profile and social accounts, and connect it to your LocalBusiness entity. But the principle is the same: a small, well-formed JSON object that lets machines confidently identify and describe you.
The real leverage comes from linking these types together. When your BlogPosting nodes reference an author that resolves to a Person node, and that person is connected to your Organization, you're building a graph — a web of corroborated facts — that AI systems treat as far more trustworthy than isolated, unconnected claims. A machine reading a chained graph can say "this article was written by this person who works for this verified organization." A machine reading a bare page can only guess.
What llms.txt is and why it appeared
llms.txt is a convention that emerged in late 2024 as AI web crawlers became common. The idea is simple: you create a plain-text file at yourdomain.com/llms.txt that lists your most important pages, with a one-line description of each. AI engines that respect the convention use it as a starting point — a curated map of your content — rather than having to crawl and infer your site structure on their own.
It's modeled loosely on robots.txt (the file that tells search crawlers what to skip) and sitemap.xml (the file that lists all your pages for indexing), but its purpose is different. Where robots.txt controls access and sitemaps list everything, llms.txt is an editorial curation — you're telling AI engines which pages best represent your thinking, your expertise, and your answers. It's a signal of intent: here are the pages worth reading if you want to understand us.
There's no governing body behind llms.txt, no penalty for omitting it, and no guarantee that every AI crawler reads it. What it does is reduce the work an AI system has to do to extract a coherent picture of your business. Less work for the machine usually means more accurate, more favorable coverage. For a page that takes thirty minutes to create, the upside is disproportionately large.
The format is deliberately minimal — a heading for each section, followed by a list of URLs with short descriptions in plain sentences. No markup, no special syntax. A crawlable version at /llms-full.txt can include the full text of your key pages, for engines that want to download content directly rather than follow links.
How they make you AI-readable
AI answers are built on extraction and corroboration. An AI engine looking for a source on, say, service business operations has to do two things: find pages that contain the right information, and decide which of those pages to trust enough to cite. Schema and llms.txt each address one half of that problem.
Schema makes extraction clean. When a machine can read your FAQPage markup and see that a specific block of text is a structured answer to a specific question, it doesn't have to guess. It knows what the text is, who wrote it, when it was last updated, and what organization stands behind it. That certainty is the difference between content that gets cleanly lifted into an AI answer and content that gets paraphrased into something unrecognizable — or skipped.
llms.txt makes discovery and prioritization easier. Instead of crawling your entire site and trying to infer what matters, an AI agent can read your index and go directly to the pages that represent your best work. That's particularly important for smaller sites that haven't accumulated enough inbound links to signal authority through traditional means. llms.txt lets you say "trust these pages" without waiting for the world to say it for you.
lift in how often a source was surfaced in AI answers when it added clear statistics to its content — formatting is leverage.
That stat points to something broader. Format is a signal. A page with clear statistics, explicit structure, and marked-up answers is easier for a machine to process — and machines favor what they can process cleanly. Schema is the technical layer of that signal. llms.txt is the navigational layer. Together they tell an AI system: we've done the work of organizing this content for you, and here's where to find it.
The practical implication is that adding schema and llms.txt to an existing site doesn't require rewriting your content. It requires wrapping what you've already said in a structure that machines can follow. If your content is solid — accurate, specific, well-organized — schema and llms.txt are what allow AI engines to actually use it.
For a deeper look at how to become the source an AI assistant names by name, see how to get cited in ChatGPT and AI search. Schema and llms.txt are two of the five systems covered there. And for the broader picture of why AI visibility is now part of the same infrastructure problem as traditional search, see how customers find businesses now.
See it live
The best way to understand what these files look like in production is to read a real one. Lyfework publishes its own llms.txt — you can see it at /llms.txt. It lists the pages and resources we think best represent how we work and what we know: this blog, the service pages, the tools. No jargon, no auto-generated sitemap dump — just a curated list written for an AI reader.
The schema on this page follows the same logic. Every article in this blog is connected to a BlogPosting node that references a Person author node that connects to the Lyfework Organization node. The FAQPage below is marked up explicitly so the questions and answers can be extracted without interpretation. If you open the page source and search for application/ld+json, you'll see exactly what an AI crawler sees.
You don't need to be a developer to audit this. Every major browser has a developer tools panel where you can inspect the source. Google's Rich Results Test will tell you whether your schema is valid and what rich features you've unlocked. Schema.org's documentation describes every type in plain language. The barrier to entry is genuinely low — the bigger obstacle is usually not knowing these tools exist, which is the whole reason this post does.
The systems that make a business findable — schema, llms.txt, a clean site structure, a steady content engine — are not glamorous. They're infrastructure. They don't show up in an ad or a press release. But they're the reason one business shows up when a customer asks an AI assistant a question, and another doesn't. Build the infrastructure, and visibility follows. Skip it, and you're betting on luck.