AI Visibility Report — carbonremote.com

1. Signal-by-Signal Scorecard

Every signal that AI agents, crawlers, and LLM search engines use to discover and understand your site.

Signal	Status	Impact	Finding
Agent discovery file	❌ Missing	Critical	No `/llms.txt` or `/.well-known/llms.txt` file. This is the front door for AI models to discover your content.
Full content endpoint	❌ 404	High	No dedicated markdown content endpoint for bulk ingestion by AI systems.
Markdown content negotiation	❌ Returns HTML	High	`Accept: text/markdown` requests receive the full JS-heavy HTML page. AI crawlers cannot parse this efficiently.
JSON-LD structured data	❌ None	High	Zero JSON-LD blocks on homepage. No Schema.org entity descriptions for the company, services, or articles.
Semantic heading structure	⚠️ Minimal	Medium	Homepage: 1 h1, 4 h2, 3 h3. Functional but minimal. Most content is laid out via Webflow divs, not semantic elements.
robots.txt (AI crawlers)	✅ Excellent	High	Explicitly allows GPTBot, PerplexityBot, Anthropic, MetaAI, DeepSeekBot, Google-Extended, and Applebot.
robots.txt (Scrapers)	✅ Well-configured	Low	Blocks 100+ known scraping/spam bots aggressively.
Sitemap.xml	✅ Present	Medium	Multiple sitemaps found covering blog posts, service pages, and core site pages.
OpenGraph / Twitter Cards	✅ Configured	Medium	Title, description, and image tags present for both OG and Twitter.
Canonical URL	✅ Clean	Low	Set to `www.carbonremote.com` — no duplicate issues.
Analytics (GTM + GA4)	✅ Present	-	Both Google Tag Manager and GA4 tracking installed.
Google Search Console	✅ Verified	-	Site verification meta tag present.

Signal

Status

Impact

Finding

Agent discovery file

❌ Missing

Critical

No /llms.txt or /.well-known/llms.txt file. This is the front door for AI models to discover your content.

Full content endpoint

❌ 404

High

No dedicated markdown content endpoint for bulk ingestion by AI systems.

Markdown content negotiation

❌ Returns HTML

High

Accept: text/markdown requests receive the full JS-heavy HTML page. AI crawlers cannot parse this efficiently.

JSON-LD structured data

❌ None

High

Zero JSON-LD blocks on homepage. No Schema.org entity descriptions for the company, services, or articles.

Semantic heading structure

⚠️ Minimal

Medium

Homepage: 1 h1, 4 h2, 3 h3. Functional but minimal. Most content is laid out via Webflow divs, not semantic elements.

robots.txt (AI crawlers)

✅ Excellent

High

Explicitly allows GPTBot, PerplexityBot, Anthropic, MetaAI, DeepSeekBot, Google-Extended, and Applebot.

robots.txt (Scrapers)

✅ Well-configured

Low

Blocks 100+ known scraping/spam bots aggressively.

Sitemap.xml

✅ Present

Medium

Multiple sitemaps found covering blog posts, service pages, and core site pages.

OpenGraph / Twitter Cards

✅ Configured

Medium

Title, description, and image tags present for both OG and Twitter.

Canonical URL

✅ Clean

Low

Set to www.carbonremote.com — no duplicate issues.

Analytics (GTM + GA4)

✅ Present

Both Google Tag Manager and GA4 tracking installed.

Google Search Console

✅ Verified

Site verification meta tag present.

2. The Robots.txt Story

This is the standout finding. Carbon Remote's robots.txt is unusually well-configured for the AI era — it blocks known scraping bots aggressively while explicitly welcoming every major AI crawler. The configuration shows real awareness of the AI landscape.

✅ Allowed AI Crawlers

GPTBot (OpenAI/ChatGPT) · PerplexityBot · anthropic-ai (Claude) · MetaAI · DeepSeekBot · Google-Extended · Applebot

🚫 Blocked Scrapers

100+ known bad actors, including BLEXBot, dotbot, EmailCollector, HTTrack, and dozens more. Well-maintained blocklist.

💡 The Catch

AI crawlers have permission, but when they arrive there's nothing structured for them to consume. The front door is open — the library is in a language they can't read.

"Carbon's robots.txt is in the top 5% of sites I've audited for AI crawler configuration. Most companies are either blocking everything or haven't thought about it. Carbon got the policy right — now they need to give the crawlers something useful to read."

— Audit observation, May 2026

3. Content & Site Architecture

Platform & Structure

Built on Webflow (last published 24 Feb 2026). Single-page sections cover Solutions, About, AI, The Carbon Way, Success Stories, and Blog. JS-heavy: animations powered by Webflow's native animation engine and Lottie, with Crisp for live chat.

Page Inventory

Key pages from sitemap analysis:

Core Pages

Home · About Us · How It Works · Success Stories · Contact Us · Careers

Services

Team Augmentation · Talent Hubs · Product Studio · Artificial Intelligence

Content

Blog (44+ articles) · Covering offshoring, engineering productivity, BOT models, talent strategy, digital transformation

Legal

Blog Quality Assessment

Carbon's blog is a genuine asset. Articles are long-form (1,000–2,500 words), properly structured with h1/h2 headings, and reference real frameworks like DORA metrics, the SPACE framework, Atlassian research, and GitHub's State of Distributed Development report. These are real thought leadership pieces, not AI-generated filler.

"Engineering Productivity Across Distributed Teams in 2025" — 7 h2 sections, methodology grounded in published research, directly relevant to their ICP (CTOs, VPs of Engineering, technical founders). This is exactly the kind of content AI models should be citing.

Structural Problems

Issue	Severity	Why It Matters
Webflow JS shell	High	Navigation, animations, and dynamic content all depend on JavaScript execution. AI crawlers may only read the raw HTML — missing large portions of the rendered content.
No semantic outlines	Medium	Content is laid out visually via Webflow divs. LLMs parsing raw HTML see a flat structure with no clear document hierarchy beyond the headings that exist.
No blog structured data	Medium	None of the 44+ blog articles have JSON-LD Article markup. No author, datePublished, or about schema — all signals AI models use to assess authority.
Single HTML response type	Medium	Every request returns the same JS-heavy HTML regardless of Accept header. No content negotiation for AI-native formats.

4. Competitive Context

Carbon Remote competes in the remote engineering talent / staff augmentation space. This section maps the competitive landscape for AI visibility.

Signal	Carbon Remote	Industry Benchmark	Leaders
llms.txt	❌	Rare (under 5%)	Pioneers: smaller agencies adopting early
JSON-LD	❌	Mixed (~40%)	Larger platforms with dedicated SEO teams
AI crawler allowlist	✅	~20% have explicit AI policy	Carbon is ahead of most here
Blog authority	✅	Varies widely	Content quality is a differentiator
Agent readiness	40/100	45–55 typical for SMBs	65+ for tech-forward companies

Signal

Carbon Remote

Industry Benchmark

Leaders

llms.txt

❌

Rare (under 5%)

Pioneers: smaller agencies adopting early

JSON-LD

❌

Mixed (~40%)

Larger platforms with dedicated SEO teams

AI crawler allowlist

✅

~20% have explicit AI policy

Carbon is ahead of most here

Blog authority

✅

Varies widely

Content quality is a differentiator

Agent readiness

40/100

45–55 typical for SMBs

65+ for tech-forward companies

Key takeaway: Carbon is in the middle of the pack. The robots.txt configuration puts them ahead of most competitors on policy, but the absence of structured data and agent-native content formats pulls them back to average. The blog quality is a real asset most competitors don't have — it's just not discoverable by AI yet.

5. Recommended Actions (Priority Order)

These are ordered by impact-to-effort ratio. Items 1-3 are high-impact wins that can be implemented in days, not months.

PRIORITY 1

Create llms.txt with blog content indexing Low Effort

One text file at the site root. List your key pages with descriptions, then link to your full-content markdown endpoint and blog index. This is the single highest-impact change — it's how ChatGPT, Claude, Perplexity, and Google AI discover what matters on your site. Carbon's 44+ blog articles are the perfect content to expose here.

PRIORITY 2

Serve markdown versions of blog content Medium Effort

Add content negotiation: when an AI crawler requests a blog article with Accept: text/markdown, return a clean markdown version. This turns your blog from invisible to indexable overnight. The articles are already well-structured — extraction should be straightforward for the Webflow CMS.

PRIORITY 3

Add JSON-LD structured data Medium Effort

Start with Article schema on blog posts (author, datePublished, headline, about) and Organization schema on the homepage (name, description, sameAs links). This gives AI models machine-readable entity descriptions — not just raw HTML to parse.

PRIORITY 4

Add full-content markdown endpoint Medium Effort

A single /full.txt or /llms-full.txt containing all key content in one markdown file. This is the bulk ingestion endpoint that let AI models consume everything at once. Link it from your llms.txt.

PRIORITY 5

Improve semantic HTML on landing pages Higher Effort

Since the site runs on Webflow, this means working within the Webflow designer to replace div-based layouts with proper section/article/nav elements and expand the heading hierarchy. Lower priority because it affects LLM parsing less than the items above — but it improves everything: accessibility, SEO, and AI readability.

6. What This Means for Carbon Remote

Carbon's ICP — CTOs, VPs of Engineering, technical founders — are exactly the kind of buyer who uses AI search to research vendors. Queries like "best nearshore engineering teams," "build-operate-transfer staffing model," or "Eastern European tech talent" are high-intent and increasingly answered by LLMs rather than traditional search.

🟢 Strengths

Excellent robots.txt policy · Strong blog with genuine thought leadership · Clean OpenGraph/social metadata · Well-structured sitemap · Analytics infrastructure in place

🟡 Opportunities

Be first-mover on llms.txt in the remote staffing space · Blog content is AI-ready in quality, just not format · Webflow CMS can support structured data additions

🔴 Risks

Competitors adopting AI visibility faster · Blog content invisible when prospects use AI search · Webflow JS dependency limits what AI crawlers can extract today

The gap is not content quality — it's content format. Carbon has the hardest part (genuine expert content) already done. The remaining work is technical plumbing: making that content available in the formats AI systems expect. This is a two-week fix, not a six-month content program.

— Summary assessment, May 2026

Agent Readiness Score: 40/100

1. Signal-by-Signal Scorecard

2. The Robots.txt Story

✅ Allowed AI Crawlers

🚫 Blocked Scrapers

💡 The Catch

3. Content & Site Architecture

Platform & Structure

Page Inventory

Core Pages

Services

Content

Legal

Blog Quality Assessment

Structural Problems

4. Competitive Context

5. Recommended Actions (Priority Order)

Create llms.txt with blog content indexing Low Effort

Serve markdown versions of blog content Medium Effort

Add JSON-LD structured data Medium Effort

Add full-content markdown endpoint Medium Effort

Improve semantic HTML on landing pages Higher Effort

6. What This Means for Carbon Remote

🟢 Strengths

🟡 Opportunities

🔴 Risks