SM

Command Palette

Search for a command to run...

Blog

The Scanner That Refuses to Run Your Code: Inside Perplexity's Bumblebee

Syed Moinuddin7 min read
SecurityDev Tools
The Scanner That Refuses to Run Your Code: Inside Perplexity's Bumblebee

Most supply-chain scanners run your package manager to check for poison — which is exactly how poison spreads. Perplexity's open-source Bumblebee takes the opposite bet: it never runs anything.

Most supply-chain scanners have a dirty secret: to check whether you've installed a poisoned package, they run your package manager — which is exactly how the poison spreads. Perplexity's new open-source scanner takes the opposite bet. It never runs anything.

On May 22, 2026, Perplexity open-sourced Bumblebee, a read-only scanner it already uses internally to protect the developer machines behind Perplexity Search, the Comet browser, and its Computer agent. S1S2 It's on GitHub under Apache 2.0, written entirely in Go, with zero non-standard-library dependencies and a single static binary. S1

The pitch is narrow on purpose. Bumblebee answers one question, fast: when an advisory names a compromised package, version, or extension — which of my developer machines have it on disk right now? S1

That sounds simple. In practice, it's the question most security tooling can't answer in the minutes that matter.

The gap nobody else fills

When a supply-chain advisory drops at 2 a.m., your existing tools each tell you something adjacent to what you need:

  • SBOMs tell you what shipped in a build artifact.
  • EDR tells you what ran or touched the network.
  • Endpoint inventory tools track installed applications.

None of them look at the messy local state where the risk actually lives — lockfiles, package-manager install metadata, editor extensions, browser add-ons, and AI agent configs scattered across hundreds of laptops. S1S3 That's the layer attackers increasingly target, because a developer's machine is a soft entry point into everything they can push, deploy, or sign. S3

Bumblebee turns that scattered on-disk state into structured records, then flags exact matches against a catalog of known-bad packages. S1

Bumblebee mapping scattered developer-machine state — lockfiles, extensions, MCP configs — into a single exposure view

Why "read-only" is the whole point

Here's the part every dev should internalize.

npm install doesn't just copy files. Packages can carry postinstall scripts that execute automatically the moment the package manager touches them. S4 That's how most modern supply-chain worms propagate — Shai-Hulud-style self-replication, credential theft on install, the works.

So a scanner that "checks" your exposure by invoking npm ls, pip show, or go list has a fatal flaw: it can trigger the very attack it was sent to find. S4

Bumblebee refuses to play that game. It:

  • Never executes install scripts or lifecycle hooks.
  • Never invokes a package manager (npm, pnpm, bun, pip, …).
  • Never reads your source files.

It only parses metadata that's already sitting on disk — package-lock.json, pnpm-lock.yaml, go.sum, *.dist-info/METADATA, extension manifests, MCP JSON configs, and so on. S1 The scan can't become the breach. There's even a quiet privacy detail: MCP host configs often stash credentials in their env blocks, and Bumblebee parses those configs for the server inventory it needs without emitting the secret values in its records. S1

What it actually covers

One binary, eleven surfaces — the kind of coverage that normally takes three or four separate tools. S3

SurfaceEmitted ecosystemReads (examples)
npm / pnpm / Yarn / Bunnpmpackage-lock.json, pnpm-lock.yaml, yarn.lock, bun.lock
PyPIpypi*.dist-info/METADATA, *.egg-info/PKG-INFO
Go modulesgogo.sum, go.mod
RubyGemsrubygemsGemfile.lock, installed *.gemspec
Composerpackagistcomposer.lock, vendor/composer/installed.json
MCP configsmcpmcp.json, claude_desktop_config.json, Gemini CLI settings, …
Editor extensionseditor-extensionVS Code, Cursor, Windsurf, VSCodium manifests
Browser extensionsbrowser-extensionChromium manifest.json, Firefox extensions.json

(All four JS package managers normalize to the npm ecosystem, since that's where the registry risk lives.) S1

Three profiles, one job

Bumblebee is a one-shot scanner — each run scans once and exits. Cadence is your runner's job (cron, launchd, systemd, MDM). S1 You pick a profile based on how wide you need to look:

  • baseline — standard global/user package roots, toolchains, editor + browser extensions, MCP configs. For routine, lightweight fleet inventory. S1S5
  • project — targeted dev directories like ~/code or ~/work. For known workspaces. S1S5
  • deep — explicit --root paths, up to and including $HOME. For active incident sweeps. S1S5

A safety rail worth noting: baseline and project refuse bare-home roots — only deep is allowed to walk your entire home directory. S1

# Install (Go 1.25+, into $GOBIN)
go install github.com/perplexityai/bumblebee/cmd/bumblebee@v0.1.1
 
# Routine global inventory
bumblebee scan --profile baseline > inventory.ndjson
 
# On-demand exposure scan against a published advisory
bumblebee scan --profile deep \
  --root "$HOME" \
  --exposure-catalog ./catalog.json \
  --findings-only \
  --max-duration 10m

Output is NDJSON — one component record per line, with a scan_summary record at the end so receivers know whether to trust the run. S1 Each match carries a confidence of high, medium, or low, so you can tell "exact name+version from canonical metadata" apart from "a path reference that might mean it's installed." S1

The AI-assisted feedback loop

The detection logic isn't magic — it's an exposure catalog: minimal JSON doing exact (ecosystem, name, version) matching. S1 Where it gets interesting is how Perplexity keeps that catalog fresh.

Their internal loop runs five steps: a threat signal comes in → Perplexity Computer drafts a structured catalog entry (ecosystem, name, affected versions, source links) → it opens a GitHub PR → a human reviews and merges → Bumblebee re-scans every endpoint against the updated catalog and routes findings to the security team. S5S3 The repo ships maintained catalogs in threat_intel/, assembled with Computer and updated via PRs as new campaigns surface. S1

It's a clean example of the pattern your audience keeps running into: let the agent draft, keep the human on the merge button. The AI never ships a detection rule on its own.

Should you run it? A quick checklist

Bumblebee is sharp, not universal. It's a fit if:

  • ✅ You run a fleet of developer machines and need fast "who's exposed?" answers during incidents.
  • ✅ You're on macOS or Linux (no Windows support today). S6
  • ✅ You already have MDM/fleet tooling to schedule and collect runs.
  • ✅ You want detection logic you can read, fork, and tune to your own stack.

It's not the right tool if:

  • ❌ You want remediation — Bumblebee finds, it doesn't fix or quarantine. S6
  • ❌ You need deep SCA / transitive vuln analysis — that's still Snyk/Socket/Phylum territory. S6
  • ❌ You expect fuzzy matching — it's exact-match only by design, which means a catalog gap is a blind spot.

One honest caveat the analysts flagged: because the detection logic is public, a motivated attacker can study it to engineer evasion. S6 Open detection is a tradeoff, not a free lunch.

The takeaway

Bumblebee won't replace your SBOM, your EDR, or your SCA scanner — and it isn't trying to. It does one unglamorous thing exceptionally well: it answers "which machines have package X@Y right now?" in seconds, without ever becoming the attack it's hunting. S1S3

For a v0.1.1 release that's a single dependency-free binary, that's a remarkably clean idea — and one worth keeping in your incident-response toolbox.

FAQ

  1. What is Bumblebee, in one line?

    A read-only, open-source scanner that inventories on-disk package, extension, and dev-tool metadata on macOS/Linux developer machines and flags matches against a catalog of known-compromised software.

  2. Why is read-only such a big deal?

    Because running a package manager to check exposure can auto-execute malicious postinstall scripts — triggering the breach you were trying to detect. Bumblebee only parses metadata, never executes anything.

  3. What ecosystems does it cover?

    npm, pnpm, Yarn, Bun, PyPI, Go modules, RubyGems, and Composer, plus MCP agent configs, editor extensions (VS Code, Cursor, Windsurf, VSCodium), and browser extensions (Chromium and Firefox).

  4. How do I install and run it?

    With Go 1.25+: go install github.com/perplexityai/bumblebee/cmd/bumblebee@latest, then bumblebee scan --profile baseline. There's also a bumblebee selftest for a fast post-install smoke check.

  5. What's the difference between the baseline, project, and deep profiles?

    baseline scans standard global roots for routine inventory, project targets specific workspace directories, and deep walks explicit roots (including $HOME) for incident response. Only deep is allowed to scan a bare home directory.

  6. Does it detect vulnerabilities automatically?

    No. It matches against an exposure catalog you supply — exact (ecosystem, name, version) matches only. Perplexity ships maintained catalogs in the repo, but there's no built-in CVE database or fuzzy matching.

  7. Is it a replacement for Snyk, Socket, or an SBOM?

    No. SBOMs describe build artifacts and SCA tools do deep dependency analysis; Bumblebee fills the gap they leave — fast, fleet-wide visibility into local developer state. Treat it as a complement.

  8. Does it run on Windows or fix the problems it finds?

    Neither today. It's macOS/Linux only and is detection-only — no remediation, quarantine, or auto-removal. Pair it with your existing response workflow.

  9. Is it actually free?

    Yes — Apache 2.0 on GitHub. You can read, fork, self-host, and build private catalogs tuned to your stack at no cost.

Sources

  1. S1Perplexity — Bumblebee GitHub repository (README, coverage, profiles, usage), github.com/perplexityai/bumblebee (v0.1.1, May 22, 2026)
  2. S2TestingCatalog — "Perplexity open-sources Bumblebee security scanner"
  3. S3MarkTechPost — "Perplexity Open-Sources Bumblebee: A Read-Only Supply-Chain Scanner for Developer Endpoints"
  4. S4Perplexity Hub — "Perplexity Is Open-Sourcing Bumblebee" (read-only rationale, postinstall scripts)
  5. S5OpenTools — "Perplexity Open-Sources Bumblebee to Scan Developer Machines for Supply-Chain Threats" (scan profiles, workflow)
  6. S6AI Weekly — "Perplexity open-sources Bumblebee supply-chain scanner" (limitations, evasion tradeoff, vendor landscape)
  7. S7explainX — "Bumblebee: Perplexity's Open-Source Supply Chain Security Scanner for Developer Endpoints (2026)"

Written by

Syed Moinuddin

Full Stack Engineer writing about AI tooling, agentic systems, and the frontend/backend craft. Follow along for more deep dives on the tools changing how we ship software.

Command Palette

Search for a command to run...