Code Indexing

How Codeteel reads, understands, and indexes your codebase.

How it works

1
Fetch file list
Codeteel fetches your repository tree from GitHub and filters files (30+ languages supported).
2
Generate summaries
Each file is sent to your LLM. It generates a concise summary — what the file does, key functions, and notable patterns.
3
Create embeddings
The summary is converted to a 1536-dimension vector using your embedding provider. This enables semantic search.
4
Store results
File content, summary, and embedding are saved to the database. Your codebase is now searchable.

Progress & controls

Indexing runs in your browser — you see real-time progress with file count, percentage bar, and the current file being processed. You can:

  • Pause/Resume — stop indexing and continue later from where you left off
  • Start Fresh — re-index all files (ignores previous progress)
  • Cancel — stop indexing entirely
!
Don't close the browser tab during indexing — progress is driven by the browser.

Large file handling

Files over 2,000 lines or 8,000 characters are automatically chunked:

  • Chunk size: 500 lines (50-line overlap) or 6,000 chars (500-char overlap)
  • Each chunk is summarized separately
  • Chunk summaries are combined with a final LLM call

Supported languages

JavaScript, TypeScript, Python, Go, Rust, Java, Kotlin, C/C++, C#, Ruby, PHP, Swift, Scala, Shell, Vue, Svelte, Astro, SQL, GraphQL, YAML, TOML, Terraform, HCL, Docker, Makefile, Markdown, and JSON config files.

What gets skipped

Lock files (package-lock.json, yarn.lock), build directories (dist/, build/), node_modules, minified files (*.min.js), type definitions (*.d.ts), source maps, binary assets, .env files, and anything over 100KB.

Re-indexing

Codeteel tracks content hashes — unchanged files are skipped on re-index. GitHub webhooks can detect pushes and PR merges to flag files that need re-indexing.