Code Indexing

How Codeteel reads, understands, and indexes your codebase.

How it works

Fetch file list

Codeteel fetches your repository tree from GitHub and filters files (30+ languages supported).

Generate summaries

Each file is sent to your LLM. It generates a concise summary — what the file does, key functions, and notable patterns.

Create embeddings

The summary is converted to a 1536-dimension vector using your embedding provider. This enables semantic search.

Store results

File content, summary, and embedding are saved to the database. Your codebase is now searchable.

Progress & controls

Indexing runs in your browser — you see real-time progress with file count, percentage bar, and the current file being processed. You can:

Pause/Resume — stop indexing and continue later from where you left off
Start Fresh — re-index all files (ignores previous progress)
Cancel — stop indexing entirely

Don't close the browser tab during indexing — progress is driven by the browser.

Large file handling

Files over 2,000 lines or 8,000 characters are automatically chunked:

Chunk size: 500 lines (50-line overlap) or 6,000 chars (500-char overlap)
Each chunk is summarized separately
Chunk summaries are combined with a final LLM call

Supported languages

JavaScript, TypeScript, Python, Go, Rust, Java, Kotlin, C/C++, C#, Ruby, PHP, Swift, Scala, Shell, Vue, Svelte, Astro, SQL, GraphQL, YAML, TOML, Terraform, HCL, Docker, Makefile, Markdown, and JSON config files.

What gets skipped

Lock files (package-lock.json, yarn.lock), build directories (dist/, build/), node_modules, minified files (*.min.js), type definitions (*.d.ts), source maps, binary assets, .env files, and anything over 100KB.

Re-indexing

Codeteel tracks content hashes — unchanged files are skipped on re-index. GitHub webhooks can detect pushes and PR merges to flag files that need re-indexing.

Next: Web Chat