Easy Retrieval with Codex CLI and Bash | by George Campbell

Constructing retrieval brokers and tuning their instruments is usually a ache. OpenAI is basically good at it. So, why reinvent the wheel?

Codex is OpenAI’s flagship agentic coding product, much like Anthropic’s Claude Code. Marketed to be used in a CLI, customers can spin up Codex in any listing on their pc (or within the cloud) and have it autonomously make edits to and execute code. OpenAI geared up Codex with a variety of instruments it might wield to realize its objectives, in addition to the flexibility to connect with MCP servers for additional extension.

Generally, devs will clone a GitHub repo, spawn Codex within the ensuing listing, and have the agent clarify the code in language accessible to the consumer. These talents, nonetheless, prolong past code information. Codex can navigate via any listing, learn any textual content file, and use those self same shell instructions to investigate non-technical content material.

My use case right here is the ever present: “we now have a bunch of workplace docs that we want particular data out of and we don’t wish to spend man-hours on it anymore.”

The usual method, which you’re undoubtedly conversant in is RAG: chunk the paperwork, generate embeddings, throw them right into a vector database, bounce incoming queries towards it, and use outcomes to generate a solution. This fails when your top-k chunks miss essential info, or when the database returns a number of seemingly-relevant hits with out the context to find out which is the winner (e.g. surfacing a deadline from a stale court docket doc whereas the proper date sits outdoors your top-k outcomes). Conventional knowledge tells you subsequent to crank up the complexity, however that doesn’t at all times work. Framework-based multi-agent techniques take some time to arrange and may be brittle. GraphRAG usually doesn’t present efficiency positive factors with small to medium doc collections.

Right here’s a less complicated method I’ve been utilizing with small however complicated doc units: Microsoft’s MarkItDown library converts almost any generally discovered workplace doc to markdown. Codex’s shell instruments fortunately parse that markdown, identical to they do your code repos. If we preprogram a headless Codex instantiation with what to search for, there will not be a necessity for any embedding go in any respect.

Setup is easy. The under would work on a single contract doc. Stipulations are an current (authenticated) Codex CLI set up, and the markitdown[pdf] Python bundle put in in your venv:

Convert doc to markdown:

 markitdown contract.pdf > contract.md

2. Create immediate:

cat > contact_prompt.md <<'EOF'
You're a contract analyzer.
Look via the information on this listing utilizing shell instructions.
Print an inventory of the events concerned and their contact info.
EOF

3. Run agent:

echo "Analyze contract.md" | 
codex -c "experimental_instructions_file="contact_prompt.md"" 
exec --skip-git-repo-check - -o end result.md

That is near the only doable implementation of the approach. When you run these in a listing containing contract.pdf, you’re going to get a report of the events’ contact info from Codex.

Whereas the instance job is elementary (though it doesn’t have to be; I exploit this at work to parse complicated authorized docs), the actual magic comes from constructing out these scripts and incorporating them right into a user-facing software. Since they’re simply Bash scripts, you’ll be able to execute as many in parallel (or in sequence) as you need. You’ll be able to programmatically alter the directions being handed to the brokers relying on what sorts of paperwork are current within the listing. You’ll be able to grant or limit the brokers’ entry to MCP servers relying on what their job is. The record goes on.

An extra profit: whereas historically LLMs are greatest stored to at least one ask per API name, CLI brokers do a greater job dealing with a number of associated duties in succession with out getting confused. This enables the developer to invoke fewer API calls, making setup/iteration even simpler.

This isn’t to say that retrieval has been solved; however quite, this method is demonstrative of the uncooked energy and helpful software CLI brokers have outdoors of producing code. Points stay with this implementation together with elevated latency in comparison with conventional RAG, and unpredictable prices when run at scale (as any given instantiation could end in greater or decrease token consumption relying on how environment friendly its search was).

With that being stated, these strategies have confirmed helpful when constructing out purposes for inside customers. I hope you discover the identical.

Thanks for studying. Please e-mail me at George@ParksMC.com with any questions, feedback, or solutions.