{"id":7953,"date":"2025-10-22T21:04:53","date_gmt":"2025-10-22T21:04:53","guid":{"rendered":"https:\/\/techtrendfeed.com\/?p=7953"},"modified":"2025-10-22T21:04:53","modified_gmt":"2025-10-22T21:04:53","slug":"easy-retrieval-with-codex-cli-and-bash-by-george-campbell-oct-2025","status":"publish","type":"post","link":"https:\/\/techtrendfeed.com\/?p=7953","title":{"rendered":"Easy Retrieval with Codex CLI and Bash | by George Campbell | Oct, 2025"},"content":{"rendered":"<p> <br \/>\n<\/p>\n<div>\n<div>\n<div>\n<div class=\"speechify-ignore ac cw\">\n<div class=\"speechify-ignore bi m\">\n<div class=\"ac im in io ip iq ir is it iu iv iw\">\n<div class=\"ac r iw\">\n<div class=\"ac ix\">\n<div>\n<div class=\"bn\" role=\"tooltip\">\n<div tabindex=\"-1\" class=\"bf\"><a rel=\"nofollow\" target=\"_blank\" rel=\"noopener follow\" href=\"https:\/\/medium.com\/@parksmc?source=post_page---byline--e09ff03afe3b---------------------------------------\" data-discover=\"true\"><\/p>\n<div class=\"m iy iz by ja jb\">\n<div class=\"m fr\"><img decoding=\"async\" alt=\"George Campbell\" class=\"m fk by bz ca de\" src=\"https:\/\/miro.medium.com\/v2\/resize:fill:64:64\/1*dmbNkD5D-u45r44go_cf0g.png\" width=\"32\" height=\"32\" loading=\"lazy\" data-testid=\"authorPhoto\"\/><\/div>\n<\/div>\n<p><\/a><\/div>\n<\/div>\n<\/div>\n<\/div>\n<p><span class=\"bg b bh ab bl\"\/><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"382d\" class=\"pw-post-body-paragraph mt mu hl mv b mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq he bl\">Constructing retrieval brokers and tuning their instruments is usually a ache. OpenAI is basically good at it. So, why reinvent the wheel?<\/p>\n<p id=\"5306\" class=\"pw-post-body-paragraph mt mu hl mv b mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq he bl\">Codex is OpenAI\u2019s flagship agentic coding product, much like Anthropic\u2019s Claude Code. Marketed to be used in a CLI, customers can spin up Codex in any listing on their pc (or within the cloud) and have it autonomously make edits to and execute code. OpenAI geared up Codex with a variety of instruments it might wield to realize its objectives, in addition to the flexibility to connect with MCP servers for additional extension.<\/p>\n<p id=\"0543\" class=\"pw-post-body-paragraph mt mu hl mv b mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq he bl\">Generally, devs will clone a GitHub repo, spawn Codex within the ensuing listing, and have the agent clarify the code in language accessible to the consumer. These talents, nonetheless, prolong past code information. Codex can navigate via any listing, learn any textual content file, and use those self same shell instructions to investigate non-technical content material.<\/p>\n<p id=\"87f6\" class=\"pw-post-body-paragraph mt mu hl mv b mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq he bl\">My use case right here is the ever present: \u201cwe now have a bunch of workplace docs that we want particular data out of and we don\u2019t wish to spend man-hours on it anymore.\u201d<\/p>\n<p id=\"595e\" class=\"pw-post-body-paragraph mt mu hl mv b mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq he bl\">The usual method, which you&#8217;re undoubtedly conversant in is RAG: chunk the paperwork, generate embeddings, throw them right into a vector database, bounce incoming queries towards it, and use outcomes to generate a solution. This fails when your top-k chunks miss essential info, or when the database returns a number of seemingly-relevant hits with out the context to find out which is the winner (e.g. surfacing a deadline from a stale court docket doc whereas the proper date sits outdoors your top-k outcomes). Conventional knowledge tells you subsequent to crank up the complexity, however that doesn\u2019t at all times work. Framework-based multi-agent techniques take some time to arrange and may be brittle. GraphRAG usually doesn&#8217;t present efficiency positive factors with small to medium doc collections.<\/p>\n<p id=\"1bc3\" class=\"pw-post-body-paragraph mt mu hl mv b mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq he bl\">Right here\u2019s a less complicated method I\u2019ve been utilizing with small however complicated doc units: Microsoft\u2019s <a rel=\"nofollow\" target=\"_blank\" class=\"ah nr\" href=\"https:\/\/github.com\/microsoft\/markitdown\" rel=\"noopener ugc nofollow\" target=\"_blank\">MarkItDown<\/a> library converts almost any generally discovered workplace doc to markdown. Codex\u2019s shell instruments fortunately parse that markdown, identical to they do your code repos. If we preprogram a headless Codex instantiation with what to search for, there will not be a necessity for any embedding go in any respect.<\/p>\n<p id=\"6f02\" class=\"pw-post-body-paragraph mt mu hl mv b mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq he bl\">Setup is easy. The under would work on a single contract doc. Stipulations are an current (authenticated) Codex CLI <a rel=\"nofollow\" target=\"_blank\" class=\"ah nr\" href=\"https:\/\/developers.openai.com\/codex\/cli\" rel=\"noopener ugc nofollow\" target=\"_blank\">set up<\/a>, and the markitdown[pdf] Python bundle put in in your venv:<\/p>\n<ol class=\"\">\n<li id=\"95ea\" class=\"mt mu hl mv b mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq ns nt nu bl\">Convert doc to markdown:<\/li>\n<\/ol>\n<pre class=\"nv nw nx ny nz oa ob oc bq od bc bl\"><span id=\"a184\" class=\"oe of hl ob b bh og oh m oi oj\"> markitdown contract.pdf &gt; contract.md<\/span><\/pre>\n<p id=\"8572\" class=\"pw-post-body-paragraph mt mu hl mv b mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq he bl\">2. Create immediate:<\/p>\n<pre class=\"nv nw nx ny nz oa ob oc bq od bc bl\"><span id=\"c335\" class=\"oe of hl ob b bh og oh m oi oj\">cat &gt; contact_prompt.md &lt;&lt;'EOF'<br\/>You're a contract analyzer.<br\/>Look via the information on this listing utilizing shell instructions.<br\/>Print an inventory of the events concerned and their contact info.<br\/>EOF<br\/><\/span><\/pre>\n<p id=\"2fb3\" class=\"pw-post-body-paragraph mt mu hl mv b mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq he bl\">3. Run agent:<\/p>\n<pre class=\"nv nw nx ny nz oa ob oc bq od bc bl\"><span id=\"0f6f\" class=\"oe of hl ob b bh og oh m oi oj\">echo \"Analyze contract.md\" | <br\/>codex -c \"experimental_instructions_file=\"contact_prompt.md\"\" <br\/>exec --skip-git-repo-check - -o end result.md<\/span><\/pre>\n<p id=\"949d\" class=\"pw-post-body-paragraph mt mu hl mv b mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq he bl\">That is near the only doable implementation of the approach. When you run these in a listing containing contract.pdf, you&#8217;re going to get a report of the events\u2019 contact info from Codex.<\/p>\n<p id=\"fd6f\" class=\"pw-post-body-paragraph mt mu hl mv b mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq he bl\">Whereas the instance job is elementary (though it doesn\u2019t have to be; I exploit this at work to parse complicated authorized docs), the actual magic comes from constructing out these scripts and incorporating them right into a user-facing software. Since they\u2019re simply Bash scripts, you&#8217;ll be able to execute as many in parallel (or in sequence) as you need. You&#8217;ll be able to programmatically alter the directions being handed to the brokers relying on what sorts of paperwork are current within the listing. You&#8217;ll be able to grant or limit the brokers\u2019 entry to MCP servers relying on what their job is. The record goes on.<\/p>\n<p id=\"110a\" class=\"pw-post-body-paragraph mt mu hl mv b mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq he bl\">An extra profit: whereas historically LLMs are greatest stored to at least one ask per API name, CLI brokers do a greater job dealing with a number of associated duties in succession with out getting confused. This enables the developer to invoke fewer API calls, making setup\/iteration even simpler.<\/p>\n<p id=\"faae\" class=\"pw-post-body-paragraph mt mu hl mv b mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq he bl\">This isn&#8217;t to say that retrieval has been solved; however quite, this method is demonstrative of the uncooked energy and helpful software CLI brokers have outdoors of producing code. Points stay with this implementation together with elevated latency in comparison with conventional RAG, and unpredictable prices when run at scale (as any given instantiation could end in greater or decrease token consumption relying on how environment friendly its search was).<\/p>\n<p id=\"026b\" class=\"pw-post-body-paragraph mt mu hl mv b mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq he bl\">With that being stated, these strategies have confirmed helpful when constructing out purposes for inside customers. I hope you discover the identical.<\/p>\n<p id=\"0343\" class=\"pw-post-body-paragraph mt mu hl mv b mw mx my mz na nb nc nd ne nf ng nh ni nj nk nl nm nn no np nq he bl\"><em class=\"ok\">Thanks for studying. Please e-mail me at <\/em><a rel=\"nofollow\" target=\"_blank\" class=\"ah nr\" href=\"https:\/\/medium.com\/@parksmc\/mailto:george@parksmc.com\" rel=\"noopener ugc nofollow\" target=\"_blank\"><em class=\"ok\">George@ParksMC.com<\/em><\/a><em class=\"ok\"> with any questions, feedback, or solutions.<\/em><\/p>\n<\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>Constructing retrieval brokers and tuning their instruments is usually a ache. OpenAI is basically good at it. So, why reinvent the wheel? Codex is OpenAI\u2019s flagship agentic coding product, much like Anthropic\u2019s Claude Code. Marketed to be used in a CLI, customers can spin up Codex in any listing on their pc (or within the [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":7955,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[55],"tags":[6043,6044,1355,2516,2619,5655,6042,4127],"class_list":["post-7953","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-machine-learning","tag-bash","tag-campbell","tag-cli","tag-codex","tag-george","tag-oct","tag-retrieval","tag-simple"],"_links":{"self":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/7953","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=7953"}],"version-history":[{"count":1,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/7953\/revisions"}],"predecessor-version":[{"id":7954,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/posts\/7953\/revisions\/7954"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=\/wp\/v2\/media\/7955"}],"wp:attachment":[{"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=7953"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=7953"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techtrendfeed.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=7953"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69d9690a190636c2e0989534. Config Timestamp: 2026-04-10 21:18:02 UTC, Cached Timestamp: 2026-05-04 16:25:18 UTC -->