<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Niket's Dev Diary]]></title><description><![CDATA[I explore system architecture and databases through reading, experimenting, and writing. This blog documents my learning journey and technical insights on moder]]></description><link>https://blogs.niket.pro</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1746370306005/c6cfd665-9c3b-4324-95c8-0d05925fe45c.png</url><title>Niket&apos;s Dev Diary</title><link>https://blogs.niket.pro</link></image><generator>RSS for Node</generator><lastBuildDate>Wed, 15 Apr 2026 17:45:43 GMT</lastBuildDate><atom:link href="https://blogs.niket.pro/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[3. Parallel Query Retrieval (Fan Out)]]></title><description><![CDATA[You might have come across this popular reel where Virat Kohli talks about Rohit Sharma’s lazy communication style.

  
  


I will describe this in English so that non Hindi speaking audience can understand. Fair warning, my mediocre English can’t j...]]></description><link>https://blogs.niket.pro/rag-parallel-query-retrieval</link><guid isPermaLink="true">https://blogs.niket.pro/rag-parallel-query-retrieval</guid><category><![CDATA[RAG ]]></category><category><![CDATA[ParallelQueryRetrieval]]></category><category><![CDATA[langchain]]></category><category><![CDATA[qdrant]]></category><category><![CDATA[Open Source]]></category><category><![CDATA[advanced rag]]></category><dc:creator><![CDATA[Aniket Mahangare]]></dc:creator><pubDate>Tue, 20 May 2025 18:30:18 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1747765677203/c4acdff6-677c-4fed-a146-0f09f8656d1e.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You might have come across this popular reel where <strong>Virat Kohli</strong> talks about <strong>Rohit Sharma’s</strong> lazy communication style.</p>
<center>
  <iframe width="315" height="560" src="https://www.youtube.com/embed/kGMMqOQSXIs">
  </iframe>
</center>

<p>I will describe this in English so that non Hindi speaking audience can understand. Fair warning, my mediocre English can’t just justify the humor here. When you have to say, “there is a lot of traffic in Lokhandwala (a place in Mumbai)“, Rohit Sharma will say the same thing like “that place has a lot this“. Now it’s your responsibility to know “what place“ and “has what“.</p>
<p>My point here being we humans are lazy. Google has exposed us to so much of convenience for so long that we generally don’t care about what we are typing in the search bar. We just expect that Google will bring us the right results. And if you want your RAG application to get popular, then you have to make it very good in understanding what the user wants to ask.</p>
<p>In this &amp; next couple of articles we will try solve this exact same problem of making your RAG application understand user’s queries better, so that it can generate better results.</p>
<p>Before we dive deep into our topic for this article, I will highly recommend you to read my previous articles on the RAG series. We are diving into the advanced RAG topics now, hence you must clear your basics first.</p>
<ol>
<li><p><a target="_blank" href="https://blogs.niket.pro/rag-intro">Introduction to RAG</a></p>
</li>
<li><p><a target="_blank" href="https://blogs.niket.pro/implementing-rag">Implementing RAG</a></p>
</li>
</ol>
<h1 id="heading-parallel-query-retrieval">Parallel Query Retrieval</h1>
<p>So the problem at hand is, we want our RAG application to understand what the user wants to ask, given most of the times humans are going to give bad input. You may have heard this phrase, “Garbage In, Garbage Out“. It applies perfectly to LLMs. If you give bad input to LLMs, then you will most likely get bad output from them. That means you want to improve the input you are giving to the LLMs to make your RAG application “usable“ to <code>normal</code> users.</p>
<p>Parallel Query Retrieval technique tries to generate better LLM input for user’s queries. It does so by asking the LLM to generate multiple refined queries for any given user query. It then processes all the LLM generated queries along with the user’s query to generate a comprehensive output for the user’s query. Following diagram will help you understand this better.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747578509513/6274e253-a09e-40f2-9bbf-14c170ab6123.png" alt class="image--center mx-auto" /></p>
<p>For example, Let’s say you create a RAG application capable of answering programming related questions &amp; you have ingested relevant data into your vector database (using the ingestion phase defined in previous article). If the user asks query “implemend goroutines golang“ (notice the spelling mistake in “implement“), then your RAG application will ask LLM model to generate queries similar to user’s query. Let’s say the LLM returns following queries:</p>
<ol>
<li><p>How to implement Goroutines in GoLang?</p>
</li>
<li><p>What are the various concurrency patterns in GoLang?</p>
</li>
<li><p>How to take care of thread-safety while using Goroutines in GoLang?</p>
</li>
</ol>
<p>As described in the above digram, you:</p>
<ol>
<li><p>Generate Vector Embeddings for all the LLM generated queries &amp; the user’s query</p>
</li>
<li><p>Fetch relevant documents from your vector database using similarity search</p>
</li>
<li><p>Aggregate unique data points from similarity search results across multiple queries</p>
</li>
<li><p>Pass the user’s query along with the aggregated data points to LLM</p>
</li>
</ol>
<p>After following these steps, the response from LLM will be most likely better than the response from basic RAG that we coded in previous article.</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">In classic system design, the Fan-Out Pattern refers to <strong>sending a single message or event to multiple services or consumers at once</strong>. I hope you understand now why the technique we are discussing in this article comes under the Fan-Out pattern.</div>
</div>

<h1 id="heading-implementation-in-python">Implementation in Python</h1>
<p>Enough with theory, let’s code this thing. As discussed before, this RAG differs from the basic RAG we built in previous article in the <code>QUERY</code> phase. Hence, I will be reusing some components from my basic RAG implementation article. If you haven’t read it already, I will highly recommend reading it first.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://blogs.niket.pro/implementing-rag">https://blogs.niket.pro/implementing-rag</a></div>
<p> </p>
<p>Let’s assume that you have ingested a PDF document about GoLang into your RAG application. Now we will discuss the changes in the query flow.</p>
<h3 id="heading-step-1-generate-multiple-queries-given-users-query">Step 1: Generate Multiple Queries Given User’s Query</h3>
<p>Our goal from this step is to generate multiple queries which are similar to user’s queries. We will use LLM to generate queries that are similar to user’s query. On a high level there are two ways to achieve this.</p>
<ol>
<li><p>You make multiples requests to your LLM, each one asking to generate a query similar to user’s query. But this is more time consuming &amp; most importantly it will cost you more.</p>
</li>
<li><p>Second way is you ask the LLM to generate multiple queries within the same response. But there is a problem here. When you ask LLM any question, it gives you response in text. How do you extract queries from plain text response? This is where a concept called “Structured Output“ helps you. Basically, modern LLMs can respond in a specific format that you define before making requests.</p>
</li>
</ol>
<p>Let’s see structured output in action using LangChain.</p>
<p><strong>Define Output Format</strong></p>
<p>We will use <code>BaseModel</code> from <code>pydantic</code> library to create a class <code>MultipleQueries</code> that defines our output structure that we are expecting from LLM.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> pydantic <span class="hljs-keyword">import</span> BaseModel

<span class="hljs-comment"># model for multiple queries</span>
<span class="hljs-class"><span class="hljs-keyword">class</span> <span class="hljs-title">MultipleQueries</span>(<span class="hljs-params">BaseModel</span>):</span>
    queries: list[str]
</code></pre>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">You can watch <a target="_self" href="https://www.youtube.com/watch?v=XIdQ6gO3Anc">this YouTube video</a> to learn more about Pydantic.</div>
</div>

<p><strong>Instruct LLM Model to Respond in Output Format</strong></p>
<p>LangChain makes it very easy to instruct the LLM models to respond in specific format.</p>
<pre><code class="lang-python"><span class="hljs-comment"># create LLM</span>
llm = ChatOpenAI(
    model=<span class="hljs-string">"gpt-4.1"</span>,
)

<span class="hljs-comment"># llm for query generation</span>
llm_for_query_gen = llm.with_structured_output(MultipleQueries)
</code></pre>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">You can read more about Structured Output from LangChain in <a target="_self" href="https://python.langchain.com/docs/concepts/structured_outputs/">this</a> tutorial. OpenAI SDK also offers a similar functionality to specify the output format directly. You can read more about OpenAI structured outputs <a target="_self" href="https://platform.openai.com/docs/guides/structured-outputs?api-mode=responses">here</a>.</div>
</div>

<p><strong>Generate Multiple Queries for a Given User Query</strong></p>
<pre><code class="lang-python">SYSTEM_PROMPT_QUERY_GEN = <span class="hljs-string">"""
You are a helpul assistant. Your job is to generate 3 queries that are similar to user's queries. 
You need to give the response in the required format. 

Example:
user_query: implement goroutines in golang

response:
[
    "how to implement goroutines in golang",
    "what is goroutine in golang",
    "how to use goroutines in golang"
]
"""</span>

<span class="hljs-comment"># generate 3 queries similar to the user's query</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">generate_queries</span>(<span class="hljs-params">query: str</span>) -&gt; list[str]:</span>
    <span class="hljs-comment"># 1. use LLM to generate 3 queries similar to the user's query</span>
    messages = [
        {<span class="hljs-string">"role"</span>: <span class="hljs-string">"system"</span>, <span class="hljs-string">"content"</span>: SYSTEM_PROMPT_QUERY_GEN},
        {<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: query},
    ]

    response = llm_for_query_gen.invoke(messages)
    <span class="hljs-keyword">if</span> isinstance(response, MultipleQueries):
        result = response.queries
        print(<span class="hljs-string">f"🌀🌀🌀 Generated <span class="hljs-subst">{len(result)}</span> queries"</span>)
        <span class="hljs-keyword">for</span> i, query <span class="hljs-keyword">in</span> enumerate(result):
            print(<span class="hljs-string">f"🌀🌀🌀 <span class="hljs-subst">{i+<span class="hljs-number">1</span>}</span>. <span class="hljs-subst">{query}</span>"</span>)
        <span class="hljs-keyword">return</span> result
    <span class="hljs-keyword">else</span>:
        <span class="hljs-keyword">raise</span> ValueError(<span class="hljs-string">"Invalid response from LLM"</span>)
</code></pre>
<h3 id="heading-step-2-fetch-relevant-documents-from-vector-db-for-each-query">Step 2: Fetch Relevant Documents from Vector DB for Each Query</h3>
<p>Here, we will use the method <code>get_vector_store()</code> which we have defined in the previous article.</p>
<pre><code class="lang-python">COLLECTION_NAME = <span class="hljs-string">"golang-docs"</span>

<span class="hljs-comment"># fetch the relevant documents for the query</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">fetch_relevant_documents_for_query</span>(<span class="hljs-params">query: str</span>) -&gt; list[Document]:</span>
    <span class="hljs-comment"># 1. check if collection exists</span>
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> collection_exists(COLLECTION_NAME):
        <span class="hljs-keyword">raise</span> ValueError(<span class="hljs-string">"Collection does not exist"</span>)

    <span class="hljs-comment"># 2. fetch the relevant documents</span>
    vector_store = get_vector_store(COLLECTION_NAME)

    <span class="hljs-comment"># 3. fetch the relevant documents</span>
    docs = vector_store.similarity_search_with_score(query, k=<span class="hljs-number">5</span>)

    <span class="hljs-comment"># 4. filter the documents based on the similarity threshold</span>
    filtered_docs = [doc <span class="hljs-keyword">for</span> doc, score <span class="hljs-keyword">in</span> docs <span class="hljs-keyword">if</span> score &gt;= SIMILARITY_THRESHOLD]

    print(<span class="hljs-string">f"🌀🌀🌀 QUERY: <span class="hljs-subst">{query}</span>. FOUND: <span class="hljs-subst">{len(filtered_docs)}</span> documents"</span>)

    <span class="hljs-keyword">return</span> filtered_docs
</code></pre>
<h3 id="heading-step-3-aggregate-unique-documents-across-queries">Step 3: Aggregate Unique Documents Across Queries</h3>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain_core.documents <span class="hljs-keyword">import</span> Document
<span class="hljs-comment"># aggregate the relevant documents</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">aggregate_relevant_documents</span>(<span class="hljs-params">queries: list[str]</span>) -&gt; list[Document]:</span>
    <span class="hljs-comment"># 1. fetch the relevant documents for each query</span>
    docs = [fetch_relevant_documents_for_query(query) <span class="hljs-keyword">for</span> query <span class="hljs-keyword">in</span> queries]

    <span class="hljs-comment"># 2. flatten the list of lists and get unique documents</span>
    flattened_docs = [doc <span class="hljs-keyword">for</span> sublist <span class="hljs-keyword">in</span> docs <span class="hljs-keyword">for</span> doc <span class="hljs-keyword">in</span> sublist]
    unique_docs = list({doc.page_content: doc <span class="hljs-keyword">for</span> doc <span class="hljs-keyword">in</span> flattened_docs}.values())

    print(<span class="hljs-string">f"🌀🌀🌀 Found <span class="hljs-subst">{len(unique_docs)}</span> unique documents across all the queries"</span>)

    <span class="hljs-keyword">return</span> unique_docs
</code></pre>
<h3 id="heading-step-4-query-llm-using-aggregated-documents">Step 4: Query LLM using Aggregated Documents</h3>
<pre><code class="lang-python">SYSTEM_PROMPT_ANSWER_GEN = <span class="hljs-string">"""
You are a helpful assistant. Your job is to generate an answer for the user's query based on the relevant documents provided.
"""</span>

<span class="hljs-comment"># generate the answer for the user's query</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">generate_answer</span>(<span class="hljs-params">query: str, docs: list[Document]</span>) -&gt; str:</span>
    <span class="hljs-comment"># 1. use LLM to generate the answer for the user's query based on the relevant documents</span>
    system_prompt = SYSTEM_PROMPT_ANSWER_GEN
    <span class="hljs-keyword">for</span> doc <span class="hljs-keyword">in</span> docs:
        system_prompt += <span class="hljs-string">f"""
        Document: <span class="hljs-subst">{doc.page_content}</span>
        """</span>
    messages = [
        {<span class="hljs-string">"role"</span>: <span class="hljs-string">"system"</span>, <span class="hljs-string">"content"</span>: system_prompt},
        {<span class="hljs-string">"role"</span>: <span class="hljs-string">"user"</span>, <span class="hljs-string">"content"</span>: query},
    ]
    response = llm.invoke(messages)
    <span class="hljs-keyword">return</span> response.content
</code></pre>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">FYI, LangChain provides <a target="_self" href="https://python.langchain.com/docs/how_to/MultiQueryRetriever/"><strong>MultiQueryRetriever</strong></a><strong> </strong>which combines step 1 to step 3 we did above in a single line of code 🤖. However in my opinion, LangChain does too much abstraction, which kind of takes away the fun in building stuff.</div>
</div>

<p>As you can see below, even though I asked a question with spelling mistake (possible to make input more stupid), my RAG application was able to answer well.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747764193264/479593e7-2b1f-401f-92bd-e0b33add4b81.png" alt class="image--center mx-auto" /></p>
<hr />
<p>And that’s it, that’s how easy it is to implement <strong>Parallel Query Retrieval</strong>. In my future articles in this series, I will discuss more about techniques used in advanced RAG applications. Stay tuned.</p>
<p>Hope you liked this article. If you have questions/comments, then please feel free to comment on this article.</p>
<p>Source Code: <a target="_blank" href="https://github.com/Niket1997/rag-tutorial/tree/main/2_parallel_query_retrieval">GitHub</a></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746365342314/04ddb40d-f9f2-4471-83b2-14b4d097075d.jpeg?auto=compress,format&amp;format=webp" alt /></p>
]]></content:encoded></item><item><title><![CDATA[2. Implementing RAG]]></title><description><![CDATA[This is a second article in my series, RAG Deep Dive. The goal of this series is to dive deep into the world of RAG & understand it from the first principles by actually implementing a scalable, production ready RAG system.
In the previous article, I...]]></description><link>https://blogs.niket.pro/implementing-rag</link><guid isPermaLink="true">https://blogs.niket.pro/implementing-rag</guid><category><![CDATA[RAG ]]></category><category><![CDATA[langchain]]></category><category><![CDATA[openai]]></category><category><![CDATA[pypdf]]></category><dc:creator><![CDATA[Aniket Mahangare]]></dc:creator><pubDate>Sun, 11 May 2025 17:42:59 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1746985239520/46dc0d1f-e435-4dfd-9f11-28b4ee7f6c22.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is a second article in my series, <a target="_blank" href="https://blogs.niket.pro/series/rag-deep-dive">RAG Deep Dive</a>. The goal of this series is to dive deep into the world of RAG &amp; understand it from the first principles by actually implementing a scalable, production ready RAG system.</p>
<p>In the previous article, <a target="_blank" href="https://blogs.niket.pro/rag-intro">Introduction to RAG</a> we discussed what a RAG is &amp; how it works. In this article we will implement the most basic &amp; simplest RAG. The goal of this article is let you know how easy it is to build a basic RAG.</p>
<h2 id="heading-set-up">Set Up</h2>
<p><strong>Python</strong></p>
<p>Make sure you have Python installed locally, preferably the latest version.</p>
<p><strong>OpenAI</strong></p>
<p>You need to create an account in OpenAI &amp; generate an API key for testing. We will be storing this API key in <code>.env</code> file to be used in the code. You can refer to <a target="_blank" href="https://www.youtube.com/watch?v=gBSh9JI28UQ">this short YouTube video</a> to know how to generate OpenAI API key.</p>
<p><strong>Clone GitHub Repository</strong></p>
<p>GitHub Repository: <a target="_blank" href="https://github.com/Niket1997/rag-tutorial">https://github.com/Niket1997/rag-tutorial</a></p>
<p><strong>Install Dependencies</strong></p>
<p>You also need to install the required dependencies. Open the cloned repository in the IDE of your choice &amp; run the following command to install dependencies.</p>
<pre><code class="lang-bash"><span class="hljs-comment"># installing uv on mac</span>
brew install uv 

<span class="hljs-comment"># install dependencies</span>
uv pip install .
<span class="hljs-comment">## or alternatively, uv pip install -r pyproject.toml</span>
</code></pre>
<p><strong>Install Docker</strong></p>
<p>We will be using Docker to set up the vector database <code>qdrant</code> locally, hence you need to install Docker in your machine. Just Google it.</p>
<p><strong>Run</strong> <code>qdrant</code> <strong>locally using Docker</strong></p>
<p>To set up <code>qdrant</code> using Docker, we will use following <code>docker-compose.yml</code> file for the set up.</p>
<pre><code class="lang-yaml"><span class="hljs-attr">services:</span>
  <span class="hljs-attr">qdrant:</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">qdrant/qdrant</span>
    <span class="hljs-attr">ports:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">"6333:6333"</span>
    <span class="hljs-attr">volumes:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">./qdrant_data:/qdrant/storage</span>

<span class="hljs-attr">volumes:</span>
  <span class="hljs-attr">qdrant_data:</span>
</code></pre>
<p>You can start the <code>qdrant</code> docker container using following command.</p>
<pre><code class="lang-bash">docker compose up -d -f docker-compose.yml
</code></pre>
<p><strong>Create</strong> <code>.env</code> <strong>file</strong></p>
<p>Create a new file in the cloned repository with the name <code>.env</code> &amp; and following contents to it.</p>
<pre><code class="lang-bash">OPENAI_API_KEY=<span class="hljs-string">"&lt;your-openai-api-key&gt;"</span>
QDRANT_URL=<span class="hljs-string">"http://localhost:6333"</span>
</code></pre>
<p>As mentioned in the previous article, a RAG system has two phases, ingestion phase &amp; query phase. Let’s code them one by one.</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">We will be using LangChain framework in this tutorial to build our basic RAG. LangChain is widely used open source framework for building applications on top of Large Language Models (LLMs). You can read more about LangChain <a target="_self" href="https://python.langchain.com/docs/introduction/">here</a>.</div>
</div>

<h2 id="heading-ingestion-phase">Ingestion Phase</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746972564577/8ca73ec5-b339-4b0b-836d-8d9bcdbdd9e3.png" alt class="image--center mx-auto" /></p>
<p>As mentioned in the Introduction to RAG article, the ingestion phase has following steps. We will implement these steps one-by-one.</p>
<ol>
<li><p>Load Data</p>
</li>
<li><p>Chunk Data</p>
</li>
<li><p>Generate Vector Embeddings for Individual Chunks</p>
</li>
<li><p>Store Vector Embeddings for Chunks in Vector Database</p>
</li>
</ol>
<h3 id="heading-load-data">Load Data</h3>
<p>LangChain provides loaders for different types of data as mentioned in the documentation <a target="_blank" href="https://python.langchain.com/docs/integrations/document_loaders/">here</a>. In our example, we want to load the PDF data into our RAG system hence we will be using <code>PyPDFLoader</code>. You can find the documentation for it <a target="_blank" href="https://python.langchain.com/docs/integrations/document_loaders/pypdfloader/">here</a>. You need the package <code>langchain_community</code> &amp; <code>pypdf</code> for this.</p>
<p>The <code>docs</code> variable here will hold the array of pages. Every element in this array will contain contents from a particular page (ordered).</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain_community.document_loaders <span class="hljs-keyword">import</span> PyPDFLoader

file_path = <span class="hljs-string">"./demo.pdf"</span>
loader = PyPDFLoader(file_path)
docs = loader.load()
</code></pre>
<h3 id="heading-chunk-data">Chunk Data</h3>
<p>A single page can contain a larger amount of data, hence we need to chunk the data in <code>docs</code>. This can be achieved using text splitters. In our case we will be using <code>RecursiveCharacterTextSplitter</code>. You can read more about it <a target="_blank" href="https://python.langchain.com/v0.1/docs/modules/data_connection/document_transformers/recursive_text_splitter/">here</a>.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain_text_splitters <span class="hljs-keyword">import</span> RecursiveCharacterTextSplitter

<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_text_splitter</span>():</span>
    <span class="hljs-keyword">return</span> RecursiveCharacterTextSplitter(
        chunk_size=<span class="hljs-number">1000</span>,
        chunk_overlap=<span class="hljs-number">200</span>,
    )

text_splitter = get_text_splitter()
chunks = text_splitter.split_documents(docs)
</code></pre>
<h3 id="heading-generate-amp-store-vector-embeddings">Generate &amp; Store Vector Embeddings</h3>
<p>We need to generate vector embeddings for each of the chunk. We will use OpenAI’s <code>text-embedding-3-small</code> embedding model. Refer to previous article in this series to know more about vector embeddings. You need the package <code>langchain-openai</code> for this.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain_openai <span class="hljs-keyword">import</span> OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model=<span class="hljs-string">"text-embedding-3-small"</span>,
)
</code></pre>
<p>We need to define certain functions &amp; variables that we will use interact with <code>qdrant</code>. You need the package <code>langchain-qdrant</code> for this.</p>
<pre><code class="lang-python"><span class="hljs-comment"># create qrant client</span>
qdrant_client = QdrantClient(
    url=os.getenv(<span class="hljs-string">"QDRANT_URL"</span>),
)

<span class="hljs-comment"># create a collection if it doesn't exist</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">create_collection_if_not_exists</span>(<span class="hljs-params">collection_name: str</span>):</span>
    <span class="hljs-comment"># check if collection exists</span>
    <span class="hljs-keyword">if</span> <span class="hljs-keyword">not</span> collection_exists(collection_name):
        <span class="hljs-comment"># create the collection if it doesn't exist</span>
        <span class="hljs-comment"># Note, here the dimensions 1536 is corresponding to the embedding model we chose</span>
        <span class="hljs-comment"># which is text-embedding-3-small</span>
        qdrant_client.create_collection(
            collection_name=collection_name,
            vectors_config=VectorParams(size=<span class="hljs-number">1536</span>, distance=Distance.COSINE),
        )
        print(<span class="hljs-string">f"Collection <span class="hljs-subst">{collection_name}</span> created"</span>)
    <span class="hljs-keyword">else</span>:
        print(<span class="hljs-string">f"Collection <span class="hljs-subst">{collection_name}</span> already exists"</span>)

<span class="hljs-comment"># check if collection exists</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">collection_exists</span>(<span class="hljs-params">collection_name: str</span>):</span>
    <span class="hljs-keyword">return</span> qdrant_client.collection_exists(collection_name)

<span class="hljs-comment"># get the qdrant vector store for collection</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_vector_store</span>(<span class="hljs-params">collection_name: str</span>):</span>
    <span class="hljs-keyword">return</span> QdrantVectorStore(
        collection_name=collection_name,
        client=qdrant_client,
        embedding=embeddings,
    )

<span class="hljs-comment"># get the collection name</span>
<span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">get_collection_name</span>(<span class="hljs-params">file_name: str</span>):</span>
    <span class="hljs-keyword">return</span> <span class="hljs-string">f"rag_collection_<span class="hljs-subst">{file_name.split(<span class="hljs-string">'/'</span>)[<span class="hljs-number">-1</span>].split(<span class="hljs-string">'.'</span>)[<span class="hljs-number">0</span>]}</span>"</span>
</code></pre>
<p>We will use these methods &amp; above code to generate &amp; store vector embeddings for the PDF document.</p>
<pre><code class="lang-python"><span class="hljs-comment"># get the name of the collection in qdrant db based on the file</span>
collection_name = get_collection_name(pdf_path)

<span class="hljs-comment"># create the collection in qdrant db if it does not exists</span>
create_collection_if_not_exists(collection_name=collection_name)

<span class="hljs-comment"># this will create a vector store &amp; assign the OpenAI embeddings to it</span>
vector_store = get_vector_store(collection_name=collection_name)

<span class="hljs-comment"># this will generate the embeddings for the chunks &amp; add them to the vector store</span>
vector_store.add_documents(documents=chunks)
</code></pre>
<h2 id="heading-query-phase">Query Phase</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746368612179/b60337dd-d50a-49d5-8ac0-04966244b2fc.png" alt class="image--center mx-auto" /></p>
<p>Now that we have ingested the PDF document into our <code>qdrant</code> vector database, let’s see how we can utilize the vector database for getting the relevant chunks of data from the vector database using <code>SimilaritySearch</code> or as defined in the introduction to RAG article <code>SemanticSearch</code>.</p>
<h3 id="heading-generate-vector-embeddings-for-query">Generate Vector Embeddings for Query</h3>
<p>Let’s begin by writing a system prompt that we will be using to provide instructions to the LLM, in our case OpenAI’s latest model <code>gpt-4.1</code>.</p>
<pre><code class="lang-python">system_prompt = <span class="hljs-string">"""
    You are a helpful AI assistant that can answer user's questions based on the documents provided.
    If there aren't any related documents, or if the user's query is not related to the documents, then you can provide the answer based on your knowledge.        Think carefully before answering the user's question.
    """</span>
</code></pre>
<p>Now, we will generate vector embeddings for the user’s query &amp; try to find the chunks of documents that are relevant to the user’s query from our vector database. Here, we first check if the collection exists in our vector database &amp; if it does then we find the chunks of data from the vector database that have similarity score of more than 70% &amp; add that into our system prompt.</p>
<pre><code class="lang-python"><span class="hljs-comment"># get only the chunks who have at least similary score of 0.5 out of 1</span>
SIMILARITY_THRESHOLD = <span class="hljs-number">0.5</span>

collection_name = get_collection_name(file_name)
<span class="hljs-keyword">if</span> collection_exists(collection_name):
    vector_store = get_vector_store(collection_name)
    <span class="hljs-comment"># Get documents with their similarity scores</span>
    docs = vector_store.similarity_search_with_score(query, k=<span class="hljs-number">5</span>)

    <span class="hljs-keyword">for</span> doc, score <span class="hljs-keyword">in</span> docs:
        <span class="hljs-keyword">if</span> score &gt;= SIMILARITY_THRESHOLD:
            system_prompt += <span class="hljs-string">f"""
             Document: <span class="hljs-subst">{doc.page_content}</span>
             """</span>
</code></pre>
<p>Now we will define a variable that will that will communicate with OpenAI &amp; use the above system prompt that contains the more relevant context as per user’s query along with user’s query to get more refined &amp; more relevant answer.</p>
<pre><code class="lang-python"><span class="hljs-keyword">from</span> langchain_openai <span class="hljs-keyword">import</span> ChatOpenAI

llm = ChatOpenAI(
    model=<span class="hljs-string">"gpt-4.1"</span>,
)

messages = [(<span class="hljs-string">"system"</span>, system_prompt), (<span class="hljs-string">"user"</span>, query)]

response = llm.invoke(messages)

print(<span class="hljs-string">f"response: <span class="hljs-subst">{response.content}</span>"</span>)
</code></pre>
<p>And that’s all, we just build our first RAG from scratch. Just run the <code>main.py</code> file in the <code>1_implementing_basic_rag</code> directory and you can interact with the RAG.</p>
<p>I am attaching a screenshot of one run of our basic RAG application.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746984478665/4572c0b9-4d97-451b-8e49-41ed0ec6d16d.png" alt class="image--center mx-auto" /></p>
<hr />
<p>So that’s it for this one. Hope you liked this article on implementing a basic RAG from scratch! In the next set of articles, we will discuss on how to optimize our RAG application to make production ready. There are various techniques that are used in production-ready RAG applications to make them performant &amp; efficient at scale. Stay tuned to learn more about them.</p>
<p>If you have questions/comments, then please feel free to comment on this article.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746365342314/04ddb40d-f9f2-4471-83b2-14b4d097075d.jpeg" alt class="image--center mx-auto" /></p>
]]></content:encoded></item><item><title><![CDATA[1. Introduction to RAG]]></title><description><![CDATA[You may have observed recently that this new buzz word RAG is sprinkled all over your LinkedIn feed. Frustrated with constant bombarding of this word on my feed, I caved in and decided to understand what this word means. What I found was quite intere...]]></description><link>https://blogs.niket.pro/rag-intro</link><guid isPermaLink="true">https://blogs.niket.pro/rag-intro</guid><category><![CDATA[RAG ]]></category><category><![CDATA[rag chatbot]]></category><dc:creator><![CDATA[Aniket Mahangare]]></dc:creator><pubDate>Sun, 04 May 2025 14:28:34 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1746358584397/2dd46aca-0b39-42ca-a371-4722799fc03d.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>You may have observed recently that this new buzz word RAG is sprinkled all over your LinkedIn feed. Frustrated with constant bombarding of this word on my feed, I caved in and decided to understand what this word means. What I found was quite interesting. Hence I decided to write a series of articles on this topic. This one is first in the series which will introduce you to the world of RAG.</p>
<h2 id="heading-what-is-rag">What is RAG?</h2>
<p>The RAG stands for Retrieval Augmented Generation. Terrifying set of words, isn’t it? Don’t worry, we will break these down in this section. For now, all you need to understand is, it’s a framework built to pass better &amp; more relevant context to large language models to get better responses. If you have used the tools like ChatGPT, Google Gemini then you must know that quality of the answers from these tools improves drastically when you pass more relevant pieces of information.</p>
<p>Now, let’s break down those words.</p>
<ul>
<li><p>Retrieval → It refers to the process of retrieving/fetching the relevant pieces of information. How and from where? We will discuss that later in this article.</p>
</li>
<li><p>Augmented → In this context, Augmented means enhancing the large language models by enriching them with more relevant information for the users’ queries.</p>
</li>
<li><p>Generation → This is the core capability of LLMs. Given an input prompt, generate relevant piece of data such as answers, explanations, summaries, etc.</p>
</li>
</ul>
<h2 id="heading-semantic-search">Semantic Search</h2>
<p>Before we get into the implementation details, we must understand Semantic Search, which is the core principle on which the RAG systems works. <strong>Semantic search</strong> is a way of finding information based on <strong>meaning</strong> rather than just matching exact words. In simple words, semantic search finds what you mean, not just what you type.</p>
<h3 id="heading-heres-how-semantic-search-works">Here’s how semantic search works:</h3>
<ol>
<li><p>Turning text into meaning vectors: A piece of text can be passed to a pre-trained models (like Sentence-BERT or OpenAI’s text embeddings) that can map the text into vectors &amp; can capture meaning from the text. The model converts the text into a fixed-length list of numbers (e.g. a 768-dimensional vector). Those numbers encode the text’s meaning in a high-dimensional “semantic space.”</p>
</li>
<li><p>Indexing for faster lookup: These vector embeddings are stored into the vector databases. The database builds an index so it can quickly find which vectors lie closest to any given point in that space.</p>
</li>
<li><p>Querying with meaning: When you type a search query (“why is life so hard? 😔”), the system also turns it into its own vector. It then asks the vector database, “Which stored vectors are most similar to this query vector?”. If your RAG has previously stored the data that can handle such queries, then your response from LLM will be much better.</p>
</li>
</ol>
<p>The key benefit of using semantic search is, even if a document doesn’t literally say “why is life so hard? 😔” it might use synonyms (“What makes life so challenging?”, “Why do I face so many obstacles in life?”) and still be retrieved, because its vector sits near your query’s vector in the space.</p>
<p>Semantic search works on different types of data such as text, video, audio, images, etc. As long as you have a model that maps your data (text, pixels, audio waveforms, code tokens…) into real-valued vectors that capture “meaning” in that domain, you can perform semantic search.</p>
<p><strong>Spotify</strong> uses audio embeddings to power “Fans also like” and “Discover Weekly” by finding tracks whose embeddings cluster together.</p>
<p>You can watch following video to understand semantic search &amp; vector databases better.</p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://youtu.be/gl1r1XV0SLw?si=paNNqhkEHfzGHKnw">https://youtu.be/gl1r1XV0SLw?si=paNNqhkEHfzGHKnw</a></div>
<p> </p>
<h2 id="heading-phases-of-rag">Phases of RAG</h2>
<p>RAG in its most basic form has two phases. Let’s understand these phases from one example. Let’s say you have a big PDF document &amp; you want to get answers to some questions basis that document.</p>
<h3 id="heading-ingestion-phase"><strong>Ingestion Phase</strong></h3>
<p>This refers to ingesting the data into the RAG system that will be utilized to pass better context to LLM. In our example, we upload our PDF document to the RAG, which indexes this document and stores it in such a way, that it’s easier to fetch relevant information from it.</p>
<p>This phase can has following steps:</p>
<ol>
<li><p>Load Data: The first step in ingestion is loading the data. This can be uploaded by the users, or we may have certain data on which we want to make a specialized RAG system.</p>
</li>
<li><p>Chunk Data: In this step the loaded data is chunked into smaller pieces called chunks. Chunking splits large documents into smaller passages that fit within the model’s context window. The queried data can’t be more than the model’s context window. This also ensures that we don’t pass the whole document, in nut shell, too much context to the LLM.</p>
</li>
<li><p>Generate Vector Embeddings: As discussed before, in this step we generate the vector embeddings associated with each of the chunk of the data. We rely on vector embedding models for this step.</p>
</li>
<li><p>Store Vector Embeddings: In this step, we store vector embeddings of the chunks in the vector database such as Pinecone for faster &amp; efficient semantic search.</p>
</li>
</ol>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746972529783/c869611f-3041-4d3d-991d-2ec6b2c8461f.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-query-phase">Query Phase</h3>
<p>This refers to fetching the data that is most relevant to user’s query which is then passed to LLM. In our example, let’s say you have a question about your document &amp; you ask the question to RAG. RAG looks at the stored information and fetches the most relevant pieces of data that can be passed to LLM to get the answers to your question.</p>
<p>The query phase has following steps:</p>
<ol>
<li><p>Generate Vector Embeddings for Query: In this step we generate vector embeddings for user’s query using the same embedding model used for ingestion.</p>
</li>
<li><p>Semantic Search: In this step, we use the vector embeddings generated for the user’s query to do a similarity search on a vector database. This step returns the most relevant chunks of data corresponding to the user’s query.</p>
</li>
<li><p>Generate Response: In this step, we used the information retrieved from the vector database &amp; pass that information to LLM. Since the LLM now has the most relevant context on the user’s query, it will be able to generate good results.</p>
</li>
</ol>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">The step 1 &amp; step 2 here combined is called as Retrieval Phase.</div>
</div>

<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746368612179/b60337dd-d50a-49d5-8ac0-04966244b2fc.png" alt class="image--center mx-auto" /></p>
<hr />
<p>So that’s it for this one. Hope you liked this introductory article on RAG! In the next article, we will build a simple RAG system, in which we will upload a PDF to our RAG system &amp; ask the system questions on the PDF. The system will integrate with vector database &amp; OpenAI APIs. Stay tuned for the next one!</p>
<p>If you have questions/comments, then please feel free to comment on this article.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1746365342314/04ddb40d-f9f2-4471-83b2-14b4d097075d.jpeg" alt class="image--center mx-auto" /></p>
]]></content:encoded></item><item><title><![CDATA[Implementing Event Loops in Go: A Practical Approach]]></title><description><![CDATA[Ever wondered how single threaded applications, like Redis, are able to handle thousands of clients concurrently (“perceived” concurrency)? The answer is “Event Loops“. In this article, we will dive deep into how event loops work & their implementati...]]></description><link>https://blogs.niket.pro/event-loops-go</link><guid isPermaLink="true">https://blogs.niket.pro/event-loops-go</guid><category><![CDATA[Event Loop]]></category><category><![CDATA[golang]]></category><category><![CDATA[Redis]]></category><category><![CDATA[single-threaded]]></category><category><![CDATA[programming]]></category><category><![CDATA[operating system]]></category><dc:creator><![CDATA[Aniket Mahangare]]></dc:creator><pubDate>Mon, 14 Oct 2024 19:22:53 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1728933619707/a5980cf2-6f1e-44fa-8baf-b10dcab59f3d.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Ever wondered how single threaded applications, like Redis, are able to handle thousands of clients concurrently (“perceived” concurrency)? The answer is “Event Loops“. In this article, we will dive deep into how event loops work &amp; their implementation in GoLang.</p>
<h3 id="heading-event-loops"><strong>Event Loops</strong></h3>
<p>An event loop is a system that continuously listens for events (like user actions or messages) and handles each one sequentially. This allows programs to manage multiple tasks smoothly and efficiently using just a single thread.</p>
<h3 id="heading-concurrency-models-for-server-architecture"><strong>Concurrency Models for Server Architecture</strong></h3>
<p>There are two basic concurrency models for server architecture:</p>
<ol>
<li><p>Thread-Per-Request: This model uses a separate thread to handle each incoming client request. When a new request arrives, the server creates a new thread (or utilizes one from a thread pool) to process it independently.</p>
</li>
<li><p>I/O Multiplexing: This model allows a single thread (or a limited number of threads) to monitor and manage multiple I/O streams (like network sockets, files, or pipes) simultaneously. Instead of dedicating a thread to each request, the server uses mechanisms to detect when I/O operations (like reading or writing data) are ready to be performed. This thread then takes actions as per events on these streams.</p>
</li>
</ol>
<p>The key challenge of Thread-Per-Request model is, the application needs to be thread safe, which requires addition of locking mechanism, which in turn increases the code complexity &amp; slows down the execution, as multiple threads can compete to acquire the lock for a critical section.</p>
<p>Single threaded programs don’t need to handle thread safety, hence the CPU time allocated to them, can be utilized more efficiently. Single threaded applications usually rely on I/O multiplexing to implement event loops, so that they can serve clients concurrently.</p>
<h3 id="heading-key-concepts">Key Concepts</h3>
<p><strong>User Space vs. Kernel Space</strong></p>
<ul>
<li><p><strong>User Space</strong>: User space is the environment where user-facing applications run. This includes applications such as web servers, Chrome, text editors, and command utilities. User space applications cannot directly access the system’s hardware resources. They must make system calls to the kernel to request access to these resources.</p>
</li>
<li><p><strong>Kernel Space</strong>: Kernel space is where the core of the operating system, the kernel, operates. The kernel is responsible for managing the system’s resources, such as the CPU, memory, and storage. It also provides system calls, which are interfaces that allow user space applications to interact with the kernel. The kernel has unrestricted access to the system’s hardware resources. This is necessary for the kernel to perform its essential tasks, such as scheduling processes, managing memory, and handling interrupts.</p>
<p>  <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1728928543895/7194f6cd-12a3-44b5-b0cc-a196281c360c.png" alt class="image--center mx-auto" /></p>
  <div data-node-type="callout">
  <div data-node-type="callout-emoji">💡</div>
  <div data-node-type="callout-text">You can read more about the User-space, Kernel-space, and System Calls <a target="_blank" href="https://www.codeinsideout.com/blog/linux/system-call/">here</a>.</div>
  </div>


</li>
</ul>
<p><strong>Kernel Buffers</strong></p>
<ul>
<li><p><strong>Receive Buffer</strong>: When data arrives from a network or other I/O source, it's stored in a kernel-managed buffer until the application reads it.</p>
</li>
<li><p><strong>Send Buffer</strong>: Data that an application wants to send is stored in a kernel buffer before being transmitted over the network or I/O device.</p>
</li>
</ul>
<p><strong>File Descriptors (FDs)</strong></p>
<ul>
<li><p><strong>Definition</strong>: Integers that uniquely identify an open file, socket, or other I/O resource within the operating system.</p>
</li>
<li><p><strong>Usage</strong>: Applications use FDs to perform read/write operations on these resources.</p>
</li>
</ul>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">I highly recommend you watch these YouTube videos on <a target="_blank" href="https://www.youtube.com/watch?v=-gP58pozNuM">File Descriptors</a> &amp; <a target="_blank" href="https://www.youtube.com/watch?v=gYpWkbm6K98">System Calls</a> in Linux. TL/DR, everything in unix/linux is a file &amp; the OS provides system calls to interact with resources.</div>
</div>

<h2 id="heading-io-multiplexing-mechanisms">I/O Multiplexing Mechanisms</h2>
<p>kqueue(on MacOS) and epoll(on Linux) are kernel system calls for scalable I/O event notification mechanisms in an efficient manner. In simple words, you subscribe to certain kernel events and you get notified when any of those events occur. These system calls are desigend for scalable situations such as a webserver where thousands of concurrent connections are being handled by one server.</p>
<p>In this article, I will focus on using <code>kqueue</code>, however, I will share the GitHub repo with code for implementation using <code>epoll</code>.</p>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">You can read more on these system calls <a target="_blank" href="https://nima101.github.io/io_multiplexing">here</a>.</div>
</div>

<h2 id="heading-implementation-of-io-multiplexing-in-golang">Implementation of I/O Multiplexing in GoLang</h2>
<p>In Go, we can use the <a target="_blank" href="http://golang.org/x/sys/unix"><code>golang.org/x/sys/unix</code></a> package to access low-level system calls like <code>kqueue</code> on Unix-like systems.</p>
<h3 id="heading-step-1-define-server-configuration"><strong>Step 1: Define Server Configuration</strong></h3>
<p>Create a configuration struct or use variables to hold server parameters.</p>
<pre><code class="lang-go"><span class="hljs-keyword">var</span> (
    host       = <span class="hljs-string">"127.0.0.1"</span> <span class="hljs-comment">// Server IP address</span>
    port       = <span class="hljs-number">8080</span>        <span class="hljs-comment">// Server port</span>
    maxClients = <span class="hljs-number">20000</span>       <span class="hljs-comment">// Maximum number of concurrent clients</span>
)
</code></pre>
<h3 id="heading-step-2-create-the-server-socket"><strong>Step 2: Create the Server Socket</strong></h3>
<p>A new socket can be created using <code>unix.Socket</code> method. A socket can be thought of as an endpoint in a two-way communication channel. Socket routines create the communication channel, and the channel is used to send data between application programs either locally or over networks. Each socket within the network has a unique name associated with it called a socket descriptor—a full-word integer that designates a socket and allows application programs to refer to it when needed.</p>
<p>In simpler terms, a socket is like a door through which data enters and exits a program over the network. It enables inter-process communication, either on the same machine or across different machines connected via a network. Each of these sockets is assigned a file descriptor when they are created.</p>
<pre><code class="lang-go">serverFD, err := unix.Socket(unix.AF_INET, unix.SOCK_STREAM, unix.IPPROTO_TCP)
<span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
    <span class="hljs-keyword">return</span> fmt.Errorf(<span class="hljs-string">"socket creation failed: %v"</span>, err)
}
<span class="hljs-keyword">defer</span> unix.Close(serverFD)
</code></pre>
<ul>
<li><p><strong>unix.AF_INET</strong>: This option specifies that the socket will use IPv4 Internet protocol.</p>
</li>
<li><p><strong>unix.SOCK_STREAM</strong>: This option provides reliable, ordered, and error-checked delivery of a stream of bytes, typically using TCP.</p>
</li>
<li><p><strong>unix.IPPROTO_TCP</strong>: This option specifies that the socket will use the TCP protocol for communication, ensuring reliable data transmission.</p>
</li>
</ul>
<div data-node-type="callout">
<div data-node-type="callout-emoji">💡</div>
<div data-node-type="callout-text">Check <a target="_blank" href="https://www.ibm.com/docs/en/zos/3.1.0?topic=ncuuss-what-is-socket">this</a> documentation by IBM to read more on sockets.</div>
</div>

<h3 id="heading-step-3-set-socket-options"><strong>Step 3: Set Socket Options</strong></h3>
<h4 id="heading-set-non-blocking-mode"><strong>Set Non-blocking Mode</strong></h4>
<p>Setting a socket to <strong>non-blocking mode</strong> ensures that I/O operations return immediately without waiting. When a socket operates in this mode:</p>
<ul>
<li><p><code>accept</code>: If there are no incoming connections, it immediately returns an error (<code>EAGAIN</code> or <code>EWOULDBLOCK</code>) instead of waiting.</p>
</li>
<li><p><code>read</code>/<code>recv</code>: If there's no data to read, it immediately returns an error instead of blocking.</p>
</li>
<li><p><code>write</code>/<code>send</code>: If the socket's buffer is full and can't accept more data, it immediately returns an error instead of waiting.</p>
</li>
</ul>
<pre><code class="lang-go"><span class="hljs-keyword">if</span> err := unix.SetNonblock(serverFD, <span class="hljs-literal">true</span>); err != <span class="hljs-literal">nil</span> {
    <span class="hljs-keyword">return</span> fmt.Errorf(<span class="hljs-string">"failed to set non-blocking mode: %v"</span>, err)
}
</code></pre>
<h4 id="heading-allow-address-reuse"><strong>Allow Address Reuse</strong></h4>
<p>This is particularly useful in scenarios where you need to restart a server quickly without waiting for the operating system to release the port.</p>
<pre><code class="lang-go"><span class="hljs-keyword">if</span> err := unix.SetsockoptInt(serverFD, unix.SOL_SOCKET, unix.SO_REUSEADDR, <span class="hljs-number">1</span>); err != <span class="hljs-literal">nil</span> {
    <span class="hljs-keyword">return</span> fmt.Errorf(<span class="hljs-string">"failed to set SO_REUSEADDR: %v"</span>, err)
}
</code></pre>
<h3 id="heading-step-4-bind-and-listen"><strong>Step 4: Bind and Listen</strong></h3>
<h4 id="heading-bind-the-socket"><strong>Bind the Socket</strong></h4>
<p>Socket binding involves linking a socket to a specific local address and port on your computer. Essentially, it tells the operating system, "Hey, my application is ready to handle any network traffic that comes to this address and port."</p>
<p>When you're setting up a server in network programming, binding is a crucial first step. Before your server can start accepting connections or receiving data, it needs to bind its socket to a chosen address and port. This connection point is where clients will reach out to connect or send information.</p>
<pre><code class="lang-go">addr := &amp;unix.SockaddrInet4{Port: port}
<span class="hljs-built_in">copy</span>(addr.Addr[:], net.ParseIP(host).To4())

<span class="hljs-keyword">if</span> err := unix.Bind(serverFD, addr); err != <span class="hljs-literal">nil</span> {
    <span class="hljs-keyword">return</span> fmt.Errorf(<span class="hljs-string">"failed to bind socket: %v"</span>, err)
}
</code></pre>
<p><strong>unix.SockaddrInet4</strong>: This <code>struct</code> hold the IP address/host address (IPv4) &amp; the port of your server.</p>
<h4 id="heading-start-listening"><strong>Start Listening</strong></h4>
<pre><code class="lang-go"><span class="hljs-keyword">if</span> err := unix.Listen(serverFD, maxClients); err != <span class="hljs-literal">nil</span> {
    <span class="hljs-keyword">return</span> fmt.Errorf(<span class="hljs-string">"failed to listen on socket: %v"</span>, err)
}
</code></pre>
<h3 id="heading-step-5-initialize-kqueue"><strong>Step 5: Initialize kqueue</strong></h3>
<pre><code class="lang-go">kq, err := unix.Kqueue()
<span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
    <span class="hljs-keyword">return</span> fmt.Errorf(<span class="hljs-string">"failed to create kqueue: %v"</span>, err)
}
<span class="hljs-keyword">defer</span> unix.Close(kq)
</code></pre>
<ul>
<li><strong>unix.Kqueue()</strong>: Creates a new kernel event queue and returns a file descriptor associated with this <code>kqueue</code>.</li>
</ul>
<h3 id="heading-step-6-register-server-fd-with-kqueue"><strong>Step 6: Register Server FD with kqueue</strong></h3>
<p>Register the file descriptor associated with server socket to monitor for incoming connections. Just to reiterate, everything linux/unix is a file. Basically, when clients want to establish connection with our server, <code>kqueue</code> monitors these event &amp; notify our application to take actions accordingly.</p>
<pre><code class="lang-go">kev := unix.Kevent_t{
    Ident:  <span class="hljs-keyword">uint64</span>(serverFD),
    Filter: unix.EVFILT_READ,
    Flags:  unix.EV_ADD,
}

<span class="hljs-keyword">if</span> _, err := unix.Kevent(kq, []unix.Kevent_t{kev}, <span class="hljs-literal">nil</span>, <span class="hljs-literal">nil</span>); err != <span class="hljs-literal">nil</span> {
    <span class="hljs-keyword">return</span> fmt.Errorf(<span class="hljs-string">"failed to register server FD with kqueue: %v"</span>, err)
}
</code></pre>
<ul>
<li><p><strong>Ident</strong>: The identifier (file descriptor) to watch, in this case we want to watch the file descriptor associated with our server.</p>
</li>
<li><p><strong>Filter</strong>: The type of event to watch (<code>unix.EVFILT_READ</code> for read events).</p>
</li>
<li><p><strong>Flags</strong>: Actions to perform (<code>unix.EV_ADD</code> to add the event).</p>
</li>
</ul>
<p>The <code>Kevent</code> method here is used to perform certain actions on the kerned event queue, we created before. It accepts following parameters:</p>
<ul>
<li><p>The file descriptor associated with kqueue</p>
</li>
<li><p>A slice of <code>Kevent_t</code> structs. This slice tells kqueue what changes you want to make. Here, you're adding a new event (like monitoring a socket for incoming connections).</p>
</li>
<li><p>An event list, this would be a slice where kqueue writes back any events that have occurred. We will see this in the next section.</p>
</li>
<li><p>A timeout, that defines how long <code>kevent</code> should wait for events.</p>
</li>
</ul>
<h3 id="heading-step-7-enter-the-event-loop"><strong>Step 7: Enter the Event Loop</strong></h3>
<p>Create a loop to wait for events and handle them.</p>
<pre><code class="lang-go">events := <span class="hljs-built_in">make</span>([]unix.Kevent_t, maxClients)

<span class="hljs-keyword">for</span> {
    nevents, err := unix.Kevent(kq, <span class="hljs-literal">nil</span>, events, <span class="hljs-literal">nil</span>)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">if</span> err == unix.EINTR {
            <span class="hljs-keyword">continue</span> <span class="hljs-comment">// Interrupted system call, retry</span>
        }
        <span class="hljs-keyword">return</span> fmt.Errorf(<span class="hljs-string">"kevent error: %v"</span>, err)
    }

    <span class="hljs-keyword">for</span> i := <span class="hljs-number">0</span>; i &lt; nevents; i++ {
        ev := events[i]
        fd := <span class="hljs-keyword">int</span>(ev.Ident)

        <span class="hljs-keyword">if</span> fd == serverFD {
            <span class="hljs-comment">// Handle new incoming connection</span>
        } <span class="hljs-keyword">else</span> {
            <span class="hljs-comment">// Handle client I/O</span>
        }
    }
}
</code></pre>
<ul>
<li><strong>unix.Kevent</strong>: This is a blocking call which waits for events until getting timed out (optional). This method is used to wait for events from <code>kqueue</code> as well alter events to be monitored by the <code>kqueue</code>.</li>
</ul>
<h3 id="heading-step-8-accept-new-connections"><strong>Step 8: Accept New Connections</strong></h3>
<p>As we are monitoring the file descriptor associated with the server socket, <code>kqueue</code> returns events such as new client connection request. When the server socket is ready &amp; clients request to connect with the server on the server socket, then accept connections from client.</p>
<pre><code class="lang-go">nfd, sa, err := unix.Accept(serverFD)
<span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
    log.Printf(<span class="hljs-string">"failed to accept connection: %v"</span>, err)
    <span class="hljs-keyword">continue</span>
}
<span class="hljs-keyword">defer</span> unix.Close(nfd) <span class="hljs-comment">// Ensure the FD is closed when no longer needed</span>

<span class="hljs-comment">// Set the new socket to non-blocking mode</span>
<span class="hljs-keyword">if</span> err := unix.SetNonblock(nfd, <span class="hljs-literal">true</span>); err != <span class="hljs-literal">nil</span> {
    log.Printf(<span class="hljs-string">"failed to set non-blocking mode on client FD: %v"</span>, err)
    unix.Close(nfd)
    <span class="hljs-keyword">continue</span>
}

<span class="hljs-comment">// Register the new client FD with kqueue</span>
clientKev := unix.Kevent_t{
    Ident:  <span class="hljs-keyword">uint64</span>(nfd),
    Filter: unix.EVFILT_READ,
    Flags:  unix.EV_ADD,
}

<span class="hljs-keyword">if</span> _, err := unix.Kevent(kq, []unix.Kevent_t{clientKev}, <span class="hljs-literal">nil</span>, <span class="hljs-literal">nil</span>); err != <span class="hljs-literal">nil</span> {
    log.Printf(<span class="hljs-string">"failed to register client FD with kqueue: %v"</span>, err)
    unix.Close(nfd)
    <span class="hljs-keyword">continue</span>
}

log.Printf(<span class="hljs-string">"accepted new connection from %v"</span>, sa)
</code></pre>
<ul>
<li><p><strong>unix.Accept</strong>: Method to accept new incoming connection from clients.</p>
</li>
<li><p>nfd: When server accepts a new connection, it creates a new socket for that client. This <code>nfd</code> is file descriptor associated with that client socket.</p>
</li>
<li><p><strong>sa</strong>: This is address of the socket the client is connected to.</p>
</li>
<li><p><strong>Register the client FD</strong>: When server accepts connection from client, we register the file descriptor associated the client socket in <code>kqueue</code>, so that we can monitor events from the client such as <code>new data sent</code>, <code>connection terminated</code>, etc.</p>
</li>
</ul>
<h3 id="heading-step-9-handle-client-io"><strong>Step 9: Handle Client I/O</strong></h3>
<p>When clients send some data to our server, the <code>kqueue</code> notifies our application, then we can take actions accordingly.</p>
<pre><code class="lang-go">buf := <span class="hljs-built_in">make</span>([]<span class="hljs-keyword">byte</span>, <span class="hljs-number">1024</span>)
n, err := unix.Read(fd, buf)
<span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
    <span class="hljs-keyword">if</span> err == unix.EAGAIN || err == unix.EWOULDBLOCK {
        <span class="hljs-comment">// No data available right now</span>
        <span class="hljs-keyword">continue</span>
    }
    log.Printf(<span class="hljs-string">"failed to read from client FD %d: %v"</span>, fd, err)
    <span class="hljs-comment">// Remove the FD from kqueue and close it</span>
    kev := unix.Kevent_t{
        Ident:  <span class="hljs-keyword">uint64</span>(fd),
        Filter: unix.EVFILT_READ,
        Flags:  unix.EV_DELETE,
    }
    unix.Kevent(kq, []unix.Kevent_t{kev}, <span class="hljs-literal">nil</span>, <span class="hljs-literal">nil</span>)
    unix.Close(fd)
    <span class="hljs-keyword">continue</span>
}

<span class="hljs-comment">// Process the data received</span>
data := buf[:n]
log.Printf(<span class="hljs-string">"received data from client FD %d: %s"</span>, fd, <span class="hljs-keyword">string</span>(data))

<span class="hljs-comment">// Echo the data back to the client (optional)</span>
<span class="hljs-keyword">if</span> _, err := unix.Write(fd, data); err != <span class="hljs-literal">nil</span> {
    log.Printf(<span class="hljs-string">"failed to write to client FD %d: %v"</span>, fd, err)
    <span class="hljs-comment">// Handle write error if necessary</span>
}
</code></pre>
<ul>
<li><p><strong>unix.Read</strong>: Reads data from the file descriptor associated with socket corresponding to the client.</p>
</li>
<li><p><strong>Handling errors</strong>: <strong>EAGAIN</strong> or <strong>EWOULDBLOCK</strong>: No data available; in non-blocking mode, this is normal. You may think that, as your application is single threaded, if <code>kqueue</code> is saying there is some data to be read &amp; when you attempt to read, then there has to be some data. But in certain cases, it’s possible that, it’s not the case, hence it’s recommended to handle these errors explicitly. You can read about it in <a target="_blank" href="https://beej.us/guide/bgnet/html/index-wide.html#:~:text=Quick%20note%20to%20all,socket%20to%20non%2Dblocking.">Beej's Guide to Network Programming</a>, I am adding a quote about this from this book.</p>
</li>
</ul>
<blockquote>
<p>Quick note to all you Linux fans out there: sometimes, in rare circumstances, Linux’s <code>select()</code> can return “ready-to-read” and then not actually be ready to read! This means it will block on the <code>read()</code> after the <code>select()</code> says it won’t! Why you little—! Anyway, the workaround solution is to set the <code>O_NONBLOCK</code> flag on the receiving socket so it errors with <code>EWOULDBLOCK</code> (which you can just safely ignore if it occurs).</p>
</blockquote>
<ul>
<li>Other errors: Close the connection.</li>
</ul>
<h3 id="heading-step-10-clean-up-resources"><strong>Step 10: Clean Up Resources</strong></h3>
<p>Ensure that all file descriptors are properly closed when they are no longer needed.</p>
<ul>
<li><p><strong>Closing Client FDs</strong>: As shown in previous steps, remove the FD from kqueue and close it.</p>
</li>
<li><p><strong>Closing Server FD and kqueue FD</strong>: Use <code>defer</code> statements to ensure they are closed when the function exits.</p>
</li>
</ul>
<pre><code class="lang-go"><span class="hljs-keyword">defer</span> unix.Close(serverFD)
<span class="hljs-keyword">defer</span> unix.Close(kq)
</code></pre>
<h2 id="heading-complete-refactored-code"><strong>Complete Refactored Code</strong></h2>
<p>Here's the complete code, you can also find it in the GitHub repo I mentioned below.</p>
<pre><code class="lang-go"><span class="hljs-keyword">package</span> main

<span class="hljs-keyword">import</span> (
    <span class="hljs-string">"fmt"</span>
    <span class="hljs-string">"golang.org/x/sys/unix"</span>
    <span class="hljs-string">"log"</span>
    <span class="hljs-string">"net"</span>
)

<span class="hljs-keyword">var</span> (
    host       = <span class="hljs-string">"127.0.0.1"</span> <span class="hljs-comment">// Server IP address</span>
    port       = <span class="hljs-number">8080</span>        <span class="hljs-comment">// Server port</span>
    maxClients = <span class="hljs-number">20000</span>       <span class="hljs-comment">// Maximum number of concurrent clients</span>
)

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">RunAsyncTCPServerUnix</span><span class="hljs-params">()</span> <span class="hljs-title">error</span></span> {
    log.Printf(<span class="hljs-string">"starting an asynchronous TCP server on %s:%d"</span>, host, port)

    <span class="hljs-comment">// Create kqueue event objects to hold events</span>
    events := <span class="hljs-built_in">make</span>([]unix.Kevent_t, maxClients)

    <span class="hljs-comment">// Create a socket</span>
    serverFD, err := unix.Socket(unix.AF_INET, unix.SOCK_STREAM, unix.IPPROTO_TCP)
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> fmt.Errorf(<span class="hljs-string">"socket creation failed: %v"</span>, err)
    }
    <span class="hljs-keyword">defer</span> unix.Close(serverFD)

    <span class="hljs-comment">// Set the socket to non-blocking mode</span>
    <span class="hljs-keyword">if</span> err := unix.SetNonblock(serverFD, <span class="hljs-literal">true</span>); err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> fmt.Errorf(<span class="hljs-string">"failed to set non-blocking mode: %v"</span>, err)
    }

    <span class="hljs-comment">// Allow address reuse</span>
    <span class="hljs-keyword">if</span> err := unix.SetsockoptInt(serverFD, unix.SOL_SOCKET, unix.SO_REUSEADDR, <span class="hljs-number">1</span>); err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> fmt.Errorf(<span class="hljs-string">"failed to set SO_REUSEADDR: %v"</span>, err)
    }

    <span class="hljs-comment">// Bind the IP &amp; the port</span>
    addr := &amp;unix.SockaddrInet4{Port: port}
    <span class="hljs-built_in">copy</span>(addr.Addr[:], net.ParseIP(host).To4())
    <span class="hljs-keyword">if</span> err := unix.Bind(serverFD, addr); err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> fmt.Errorf(<span class="hljs-string">"failed to bind socket: %v"</span>, err)
    }

    <span class="hljs-comment">// Start listening</span>
    <span class="hljs-keyword">if</span> err := unix.Listen(serverFD, maxClients); err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> fmt.Errorf(<span class="hljs-string">"failed to listen on socket: %v"</span>, err)
    }

    <span class="hljs-comment">// Create kqueue instance</span>
    kq, err := unix.Kqueue()
    <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> fmt.Errorf(<span class="hljs-string">"failed to create kqueue: %v"</span>, err)
    }
    <span class="hljs-keyword">defer</span> unix.Close(kq)

    <span class="hljs-comment">// Register the serverFD with kqueue</span>
    kev := unix.Kevent_t{
        Ident:  <span class="hljs-keyword">uint64</span>(serverFD),
        Filter: unix.EVFILT_READ,
        Flags:  unix.EV_ADD,
    }

    <span class="hljs-keyword">if</span> _, err := unix.Kevent(kq, []unix.Kevent_t{kev}, <span class="hljs-literal">nil</span>, <span class="hljs-literal">nil</span>); err != <span class="hljs-literal">nil</span> {
        <span class="hljs-keyword">return</span> fmt.Errorf(<span class="hljs-string">"failed to register server FD with kqueue: %v"</span>, err)
    }

    <span class="hljs-comment">// Event loop</span>
    <span class="hljs-keyword">for</span> {
        nevents, err := unix.Kevent(kq, <span class="hljs-literal">nil</span>, events, <span class="hljs-literal">nil</span>)
        <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
            <span class="hljs-keyword">if</span> err == unix.EINTR {
                <span class="hljs-keyword">continue</span> <span class="hljs-comment">// Interrupted system call, retry</span>
            }
            <span class="hljs-keyword">return</span> fmt.Errorf(<span class="hljs-string">"kevent error: %v"</span>, err)
        }

        <span class="hljs-keyword">for</span> i := <span class="hljs-number">0</span>; i &lt; nevents; i++ {
            ev := events[i]
            fd := <span class="hljs-keyword">int</span>(ev.Ident)

            <span class="hljs-keyword">if</span> fd == serverFD {
                <span class="hljs-comment">// Accept the incoming connection from client</span>
                nfd, sa, err := unix.Accept(serverFD)
                <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
                    log.Printf(<span class="hljs-string">"failed to accept connection: %v"</span>, err)
                    <span class="hljs-keyword">continue</span>
                }

                <span class="hljs-comment">// Set the new socket to non-blocking mode</span>
                <span class="hljs-keyword">if</span> err := unix.SetNonblock(nfd, <span class="hljs-literal">true</span>); err != <span class="hljs-literal">nil</span> {
                    log.Printf(<span class="hljs-string">"failed to set non-blocking mode on client FD: %v"</span>, err)
                    unix.Close(nfd)
                    <span class="hljs-keyword">continue</span>
                }

                <span class="hljs-comment">// Register the new client FD with kqueue</span>
                clientKev := unix.Kevent_t{
                    Ident:  <span class="hljs-keyword">uint64</span>(nfd),
                    Filter: unix.EVFILT_READ,
                    Flags:  unix.EV_ADD,
                }

                <span class="hljs-keyword">if</span> _, err := unix.Kevent(kq, []unix.Kevent_t{clientKev}, <span class="hljs-literal">nil</span>, <span class="hljs-literal">nil</span>); err != <span class="hljs-literal">nil</span> {
                    log.Printf(<span class="hljs-string">"failed to register client FD with kqueue: %v"</span>, err)
                    unix.Close(nfd)
                    <span class="hljs-keyword">continue</span>
                }

                log.Printf(<span class="hljs-string">"accepted new connection from %v"</span>, sa)
            } <span class="hljs-keyword">else</span> {
                <span class="hljs-comment">// Handle client I/O</span>
                buf := <span class="hljs-built_in">make</span>([]<span class="hljs-keyword">byte</span>, <span class="hljs-number">1024</span>)
                n, err := unix.Read(fd, buf)
                <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
                    <span class="hljs-keyword">if</span> err == unix.EAGAIN || err == unix.EWOULDBLOCK {
                        <span class="hljs-keyword">continue</span> <span class="hljs-comment">// No data available right now</span>
                    }
                    log.Printf(<span class="hljs-string">"failed to read from client FD %d: %v"</span>, fd, err)
                    <span class="hljs-comment">// Remove the FD from kqueue and close it</span>
                    kev := unix.Kevent_t{
                        Ident:  <span class="hljs-keyword">uint64</span>(fd),
                        Filter: unix.EVFILT_READ,
                        Flags:  unix.EV_DELETE,
                    }
                    unix.Kevent(kq, []unix.Kevent_t{kev}, <span class="hljs-literal">nil</span>, <span class="hljs-literal">nil</span>)
                    unix.Close(fd)
                    <span class="hljs-keyword">continue</span>
                }

                <span class="hljs-keyword">if</span> n == <span class="hljs-number">0</span> {
                    <span class="hljs-comment">// Connection closed by client</span>
                    log.Printf(<span class="hljs-string">"client FD %d closed the connection"</span>, fd)
                    kev := unix.Kevent_t{
                        Ident:  <span class="hljs-keyword">uint64</span>(fd),
                        Filter: unix.EVFILT_READ,
                        Flags:  unix.EV_DELETE,
                    }
                    unix.Kevent(kq, []unix.Kevent_t{kev}, <span class="hljs-literal">nil</span>, <span class="hljs-literal">nil</span>)
                    unix.Close(fd)
                    <span class="hljs-keyword">continue</span>
                }

                <span class="hljs-comment">// Process the data received</span>
                data := buf[:n]
                log.Printf(<span class="hljs-string">"received data from client FD %d: %s"</span>, fd, <span class="hljs-keyword">string</span>(data))

                <span class="hljs-comment">// Echo the data back to the client (optional)</span>
                <span class="hljs-keyword">if</span> _, err := unix.Write(fd, data); err != <span class="hljs-literal">nil</span> {
                    log.Printf(<span class="hljs-string">"failed to write to client FD %d: %v"</span>, fd, err)
                    <span class="hljs-comment">// Handle write error if necessary</span>
                }
            }
        }
    }
}

<span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span> {
    <span class="hljs-keyword">if</span> err := RunAsyncTCPServerUnix(); err != <span class="hljs-literal">nil</span> {
        log.Fatalf(<span class="hljs-string">"server error: %v"</span>, err)
    }
}
</code></pre>
<h2 id="heading-how-to-test-above-code">How to test above code?</h2>
<h3 id="heading-netcat"><strong>netcat</strong></h3>
<p>netcat is a computer networking utility for reading from and writing to network connections using TCP or UDP.</p>
<ol>
<li><p>Open up two or more terminal windows.</p>
</li>
<li><p>Type <code>nc localhost 8080</code></p>
<p> <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1728931162296/b43c70aa-a780-4aa4-9213-375c8cd4ec47.png" alt class="image--center mx-auto" /></p>
 <div data-node-type="callout">
 <div data-node-type="callout-emoji">💡</div>
 <div data-node-type="callout-text">You can refer to <a target="_blank" href="https://www.varonis.com/blog/netcat-commands">this</a> article to learn more on netcat.</div>
 </div>

<h3 id="heading-go-client-code">Go Client Code</h3>
<pre><code class="lang-go"> <span class="hljs-keyword">package</span> main

 <span class="hljs-keyword">import</span> (
     <span class="hljs-string">"bufio"</span>
     <span class="hljs-string">"fmt"</span>
     <span class="hljs-string">"log"</span>
     <span class="hljs-string">"net"</span>
     <span class="hljs-string">"sync"</span>
     <span class="hljs-string">"time"</span>
 )

 <span class="hljs-keyword">const</span> (
     serverAddress = <span class="hljs-string">"127.0.0.1:8080"</span> <span class="hljs-comment">// Server address</span>
     numClients    = <span class="hljs-number">100</span>              <span class="hljs-comment">// Number of concurrent clients to simulate</span>
 )

 <span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">main</span><span class="hljs-params">()</span></span> {
     <span class="hljs-keyword">var</span> wg sync.WaitGroup

     <span class="hljs-keyword">for</span> i := <span class="hljs-number">0</span>; i &lt; numClients; i++ {
         wg.Add(<span class="hljs-number">1</span>)
         <span class="hljs-keyword">go</span> <span class="hljs-function"><span class="hljs-keyword">func</span><span class="hljs-params">(clientID <span class="hljs-keyword">int</span>)</span></span> {
             <span class="hljs-keyword">defer</span> wg.Done()
             err := runClient(clientID)
             <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
                 log.Printf(<span class="hljs-string">"Client %d error: %v"</span>, clientID, err)
             }
         }(i)
         <span class="hljs-comment">// Optional: Sleep to stagger client connections</span>
         time.Sleep(<span class="hljs-number">100</span> * time.Millisecond)
     }

     wg.Wait()
 }

 <span class="hljs-function"><span class="hljs-keyword">func</span> <span class="hljs-title">runClient</span><span class="hljs-params">(clientID <span class="hljs-keyword">int</span>)</span> <span class="hljs-title">error</span></span> {
     <span class="hljs-comment">// Connect to the server</span>
     conn, err := net.Dial(<span class="hljs-string">"tcp"</span>, serverAddress)
     <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
         <span class="hljs-keyword">return</span> fmt.Errorf(<span class="hljs-string">"failed to connect: %v"</span>, err)
     }
     <span class="hljs-keyword">defer</span> conn.Close()

     log.Printf(<span class="hljs-string">"Client %d connected to %s"</span>, clientID, serverAddress)

     <span class="hljs-comment">// Send a message to the server</span>
     message := fmt.Sprintf(<span class="hljs-string">"Hello from client %d"</span>, clientID)
     _, err = fmt.Fprintf(conn, message+<span class="hljs-string">"\n"</span>)
     <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
         <span class="hljs-keyword">return</span> fmt.Errorf(<span class="hljs-string">"failed to send data: %v"</span>, err)
     }

     time.Sleep(<span class="hljs-number">100</span> * time.Millisecond)

     <span class="hljs-comment">// Receive a response from the server</span>
     reply, err := bufio.NewReader(conn).ReadString(<span class="hljs-string">'\n'</span>)
     <span class="hljs-keyword">if</span> err != <span class="hljs-literal">nil</span> {
         <span class="hljs-keyword">return</span> fmt.Errorf(<span class="hljs-string">"failed to read response: %v"</span>, err)
     }

     log.Printf(<span class="hljs-string">"Client %d received: %s"</span>, clientID, reply)

     <span class="hljs-keyword">return</span> <span class="hljs-literal">nil</span>
 }
</code></pre>
</li>
</ol>
<p>So that’s it for this one. Hope you liked this article! If you have questions/comments, then please feel free to comment on this article.</p>
<p>You can find the implementation for <code>kqueue</code>, <code>epoll</code> &amp; the Go client in <a target="_blank" href="https://github.com/Niket1997/event-loops-go">this GitHub repository</a>. Stay tuned for next one!</p>
<p><strong>Disclaimer:</strong> The opinions expressed here are my own and do not represent the views of my employer. This blog is intended for informational purposes only.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1729005099345/1cba868e-476f-430e-b8d2-9e4f32a182af.jpeg" alt class="image--center mx-auto" /></p>
]]></content:encoded></item></channel></rss>