Packets, Prompts, and Probabilities: A Glimpse into Reinforcing Language Models

June 2025

Why I Got Involved

I had a message in my LinkedIn inbox — one of those recruiter-style pings that usually are just spam. But this one wasn’t targeting full-time engineers. They were looking for university students, part-timers, freelancers — anyone with English comprehension and technical background.

The goal was to evaluate AI-generated text for around $15 an hour. Enough to fund a couple of lattes and a loaf of decent bread. I wasn’t expecting much.

But after onboardings which seemed never-ending, reviewing bot outputs, comparing response quality, judging whether instructions were followed — I noticed a strange sense of familiarity. Not with the content itself, but with the structure of those messages. The process felt oddly like debugging a message-passing system.

It felt like watching a communication protocol.

What Surprised Me

The whole thing operated like a layered system. Prompts weren’t just casual questions; they were almost headers. Headers carrying intent, expectations, and constraints. Model responses were payloads, but not always clean ones. Some were malformed, ambiguous, or misaligned.

The process had rubrics: truthfulness, instruction-following, helpfulness, tone & style, completeness. It reminded me of checking protocol handshakes. SYN, then ACK, check if the payload match the negotiated expectations, and so on.

Some of the failures and issues I encountered were diagnosable:

Prompt ambiguity seems like semantic aliasing. If a prompt doesn’t clearly constrain the domain, the model “samples” too broadly and aliases unrelated interpretations into the response. You ask (vaguely) for X, but the model gives you X ± whatever fits within its training window.
Repetition it’s often a symptom of "channel narrowing". The model gets locked into a shallow loop because the prompt didn’t provide enough degrees of freedom. With limited variation in input parameters, it outputs near-identical structures.
And refusals with no fallback are still best explained as TCP resets with no retry. The conversational equivalent of a connection dropped at Layer 4. No handshake recovery, no error message, just a dead-end.

LLMs and Protocol Theory

Once I started seeing it through "telecom lens", everything made more sense.

If - a prompt is a packet. - System instructions are the protocol version. then - Tokens become frame segments. - A complete response is an ACK. (NACK if it is inaccurate, even if it’s grammatically perfect).

LLMs don’t need impedance matching, but semantic impedance is absolutely real. RLHF (Reinforcement Learning from Human Feedback) is how models are tuned.

The goal is to align four elements:

What the user intends,
How they encode that intent (the prompt),
How the model interprets and generates a response,
And how a human rater evaluates whether it all landed.

The human in the loop is like the final check in a handshake, confirming that the message got through and made sense.

There’s even semantic packet loss, where the model responds fluently but meaning is lost. In those cases the syntax holds up, but the intention goes missing. The idea-sausage is well-cased, but the meat's gone bad.

Human in the Loop: Not Just Ratings, But Tuning

At first, it seemed like rating a response with stars. Then I realized that the project was about training a communicator.

Feedback was about shaping heuristics, teaching the model that some responses are not just true, but useful. That "correct" isn’t always enough. Sometimes the model needed to reframe, clarify, or even admit what it didn’t know.

Like modelling and designing over a noisy channel. You're minimizing errors and optimizing for meaningful delivery under real-world conditions.

Takeaways

I made about $150 in the end (decent coffee money). But also I've leared how language is lossy and how easily intentions get mangled in transit.

It turns out, communication isn’t just about syntax or vocabulary. It’s a cooperative effort to minimize semantic friction.

And what we note with attention, ratings, or corrections, teaches LLMs not just to speak, but to speak with intention.

No Pages Found