Why We Must Keep Humans in Control

Rudi Maelbrancke
Rudi Maelbrancke
AIGENEER
Jun 18, 20264 min. read
Why We Must Keep Humans in Control
Tags:
Agentic AIComputer Vision

Right now, we are still trying to master the art of remote control—using humans to steer robots from a distance. But from where I sit, technology is already moving far beyond that. The real shift we are seeing today is not just about moving from manual steering to automated machines. It is about the rise of "spatial agents." These are smart AI systems that can look around, make decisions, and take action all on their own inside our physical spaces.

To me, remote-controlled robots are just a stepping stone, not the final destination. As these new AI systems start working directly in our factories, labs, and hospitals, they raise a crucial question that we all need to think about: how do we keep humans in meaningful control when the AI is the one making the decisions right in front of us?

How the Tech Works: Connecting AI to the Real World

Getting a computer brain to understand a three-dimensional room is no easy task. What companies like NVIDIA are doing to solve this is highly impressive. They have built special software tools that link smart glasses and headset sensors with heavy-duty computer systems. To make this run smoothly without any lag, they rely on massive computer setups like the NVIDIA DGX Spark. They also use what they call "digital twins"—which are basically highly accurate virtual copies of real-world buildings and machines. This lets the AI practice and run tests in a virtual space that perfectly matches the physical one.

But that is not the only way to build these systems. I am also tracking a different approach called the XARP toolkit. What stands out to me here is how they split the workload. They separate the main computer brain from the headset itself, sending data back and forth using simple web connections.

The most important part of this setup is something called the Model Context Protocol (MCP). Think of this as a universal translator. It lets different, isolated AI tools talk to each other safely so they can share information and successfully change things in a 3D space.

The Art of Giving Orders in 3D

Commanding an AI that lives in the real world requires extreme precision. You cannot just give vague instructions; you have to match your words to actual physical limits.

For example, if you are using a room-design application built on the XARP system, you have to be very specific. A great example of a prompt is:
"Scan the current room geometry, identify open floor space, and generate a 3D model of a desk scaled to fit the empty area against the north wall."

What is interesting here is how this single sentence forces the AI to do several complex steps in a specific order. First, it has to use depth sensors to map the room's shape. Next, it must calculate the exact size of the empty space. Finally, it uses the Model Context Protocol to call up a 3D modeling tool, which creates a virtual desk that fits the space perfectly and places it on the exact spot.

Real-World Gear and Action

We are already seeing the hardware that makes this possible. At the Augmented World Expo in Long Beach, the industry showed off some incredible tools. Snap brought out their SPECS glasses, which project AI interfaces directly into your view. XREAL showed off their super-light AURA glasses, which run on Google's Android software and Qualcomm processors. At the same time, PICO introduced Project Swan, a headset that lets you run standard web code in 3D, completely throwing out the need for old-fashioned computer screens.

These systems are already at work in serious, high-stakes jobs.

In factories, Siemens is using NVIDIA's setups to help engineers. An AI watches live video from the worker's smart glasses and projects step-by-step repair instructions directly onto the machines they are fixing.

In science labs, a startup called AutoBio created a system called LabOS. Running on headsets from Meta and VITURE, it helps researchers at top universities like Stanford and Princeton. The AI watches what the researcher is doing, guides them through complex gene-editing steps, finds physical samples, and logs the data without the scientist ever having to use their hands.

We are seeing similar upgrades in hospitals. At the University of Pittsburgh Medical Center, surgeons are using NVIDIA's tech to display patient vitals directly over their view of the operating room, without blocking their sight.

Even in everyday activities, specialized apps are making waves. One application acts as a personal fencing coach by tracking how your body moves and instantly giving you voice corrections. Another setup connects to flying systems to show drone pilots the exact path their drone will take, projecting the route right into the sky.

My Final Thoughts

This technology is moving incredibly fast, and the potential is huge. But as an observer, I believe we must proceed with caution.

When an AI is guiding a surgeon's hand, directing a factory worker, or handling delicate stem cells, the AI is the one perceiving the room and deciding what to do next. We cannot afford to just hand over the keys. As we build these amazing spatial agents, our most important challenge will be finding a way to keep humans in the driver's seat.