Physical AI Safety: Ownership and Execution Boundaries
This document consolidates the four design notes published in the execution-boundaries repository into a single structured reference.
Introduction
As LLM-based agents begin to control real-world actions, call external APIs, and create changes in the physical world, we keep asking the same question:
“How good and smart is this AI?”
This is not a wrong question. More questions and improvements are needed. I want to add one among the many questions that still need to be asked.
Should AI be making this judgment? Who owns this decision?
This document is organized around execution-boundaries, a series of design notes that structure that question.
No new technology is proposed here. What is proposed is a method for returning already-existing concepts — the name of an action, the manufacturer’s responsibility, the user’s intent, the conditions of execution — to their rightful owners.
This document uses a Matter-based smart home environment as an example for convenience, but the proposed structure is not tied to any specific platform. Physical control is the domain where safety is most primary and constraints are most demanding. [Note 1]
This document is written in terms of Actions, not Devices.
An Action is the unit of judgment. A Device is merely where that Action connects to reality.
The goal is to remove the Device from the center of judgment, and redefine the Action as the unit of judgment.
1. The Starting Point: Two Layers Inside a Single Action
“Activate the air conditioner.”
This sentence appears simple, but it contains two entirely different kinds of facts mixed together.
The first is what the manufacturer knows. This action provides cooling. It can run continuously for up to six hours. It generates high heat on the opposite side. If power is suddenly cut, the compressor may be damaged. It should not be operated at low temperatures. These are things the manufacturer learned while designing this action, and they belong to the manufacturer’s domain of responsibility.
The second is what the user knows. This cooling function is “turn on 30 minutes before the child falls asleep.” Or “run it ahead of time before guests arrive.” Or perhaps “never run it during daytime hours because of electricity bills.” This is the meaning the user has assigned by placing this action within their own life.
This document separates the former as the manufacturer’s domain and the latter as the user’s domain, and structures that separation through three concepts: Essence (Label), Fixed Label, and User Label.
1. Essence
The identity of the action as defined by the manufacturer. It describes what this action is. It does not change.
2. Fixed Label
Immutable information left by the manufacturer. It defines what boundaries this action must have. Safety conditions, limits, and physical constraints are included.
3. User Label
The name assigned by the user. It defines what this action means to them. Context and intent are captured here.
All three may look like “names,” but each has a different owner and a different responsibility. [Note 2]
2. Essence and Fixed Label: The Two Things the Manufacturer Leaves Behind
2-1. Essence: The Name of the Action Is Everything
The most important thing a manufacturer does when creating an action is give it a name. [Note 3]
“Turn on light.” “Start motor.” “Keep Warm.”
This is the Essence of the action. It is not a grand concept. It is the manufacturer’s declaration of “what this action is.” The moment this name exists, what the action is becomes determined. The name is functional and neutral. “Turn on light” simply states the fact that a light will be turned on.
This name does not change. Even after the manufacturer sells the device, “Turn on light” remains “Turn on light.”
When dealing with an action, the first thing to read is this name. It is from this name that we distinguish and understand what the action is.
2-2. Fixed Label: What the Manufacturer Leaves After Fulfilling Their Responsibility
If Essence answers “what is this action,” Fixed Label answers the next question: “What must be considered when executing this action?”
During the design and manufacturing process, the manufacturer makes countless judgments — how many hours this action can run continuously, what temperature range is safe, what happens if power is suddenly cut. Most of these judgments are resolved during the design phase: safety circuits are added, physical limits are set, protection logic is implemented. The manufacturer does everything within their power to fulfill their responsibility.
But even after all of that, some things remain. Things that could not be completely eliminated through design: the limits of physical law, uncertainty in the use environment, unpredictable contexts. Things that, despite the manufacturer having fulfilled all their responsibility, someone still needs to consider at the time of execution.
These are the items typically found in a user manual. Fixed Label is exactly that. In the digital domain, this format is the simplest way to express them. (Matter standard FixedLabel specification) [Note 4]
{"label": "Safety", "value": "No-unattended"}
{"label": "Limit", "value": "Max-6h-cont"}
{"label": "Warning", "value": "High-heat-gen"}
{"label": "Critical", "value": "Irreversible"}
3. User Label: The User Grants Existence
When the manufacturer creates an action, its Essence is complete. But the moment that action enters someone’s life, something entirely different begins.
The action’s Existence begins.
“Turn on the living room air conditioner” is the manufacturer’s language — functional and neutral. “When putting the child to sleep” is the user’s language. It expresses meaning, context, and importance.
User Label is the result of this transition. It is the contextual name the user assigns by placing the action within their own life. [Note 5]
4. What Happens When They Are Not Separated — And Where Does Responsibility Go?
First Confusion: When There Is No Fixed Label, the Manufacturer’s Responsibility Disappears
Imagine a situation where there is a User Label but no Fixed Label.
The user has set up “turn on the heater when putting the child to sleep.” The AI turns on the heater. But what if this heater has no declared condition of “No-unattended”?
The manufacturer knew this. They surely made that judgment during the design process. But they did not leave it as a Fixed Label. The AI cannot infer this fact through reasoning. The user cannot remember the product manual. As a result, execution happens with no one knowing, and an accident occurs.
In this case, responsibility lies with the manufacturer. The manufacturer did not fulfill their responsibility.
Failing to leave a Fixed Label is an omission in the design process.
The AI did not misjudge. The information needed for judgment was never there to begin with.
Second Confusion: When There Is No User Label, the Owner of Intent Disappears
Fixed Label exists, but User Label does not. The manufacturer has fulfilled their responsibility. But the user has not yet assigned their own meaning to this action. If in this state the AI learns patterns and begins automatic execution without any suggestion, the owner of intent disappears.
The manufacturer left a Fixed Label. The AI followed it. But the user never approved this execution. In this case, responsibility falls on the developer who designed the system this way.
The very structure that allows execution without a User Label is the problem.
Third Confusion: When Essence and Existence Conflict, the AI Takes Over the Judgment
This is the most subtle and dangerous case. The user has declared a User Label: “feed the cat every evening at 8 PM.” But the manufacturer’s Fixed Label states “max-daily-2x.” If the user has already used the device twice manually during the day, the scheduled evening execution exceeds the limit.
If the AI tries to resolve this conflict on its own, the decision becomes the AI’s. Violating the Fixed Label crosses the manufacturer’s responsibility boundary. Ignoring the User Label goes against the user’s approved intent.
There is only one correct response.
When the AI discovers this conflict, it does not judge — it returns the decision to the user.
“You’ve already used this twice today. This will reach the manufacturer’s recommended limit. Shall I proceed?”
If the user who receives this question chooses “proceed,” that decision and its consequences belong to the user. The AI only discovered the conflict and surfaced it. The judgment returned to its owner.
All three confusions share the same root. They occur when Fixed Label and User Label are not separated, or when — even if separated — the AI tries to fill the gap in judgment through inference.
And each confusion has a clear point of responsibility attribution:
- If Fixed Label was absent → Manufacturer
- If execution was permitted without User Label → AI Agent Developer
- If the AI resolved the conflict unilaterally → AI Agent Developer
- If it was the user’s judgment → User
Responsibility becomes blurred because this distinction is blurred.
5. Ownership: The Reason for the Separation
Once you understand the separation between Fixed Label and User Label, it becomes natural to see why “ownership” sits at the center.
Ownership is not an abstract legal concept. In this context, ownership is a very concrete question:
Who can declare this fact, and who takes responsibility for that declaration?
The manufacturer can declare Essence and Fixed Label — because the manufacturer knows the facts about the physical limits of the action, and takes responsibility for those facts.
The user can declare User Label — because the user knows the meaning this action holds in their life, and takes responsibility for that context.
AI can declare neither — because AI takes responsibility for neither the physical limits of the action nor the context of the user’s life.
| Domain | Owner | Basis |
|---|---|---|
| Name of action (Essence) | Manufacturer | Defines the identity of the action |
| Safety boundary / physical limits (Fixed Label) | Manufacturer | Verifiable at design time, accountable |
| Context and intent (User Label) | User | Assigns life meaning, accountability |
| Execution approval (Intent) | User | The act of approval itself |
| Mapping (connection) | AI | Connects declared elements |
The role of AI is one thing: AI connects what has been declared.
Filling what has not been declared through inference is not AI’s role.
AI can plausibly infer the physical limits of an action without a Fixed Label. It can guess the user’s intent with high accuracy without a User Label. If that inference is correct 99.99% of the time, where does responsibility lie for the remaining 0.01%? There is nothing left to say but “the AI inferred incorrectly.”
That accident cannot be explained, is difficult to prevent from recurring, and has no clear accountability.
When Fixed Label and User Label are declared, it is different. It becomes possible to trace at which boundary, through which judgment, an accident occurred. It becomes possible to review who declared what, where the AI permitted execution, and which question went unanswered.
AI is not an independent agent. AI is a structure executor — it executes an already-declared structure and passes only judgments where ownership is defined. It does not generate new meaning or create decisions; it only performs the connections between what has been declared.
In this structure, ownership is not a simple concept.
Responsibility exists only where there is authority to declare.
Conversely, in domains where nothing has been declared, execution must not be permitted either. Execution without declaration is execution without accountability.
6. The ISE Model: State Is Not the Basis for Judgment
Once Essence, Fixed Label, and User Label are in their proper places, the next question naturally follows: during execution, what should AI look at to make judgments?
This document proposes the ISE (Intent–State–Effect) model. This model separates the interaction between a physical system and AI into three domains.
State: The result of the system observing the world. A temperature sensor reads 45 degrees. Humidity is 70%. The door is open. These are records of fact.
Intent: The decision that approves execution. It belongs to the user. It is established when a User Label exists and the current context matches it.
Effect: The result that execution leaves in the world. It must remain within the range declared by the manufacturer via Fixed Label.
What matters is the relationship between State and Intent. State-machine based design has long been mainstream in software engineering and works well in the domain of digital logic. State itself was the unit of meaning; state transitions were mostly triggered by intentional input; and erroneous transitions were generally reversible.
The physical world is different. A rise in temperature might be due to the weather, cooking in the next room, or a child touching the boiler.
None of these constitute an intent to “turn on the air conditioner.” But state-machine based systems are designed to use state changes as execution conditions, and so they naturally miss this distinction.
Execution is permitted only by the existence of Intent, not by a State transition.
No matter how strongly State implies execution, there is no execution without Intent. And Intent is only created by user approval.
This model rests on three axioms:
Axiom 1: Not all physical events require judgment. Observation is not decision.
Axiom 2: Intent belongs to action, not sensing. Where Intent exists, responsibility exists.
Axiom 3: Judgment is required only when responsibility arises. Without responsibility, there is no judgment.
[Note 6]
7. The 9-Question Protocol: The Minimum Standard for Judgment Completeness
Once Essence and Fixed Label are in place, User Label is approved, and the ISE model defines the structure of execution, the next question is: what specifically must be confirmed before AI permits execution?
This document answers with the 9-Question Protocol. These are nine questions that must have answers before execution — if even one goes unanswered, execution must be blocked.
In the ISE model, actions are classified into two types: Button (Momentary) and Switch (Sustained). This distinction carries practical significance in the questions below, particularly Q7 and Q9.
| Question | Responsible Party | |
|---|---|---|
| Q1 | What is the intent of this Action? | User / Manufacturer |
| Q2 | What happens in reality when this Action executes? | Manufacturer (Fixed Label) |
| Q3 | What boundary must never be crossed? | Manufacturer (Fixed Label) |
| Q4 | In what context is this Action valid? | User |
| Q5 | What event has occurred? (start / stop) | Manufacturer / Observation Layer |
| Q6 | How far has the goal been reached? | Manufacturer / Observation Layer |
| Q7 | For how long can responsibility be held at most? | Manufacturer / Observation Layer |
| Q8 | Does starting this Action affect anything else? | Manufacturer / User |
| Q9 | Does stopping this Action cause a problem? | Manufacturer / User |
The structure discussed earlier is reflected here directly. Q2 and Q3 are what the manufacturer should have already declared via Fixed Label. Q1 and Q4 are what the user fills in through User Label and execution approval. Q5, Q6, and Q7 are provided in real time by the device’s internal processing or the observation layer. Q8 and Q9 are what the manufacturer resolves internally in advance, with the user adding the contextual layer from the broader world.
Below is an example of how a Sustained Action called “Keep Warm” is structured within this protocol.
{
"Switch": [
{
"Label": "Keep Warm",
"ExecutionEffect": {
"HardwareAnchor": 21
},
"Boundaries": [
{"Type": "warning", "Value": "thermal-risk"},
{"Type": "intended-use", "Value": "attended"},
{"Type": "limit", "Value": "max-continuous-10min"},
{"Type": "NotOff", "Value": "temperature > 45C"}
],
"Context": "ArrivingHome",
"EventTrigger": [
{"Condition": 1, "Expected": false}
],
"ProgressThreshold": [
{
"Source": 2,
"TargetValue": 60,
"Condition": "low",
"Meaning": "StopWhenReached"
}
],
"ResponsibilityLimit": {
"MaxDurationSec": 600
},
"StartImpactConstraint": [
{"Type": "NoConcurrentAction", "Targets": [23]}
],
"StopImpactConstraint": [
{"Type": "SafeShutdownRequired", "Value": true},
{
"Type": "ProhibitIfObserved",
"Observation": {
"Source": "LinkStatus",
"Condition": "connected"
},
"Meaning": "DoNotStopWhenLinkConnected"
}
]
}
]
}
[Note 7]
The use of JSON here aims for the smallest form of rule that can be ported and applied anywhere. The questions are fixed, but the form of the answers is variable — subject to negotiation or policy between regulators, platforms, and users.
Tracing how this JSON maps to the nine questions makes the structure clear.
Q1 Intent: “Keep Warm.” The name of the action itself declares the intent. This is the Essence. It may be renamed by the user (User Label), but the Essence declared by the manufacturer must remain.
Q2 ExecutionEffect: {HardwareAnchor: 21}. The manufacturer declares which hardware actually operates when this action executes. The physical entity connected to Pin 21 can be described in the Fixed Label.
Q3 Boundaries: What the manufacturer has left as Fixed Label. Thermal risk, no unattended use, 10-minute limit, do not turn off above 45°C. If this is empty, the AI can conclude that the action’s effects have not been considered.
Q4 Context: “ArrivingHome.” This is the context declared by the user. Without it, the AI cannot determine when to execute. Execution is limited to explicit commands rather than automatic triggers.
Q5 EventTrigger: Defines under what observed conditions this action starts or stops. Typically handled by the device’s internal sensors. If the Fixed Label includes an external observation request, the observation layer tracks this value in real time.
Q6 ProgressThreshold: Stops when the target temperature (60) is reached. Typically handled by the device’s internal sensors. If the Fixed Label includes an external observation request, the observation layer tracks this value in real time.
Q7 ResponsibilityLimit: {MaxDurationSec: 600}. The maximum duration for which the manufacturer can be held responsible. Even if sensors fail, execution terminates after this time. Typically handled by the device’s internal sensors. If the Fixed Label includes an external observation request, the observation layer tracks this value in real time.
Q8 StartImpactConstraint: “This action cannot run concurrently with action 23.” This is an example handled internally by the device. Users can add constraints such as “check whether it is raining before opening the window” or “cannot start when a child is home.”
Q9 StopImpactConstraint: “A safe shutdown procedure is required, and the action cannot be stopped when LinkStatus is connected.” This illustrates a relationship with an external device. Users can add constraints such as “stop only after sufficient temperature is reached.”
The AI checks whether these answers exist, and returns missing items as questions. This JSON is the output of a process in which each question’s answer is explicitly confirmed — through the manufacturer’s design process and the user’s approval process.
When a User Label is being approved, the AI identifies any gaps and requests the user’s judgment.
This “pause and confirm” procedure is the key to building systems that are more accurate and safer than inference alone. What the AI would try to infer already has its answer — with the manufacturer or the user.
The core claim of this protocol is “nine is enough.” This is a falsifiable claim. One can ask “why is this question missing?” and that debate is productive. This document claims these nine questions are the minimal set that decomposes and reconstructs 5W1H for physical execution judgment — that nothing can be removed and nothing needs to be added.
And this protocol does not apply only to physical AI. Calling external APIs, modifying databases, sending messages to users — anything that “changes the state of the world and leaves an irreversible effect” is an action and requires judgment before execution. The physical world is simply the domain where these nine questions appear most clearly.
[Note 8]
7-1. Refusal Is Explanation: Reason for Refusal
When the 9-Question Protocol blocks execution, one thing is missing: the reason for blocking must be communicated to someone.
Example: Execution Refused
User: “Turn on the heater.”
System:
Execution Refused
Reason:
- This action can generate high temperatures.
- Remote operation is restricted by the manufacturer.
- User presence cannot be confirmed.
Suggestion:
- Please verify the local situation before operating the device.
Stopping because of incapability and stopping to preserve a boundary carry opposite meanings for system reliability. The former is a defect. The latter is the intended behavior by design. If these two cannot be distinguished, a properly functioning system is mistaken for a broken one, and attempts to circumvent the boundary are made under the name of “bug fixes.”
When execution is refused, it remains possible to trace why the system made that judgment. Just as it matters to ask “why did the AI execute?” after an accident, it equally matters to ask “why did the AI refuse?” Only a system that can answer both questions is truly auditable.
7-2. 2+α: What Happens at Runtime When the Manufacturer Has Fulfilled Their Responsibility
Once the manufacturer has completed their role through Essence and Fixed Label, what actually remains at runtime?
A significant portion of the nine questions have already been answered during the design and manufacturing phase. The remaining responsibilities have become Fixed Labels. At runtime, only two core questions and additional context-dependent questions remain.
Question ①: Intent Promotion
Asked when the AI detects a repeated user pattern. This is the procedure for confirming Q4 at runtime.
“Would you like to register this pattern as an automation rule?”
This is the procedure for converting AI’s observation into user-approved intent. Without this question, User Labels are generated without user approval. AI can discover. But only the user can declare. This question is the procedure that preserves that boundary.
Question ②: Execution Gate
Asks about effects related to the start or stop of execution: “Is there anything additional to consider when starting/stopping?” This is the procedure for confirming Q8 and Q9 at runtime. Beyond the start/stop effects declared in Fixed Label, it asks whether the user has anything additional to verify in the current context.
The manufacturer considers the effects under general internal device conditions and resolves those issues during design and manufacturing. The user knows the context of this precise moment — the effects in the broader world. This question is the procedure that fills that gap.
+α: Boundary Parameters
Questions the AI dynamically generates when the Fixed Label contains soft signals that have not been explicitly quantified. Things like: “This action generates high heat (High-heat-gen). Above what temperature should it automatically shut off? After how much time should it automatically shut off?” (Q5, Q6, Q7, etc.)
The key point of this structure is not in simplifying runtime.
It lies in the fact that most responsibility is already concluded at the design stage.
Once the manufacturer fulfills their responsibility through Fixed Label, and the user declares their intent through User Label, the core of the execution judgment is determined.
8. Relationship to Existing AI Safety Discussions
This section describes how this framework connects to existing AI safety discussions.
Relationship to alignment research: Alignment research asks “how does AI internalize human values?” This framework asks: “what decisions should not be delegated to AI?” The two questions do not contradict each other. But this framework argues that even before alignment is perfectly achieved, safe execution is possible as long as the ownership structure is clear. In other words, ownership structure is a prerequisite for alignment.
Relationship to regulatory approaches: Most current AI regulation discussions focus on “which models to permit.” This framework asks “is the ownership structure of execution judgment explicit?” Is the Essence declared by name? Has Fixed Label been left? Has User Label been explicitly approved? Are there no ownership gaps? These four can be audited regardless of model architecture. The possibility of technology-neutral regulation opens up.
9. Limitations of This Framework and Open Questions
This framework is not a completed standard. It is an exploratory design memo. Several open questions remain.
Collective ownership: What happens when there are multiple owners? In a smart home device shared by a family, whose User Label takes precedence?
Dynamic ownership transfer: Can ownership change at runtime? How should control in shared spaces be handled? How should control authority in accommodations, rental vehicles, and similar contexts be managed?
Mathematical foundation: The claim that nine questions are sufficient and minimal is intuitively compelling, but there is no formal proof.
If any reader has better answers to these questions, that is precisely why this design memo has been published in exploratory form.
Conclusion
AI must not infer what it does not own. AI discovers ownership gaps, and returns them to their owners.
The manufacturer declares Essence through the name of the action. On top of that name, they pass what remains after fulfilling their responsibility as Fixed Label. The user declares their context and intent as User Label on top of that. AI only maps between these two declarations. The ISE model structurally separates State and Intent. The 9-Question Protocol defines what answers must exist for execution to be permitted, and ensures that a refusal when an answer is missing becomes a signal pointing to accountability. 2+α shows how all of this operates as a flow of responsibility at runtime.
If this is correct, the priorities of AI Agent design change. “How well does it reason?” comes after “where does it stop reasoning?” “How autonomously does it act?” comes after “does it act autonomously without ownership gaps?”
The role of the manufacturer also changes. In the age of AI, the manufacturer’s role does not end with building a device. Declaring the Essence of the action by name, and passing what remains after fulfilling responsibility as Fixed Label — this too is the manufacturer’s role. Handing over execution authority to AI without doing this is handing over execution rights while leaving accountability suspended in midair.
The starting point is simple. Device manufacturers and Agent developers explicitly declaring somewhere the Essence of an action (its name and actual effects) and its boundaries.
From the manufacturer’s perspective, an explicit record remains as evidence of having fulfilled their responsibility.
For smart homes, existing standards (Matter’s Fixed/User Label) can be utilized directly. Products released with Matter certification can be easily updated via OTA.
Notes
[1] While this document addresses a structure that is not tied to any specific platform, it uses the Connectivity Standards Alliance’s Matter as its reference starting point. Matter is a smart home standard shared across diverse manufacturers and platforms, and represents the primary environment where physical devices connect with AI and automation systems. The structure of this document is not limited to Matter and can be applied equally to any physical or non-physical execution system.
[2] In Matter, Label/Value format information can be provided through the Fixed Label Cluster (0x0040) and User Label Cluster (0x0041). The Fixed Label Cluster is read-only information defined by the manufacturer, used to describe the structure or function of a device. The User Label Cluster has the same structure but is modifiable by the user or controller, used to express user-defined meaning. This document extends this distinction to describe roles and responsibility structures. This is implementable within Matter’s specification.
[3] Currently, Matter and major smart home platforms have no agreed standard for the names of individual actions. As a result, the same action may appear differently across platforms, or be represented only in Switch form. This document does not propose a method of standardization, but rather the principle that the name of an action should be a declaration of its Essence.
[4] There is no need to standardize the individual values (label/value pairs) within a Fixed Label. What matters is not uniformity of format, but the manufacturer’s responsible declaration of known facts. A Fixed Label is not a field for conforming to a rigid schema — it is the declaration space where the manufacturer expresses their responsibility boundary. Therefore, Fixed Label is not a strict target for standardization; each manufacturer should be free to describe it based on their own context and understanding. AI’s role is not to reconstruct or constrain this expression, but to read it as-is and reflect it in execution judgment.
[5] When a user assigns the same context to multiple actions, that grouping is called a Scene. A Scene is not a new concept. It is a collection of approved User Labels. The structure of ownership and responsibility remains identical to that of individual actions. Only the form differs.
[6] The ISE model includes two complementary concepts. World Baseline is the system’s recognition of repeatedly observed patterns — environment, seasons, user habits — as the baseline state of the world. However, World Baseline is descriptive and does not approve execution. System protection logic blocks unintentional events outside the permitted range from being used as judgment inputs.
[7] The JSON is not a completed standard, but a choice of the smallest representation structure that can be commonly used across diverse systems. The goal of this document is not to define a comprehensive specification, but to define minimal rules.
[8] For non-physical actions (text generation, API calls, etc.), answers to some questions may not exist. In such cases, null is not a declaration that “there is no answer,” but rather that “this question is semantically empty for this action.” Null is not permission to skip execution judgment. Physical AI has the highest responsibility density, with most questions being non-null. As actions become more non-physical, there are more nulls and lower responsibility density — but the questions themselves do not disappear.