Shoal Code Review — v0.5.0 (feature-fish-integration)¶
1. Architecture Summary¶
User
│
├── CLI (typer) ─── src/shoal/cli/
│ ├── session.py (new/ls/attach/fork/kill/status/logs/rename/info)
│ ├── robo.py (supervisor agent management)
│ ├── mcp.py (MCP server pool commands)
│ ├── worktree.py (git worktree lifecycle)
│ ├── watcher.py (start/stop background daemon)
│ ├── demo.py (demo environment)
│ ├── nvim.py (neovim integration)
│ └── setup.py (fish shell installer)
│
├── API (FastAPI) ─── src/shoal/api/server.py
│ ├── REST endpoints (CRUD sessions, MCP pool, status)
│ └── WebSocket (real-time status broadcast)
│
├── Dashboard (fzf popup) ─── src/shoal/dashboard/popup.py
│
└── Fish Shell Integration ─── src/shoal/integrations/fish/
├── installer.py (copies templates to ~/.config/fish/)
└── templates/ (bootstrap, completions, dashboard, quick-attach)
┌──────────────────────┐
│ Core Layer │
│ ┌────────────────┐ │
│ │ state.py │ │ Session CRUD + validation
│ │ db.py │ │ Async SQLite (WAL mode)
│ │ config.py │ │ TOML config loading
│ │ tmux.py │ │ Subprocess wrappers
│ │ git.py │ │ Subprocess wrappers
│ │ detection.py │ │ Pure pane → status function
│ │ notify.py │ │ macOS osascript
│ │ theme.py │ │ Rich/tmux styling
│ └────────────────┘ │
└──────────────────────┘
Services:
watcher.py ── polls tmux panes, updates status in DB, sends notifications
mcp_pool.py ── socat-based MCP server lifecycle (start/stop/health)
mcp_proxy.py ── stdio-to-unix-socket bridge via socat execvp
status_bar.py ── tmux status bar generator
Source of truth: SQLite DB (~/.local/state/shoal/shoal.db)
- Sessions stored as JSON blobs with id/name index
- Robo states in separate table
- WAL mode for concurrent read access
- Single-writer with no locking (read-modify-write pattern)
Control flow: CLI + API are parallel entry points into the same core layer.
Both call state.py functions which hit db.py. Tmux/git wrappers are sync
subprocess calls. The watcher is a separate process polling the same DB.
Observations:
- Fish scripts are thin (good) -- just abbreviations, keybindings, and fzf wrappers
- tmux commands are centralized in core/tmux.py (good)
- CLI and API duplicate session creation logic (lines 73-221 in session.py vs 265-353 in server.py) -- this is a maintenance risk
- No single "controller" layer -- CLI and API both directly import and orchestrate core functions
2. Top 10 Issues (Ranked by Severity)¶
1. Unauthenticated API with RCE capabilities — server.py:196-202,422-428,592¶
The API binds to 0.0.0.0:8080 with allow_origins=["*"], no authentication, and exposes POST /sessions/{id}/send which sends arbitrary keystrokes to tmux panes. This is a remote code execution vector.
# server.py:422-428 — anyone on the network can type into your agents
@app.post("/sessions/{session_id}/send")
async def send_keys_api(session_id: str, body: SendKeysRequest):
s = await get_session(session_id)
if not s:
raise HTTPException(status_code=404, detail="Session not found")
tmux.send_keys(s.tmux_session, body.keys) # arbitrary keystrokes
return {"message": "Keys sent"}
2. Command injection via socat EXEC: — mcp_pool.py:84-88¶
User-supplied command from POST /mcp is passed to socat's EXEC: which shell-evaluates it. An attacker can execute arbitrary shell commands.
# mcp_pool.py:84-88
proc = subprocess.Popen(
["socat", f"UNIX-LISTEN:{socket},fork,reuseaddr",
f"EXEC:{command},pipes"], # shell-evaluated by socat
...
)
3. Command injection via fzf --preview — quick-attach.fish:6¶
fzf's {} placeholder is shell-evaluated. If a session name contained metacharacters (currently mitigated by validate_session_name), it would be RCE. The defense is implicit, not explicit.
4. Read-modify-write race in DB — db.py:139-147¶
update_session reads the full JSON blob, modifies it in Python, and writes it back. Concurrent updates (watcher + CLI, or two API requests) silently lose data. No row-level locking.
# db.py:139-147
async def update_session(self, session_id: str, **fields):
session = await self.get_session(session_id) # read
updated = session.model_copy(update=fields) # modify in Python
await self.save_session(updated) # overwrite entire blob
5. Lossy tmux name sanitization causes silent collisions — state.py:56-61¶
Two distinct session names like my.project and my-project both produce shoal_my-project. The second tmux.new_session will fail or collide with no detection.
# state.py:56-61
def _sanitize_tmux_name(name: str) -> str:
return name.replace(".", "-").replace(":", "-").replace("/", "-") # lossy
6. No subprocess timeouts anywhere — tmux.py, git.py¶
Every subprocess.run call blocks indefinitely. A hung tmux server, unresponsive git remote, or credential prompt hangs the entire process (CLI, API server, or watcher).
# tmux.py:27-32 — no timeout parameter
return subprocess.run(["tmux", *args], capture_output=capture, text=True, check=check)
7. Startup command template injection — models/config.py:20, session.py:197-204¶
The default startup command "send-keys -t {tmux_session} '{tool_command}' Enter" embeds tool_command inside single quotes. If tool_command contains ', the tmux command breaks/injects. Additionally, cmd.format() with user-controlled values has no KeyError handling for undefined template variables.
8. Orphaned session state on fork tmux failure — session.py:412-421¶
In _fork_impl, the session is created in DB (line 413-414) before tmux.new_session (line 421). If tmux fails, the DB session is never cleaned up (unlike _add_impl which has a try/except on lines 180-191).
9. Watcher crash on single poll failure — watcher.py:47-50¶
The main loop has no per-cycle exception handling. If _poll_cycle throws (e.g., DB locked, malformed TOML, or unexpected exception type), the watcher exits entirely. All status detection stops.
# watcher.py:47-50
while self._running:
await self._poll_cycle() # if this throws, watcher dies
await asyncio.sleep(self.poll_interval)
10. Duplicate session creation logic — session.py:73-221 vs server.py:265-353¶
The CLI's _add_impl and the API's create_session_api duplicate the entire session creation flow (~100 lines each): git validation, worktree setup, name generation, tmux session creation, startup commands. They will inevitably diverge. The API version already has subtle differences (e.g., different worktree path inference, no _infer_branch_name call).
3. Reliability Hardening Plan¶
P0 — Do before sharing with other users¶
- [ ] Add subprocess timeouts. Add
timeout=30totmux._run()and git_run(). Usetimeout=120forgit.push(). Handlesubprocess.TimeoutExpired. - [ ] Wrap watcher poll cycle in try/except. In
watcher.py:48, catchExceptionaround_poll_cycle(), log it, and continue. - [ ] Add tmux cleanup on fork failure. In
session.py:421, wraptmux.new_sessionin try/except anddelete_sessionon failure (mirror_add_impl's pattern). - [ ] Detect tmux name collisions. Before
tmux.new_session, calltmux.has_session(tmux_session)and fail with a clear error if a collision is detected from lossy sanitization. - [ ] Fix
ConnectionManager.disconnectcrash. Usesetinstead oflistforactive_connections, or wrap.remove()in try/except. - [ ] Fix
broadcastto handle per-connection errors. Catch exceptions per-connection so one broken WS doesn't block others.
P1 — Do before scaling beyond a handful of sessions¶
- [ ] Make
update_sessionatomic. UseUPDATE sessions SET data = ? WHERE id = ?with SQL-level field updates instead of read-modify-write. Or useBEGIN IMMEDIATEtransactions. - [ ] Add
status is-interactive; or returntobootstrap.fish. Prevents errors in non-interactive fish contexts. - [ ] Change
set -Utoset -gin bootstrap.fish:38. Universal variable writes on every command is a performance concern. - [ ] Add stale PID/socket cleanup on watcher/mcp startup. Check for orphaned processes and clean up before starting.
- [ ] Add PID file locking for watcher. Use
fcntl.flockto prevent duplicate watcher instances.
P2 — Long-term reliability¶
- [ ] Consolidate session creation into a single service function called by both CLI and API.
- [ ] Add structured logging (JSON format option) with log rotation for watcher and API.
- [ ] Add health check endpoint that verifies DB connectivity.
- [ ] Add graceful degradation — if DB is inaccessible, CLI should show a clear error rather than a Python traceback.
4. Security Findings¶
Finding 1: Unauthenticated RCE via API¶
Impact: Critical. Anyone on the same network can execute arbitrary commands in any tmux session.
Fix (immediate):
# server.py — change default bind to localhost
if __name__ == "__main__":
uvicorn.run(app, host="127.0.0.1", port=8080) # was 0.0.0.0
# cli/__init__.py — change default host
def serve(host: str = "127.0.0.1", port: int = 8080):
Fix (proper): Add token-based auth. Generate a token on shoal init, store it in config, require it as a header.
Finding 2: Command injection via MCP pool¶
Impact: High via API, low via CLI (CLI requires local access).
Fix:
# mcp_pool.py — use shlex.quote or validate command against allowlist
import shlex
# Option A: Quote the command for socat's EXEC
safe_cmd = shlex.quote(command)
proc = subprocess.Popen(
["socat", f"UNIX-LISTEN:{socket},fork,reuseaddr",
f"EXEC:{safe_cmd},pipes"],
...
)
# Option B (better): Only allow known servers, reject arbitrary commands from API
@app.post("/mcp", ...)
async def start_mcp_server_api(data: McpCreate):
if data.command and data.name not in KNOWN_SERVERS:
raise HTTPException(400, "Custom MCP commands not allowed via API")
Finding 3: fzf preview command injection¶
Impact: Medium. Currently mitigated by validate_session_name restricting charset to [a-zA-Z0-9_/.-], but the defense is implicit.
Fix:
# quick-attach.fish — use single quotes around {} to prevent word splitting
--preview='shoal info -- {}'
# Also add -- to prevent flag injection from names starting with -
Finding 4: AppleScript injection (minor, already handled)¶
notify.py correctly escapes " and \. However, it doesn't handle other AppleScript control characters like \r, \n, or \t, which while not injection vectors, could corrupt the notification display. Low priority.
Finding 5: Config directory permissions not validated¶
If ~/.config/shoal/ or ~/.local/state/shoal/ is world-writable, any local user could inject tool configs that execute arbitrary commands via startup commands.
Fix:
# config.py — check permissions on ensure_dirs()
import stat
def ensure_dirs():
for d in (config_dir(), state_dir(), runtime_dir()):
d.mkdir(parents=True, exist_ok=True)
mode = d.stat().st_mode
if mode & (stat.S_IWGRP | stat.S_IWOTH):
import warnings
warnings.warn(f"Insecure permissions on {d}: {oct(mode)}")
Finding 6: curl | sh in CI¶
.github/workflows/ci.yml:16 pipes a remote script directly to sh.
Fix:
5. Refactors That Would Pay Off Most¶
Small (< 1 day each)¶
-
Extract session creation service. Move the shared logic from
_add_implandcreate_session_apiinto a singlecreate_session_full()function incore/or a newservices/session.py. Both CLI and API call it. This eliminates the duplicate ~100 lines and prevents future divergence. -
Replace
_initializedclass variable with instance variable.ShoalDB._initializedis a class variable but semantically belongs to the instance. Ifreset_instanceis called and a new instance is created, the old class-variable state could leak. Move it to__init__. -
Consolidate
initandcheckincli/__init__.py. Lines 85-214 are largely duplicated. Extract shared dependency/directory checks into a helper.
Medium (1-3 days each)¶
-
Make DB updates field-level atomic. Instead of the current read-modify-write with
model_copy, store session fields as SQL columns (or usejson_set()in SQLite 3.38+). This eliminates the race condition entirely. -
Add a
ShoalErrorhierarchy. Currently errors are a mix ofValueError,FileNotFoundError,RuntimeError,SystemExit, andtyper.Exit. A proper taxonomy (SessionNotFoundError,ToolConfigError,TmuxError) would make error handling consistent and testable. -
Add XDG support to fish installer. Respect
$XDG_CONFIG_HOMEinstead of hardcoding~/.config/fish.
Large (> 3 days)¶
-
Wrap all tmux operations in an async executor. While the docstring in
tmux.pyacknowledges the sync-in-async tradeoff, as session counts grow this will become a bottleneck. Useasyncio.to_thread()wrappers. -
Add a proper event bus. The current model of polling DB for changes (watcher + API poller) doesn't scale. An in-process event bus (or SQLite
UPDATEtriggers with notifications) would reduce polling and enable real-time reactivity. (I see you have anopenspec/changes/event-bus/— this is the right direction.)
6. Testing Plan¶
What to unit test (pure functions, no I/O)¶
| Module | Priority | Notes |
|---|---|---|
detection.py |
Already good | Add case-sensitivity and regex-special-char tests |
notify.py._escape_applescript_string |
High | 0 tests currently. Test ', ", \, newlines, unicode |
state.validate_session_name |
Already good | Add leading/trailing slash, ../ path traversal |
state._sanitize_tmux_name |
High | Test collision detection (e.g., a.b vs a-b) |
models/ |
Medium | Add invalid input tests, boundary values |
theme.py |
Low | Pure formatting, test tmux_status_segment edge cases |
_infer_branch_name |
Low | Already trivial |
What to integration test (with mocking)¶
| Module | Priority | Notes |
|---|---|---|
| Session creation (CLI) | High | Happy path. Currently only error cases tested |
| Session creation (API) | High | Currently SKIPPED |
kill command |
High | 0 tests |
fork command |
Medium | Only startup commands tested |
watcher._poll_cycle |
Medium | Add exception-during-poll and concurrent-update tests |
mcp_pool.start/stop |
High | 4 tests SKIPPED |
| Fish installer | Medium | Add missing-template and copy-failure paths |
demo start |
Medium | 5 tests SKIPPED |
How to test tmux/fish¶
# For tmux: mock subprocess.run at the tmux._run level
@patch("shoal.core.tmux._run")
def test_new_session_with_cwd(mock_run):
tmux.new_session("test", cwd="/tmp")
mock_run.assert_called_once_with(
["new-session", "-d", "-s", "test", "-c", "/tmp"]
)
# For fish: use `fish -n` (no-execute) to syntax-check templates
def test_fish_templates_syntax():
template_dir = Path(__file__).parent.parent / "src/shoal/integrations/fish/templates"
for f in template_dir.glob("*.fish"):
result = subprocess.run(["fish", "-n", str(f)], capture_output=True)
assert result.returncode == 0, f"{f.name}: {result.stderr}"
# For integration tests (mark with @pytest.mark.integration):
# Require real tmux, real git, real fish
# Use unique session prefixes to avoid collisions with user sessions
# Clean up in teardown (kill all shoal_test_* sessions)
Current gaps summary¶
- 8 skipped tests across demo, MCP pool, and API
- 0 tests for:
notify.py,theme.py,popup.py,nvim.py,setup.pyCLI,kill,attach,fork(full flow) - Coverage floor is 57% (
pyproject.toml:67) — reasonable for v0.5, but should be 70%+ before wider distribution
7. Quick Wins (< 2 hours each)¶
-
Bind API to
127.0.0.1by default. Two lines changed inserver.py:592andcli/__init__.py. Eliminates the most critical security issue immediately. -
Add
status is-interactive; or returntobootstrap.fish. One line at top. Prevents errors when fish sources this in non-interactive contexts. -
Change
set -Utoset -ginbootstrap.fish:38. One character change. Eliminates filesystem write on every command. -
Add subprocess timeouts. Add
timeout=30parameter totmux._run()andgit._run(). Handlesubprocess.TimeoutExpiredwith a clear error message. ~10 lines changed. -
Fix watcher crash on poll failure. Wrap
_poll_cycle()call in try/except inwatcher.py:49. ~4 lines.
while self._running:
try:
await self._poll_cycle()
except Exception:
logger.exception("Poll cycle failed, continuing")
await asyncio.sleep(self.poll_interval)
- Fix
ConnectionManagerbugs. Use asetfor connections and add per-connection error handling in broadcast. ~8 lines.
class ConnectionManager:
def __init__(self):
self.active_connections: set[WebSocket] = set()
def disconnect(self, websocket: WebSocket):
self.active_connections.discard(websocket)
async def broadcast(self, message: dict):
broken = []
for conn in list(self.active_connections):
try:
await conn.send_json(message)
except Exception:
broken.append(conn)
for conn in broken:
self.active_connections.discard(conn)
-
Add try/except around
tmux.new_sessionin_fork_impl. Mirror the pattern from_add_impl. ~6 lines. -
Fix CI to use
astral-sh/setup-uv@v5. Replacecurl | shwith the official action. 2 lines inci.yml. -
Remove unused imports in
installer.py. Deletefrom rich.console import Panelandfrom rich.table import Table. 2 lines. -
Add
fish -nsyntax check to CI. Add a CI step that validates fish template syntax. ~5 lines inci.yml.
Fish Integration Specific Issues¶
From bootstrap.fish¶
- Line 1: No
status is-interactive; or returnguard → errors in non-interactive contexts - Line 16:
\cs(Ctrl+S) conflicts with terminal flow control (XOFF signal) — users may think terminal froze - Line 38:
set -U __shoal_last_sessionwrites to disk on every command → performance issue - No guard against re-sourcing → duplicate event handlers could accumulate
From completions.fish¶
- Lines 13-20: Dead helper functions (
__shoal_mcp_servers,__shoal_robo_profiles) defined but never used - Line 29:
detachsubcommand has no session-name completion - Missing:
setup fish --forceflag completion
From quick-attach.fish¶
- Line 6:
--preview="shoal info {}"→ shell injection via fzf's{}(currently mitigated by name validation, but defense is implicit) - Line 12:
shoal attach $sessionlacks--to prevent flag injection - No fzf availability check in the function itself (only in keybinding)
From installer.py¶
- Line 24:
get_fish_config_dir()logic is broken — always returns a path (neverNone), making the error check on line 62 dead code - No XDG_CONFIG_HOME support → installs to wrong directory if user has custom XDG config
- Line 113: Bare
except Exceptionwith unhelpful error messages - No uninstall mechanism
- Lines 10-11: Unused imports (
Panel,Table)
Overall assessment: The codebase has a clean architecture with good separation of concerns. The core abstractions (db.py, state.py, tmux.py) are well-designed. The main risks are: the unauthenticated API surface, the lack of subprocess timeouts, and the read-modify-write race in the DB. The fish integration is well-structured but needs the interactive guard and the fzf preview fix. The test suite has good bones but significant coverage gaps, especially around happy paths and the 8 skipped tests. Fixing the quick wins above would materially improve both security and reliability for v0.5.0 release.