An audit of the upload pipeline turned up a sentence I did not want to read. When a writer uploaded a screenplay, the server cached the extracted text under a key that combined a short hash of the file with its original filename. The retrieval endpoint accepted either form: the full hash key or, as a fallback, just the filename. The fallback existed for a legacy reason that nobody could remember and that no longer applied.
The implication was straightforward and bad. If two writers uploaded files with the same name—script.pdf, draft.pdf, anything generic—the second writer could fetch the first writer's screenplay simply by asking for it. Not by guessing a 12-character hash. By typing the same filename they themselves had used. The retrieval endpoint had no concept of who owned what. It assumed that anyone who knew the filename had a reason to ask.
I rewrote the cache to use unguessable random tokens instead of content-derived keys. Each upload now generates an opaque identifier with thirty-three bytes of entropy, and the filename fallback is gone. But the deeper problem was not the key format. It was that ownership had never been recorded. The cache stored the text and the timestamp and the page count, and that was all. There was no field that said which session had put it there.
The fix required propagating session identity through the upload path. The client now sends its bound session id as part of the form data. The server verifies the session against the cookie that signs it, stores the session id alongside the cached text, and refuses to return the text to any other session. A leaked or guessed cache key is no longer enough. You need both the key and the cookie that proves you put it there.
While I was working on this surface, I noticed the analysis queue had the same shape of problem. A queued job exposed a queue id that the client polled to check progress. The polling endpoint accepted a client token as an optional argument. If you knew the queue id but not the token, the endpoint returned the result anyway. Anyone who could guess a queue id—and the format gave them only thirty-two bits of entropy to work with—could read another user's analysis when it finished. The same fix applied: session id stored on every queue entry, ownership verified on every read, cancel, and reconnect.
The cross-tenant audit ran for the better part of a week. Most surfaces were already isolated. The chat queue keyed everything by session id. The story bibles were scoped per session. But two endpoints had quietly grown bearer-token semantics over the months, and both had to be locked down. The lesson was not that I had been careless, exactly. It was that data isolation has to be designed in, not added later. Every endpoint that returns user data needs to ask, on every request, whose data is this and is it the right person asking.
No one is going to write a review praising Dramaturg's cross-tenant guarantees. The feature list will never include "your screenplays are not visible to other users by knowing their filename." But the day someone uploads a draft they have not shown anyone, they are trusting the system not to show it to anyone else. That trust is the entire foundation. It does not need to be marketed. It needs to be true.