Building a Supabase Driver for Keboola Connection

dev

Over the past weeks I've been working on integrating Supabase as an external data source into the Keboola Connection platform. The result is a complete driver with OAuth 2.0 authentication, automatic schema discovery, and Supabase Marketplace integration. Along the way I hit some genuinely tricky bugs -- here's the full story.

Supabase + Keboola

Supabase is an open-source Firebase alternative built on top of PostgreSQL. It offers authentication, real-time subscriptions, storage, and a full-featured PostgreSQL database. Keboola is a data pipeline platform that enables ETL processes over various data sources.

The goal was to allow users to connect their Supabase project as an external data source in Keboola and automatically read data from it for further processing in pipelines.

The Pivot: From CLI to Web Controller

The original plan was a CLI command to register Supabase OAuth credentials manually. But testing a full OAuth flow from the terminal is painful -- you have to copy URLs back and forth, handle redirects manually, and there's no visual feedback.

So I pivoted early: instead of a CLI tool, I built a web-based test harness at /supabase/connect. A simple form where you enter client_id and client_secret, it generates the redirect URL from the current host, sends you to Supabase for authorization, and handles the callback. This decision proved crucial -- the iterative debugging cycle that followed would have been impossible from a terminal.

The Stateless OAuth Puzzle

The first real challenge came from Symfony's security architecture. Routes marked with #[AsPublicAction] run on a stateless firewall -- no session available. But OAuth flows typically need to store the PKCE verifier between the authorization request and the callback. Symfony threw: "Session was used while the request was declared stateless."

The solution was to encode everything -- client credentials, PKCE verifier, redirect URI -- into the OAuth state parameter itself, signed with HMAC-SHA256 using kernel.secret. The callback decodes the state, verifies the signature, and extracts the PKCE verifier. Completely stateless, completely secure.

This pattern turned out elegant: no session dependency, no database storage, HMAC prevents tampering, and the state parameter is validated against CSRF attacks by design.

OAuth API Quirks

Two smaller issues came next:

The approval_prompt mystery. The League OAuth2 library adds approval_prompt to authorization requests by default (a Google OAuth convention). Supabase rejected it: "Unrecognized key(s) in object: 'approval_prompt'." Fix: override getAuthorizationParameters() and filter it out.

Project vs. account OAuth apps. I initially created the OAuth app at the project level (/project/{ref}/auth/oauth-apps), which returned "Unrecognized client_id." Supabase has two distinct OAuth scopes -- project-specific apps and integration apps at the account level (/account/integrations). For a marketplace integration that needs cross-project access, you need the latter.

The PKCE Double-Handling Bug

This was the trickiest bug of the entire project. After solving the stateless flow and API quirks, token exchange kept failing: "Invalid or expired OAuth authorization."

I tried everything systematically:

  1. Added detailed error logging -- same error
  2. Switched to HTTP Basic Auth for the token endpoint -- still failed
  3. Added Accept: application/json header -- still failed
  4. Wrote a completely custom exchangeCodeForTokens() method bypassing the library -- still failed

The authorization flow worked. The callback received the code. But token exchange consistently returned the same cryptic error.

The breakthrough: the League OAuth2 library has built-in PKCE support. My code was also handling PKCE manually for the stateless flow. Two correct PKCE implementations running simultaneously -- the library sent one code_verifier, my stateless code sent a different one. Supabase saw a mismatch and rejected the exchange.

The fix was one line:

protected $pkceMethod = null; // Disable library's built-in PKCE

This is a classic integration bug -- two independently correct systems conflicting when combined. The debugging process took hours because each individual piece looked right.

Two Connection Modes

The driver supports two ways to access Supabase data:

Direct PostgreSQL connection -- classic database connection via connection pooler (port 6543). Full SQL query access, suitable for larger data volumes.

REST API (PostgREST) -- connection via Supabase REST endpoint with a service_role key. Simpler setup without exposing database passwords.

Both modes support encrypted credential storage with separate encryption keys for database passwords, API keys, and OAuth tokens.

Automatic Schema Discovery

After a successful OAuth connection, a SupabaseProjectSetupJob runs in the background that:

  1. Calls ListSchemasCommand on the Supabase driver via protobuf interface
  2. Discovers available database schemas
  3. Creates an external bucket in Keboola for each schema
  4. Sets up automatic refresh every 6 hours

The protobuf-based communication is architecturally interesting -- commands are defined in a shared storage-driver-common monorepo and implemented in the storage-driver-postgres driver. Clean boundaries between components, no direct coupling.

Users see their data in Keboola immediately after connecting, with no manual configuration needed.

Supabase Management API

The integration includes a client for the Supabase Management API that can:

  • Fetch project details (region, configuration)
  • Retrieve API keys (anon, service_role)
  • Read connection pooler configuration
  • List projects and organizations

This information is used during automatic credential setup after the OAuth flow.

The Full User Flow

The end-to-end experience after all the pieces came together:

  1. User clicks "Connect Supabase" and authorizes via OAuth
  2. Callback stores tokens, redirects to a setup page
  3. User selects their Supabase project and optionally provides a PostgreSQL DSN
  4. Keboola creates an organization (using the Supabase org name) and a project with a Supabase backend
  5. Background job discovers tables, registers buckets, schedules auto-refresh
  6. User lands on their project dashboard with data already flowing

Project Scale

The entire driver was built in one week, roughly 60 hours of intensive work with Claude Code. The result is around 25,000 changes -- new classes, controllers, migrations, tests, CLI commands, and documentation. From zero to a complete integration.

The development workflow was highly iterative: define what's needed, implement it, test immediately in the browser, debug from actual error messages. Quick feedback loops made it possible to solve bugs like the PKCE double-handling issue within hours instead of days.

The AI agent significantly accelerated implementation of repetitive patterns (controllers, DTOs, tests), boilerplate code generation, and navigation across the large Keboola Connection codebase. Key architectural decisions and the security model obviously required human judgment.

Lessons Learned

  • Stateless OAuth is possible with careful state parameter design -- encode everything in HMAC-signed state, no session needed
  • Library abstractions can conflict with manual implementations -- the PKCE double-handling bug taught me to always check what the library does automatically
  • OAuth provider APIs vary significantly -- approval_prompt, auth methods, app scopes differ between providers
  • PKCE is essential for OAuth flows where the client secret cannot be fully secured (marketplace integrations, CLI tools)
  • Separate encryption schemes for different credential types significantly improve the security model
  • A web-based test harness beats CLI testing for OAuth flows by an order of magnitude
  • AI agents dramatically change development speed -- 25,000 changes in a week would be unthinkable without them

The integration is currently in pull request stage and going through code review. Looking forward to seeing it in production.

Previous Post Next Post

Related Posts