Managing Stripe as a source of truth

Eric Goldman

•

Jul 14, 2021

•

7 min read

We’re Sequin. We let you build apps and workflows on top of APIs like Stripe and GitHub using just SQL and Postgres. Changes in the API sync to your database and changes in your database sync to the API. We use Sequin to power Sequin's Stripe integration, which means all our Stripe tables are right next to our application tables.

Last updated: 11/17/2023

If you're using Stripe, there's a good chance your app's functionality depends on data stored in Stripe. Some requests are easy. A user's billing details page is often just one API call to Stripe's v1/payment_methods?customer={customerId} endpoint (and a little formatting) away.

But some requests are hard. Some require a combination of data stored in Stripe along with data in your own app.

For example, showing a re-order button on a site's order history page should have full details of both past purchases (from Stripe) and current stock levels from an inventory management system.

What are the best approaches for implementing these kinds of complex integrations with Stripe?

Practical example: WebFlicks Movie Search

A practical example will help us evaluate the pros and cons of different approaches to integrating Stripe data back into your app.

Let's say you've built a movie streaming service, WebFlicks, where your users pay for subscriptions to access different packages of shows and movies. $3/month for animated movies, $2/month for action movies, etc.

Now, you want your Movie Search feature to intelligently mark movies as available for each user. So as a user searches for content, the results consider a user's purchase and subscription history by replacing a Buy Now button with a Watch Now button.

The requirements might look something like:

A movie should be available if the user has an active subscription including that movie
A movie should be available if the user has purchased it
A movie's available flag should be immediately updated when:
A user starts a new subscription
A user's subscription is cancelled and expired
A movie is added to and removed from a genre

Now, let's say your search is based on a table called movie_search:

movie_id	subscription_groups	title	keywords
1	`action`, `adventure`	Indiana Jones	'harrison ford action archaeologist'
2	`action`, `sci-fi`	Star Wars	'harrison ford space opera'

You'll use Postgres' tsvector and tsquery for a simple Full Text Search interface. For example, the following sql clause will match any rows where the keywords field contains both harrison and ford:

to_tsvector(movie.keywords) @@ to_tsquery('harrison & ford')

A solution for running a user search, including a boolean flag (included_in_user_subscription), might look something like this:

def get_movie_results(customer_id, query):
    # somehow fetch the users's active subscriptions
    user_subscribed_groups = get_subscription_groups_for_customer(customer_id)
    query = """
      select
        movie_search.movie_id,
        movie_search.title,
        -- We'll use postgres' '&&' (overlap) operator to check if the user's subscriptions
        -- include any of the subscription_groups for the movie.
        -- The users's 'subscription_groups' are parameterized here as '$1'
        (movie.subscription_groups && $1) as included_in_user_subscription
      from movie_search
      where to_tsvector(movie.keywords) @@ to_tsquery($2)
    """
    return db.execute(query, [subscription_groups, query])

To serve these search results, we need to know the details of any subscriptions currently active for the user so that we can include those details as part of the search query. To do this, we need access to the full subscription state and subscription items to serve any request on WebFlick's search page. What is the best way to do this?

Common Approaches

There are several ways to handle this kind of problem, each with their own downsides and edge cases.

Method 1: Hammering The Stripe API

The most obvious approach is to simply query Stripe via their API whenever you need to access the data. Maybe with a short-lived cache.

One strategy for this may be to do something like the following:

List the customer's subscriptions and subscription items via Stripe's /v1/subscriptions?customer=<customerId> endpoint. The response includes subscriptionItems as a list (which may require separate pagination calls) which each contain a reference to a product.
Using the productIds from the subscriptionItems you could now use /v1/products?id=<productId1>,<productId2>,<productId3> (again, requiring pagination if you have a lot of productIds) to get that metadata [1], which you can then merge into the results from your movie search system to mark movies which are included in active subscriptions as available.

[1] Depending on your Stripe integration, a Stripe ProductId could map to a number of different objects in your system such as a specific Movie, a Genre, a Movie Studio or something like New Releases. There are multiple ways you may be storing the relationship between Stripe's concept of a product and your own, but a common pattern is to store this on the product.metadata field.

An implementation of this method might look something like the following. We'll use Stripe's Python API as an example, including its auto_paging_iter helper method that handles pagination. Note: this method may use as few as 2 api calls, but any pagination required will still count towards Stripe's API rate limits of 100 requests per second.

def get_customer_subscription_groups(customer_id: str):
    subscribed_product_ids = set()
    subscriptions = stripe.Subscriptions.list(customer=customer_id)
    for subscription in subscriptions.auto_paging_iter():
        for subscription_item in subscription.items.auto_paging_iter():
            product_id = subscription_item.price.product
            subscribed_product_ids.add(product_id)

    subscription_groups = set()
    products = stripe.Products.list(ids=list(subscribed_product_ids))
    # for each product, extract the 'subscription_group' and add it to our set
    for product in products.auto_paging_iter():
        metadata = product.metadata
        subscription_group = metadata.get('subscription_group')
        subscription_groups.add(subscription_group)
    return list(subscription_groups)

Pros:

Stripe is the ultimate source of truth for subscription/order data
It's simple
It's usually fast to onboard a new feature

Cons:

The Stripe API doesn't have a lot of filter options, so you'll probably need to paginate
The Stripe API rate limit of 100/s would allow maybe 20/25 search page requests per second due to multiple calls
If you exceed these limits you could start interfering with orders and payments!
Caching the results from the Stripe API for more than a few minutes could lead to a bad user experience if someone sees a movie which they know they should have access to but don't.

Method 2: Maintain Your Own Copy Of The Data

Having order and subscription data in your own database gives you a lot more options! A (relatively) simple SQL query can merge your realtime subscription data directly with your inventory data efficiently and flexibly. This query makes use of stripe.subscription, stripe.subscription_items and stripe.product tables which contain the same data as you'd get from the Stripe API. The approach is similar to Including Subscriptions In The Search Query, outlined above, but with the added benefit of the user's subscription database being pulled directly from the same database as part of the query.

select
    movie_search.movie_id,
    exists(
        select 1
        from stripe.subscription s
        join stripe.subscription_item si
            on s.id = si.subscription_id
        join stripe.product p
            on si.product_id = p.id
        where s.customer_id = '<Stripe_customer_id>'
        and p.metadata -> 'group' = ANY(movie.subscription_groups)
    ) as included_in_subscription
from movie_search
where to_tsvector(move.info) @@ to_tsquery('harrison & ford')

This allows a fast and scalable interface for search without having to touch the Stripe API. If you can keep these tables up to date in realtime then you have a convenient and scalable (e.g. via database replication) solution without any caching and hard rate limits.

So, what's the best way to build and maintain this database?

Option 1. Update State With Stripe's Webhooks

Stripe's webhooks mechanism allows you to register an HTTP endpoint in your own API, which Stripe will use to alert you of any changes to the data in your Stripe account. For example, a new subscription for product prod_0001 would trigger an HTTP POST request to your API with contents like:

{
  "object": "event",
  "type": "customer.subscription.created",
  ...
  "data": {
    "object": {
      "object": "subscription",
      "id": "sub_0027",
      "customer": "cus_0012",
      ...
      "items": [
        ...
        "data": [
          {
            "object": "subscription_item",
            ...
            "price": {
              "product": "prod_0001"
            }
          }
        ]
      ]
    }
  }
}

We recently wrote a pretty thorough summary of the issues with webhooks so I'll just summarize here:

Requires high uptime to reliably receive events
Hard to recover from bugs in webhook handlers
No handling of events that occurred before the webhook was integrated. You'll need to setup a backfill operation.
Even with Stripes toolings, its hard to work with in a development environment (where you likely don't have a secure public facing API to receive webhooks)
Having to manage verification of the webhook's source

Option 2. Update State After All Stripe API Operations

All actions on the Stripe API return the new (and modified) Stripe objects in the response. You can use this response data to update your own database.

class Subscripton(Model):
    id = fields.text()
    # ... more fields

    def create_subscription(customer_id: str, price: str):
        subscription = stripe.Subscription.create(
            customer=customer_id",
            items=[
                {"price": price"},
            ],
        )
        # tranform the customer object to fit your model
        # and insert-and-update
        self.create_from_stripe_object(subscription)
        return customer

Here, stripe.Subscription.create will return an object similar to the WebHook JSON payload shown above, including nested objects for subscription, subscription_item and price (amongst others). You could use this response in a function, something like Subscription.create_from_stripe_object to traverse these nested objects and insert/update the corresponding records in your own database.

This approach allows for some nice abstractions, but thorough implementation would need to manage over 2,000 fields! Yes, you could just implement tables and fields as-and-when you need them, but this brings back a problem we had with the webhook approach - you'd need a whole separate system to backfill data whenever your requirements change!

One final drawback: even the most thoroughly planned integration between your app and Stripe is likely to hit some unexpected feature limitations. Don't have an in-app way to handle changing the shipping address for an order? Someone's probably going to have to bypass your app entirely to log into the Stripe console to fix a few manually. Even worse, you may expect cancelled subscriptions to come via your users logging in clicking unsubscribe but sometimes payments are refused and bank accounts get closed - and you'll need to handle that! You'll still need webhooks or else these changes will never make it back to your app database!

Option 3. Here comes Sequin

We know how difficult/complicated maintaining a copy of the full Stripe dataset in realtime is 😄

With Sequin, you can get the accuracy and realtime benefits of directly using the Stripe API. And in addition enjoy the scalability and flexibility of having Stripe data in your application database. All without needing to build anything.

If you're interested, try us out free.