How to sync HubSpot to Postgres

Eric Goldman

•

Feb 27, 2023

•

10 min read

We’re Sequin. We stream data from services like HubSpot to messaging systems like Kafka and databases like Postgres. It’s the fastest way to build high-performance integrations you don’t need to worry about. We've learned a lot about what it takes to sync HubSpot, which is what this post is about.

HubSpot is the source of truth for your customer data. It's where your Sales, Marketing, and Support teams sell, market, and support your product. But that's often the rub: you need that data in your application, too.

HubSpot integrations take many forms. For instance, when a user logs into your product for the first time, you might want to create a new customer in HubSpot. When a customer purchases a new subscription, you might want to flag their HubSpot deal for Customer Success. When a deal closes, you might want to update a user's authorization in your application. In each case, a change in one system requires a corresponding change — or several — in the other. Your HubSpot data and product data are deeply intertwined.

HubSpot's API is expansive, but its limitations make these integration stories tedious. You'll need to manage rotating tokens, meter your API calls to avoid rate limits, navigate a complex relational model, and scrounge for data to make up for missing webhooks.

Sequin's solution is to sync your HubSpot data, live, to the database already at the heart of your app.

HubSpot API limitations

Like many CRMs, HubSpot places some constraints on its API to ensure its service is reliable, secure, and proprietary. While I understand the purpose of these constraints, a quick word of caution to my fellow developers as you embark on an integration.

First, you'll need to carefully manage your API usage. Miscount and you'll see a couple (or hundreds) of 429 - RATE LIMIT errors in your logs. Even moderate API use is enough to exhaust either HubSpot's daily request limit or their ten-second maximum "burst" rate.

You'll also need to navigate HubSpot's associations model for object relationships. To update or retrieve one customer record, you may need to make five nested API calls to the search, contacts, associations, companies, and deals endpoints. Have a couple of customers to pull? Prepare for another 429!

And for many integrations, you'll want HubSpot to notify you about changes — say, when a new support ticket is filed. Too bad! HubSpot's Webhooks API is limited to Contacts, Companies, Deals, and Conversations. To detect when data changes in HubSpot and then trigger some business logic, like an email, you'll need to poll continuously for updates... and handle more 429 errors!

Of course, the API works as advertised. None of those limitations are insurmountable, given time and careful monitoring, but you can avoid them entirely by syncing your HubSpot data to your database. Let's explore this approach.

Why sync HubSpot to Postgres

Transforming the HubSpot API into a Postgres database gives you more options to build and iterate.

With a sync, you can manage your HubSpot API quota in one place, trading quota utilization for sync latency. Using /search endpoints, you can limit your requests to objects that've changed, reading their updates in pages of 100 objects at a time.

With just a third of your quota allocated to reads you can keep at least 20 objects in sync with just a minute or less of latency.

A sync separates the process of retrieving data from querying the data. Simple SQL queries can return a complete customer record, joined across several tables, and deliver it at a production-ready pace — much faster than the five sequential API calls that'd be required to produce the same joined record. Unsure what data you need? You're free to experiment on SQL data without impacting your API quota.

Your team can use their favorite database tools to work with HubSpot just as they'd work with any other application data. Internal tools, reports, killer features; the sky's the limit.

What's the best way to build and maintain a sync between HubSpot and your database?

Building a HubSpot to Postgres sync

Building a sync with the HubSpot API is a multi step process. Here is a high level overview of the system:

Authentication and Token Management

With any integration, you'll need to authenticate your requests to the HubSpot API using OAuth access tokens (since API keys are now deprecated).

You'll first need to create a developer account, a private app for your sync, and then generate a token with an OAuth handshake:

const hubspotClient = new hubspot.Client({});

const token = "token";

try {
  const apiResponse = await hubspotClient.oauth.accessTokensApi.getAccessToken(token);
  console.log(JSON.stringify(apiResponse.body, null, 2));
} catch (e) {
  e.message === 'HTTP request failed'
    ? console.error(JSON.stringify(e.response, null, 2))
    : console.error(e)
}

HubSpot access tokens are short-lived, so you'll then need to check the token's expiration and refresh as needed:

const hubspot = require('@hubspot/api-client');

const hubspotClient = new hubspot.Client({});

const token = "token";

try {
  const apiResponse = await hubspotClient.oauth.refreshTokensApi.getRefreshToken(token);
  console.log(JSON.stringify(apiResponse.body, null, 2));
} catch (e) {
  e.message === 'HTTP request failed'
    ? console.error(JSON.stringify(e.response, null, 2))
    : console.error(e)
}

Keep in mind that generating and refreshing access tokens counts against your API rate limit.

Quota Management

The HubSpot API has both a "burst" and "daily" API limit. These limits increase if you purchase a higher limit or enterprise plan.

HubSpot product tier	Requests per app per 10 seconds	Requests per account per day
Free or Starter	100	250,000
Pro or Enterprise	150	500,000
Any, with API Limit Increase add-on	200	1,000,000

Keep the small print in mind as well! The /search endpoint (which you'll use throughout your sync) is subject to an entirely different rate limit of 4 requests per second per authentication token, sixty percent less than the rest of the API.

When you hit the rate limit, you'll receive a 429 error response. To keep track your quota, check these properties in the HEADER of HubSpot's API response:

Header	Description
`X-HubSpot-RateLimit-Daily`	The number of API requests you are allowed to make per day.
`X-HubSpot-RateLimit-Daily-Remaining`	The number of remaining requests you can make today.
`X-HubSpot-RateLimit-Interval-Milliseconds`	The time period that is monitored to cap API bursts. A value of 10,000ms = 10 seconds.
`X-HubSpot-RateLimit-Max`	The number of API requests allowed during the burst period. If the `RateLimit-Interval` is 10 seconds, and the `RateLimit-Max` is 100, then you can make at most 100 requests per 10 seconds.
`X-HubSpot-RateLimit-Remaining`	The number of remaining API requests in the current `RateLimit-Interval`.

To stay within the usage limits, it's best to implement a quota management system to throttle your API calls. In addition to some helpful examples in the HubSpot developer forums, you'll find existing libraries you can use to manage your quota. Remember, you'll need to measure your /search API usage separately.

Backfill: paginate each endpoint

With authentication and quota management in place, you can begin to paginate through each API endpoint to load your historic data into your database.

For instance, to backfill contacts you'll call the list endpoint:

const hubspot = require('@hubspot/api-client');

const hubspotClient = new hubspot.Client({"apiKey":"YOUR_HUBSPOT_API_KEY"});

const limit = 100;
const after = undefined;
const properties = undefined;
const propertiesWithHistory = undefined;
const associations = undefined;
const archived = false;

try {
  const apiResponse = await hubspotClient.crm.contacts.basicApi.getPage(limit, after, properties, propertiesWithHistory, associations, archived);
  console.log(JSON.stringify(apiResponse.body, null, 2));
} catch (e) {
  e.message === 'HTTP request failed'
    ? console.error(JSON.stringify(e.response, null, 2))
    : console.error(e)
}

As you retrieve contacts from each page, store the pagination_id in case the process running the backfill should fail and you need to restart the process.

Sync: polling and cursors

With the historic data loaded into your database, you'll then poll the search endpoint for newly created or updated HubSpot objects by using the hs_lastmodifieddate property to filter and sort records.

curl https://api.hubapi.com/crm/v3/objects/contacts/search?hapikey=YOUR_HUBSPOT_API_KEY \
  --request POST \
  --header "Content-Type: application/json" \
  --data '{
    "filterGroups":[
      {
        "filters":[
          {
            "propertyName": "hs_lastmodifieddate",
            "operator": "GTE",
            "value": "2023-02-28 18:12:06.469752"
          }
        ]
      }
    ]
  }'

You'll store the most recently synced hs_lastmodifieddate as a cursor for each object type to ensure you only retrieve newly created or updated contacts, deals, etc. from their respective /search endpoints.

Keep in mind that /search endpoints are rate-limited to 4 requests per second per token.

Handle: eventual consistency

HubSpot's /search endpoints are (or at least appear to be) not strictly consistent. It can take several minutes for a newly created or updated object to appear in the /search endpoint. Especially if that object is being rapidly updated by users in the HubSpot client (for instance, moving a deal through multiple stages in quick succession).

This means that an object or property can be updated in HubSpot precisely at noon GMT, and you can request that object at noon plus 1 second GMT with your cursor (i.e. hs_lastmodifieddate) set to search for objects updated at noon GMT, but the object won't return in the search. Instead, that object might not appear in the search for several minutes.

To avoid missing updates entirely, you'll need to account for this eventual consistency by creating a process that tries to catch and sync objects that don't immediately return in your /search query by searching further into the past. Here are the components of the process:

Create another process that uses hs_lastmodifieddate as the search property.
Set hs_lastmodifieddate to approximately ten minutes before the present.
Retrieve the results, and identify objects and properties that were missing in your primary search job.
Load these newly found objects and properties into your database.

You'll need to run this kind of "eventual consistency" job for each HubSpot object type you wish to sync.

Detect: deletes

With your backfill, polling process, and eventual consistency job - every new and updated object in HubSpot will sync to your database. Now, you need to detect deletes - which requires one more process in your sync architecture.

You can detect deletes on some standard objects (COMPANY, CONTACT, DEAL, LINE_ITEM, PRODUCT, and TICKET) by calling the LIST endpoint for that collection and setting the query parameter archived=true.

Here is a CURL example for the contacts endpoint:

curl --request GET \
  --url 'https://api.hubapi.com/crm/v3/objects/contacts&archived=true&hapikey=YOUR_HUBSPOT_API_KEY'

This endpoint will return all the archived/deleted objects for the given object collection. To avoid waisting time (and quota) constantly sweeping through all the deleted objects, you can store a "backstop" cursor (i.e. the last-archived object you processed) and then sweep backward through time until you hit the backstop. If a sweep reveals newly-archived objects, update the backstop to the latest timestamp among them.

While this approach will work for standard CRM objects, custom objects require another backfill job to detect and remove deleted records from your database.

Sync: relationships

Creates, updates, and deletes are now syncing. But you aren't done yet. You know need to handle associations - a.k.a the relationships in your CRM.

When an object's associations change, the object isn't considered modified – its hs_lastmodifieddate, which surfaces objects in /search job results, doesn't budge. Moreover, even if you could use /search to detect a new association, the associations themselves aren't available in the search results.

Instead, you'll need to detect new, updated, and deleted associations by polling through every page of the LIST associations endpoint"

curl --request GET \
  --url 'https://api.hubapi.com/crm/v4/objects/contact//associations/?limit=500' \
  --header 'authorization: Bearer YOUR_ACCESS_TOKEN'

You'll need to create a polling job of this kind for one side of each association pair.

Manage: schema migrations

HubSpot supports both custom objects and custom properties. As HubSpot administrators create, update, or delete these objects, the shape of the API response will change. You'll want to handle schema migrations to accommodate these changes.

The complete system

You'll build several distinct services to sync HubSpot to your Postgres database:

How to sync HubSpot to Postgres - high level architecture

In the end, it's a system you can reliably use to query the data you need when you need it, with Postgres niceties like PG_NOTIFY to trigger business logic when HubSpot entities change while avoiding 429 status codes.

Getting a HubSpot to Postgres sync

Alternatively, skip all this hassle with Sequin.

We're a completely managed service that goes to extreme lengths to sync HubSpot to your Postgres database in real-time. We've done all the heavy lifting: managing tokens, enforcing rate limits, backfilling existing data, and syncing all new and updated objects to your database as they change.

Here are the steps for syncing HubSpot to your Postgres database using Sequin:

(1) Create a Sequin account: Go to https://app.sequin.io/to signup and create an account.

(2) Connect your HubSpot account: Create a new HubSpot sync, and generate a token for Sequin by entering your HubSpot credentials:

(3) Select the objects and properties to sync: Pick the objects and properties you want to sync — Sequin supports custom objects and properties out of the box. Importantly, you can edit and lock the column names in your database to ensure your SQL queries don't break when properties are renamed in HubSpot.

(4) Connect your database: Connect to your Postgres database (we support tunneling and VPC peering) or simply click "Start" to begin syncing your data to a Sequin-hosted demo database.

(5) Sync: Sequin will persistently sync your HubSpot data to your database in real time. As your integration evolves, you can return to the dashboard and click to sync additional tables and columns.

Next steps

Every API has quirks. Setting up a sync between HubSpot and your Postgres lets you work around these limitations, but generates some additional complexity.

We're building Sequin to help you skip the limits of the API and skip the tedium of writing all this glue code.