We’re Sequin. We stream data from services like Salesforce, Stripe, and AWS to messaging systems like Kafka and databases like Postgres. It’s the fastest way to build high-performance integrations you don’t need to worry about. We've learned a lot about what it takes to integrate Salesforce and want to share what we've learned!
Every time Sequin integrates a new platform, we ask the same question: how do we get the most recently updated records? Salesforce's API gives us several options:
/query
runs a SOQL query (Salesforce Object Query Language)./jobs/query
runs a SOQL query asynchronously. It can return more records than the/query
endpoint, but you must circle back for your results./updated
returns the IDs of records updated in a given interval.
The /updated
endpoint seemed perfect for our use case. We've written about the importance of /updated
-style endpoints before. Unlike our recommendations in that post, Salesforce's endpoint receives a caller-defined interval – a start timestamp and an end timestamp – and returns the IDs of objects that were updated within that interval:
One thing to note in this example, the call was made on 8/30, but our end date is 9/01. This ensures we'll get any records that were updated up to the time we made the call. We only included the date because the field is required.
The response also includes latestDateCovered
. latestDateCovered
guarantees that all updated records before this timestamp will have already been returned in the specified timeframe. To avoid returning duplicate IDs in subsequent calls to /updated
, we can replace the start
timestamp with this value as our new cursor.
What changed?
Now that we have the IDs, we have to make sense of them. That means fetching the records themselves to look for fields that changed. That's problematic in its own right: not only do we need to learn the nuances of a second endpoint, but each additional call chips away at our customer's API rate limit. Luckily, Salesforce provides the /composite/sobjects
endpoint. This endpoint seems like it was made to be used with /updated
, as we can take the list of IDs from /updated
and pass it on to return a list of updated records.
This works well enough, but there were a couple of things we found that gave us pause:
- We had to provide a list of field names. If a customer had fifty fields on a record, we had to enumerate all fifty in the request. If the customer added a 51st field to the object in Salesforce, we had to remember to add that field later on! Fortunately, with the way we manage sync schemas, this was not an issue for us.
/updated
can return 600,000 records, but a POST request to/composite/sobjects
will only accept up to 2,000 object IDs. For these cases, we had to come up with a separate pagination strategy.
30-day limit
Sequin customers often pause their syncs, sometimes for months at a time. In this case, if we ended up calling /updated
, we'd get this error:
Since we can't predict when our customers will unpause their syncs, we can't rely on /updated
at all.
We also wanted more flexibility with our API calls. What if we only needed updated objects from a specific account? Or to fetch all contacts, regardless of whether they've been updated? It was time to give up on /updated
and find a better solution.
Getting recently updated records with /query
What makes the /query
endpoint so powerful is it's ability to run SOQL queries. To get results similar to the /updated
and composite/sobjects
endpoints but in only one call, we can run this query:
You can see the SOQL query nested in the q
query param. There's quite a bit to unpack, but the syntax should look familiar if you know SQL. First off, FIELDS(ALL)
is one way to return all fields without having to individually list each standard and custom field. The drawback is the maximum number of results returned will be 200 instead of 2,000. This is why LIMIT 200
is required when using FIELDS(ALL)
.
The where
and order by
clauses utilize both the LastModifiedDate
and Id
fields. This makes sure we're not missing any records that were created with the exact same timestamp. We go into this in more detail in the previously mentioned blog post.
You may also notice that we only have one date defined here. This date is equivalent to the start date in our /updated
call as the end date was only provided because it was required. If you need it, you can easily add another where
condition to specify an end date.
To set a new cursor (start date) on subsequent calls, we just grab the LastModifiedDate
from the last object in the results (another reason why we order by LastModifiedDate
). If there are no results, we can keep using the current timestamp until another object is updated.
Unleashing the power of our SOQL query
Now that we have our SOQL query, we only need to modify it slightly to repurpose it for other uses. For example, we can get all objects by simply replacing LastModifiedDate
with CreatedDate
. That's it! Now we can perform an ETL backfill on a table or run audit jobs.
Note: don't useLastModifiedDate
to query all objects, asLastModifiedDate
can change and you may end up missing records.
We're now able to get related association information per object. In this example, we added Contact.Account.Name
after FIELDS(ALL)
and received the Account
key within the results.
The last part of keeping a customer's table in sync is removing any objects that have been deleted. We could use the /deleted
endpoint (which is similar to /updated
), but our SOQL query also solves this problem. Adding isDeleted = TRUE
to a /queryAll
call gets us our list of deleted contacts.
The magic here is that even with four different types of calls, the results are all the same. We know exactly how many records are in our response, our pagination strategy will always be the same, and we know which key has our records. We did want to use /updated
, but it quickly outgrew our needs. Thankfully, /query
came to the rescue.