left arrow Back to posts

Pulling past the limits of Salesforce's /updated endpoint

Dave Nguyen
5 min read
main image of the article

We’re Sequin. We stream data from services like Salesforce, Stripe, and AWS to messaging systems like Kafka and databases like Postgres. It’s the fastest way to build high-performance integrations you don’t need to worry about. We've learned a lot about what it takes to integrate Salesforce and want to share what we've learned!

Every time Sequin integrates a new platform, we ask the same question: how do we get the most recently updated records? Salesforce's API gives us several options:

  1. /query runs a SOQL query (Salesforce Object Query Language).
  2. /jobs/query runs a SOQL query asynchronously. It can return more records than the /query endpoint, but you must circle back for your results.
  3. /updated returns the IDs of records updated in a given interval.

The /updated endpoint seemed perfect for our use case. We've written about the importance of /updated-style endpoints before. Unlike our recommendations in that post, Salesforce's endpoint receives a caller-defined interval – a start timestamp and an end timestamp – and returns the IDs of objects that were updated within that interval:

Example call to /updated
Example call to /updated

One thing to note in this example, the call was made on 8/30, but our end date is 9/01. This ensures we'll get any records that were updated up to the time we made the call. We only included the date because the field is required.

The response also includes latestDateCovered. latestDateCovered guarantees that all updated records before this timestamp will have already been returned in the specified timeframe. To avoid returning duplicate IDs in subsequent calls to /updated, we can replace the start timestamp with this value as our new cursor.

What changed?

Now that we have the IDs, we have to make sense of them. That means fetching the records themselves to look for fields that changed. That's problematic in its own right: not only do we need to learn the nuances of a second endpoint, but each additional call chips away at our customer's API rate limit. Luckily, Salesforce provides the /composite/sobjects endpoint. This endpoint seems like it was made to be used with /updated, as we can take the list of IDs from /updated and pass it on to return a list of updated records.

Example call to /composite/sobjects
Example call to /composite/sobjects

This works well enough, but there were a couple of things we found that gave us pause:

  1. We had to provide a list of field names. If a customer had fifty fields on a record, we had to enumerate all fifty in the request. If the customer added a 51st field to the object in Salesforce, we had to remember to add that field later on! Fortunately, with the way we manage sync schemas, this was not an issue for us.
  2. /updated can return 600,000 records, but a POST request to /composite/sobjects will only accept up to 2,000 object IDs. For these cases, we had to come up with a separate pagination strategy.

30-day limit

Sequin customers often pause their syncs, sometimes for months at a time. In this case, if we ended up calling /updated, we'd get this error:

Example of startDate being more than 30 days ago
Example of startDate being more than 30 days ago

Since we can't predict when our customers will unpause their syncs, we can't rely on /updated at all.

We also wanted more flexibility with our API calls. What if we only needed updated objects from a specific account? Or to fetch all contacts, regardless of whether they've been updated? It was time to give up on /updated and find a better solution.

Getting recently updated records with /query

What makes the /query endpoint so powerful is it's ability to run SOQL queries. To get results similar to the /updated and composite/sobjects endpoints but in only one call, we can run this query:

Example call to /query
Example call to /query

You can see the SOQL query nested in the q query param. There's quite a bit to unpack, but the syntax should look familiar if you know SQL. First off, FIELDS(ALL) is one way to return all fields without having to individually list each standard and custom field. The drawback is the maximum number of results returned will be 200 instead of 2,000. This is why LIMIT 200 is required when using FIELDS(ALL).

The where and order by clauses utilize both the LastModifiedDate and Id fields. This makes sure we're not missing any records that were created with the exact same timestamp. We go into this in more detail in the previously mentioned blog post.

You may also notice that we only have one date defined here. This date is equivalent to the start date in our /updated call as the end date was only provided because it was required. If you need it, you can easily add another where condition to specify an end date.

To set a new cursor (start date) on subsequent calls, we just grab the LastModifiedDate from the last object in the results (another reason why we order by LastModifiedDate). If there are no results, we can keep using the current timestamp until another object is updated.

Unleashing the power of our SOQL query

Now that we have our SOQL query, we only need to modify it slightly to repurpose it for other uses. For example, we can get all objects by simply replacing LastModifiedDate with CreatedDate. That's it! Now we can perform an ETL backfill on a table or run audit jobs.

Note: don't use LastModifiedDate to query all objects, as LastModifiedDate can change and you may end up missing records.
SOQL query using CreatedDate
SOQL query using CreatedDate

We're now able to get related association information per object. In this example, we added Contact.Account.Name after FIELDS(ALL) and received the Account key within the results.

Get account name with contact using /query
Get account name with contact using /query

The last part of keeping a customer's table in sync is removing any objects that have been deleted. We could use the /deleted endpoint (which is similar to /updated), but our SOQL query also solves this problem. Adding isDeleted = TRUE to a /queryAll call gets us our list of deleted contacts.

Get deleted records using /queryAll
Get deleted records using /queryAll

The magic here is that even with four different types of calls, the results are all the same. We know exactly how many records are in our response, our pagination strategy will always be the same, and we know which key has our records. We did want to use /updated, but it quickly outgrew our needs. Thankfully, /query came to the rescue.