Every time Sequin integrates a new platform, we ask the same question: how do we get the most recently updated records? Salesforce's API gives us several options:
/queryruns a SOQL query (Salesforce Object Query Language).
/jobs/queryruns a SOQL query asynchronously. It can return more records than the
/queryendpoint, but you must circle back for your results.
/updatedreturns the IDs of records updated in a given interval.
/updated endpoint seemed perfect for our use case. We've written about the importance of
/updated-style endpoints before. Unlike our recommendations in that post, Salesforce's endpoint receives a caller-defined interval – a start timestamp and an end timestamp – and returns the IDs of objects that were updated within that interval:
One thing to note in this example, the call was made on 8/30, but our end date is 9/01. This ensures we'll get any records that were updated up to the time we made the call. We only included the date because the field is required.
The response also includes
latestDateCovered guarantees that all updated records before this timestamp will have already been returned in the specified timeframe. To avoid returning duplicate IDs in subsequent calls to
/updated, we can replace the
start timestamp with this value as our new cursor.
Now that we have the IDs, we have to make sense of them. That means fetching the records themselves to look for fields that changed. That's problematic in its own right: not only do we need to learn the nuances of a second endpoint, but each additional call chips away at our customer's API rate limit. Luckily, Salesforce provides the
/composite/sobjects endpoint. This endpoint seems like it was made to be used with
/updated, as we can take the list of IDs from
/updated and pass it on to return a list of updated records.
This works well enough, but there were a couple of things we found that gave us pause:
- We had to provide a list of field names. If a customer had fifty fields on a record, we had to enumerate all fifty in the request. If the customer added a 51st field to the object in Salesforce, we had to remember to add that field later on! Fortunately, with the way we manage sync schemas, this was not an issue for us.
/updatedcan return 600,000 records, but a POST request to
/composite/sobjectswill only accept up to 2,000 object IDs. For these cases, we had to come up with a separate pagination strategy.
Sequin customers often pause their syncs, sometimes for months at a time. In this case, if we ended up calling
/updated, we'd get this error:
Since we can't predict when our customers will unpause their syncs, we can't rely on
/updated at all.
We also wanted more flexibility with our API calls. What if we only needed updated objects from a specific account? Or to fetch all contacts, regardless of whether they've been updated? It was time to give up on
/updated and find a better solution.
Getting recently updated records with
What makes the
/query endpoint so powerful is it's ability to run SOQL queries. To get results similar to the
composite/sobjects endpoints but in only one call, we can run this query:
You can see the SOQL query nested in the
q query param. There's quite a bit to unpack, but the syntax should look familiar if you know SQL. First off,
FIELDS(ALL) is one way to return all fields without having to individually list each standard and custom field. The drawback is the maximum number of results returned will be 200 instead of 2,000. This is why
LIMIT 200 is required when using
order by clauses utilize both the
Id fields. This makes sure we're not missing any records that were created with the exact same timestamp. We go into this in more detail in the previously mentioned blog post.
You may also notice that we only have one date defined here. This date is equivalent to the start date in our
/updated call as the end date was only provided because it was required. If you need it, you can easily add another
where condition to specify an end date.
To set a new cursor (start date) on subsequent calls, we just grab the
LastModifiedDate from the last object in the results (another reason why we order by
LastModifiedDate). If there are no results, we can keep using the current timestamp until another object is updated.
Unleashing the power of our SOQL query
Now that we have our SOQL query, we only need to modify it slightly to repurpose it for other uses. For example, we can get all objects by simply replacing
CreatedDate. That's it! Now we can perform an ETL backfill on a table or run audit jobs.
Note: don't use
LastModifiedDateto query all objects, as
LastModifiedDatecan change and you may end up missing records.
We're now able to get related association information per object. In this example, we added
FIELDS(ALL) and received the
Account key within the results.
The last part of keeping a customer's table in sync is removing any objects that have been deleted. We could use the
/deleted endpoint (which is similar to
/updated), but our SOQL query also solves this problem. Adding
isDeleted = TRUE to a
/queryAll call gets us our list of deleted contacts.
The magic here is that even with four different types of calls, the results are all the same. We know exactly how many records are in our response, our pagination strategy will always be the same, and we know which key has our records. We did want to use
/updated, but it quickly outgrew our needs. Thankfully,
/query came to the rescue.