Dataverse Long Term Retention

The long term data retention is a new feature of Dataverse, so it is worth having an overview of it

Why long-term data retention

Even if it could seem easy to understand, it is not so straightforward. Long-term data retention is a long-awaited feature in Dynamics world, but those who want to take advantage of this feature, have to completely understand how it works.

Data retention was built to allow organizations to offload some data that should not be used in everyday work but needs to be stored for compliance or auditing reasons.

What we can deduce from this statement is that data reduction is a side effect of this procedure, not the main one. On the other hand, having fewer records to work on with a performance improvement is instead an expected result.

How does it work?

Technically speaking, long-term retention is obtained by moving records from the SQL database of Dynamics to an Azure DataLake managed by Microsoft.

The data reduction is due to the compression data lake is doing on data stored and is indeterministic but approximately 50% of the original size.

Customers should define a retention policy that at the end of the day is a fetch XML that will define what records will be retained. This is made through a view that has to be created and tested. Then in the policy, we have to define the starting date and the frequency (once, daily, weekly and monthly). The policy will run in the off hours of the region the organization is registered in. So be aware if you have a 24/7 organization, someone will experience some slowness or staleness in the data.

Please be aware that data retention is a ONE WAY TICKET. Once the data is retained it cannot be brought it back to live data. So testing is pivotal to make a successful retention policy.

What for developers?

The introduction of long-term data retention has introduced some new messages that can be intercepted with plugins.

Here you can find all the references for developers. It is important to understand that the retention policy works in this way:

A retention policy is validated
When it is executed, it flags records for retention
Then there is a bulk retention action
Finally, data is deleted due to retention

At every stage, we can register plugins to intercept events and extend the functionality. Please be aware that every part of this procedure will consume API requests from application users. I’ve written an article on the limits of application users’ requests, you can read it here

How can we read retained data?

First of all, you shouldn’t. As we said data is retained only for compliance and auditing purposes, so access to that data is truly an exception. Because of that, we have very strict limits on accessing that data:

Up to five users can query and retrieve retained data at the same time.
Up to 100 queries per day are allowed for each environment.
Any single request from advanced find, Power Automate cloud flow, or Dataverse OData public API is considered as one query.
Queries are allowed on one table at a time. Joins and aggregation functions aren’t allowed.
Retained data includes lookup data. Lookup values in the table are denormalized with ID and name values.

There are multiple ways to read retained data, there is an article on that topic. My favorite, and the one that I think makes more sense is through a Power Automate.

There is a well-documented procedure for that, I’m attaching it here.

7 November 2023 ∙ dataverse retention data