Caching sitemaps while using querystrings

The problem

I had a need to have a site map which is basically an index to other site maps. The file is XML and the sitemap is generated every time you / a bot visits https://awebsite.com/sitemap

 

The index has a link to a separate sitemap which pulls all content that is published, excluding the news articles. The reason for this is the site has 1000s of news articles on it and creating even a single sitemap caused the browser to crash.

The other links in the index point to individual news indexes which pull all the news articles for a specific year. 

The sitemap.xml looks something like this : 

https://awebsite.com/sitemap.xml?excludeArticles=true
https://awebsite.com/sitemap.xml?year=2024
https://awebsite.com/sitemap.xml?year=2023
https://awebsite.com/sitemap.xml?year=2022
...

The problem I had was the original code cached the sitemap like so : 

private string GetSiteMap(UmbracoContext umbracoContext)
{
    string result = null;
    lock (LockObjectSitemap) {
        if (DateTime.Now.CompareTo(_siteMapExpiry) >= 0) 
        {
            _siteMap = CreateSiteMap(umbracoContext);
        }
        result = _siteMap;
    }

    return result;
}

Which worked fine when you are only hitting \sitemap.xml but the issue I had was even with the querystring, the code would thing it was the same page and skip over the top of CreateSiteMap which meant the new sitemaps e.g. excludeArticles and year weren't created.

The Solution

I managed to fix this by checking for a query string and then using the querystring as a cache key, this meant I could still cache the different sitemaps but I could also general different xml depending on the querystring. 

 

My code ended up looking like this :

private string GetSiteMap(UmbracoContext umbracoContext, string queryString)
{
    string result = null;
    string cacheKey = "sitemap_" + queryString;
    lock (LockObjectSitemap)
    {
        if (DateTime.Now.CompareTo(_siteMapExpiry) >= 0 || !_siteMapCache.TryGetValue(cacheKey, out result))
        {
            _siteMap = CreateSiteMap(umbracoContext);
            _siteMapCache[cacheKey] = _siteMap; 
            _siteMapExpiry = DateTime.Now.AddMinutes(_cacheDurationMinutes); 
            result = _siteMap;
        }
    }

    return result;
}

Interesting.....

While working on this code I also did some reading up about lock as I hadn't encountered this before.

The lock (LockObjectSitemap) statement ensures that the code inside the lock block is executed by only one thread at a time. This prevents concurrent modification of the sitemap object. Without the lock, it could potentiually be possible to DDOS a website by hitting the request over and over again.

Without a lock, multiple threads or processes could attempt to generate the sitemap simultaneously. This could lead to excessive CPU and memory usage, as each thread would be performing similar intensive tasks like querying databases and writing files, which could overwhelm the server.

Published on : 07 June 2024