Azure Search for Sitecore design considerations

April 15, 2019, 12:44 pm

≫ Next: Free op your local diskspace – easily get rid of your (old) logs

≪ Previous: To Elastic pool or not to elastic Pool for Sitecore on Azure

Azure Search makes up for a significant amount of costs within a Sitecore on Azure setup; apart from these costs, there are often issues with Azure Search and clients tend to move away from Azure Search and spin up a new VM with Solr – But is this really mandatory? this blogpost is part of a series of blogposts

Sitecore developers are spoiled

Most Sitecore developers are very spoiled; on premise hardware is virtually unlimited. For example: when working with Solr, there is no need to take care of any limitations – but things change when the workload moves to Azure PaaS, as Azure Search does have some limitiations. But how severe are these limitations?

Azure Search limits

When looking at the Azure Search limits there are a few important ones to take care of:

	Free	Basic	S1	S2	S3	S3 HD	L1 + L2
Indexes	3	15	50	200	200	3000	10
Fields	1000	100	1000	1000	1000	1000	1000
Scale units	0	3	36	36	36	36	36
Costs	0	62.18	206.84	827.38	1654.76	1181.35	2363.32

Indexes

Azure Search has an maximum index limit – while Sitecore comes with 13 indexes out of the box (including the xdb and xdb_rebuild index), this leaves very few possiblities open for custom indexes. This also means that the Free tier and L1 + L2 tiers cannot be used by Sitecore, as they don’t support enough indexes. Another consequence is that the basic and S1 tiers cannot host multiple Sitecore environments, if you would share an Azure Search service over multiple instances.

Fields

Every tier, except the basic tier, supports 1000 fields. This 1000-field limit is strict and could lead to severe problems when not being monitored correctly. Due to the 100 field limit on the basic tier, this basic tier can not be used without any modifications to Sitecore (more on that later)

Scale units

Scale units are the amount of replicas multiplied by the amount of partitions. The default settings are “1”, but when more concurrent searches or faster queries are needed, a replica or partition could be added. To explain how this works, the “telephone book analogy” could be used:

telephone book analogy

Imagine a single telephone book with 1000 telephone numbers. Just one person could use that telephone book to lookup a certain telephone number. To be able to have multiple persons search through that index, a replica of that telephone book should be created. For every replica, an extra user can execute a concurrent search through the index.

While the index with telephone numbers grows, it might take longer and longer to do a proper lookup for a telephone number. An increase to 10000 telephone numbers in the index might lead to a performance degradation. In this case, one more more partitions could be added: the index would be split into a partitioned phonebook with last names starting with A-M and
a partion with lastnames starting with N-Z. To be able to lookup a telephone number with multiple concurrent users, each partition would need its own replica. In the end you would and up with an ‘X’ amount of partitions, each with ‘Y’ replicas, which would lead to a total of X*Y books: the amount of Scale units in terms of search.

Getting the most out of Azure Search for Sitecore

Getting the most out of Azure Search isn’t too hard, the only thing is that you might to unlearn some old habits. In the end it might lead to a bit more configuration, while it might save tons of money

Work around the 1000 field-limit
- Index only what you need
- Create custom indexes
Cache, Cache, Cache
Scale your Search service (or don’t)
Consider an external Search solution

Work around the 1000 field-limit (aka the friends with benefits method)

Most Sitecore developers have a background with Solr, which doesn’t have too many limitations. It could virtually host an unlimited amount of fields, which is not the case with Azure Search. All tiers, except the basic tier, can hold a maximum of 1000 fields, due to the way Azure Search copes with it’s memory. This limit can be reached really fast, when working with small, autonomous templates, SXA, and/or multiple languages. This is due to the fact that Sitecore has a default setting on its master and web indexes named “indexAllFields=’true'”.

Set ‘indexAllFields to ‘false’

This will prevent all fields from being indexed. Sitecore does come with a configuration in which all required fields for the content editor view will work (title, body, buckets and some other computed fields) – thus this will greatly reduce the amount of fields.

The drawback, however, is that your complete search functionality will break, as no custom field is in the index anymore.

Create a custom index for ‘web’ and ‘master’ and put your custom fields in these indexes

Using this pattern, will keep the out of the box index very clean and fast. Sitecore could add whatever field they want to this clean index, without affecting your indexes. Add all fields that are used within your custom search functionality into the custom indexes. This will lead initially to a bit of custom configuration, but it will lead to very small and manageable indexes as well.

Extra free benefits – faster indexing actions and faster queries

Faster indexing actions
The extra benefits in this approach are great: because just a very small subset of all data is included, the (re)-index of the Sitecore content decreases from several minutes to several seconds. This helps in the blue green deployments and uptime of your environment. the search queries are faster and smaller
When querying Azure search, all fields in the index are returned – even if they are empty. When having 900+ fields, this will lead to very large responses (900 fields per result) and long response times (as Azure Search searches through all fields).

After the optimization of having a custom index with only the fields that are really needed, this will mean that for each result only a few fields are returned and because of the small dataset, the query is much faster than before. Move to Plan B (the pricing tier) – More and faster queries for lower costs Due to this approach, its very likely that your sitecore indexes would stay below the 100 fields. This means that the the “Azure Search S1′ level could be reduced to the “Azure Search tier B” level. This could lead to a costs reduction of 206-64 = 142 EUR per environment, just by using a different configuration. Especially for your dev, test and QA environments.

Note: I have not researched the impact of xConnect for this approach
Note2: Tier ‘B has a maximum of 3 Scale units. If you need more query power, you should still consider the S1 tier. But this is often not applicable for dev and test environments.

Cache, Cache, Cache

Although search is a very fast pattern to lookup data, it still takes time. When your result set is always the same (for example, when looking up a certain set of articles of a specific category), this dataset should be cached and not be executed on every request. With an increasing load on your webserver, this would lead to an increasing load on your search service, which could mean that an increase in replicas would be needed. When having just one partition, this could be,- EUR 64 (tier B), EUR 206 (S1) or EUR 827,- (S2), but when working with multiple partitions, this could lead to a significant increase of costs.

An easy approach is to just turn on your output cache, as you would normally do on your Sitecore renderings, other approaches might involve custom Sitecore caches. The key in here is: be as cheap as possible

Scale your Search

After these changes, it might be possible that you could scale down your search services. Make sure to execute some performance tests on your QA environment before you scale down, or you might end up in error with your production environment. Scale down could mean: scale down to a lower tier or reduce the number of replicas and partitions.

Consider an external Search solution

Yes. Although developer tend to solve everything with custom code and own frameworks, a site that heavily depends on search might benefit from external search solution such as Coveo. They offer a lot of functionalities to the content editors out of the box for a price that no developer could build it themselves. 2000,- a month sounds a lot, but this is barely 20 hours of development for a single developer: I bet that in those 20 hours no deployments, bug fixes, new functionality and a fully scalable solution could be build.

moving to an external search solution would mean that that vendor takes care of the performance for the frontend searches and you would only have to take care about the performance on content management and xConnect environments.

Indications that you might need to redesign your search approach

The best indicator is the Azure Search Metrics. In this overview, you can display graphs on query latency, query throttling and the amount of queries. When there is any sign of query throttling, this means that these queries are not executed and thus, leads to problems in your portal.

The issue in the picture above happened due to a misconfiguration, all other days show correct behaviour. The graph below gives more food for thought. The red lines are search queries, while the purple lines are throttled queries. An interesting fact: during the throttling almost no queries were possible: something really messed that search service up during that time!

Gotchas

Moving up or down a service tier is not possible

When deciding to go with a “Tier B” search, keep in mind that upgrading to a “Tier SX” is not possible. You would require to delete your Search service, and recreate it, to get a service with the same name. Getting the same apikeys is not possible.

SXA

SXA adds all the template fields to the index by default. By turning this off, you would have to manually add all the fields. While SXA would only need actions within the Sitecore environment and no configuration changes, this strategy might not be a suitable solution for you.

Summary

Azure search costs can ~~“rise out of the pan” (this is a dutch saying)~~ increase very fast and lead to errors pretty quick, when not correctly scaled. By being “cheap on resources”, these issues could easily be overcome, which would still allow you to run your Sitecore on Azure completely as a PaaS setup, instead of having to add an IaaS based Solr instance.

↧

Free op your local diskspace – easily get rid of your (old) logs

August 4, 2019, 2:14 am

≫ Next: Sitecore analytics, cookie consent and personalization isn’t a great match – learn how to keep Sitecore functional without breaking the law!

≪ Previous: Azure Search for Sitecore design considerations

I bet that a lot of people have this issue: Having a lot of (old) Sitecore installations that you don’t want to remove, as you aren’t sure whether or not there is still some valuable configuration in it. With a default installation, these installations grow over time, as they are running by default and are (thus) generating logs. I never change the logging settings to just generate logs for one day, which means they will eat up a lot of diskspace, especially the xConnect ones, as they might generate logs up to 1Gb per logfile in size! The following powershell line can be used to delete all the logs which are older than 2 days:

↧

Sitecore analytics, cookie consent and personalization isn’t a great match – learn how to keep Sitecore functional without breaking the law!

August 6, 2019, 12:39 pm

≫ Next: A Sitecore theme for windows-terminal

≪ Previous: Free op your local diskspace – easily get rid of your (old) logs

Due to different laws (European as well as local legislation) companies have be very conversative in how they process data, while they have to take care on how they track people. People have to consent whether or not they will be tracked or not. Within Siecore, you might do both. This blogpost shares how to use your cookie consent strategy within Sitecore. In short: There are three level of cookies: Functional, analytic and tracking cookies. Without responding to the cookie consent, Only functional cookies are allowed, while analytics and tracking cookies is forbidden until a user gives approval for these kinds of functionality. Within Sitecore, this is hard to implement, due to the internal workings of Sitecore analytics and (from what I think) Sitecore bug. This blogpost explains why this is hard and how to solve this.

PS: Different companies classify the Sitecore cookies under different levels. I have seen classifications of “Functional”, “Analytics” and “Tracking”. I won’t judge any choice, as I am not a person with a legal background and can’t judge on what all companies implement to prevent data from being collected. This is my personal view and the approach should be applicable to every level. This blogpost applies to Sitecore 9.X

Too long; Didn’t Read

When implementing logic in Sitecore to follow Cookie consent settings, the Sitecore tracker needs to be stopped. While basic conditions which don’t require the tracker should still work (aka the functional level), this is not the case. Stopping the tracker prevents Conditional Renderings from working. The fix for this issue (override the Personalize Pipeline and remove the check for Tracker.IsActive) brings another issue to the surface; when there is a Condition that causes an Exception (and there are loads of them) your conditions will not be evaluated anymore and the rendering will fall back to the default rendering. The RuleList<T>.Run stops evaluating conditions when a condition throws an error. The fix for this issue is to create a PersonlizeRuleList<t> class which derives from the RuleList<T>, override the Run logic and use that logic within the personalization pipeline. Code can be found here.

Sitecore analytics and the tracker – a functional perspective

When Sitecore analytics is enabled, various functionalities (but not limited to) are available, which all require the Sitecore Tracker (important):

Personalization (Until Sitecore 9.1 in process personalization was possible from Sitecore 9.1 the XM version is needed. Tracking might be disabled while still having making use of personalization rules) – Out of the box functionality
Analytics – User behavior is collected anonymously. IP addresses are redacted, browsing history. Out of the box functionality
Tracking – users may be identified and additional information might be stored into the XDB. Explicit action needs to be taken to store this information, which means this data is not stored out of the box. Using XDB to store data about people might fall under the GDPR law!

Depending on how the above functionality is plotted by your legal organization, out of the box functionality might or might not comply with the “Functional Level” – the level which should be applied without explicit consent. If your organization scores Personalization and Analytics under the “Functional level” – congratulations, you could stop reading, as you won’t have to take any actions to comply. In the story below I’ll discuss the different functionalities and why you might have a problem.

Personalization – Basic functional working of Sitecore with rules and conditions

One of the most powerful capabilities of Sitecore is using personalization rules and conditions. As “personalization” may sound that it personalizes the website based on an identity, it may be a very general condition which doesn’t have to do anything with a person or identity. For example: Show a different rendering based on a querystring parameter:

In my personal opinion, this is a basic functionality which should work in any scenario, even when analytics is turned off. Which means: conditions which require the Sitecore Tracker should not function.

Sitecore Analytics – track anonymous data

In this functionality, Sitecore stores visitor information (useragent, resolution information, goals, events, page visited and a hashed ip-address). This functionality stores metadata about a certain visit, but no information about the visitor itself.

Sitecore Tracking – track visitor information

Whenever a user submits (personal) data, this data could be stored. A developer would have to take explicit action to, for example, store email addresses, name and address data, personal interests or whatsoever. Based on this kind of data, personalization options could be offered.

Sitecore analytics and the tracker – a technical perspective

When Sitecore analytics is enabled, the functionalities above are all available. But they rely on a single mechanism: the Sitecore Tracker. This one should be active and is filled with a certain context. The property “Tracker.IsActive” tells is if analytics is turned on, while the Tracker.Current should have a Tracking context for the current user. The “SC_ANALYTICS_GLOBAL_COOKIE” goes hand in hand with this mechanism. It tells us a) who is this user and b) has the user already been classified (as a robot or human?))

Cookie consent levels and Sitecore and their precautions

Most companies inject their cookie consent messages and choices using a tag managent system like GTM or Relay42. The default choice is level 1 (Functional) or no level, while visitors can explicitly choose for level 1, 2 or 3 (functional, analytics or tracking – see the similarity with the Sitecore functions?)

Sitecore analytics classified as	Precautions
Functional	No Actions
Analytics	Take actions to prevent Sitecore from loading the tracker before consent has been given
Tracking	Take actions to prevent Sitecore from loading the tracker before consent has been given

When precautions need to be taken, based on your cookie consent levels and the Sitecore classification, the Tracker needs to be stopped as soon as possible. I won’t share code with the cookie consent logic, but this Tracker should be stopped directly after the following pipeline:

Sitecore.Analytics.Pipelines.StartAnalytics.CheckPreconditions

a custom pipeline should be implemented which, based on your cookie consent logic, should call the following code as long as no consent has been given:

Sitecore.Analytics.Tracker.Current?.EndTracking();
args.AbortPipeline();

The Tracker has been stopped – now it gets hard

When the tracker stops tracking, the most basic functionality stops working as well: Personalization! As explained, this doesn’t have to rely on a person, identity or the current visits; conditions could go off on things like query strings as well! A few lines of code within the “Sitecore.Mvc.Analytics.Pipelines.Response.CustomizeRendering.Personalize” pipeline prevent the (basic) conditions from evaluating, as there is an explicit check on the “Tracker.IsActive”. but it has explicitly been stopped:

A quick fix was to override this processor and remove the explicit check on the Tracker. But behavior wasn’t as obvious as it seemed:

Conditions might not evaluate

When working with basic conditions which do not require a tracker worked seamlessly. But when mixing up, behavior is unexpected. As an example, the rule “Where the specific campaign was triggered during the current visit”. This one has a dependency on the tracker. Based on the order of the conditions, the expected outcome will or won’t appear, see the included video below, as it explains the issue much better:

Conditions which require a Tracker throw an exception when it is not available

The HasCampaign condition for example has a few lines of code which chech whether or not the Sitecore Tracker is available:

Under the hood, these Asserts throw an “InvalidOperationException”.

Conditions which throw an error cause the complete pipeline to abort

At the “RunRules” method in the personalizae pipeline, the code to run the rules can be found. It’s actual impementation is the default RuleList<T>.Run implementation, which is widely used within the Sitecore product. My expectation was that the Run implementation would evaluate rules which throw an exception to false. This expectation was wrong, every exception aborts the pipeline, which means that no rule will be evaluated anymore:

This cannot be fixed easily by Sitecore

As this function RuleList<T>.Run function is widely used, behavior of vital elements of Sitecore might change. Changing the Assert.IsNotNull implementation might lead to unexpected behavior as well, while changing each rule that requires a tracker to return false is a labour intensive approach as well.

A Suitable approach

A more suitable approach, is to replace the RuleList<T>.Run function by another function which is only used by the personalization engine. We decided to duplicate the pipeline and make an own implementation of the Run logic. As extension methods weren’t possible, a new class “PersonalizedRuleList” was created which implements Run-logic which is suitable for processing the ConditionalRenderings. The magic happens in this specific catch:

This will cause to the evaluation process to continue after an InvalidOperationException and causes all Conditions to evaluate successfully.

↧

A Sitecore theme for windows-terminal

August 8, 2019, 12:37 pm

≫ Next: How to run Azure DevOps hosted (Linux) build agents as private agents (and be able to scale them accordingly)

≪ Previous: Sitecore analytics, cookie consent and personalization isn’t a great match – learn how to keep Sitecore functional without breaking the law!

Microsoft recently released the Windows terminal in the windows store and I must say: I love it. Highly configurable, so I decided to create a small Sitecore theme for it. I hope you will love it! The code can be found here The theme has been made up of

a Background created by Jason “Taco” Wilkerson
a randomly found icon on google search
the dracula theme for Powershell

which results in the following look and feel:

I just grabbed everything together and put it into a repository Image may be NSFW.
Clik here to view.

↧

How to run Azure DevOps hosted (Linux) build agents as private agents (and be able to scale them accordingly)

November 2, 2019, 4:28 am

≫ Next: A universal guide to the mono package deployment

≪ Previous: A Sitecore theme for windows-terminal

Lately, I was preparing for a talk on Azure DevOps for the Sitecore community. For this talk I wanted to talk about scaling up and scaling out of build agents and compare the performance of different sized build agents on larger projects. Due to some limitations on the hosted Azure DevOps build agents, I had to create my own build agents. This blogpost will explain why I had to create my own agents and how I did this without too much effort. TLDR: just run a packer script to create your own private build agents

Why were private build agents required?

I wanted to have a representative real-world example with quite some (legacy) frontend code and a lot of backend projects. It isn’t a very different scenario from regular Asp.Net applications with a heavy clientside oriented frontend, apart from the fact that a lot of persons are using a single powershell/cakescript/gulp script to build all the clientside assets and backend code. There are three reasons why I couldn’t use the hosted build agents and had create those agents with the same software them myself:

To show the possibilities to scale out agents, the hosted Azure DevOps build agents could have been used, as they are free to use for up to 10 processes, but only for open source projects. The codebase that was used is not public, so parallelism wasn’t possible.
Scaling up hosted Azure DevOps Build Agents is not possible. All agents are based on the DS2_V3 VM, which has 2 vCPU’s and 7Gb op memory. Scaling up may have a positive effect on some workloads
The hosted build agents are sufficient in terms of tooling. Why create our own agents when the blueprints are already available?

How to create private build agents with the configuration of the Hosted agents

My first approach was to run those agents as a docker container. Microsoft published their linux containers to hub.docker.com, but they are deprecated and haven’t been updated since the 22th of January.

Building the images

After a bit of research, I found out that Microsoft open sourced the scripts to build the images. They are created using packer and can be found here. The MSDN documentation helps a bit on creating those images using packer and the azure CLI.

I turned out that it was even more easy to create those images with some PowerShell scripts that are available within the github repository. They can be found within the helpers directory. With the command “GenerateResourcesAndImage -SubscriptionId “<subid>” -ResourceGroupName “resourcegroup” -ImageGenerationRepositoryRoot “C:\github\azure-pipelines-image-generation” -ImageType 1 -AzureLocation “westeurope” the VM’s can be generated. Choose 0 for VS2017, 1 VS2019 or 2 for the Ubuntu VM. Building of these images take a looooong time.

When the script has run successfully, you’ll eventually see the following output:

The last templateUri has to be selected, including the query string parameters, and open it in your browser. Download the json and store it somewhere on your filesystem. This file contains information about your generated VHD.

Running the image

The next step is to actually instantiate the VM and update it! The first step is to Create a VM based on the previously created image. Within the helpers directory, there is another script: CreateAzureVMFromPackerTemplate. A few parameters, like name, username, password and TemplateFilePath have to be specified.

After the script has been finished, the last actions can be taken to finalize the configuration.

Add the Azure Devops agent

Please note: I am doing this manually Mikael Krief wrote an article on how to automate this! When using that approach, the steps below aren’t needed.

Before the agent can be added, the firewall has to allow connections over SSH (or RDP when using windows). I created a new Network Security Group and added the rule to this NSG. This NSG is added to the Network Interface:

The next step is to connect over SSH to the linux image. I prefer to use the tool ‘putty’ for this, when working on windows. Azure DevOps tells us what to do:

The agent can be downloaded using the ‘wget’ command. ‘wget https://vstsagentpackage.azureedge.net/agent/2.155.1/vsts-agent-linux-x64-2.155.1.tar.gz’ downloads the client to the current directory.

After extracting the client, the agent can be configured following this guide,

Automatically start the client

When the client has been configured, it will be shown as ‘offline’ in the agent overview. Run the command

‘sudo ./svc.sh install’ && ‘sudo ./svc.sh start’ and your client is configured to start after startup of the VM and it will be started for now.

Conclusion

Creating one or more private build agents is not hard, especially when using the packer scripts provided by Microsoft. It just takes some time go generate the VM’s.

↧

A universal guide to the mono package deployment

November 8, 2019, 5:32 am

≫ Next: How to use the Nuget / Azure Artifact credential provider with a .net docker build container to connect to authenticated Azure DevOps feeds

≪ Previous: How to run Azure DevOps hosted (Linux) build agents as private agents (and be able to scale them accordingly)

In my Sitecore symposium session “Sitecore in the enterprise: Optimize your time to deliver toward the speed of light, using all the new love in Azure DevOps” (and yes that is a mouth full) I spend quite some time on the “mono-package” approach. This blogpost what a mono-package is and why it is (much) faster in terms of deployments as opposed to using multiple web deployment packages.

Disclaimer 1: In this blogpost I talk (a bit) about Sitecore, but it is applicable for any application that is being deployed using msdeploy or the app service task in Azure DevOps. The blogpost “Sitecore challenges on the mono-package approach” contains the specific challenges faced that had to be solved.

Disclaimer 2: Some people might immediately be opinionated: “how could you ever end up with multiple packages, you have a bad architecture”. I understand your pain and in some cases you might be right. But there are cases where you have to deal with external factors, such as existing platforms and organizational challenges, where a layered approach is a not to weird solution.

The layered approach – part 1

When using a layered deployment, multiple web deployment packages are needed to deploy an application. This approach could lead to increased deployment times and reduced quality. But why? To understand this, it’s important to know how msdeploy works. When having a single solution with a single web deployment package, stop reading: you won’t learn anything here.

Msdeploy 101

Msdeploy has a *lot* of different functionalities, but it the end, it comes down to the following: Let’s say there is a source (your compiled code and all the assets) and a target (the place where your code and assets need to be deployed to). A few different situations can happen:

The source has a file which already exists at the target and is newer or older (situation 1 and 2)
The source has a file which doesn’t already exist in the target (scenario 3)
The source omits a file that exists at the target

In the first situation (1 and 2 in the table), the file will be added or omitted. In the second situation, the file will be added and in the third situation, the file *might*, based on your deployment settings, be removed. If the setting “remove additional files from target is enabled”, msdeploy works in a modus that I call “sync” modus: it removes everything that is not in the current source and updates everything that is newer. All the files that didn’t change are untouched.

Layered approach – part two

In a layered approach, the first package is often installed with the “removed files from target” option enabled, which means that on every release, the application syncs back to that single package. In other words: all the files that have been deployed using another package, will be removed or set back to the state as defined in the current source package. Every following package for this deployment will *not* have this option enabled, otherwise it would undo the previous deployment. These packages just add new or update existing files.

If the first package doesn’t have “remove files from target” enabled, it doesn’t remove old files at all. With other words, old stuff stays on the app service (or iis server).

Below is an example on how such a deployment could occur:

Layered approach – part two

If the first package doesn’t have “remove files from target” enabled, it doesn’t remove old files at all. With other words, old stuff stays on the app service (or iis server).

Below is an example on how such a deployment could happen

Layered approach – part two

If the first package doesn’t have “remove files from target” enabled, it doesn’t remove old files at all. With other words, old stuff stays on the app service (or iis server).

Below is an example on how such a deployment could occur:

The problem lies with the growing packages. An application grows and gets new assets, so eventually the set of “generic components + projects” will grow (very) large. Which basically means: this will take more time to deploy. In our specific scenario, the following happens every deployment:

Reset to baseline: compare the baseline package to the target and remove everything that is not in there. With 25000+ files, this is quite a lot and takes a lot of time
Add project + baseline: add every file in these packages.

This happens every deployment over- and over- again, while the actual changeset (the delta in green and red) might be very, very small.

The mono package approach

“Normal” projects have a mono package approach by default: a single web deployment package will be created after the build and it can be deployed. In environments such as Sitecore, this cannot be done in a convenient way (yet). Due to their architecture (helix), installation (single wdp) and module approach we end up with at least two packages, but it isn’t uncommon to have five, six or even more packages. But what would a world look like if we could have a mono-package?

The mono package world

When talking about mono packages, we should be able to merge all different packages into one. This makes the deployment much, much faster. The source and target should look a lot like each other and only differences (the delta) have to be deployed:

This brings deployments down from 7-8 minutes, back to just 1 minute, when talking about 25000+ files. The amount of files updates are greatly reduced (just 213 in my case)

Conclusion & Challenges

Moving from a multi-layered deployment to a mono-package reduces the deployment time. Based on different factors, such as vendors and architecture it might be hard to achieve this. The following blogpost will explain the main challenges for Sitecore and show how to solve them.

↧

How to use the Nuget / Azure Artifact credential provider with a .net docker build container to connect to authenticated Azure DevOps feeds

August 14, 2020, 4:21 am

≫ Next: A quick guide on reloading your Sitecore xDB contact on (or after) every request

≪ Previous: A universal guide to the mono package deployment

This blogpost describes how to add the Azure Artifact nuget credential provider to a windows based docker container for building .Net (full framework) solutions, using authenticated Azure DevOps artifacts feeds. As I couldn’t find a feasible solution, I decided to write a quick guide on how to set this up. This blogpost makes use of the provided Dockerfile structure that Sitecore provides, but the learnings can be applied in any solution. In other words: this post is not tied to the Sitecore ecosystem. To skip immediately to the instructions, click this link

Note: It has been a while that I was really, really, really enthusiastic about a new release of Sitecore, but this Sitecore 10 release, it’s just: WOW. Sitecore has finally put an enormous effort into making new(ish) technology, such as containers, .net core, real CI/CD, command line automation available to their developers. That, together with the new, supported, serialization solution, Sitecore made a giant leap towards a complete, modern developer experience. This blogpost describes how a private Azure Devops Artififact nuget feed can be used in conjunction with the Sitecore Docker setup.

Within our organization, we only use private nuget feeds which are hosted on Azure DevOps, as I previously explained here. Not only contain these feeds open source packages which are approved to use by our team, but specific packages which are widely used within the organization as well. Think about generic (organization specific) crosscuttings for logging, abstractions, configuration, and from a Sitecore perspective generic packages such as (Security) baselines, identity management, security, serialization, ORM and other reusable stuff. We add the Sitecore packages to this feed as well. Instructions on how to download these packages really quick, can be found on this blog, written by my colleague Alex van Wolferen

A very, very short story on the Sitecore docker setup

Sitecore delivered quite a lot of Dockerfile, one of them being a container which is used to actually build the .net solution. in this image all required build tools are installed. One of these tools is nuget, as can be seen in image below:

What is the problem with non-public Azure DevOps Artifacts?

When a feed needs authantication, it is very likely that you’ll see the following error:

[CredentialProvider]Device flow authentication failed. User was presented with device flow, but didn't react within 90 seconds.

It’s clear that this feed needs credentials, and in this case, nuget tries to open up a window to enter your credentials, which is, in docker, not possible. These required credentials can be provided in various ways. One options is to provide user credentials right away into the nuget.config, which is not a recommended approach, as the credentials would a) be stored into git and b) would be available on the filesystem of your container. A recommended solution is to make use of the Azure Artifacts Nuget Credential Provider – this provider automates the acquisition of credentials needed to restore Nuget packages as part of the .Net development flow.

How to use the Nuget / Azure Artifact Credential Provider

First of all, the Azure Artifact Credential Provider needs to be installed. It will be installed in a plugin directory and will automatically be detected by Nuget. In order to use this credential provider, for an unattended use of an azure artifacts feed, the VSS_NUGET_EXTERNAL_FEED_ENDPOINTS environment variable needs to be set to contain the following data:

The endpointCredentials contains an array with a structure of “endpoint, username and password”. If the endpoint matches a source in the nuget.config, the password will be used as access token to authenticate against that source. The username can be anything and will not be used.

How to obtain an accesstoken

This piece of documentation describes how to obtain a Personal Access token. Make sure not to select Full Access and to select the correct custom defined scope:

First, make sure to show all scopes:

After this action, select “Read” under packaging: Read & write could be selected if you’d like to push packages as well, but that is not in scope of this post.

How to configure your Docker files in order to get this all to work in a secure way

The following actions need to be taken:

Download and Install the Azure Artifacts Credential Provider
- Make sure it works with .Net full framework!
Configure your solution to work with the environment variable
Make the access token configurable

Install the Azure Artifacts Credential Provider

Note: this step can be skipped if you are using the mcr.microsoft.com/dotnet/framework/sdk:4.8 Image as your build tool. For Sitecore users: this image is used within the Sitecore dotnetsdk. The plugin can be found at ‘C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\Common7\IDE\CommonExtensions\Microsoft\NuGet\Plugins\’ Please continue here

Open up the dockerfile for the dotnetsdk (under docker/build/dotnetsdk). Add the following lines after the installation of nuget:

_{PLEASE make sure that the -AddNetfx switch is NOT omitted. It costed me 4 hours of my valuable life. When this switch is omitted, the provider will be installed for core, and your nuget restore will ask for a device login over and over again.}

Configure your solution to work with the environment variable and make it configurable

The dockerfile for your solution is probably in the root of your project. Add the following two lines after “FROM ${BUILD_IMAGE} AS builder”:

These two lines accept an argument from the docker-compose file (FEED_ACCESSTOKEN) and adds that argument into the environment variable. please take not of the backticks as escape character. If the line was placed between single quotes, the environment variable wouldn’t be replaced. As this is just and intermediate layer, the environment variable will not end up in the actual image

In order to finalize all actions, just two more modifications are needed:

1- In the docker-compose.override.yml, the FEED_ACCESSTOKEN needs to be added for the solution:

2- add the FEED_ACCESSTOKEN (your PAT) to the .env file.

After executing these steps, execute the following command:

docker-compose build

If your builder image was changed, that one would be rebuilded, this should not be the case for the Sitecore users. Your solution image will be rebuilded, as changes were made to supply the FEED_ACCESSTOKEN and the VSS_NUGET_EXTERNAL_FEED_ENDPOINTS.

After the rebuild, your solution should have retrieved the nuget packages from your private azure artifacts repository!

Conclusion

When downloading nuget packages from an authenticated Azure Devops Artifacts feed, you need to supply credentials. When making use of the default Micrososft .net SDK image, the Azure Artifacts Credential Provider has already been installed, otherwhise this package can easily be installed using two lines of code. All you need to do is providing an environment variable with the feed and the credentials, and you’re good to go! happy dockering!

↧

A quick guide on reloading your Sitecore xDB contact on (or after) every request

October 16, 2020, 6:30 am

≫ Next: Sitecore 10 on docker – Help to understand the composition of the configuration

≪ Previous: How to use the Nuget / Azure Artifact credential provider with a .net docker build container to connect to authenticated Azure DevOps feeds

In our road towards realtime personalization, we were in need of reloading our xDB contact on every request, as external systems might have updated several facets with information that could or should be used within Sitecore. Out of the box, this does not happen.

Why changes to the contact xDB do not reflect to the contact in Sitecore

The problem within Sitecore lies within how and when xDB contacts are retrieved from the xDB. let’s take a look at the diagram below:

In the sequence diagram, it becomes clear that after the start of the Session, a contact will be retrieved from the xDB. This is the only time in de default lifecycle of a session that this happens, which means that, whenever an update to the xDB contact will be written to the xDB, this change does not reflect the state within the private session database. In order to be able to reflect those changes, the contact needs to be reloaded. This can be done using the following code:

The code consists of three parts:

Ensure that the contact exists.

When the “IsNew” property has been set to true, the contact only exists in the Sitecore environment. An explicit save is needed, before the contact can be reloaded. This is only the case when the visitor doesn’t send a SC_GLOBAL_ANALYTICS_COOKIE – this is a persistent cookie which is stored over sessions and contains an identifier which can be used to identiy a user in the xDB. When this information is not available, the contact will be marked as “IsNew”. whenever a user leaves information, which can be used to identify this user, a merge of contacts can be executed.

Remove the contact from the current session

By removing a contact entirely from the current session, his interactions and goals will be saved, but the contact details and its facets will be reloaded upon the next request.

Explicitly reload the contact

When the contact is removed from the session, the contact can be reloaded explicitly. By removing the contact from the session at the start of the request and reloading that same contact immediately, all the latest, fresh information for this contact, with its facets, will be made availabe to sitecore.

Summary

The default working of Sitecore loads a contact into the session, but does not sync updates to the xDB immediately to Sitecore. By explicitly removing and reloading the contact at the start of a request, all the latest changes to a contact can be made availabe to sitecore. This data can be used to for, for example, smart personalizations.

↧

Sitecore 10 on docker – Help to understand the composition of the configuration

November 4, 2020, 1:58 pm

≫ Next: How to visualize your docker composition

≪ Previous: A quick guide on reloading your Sitecore xDB contact on (or after) every request

After following the “getting started” guide by Nick Wesselman, I had my first Sitecore 10 environment up and running in Sitecore, so there is no need to write about the convenient installation. But being new to Docker and (thus) new to the new approach that Sitecore uses for these development environments, I struggled a little bit in understanding how everything worked together. I wanted to know about the structure, dependencies. As I couldn’t find any blogpost on the new structure/setup and how all the roles correlate to each other and how the dependencies are working, I decided to dive into it and share it. Note: there is a lot of information on the Sitecore DevEx Containers documentation site and it explains how things can/should be achieved, I can really recommend this site.

When taking a look at all the docker containers that are running after completing Nick’s tutorial, we do see that 10 containers were started. This is for the “xp0” One “traefik” container and 9 sitecore containers.

You might recognize the different sitecore roles, but how do they depend on each other? Where do they come from? How do you customize the solution? What does the network look like? What is exposed and what not? In order to understand how everything works together, I tried to visualize the composition and explain it. I did this for the XP0 setup.

The role decomposition

Aside from the sitecore roles, the only role which might not be familiar to you, is traefik. According to their website:

Traefik is an open-source Edge Router that makes publishing your services a fun and easy experience. It receives requests on behalf of your system and finds out which components are responsible for handling them.

It basically listens to requests on a certain port and redirects them to the correct role.

The docker composition visualized – identify the important configuration assets

To gain all the insights, I started to visualize the docker-compose files. For the XP0 I ended up with the following composition:

The XP1 composition has the following composition (with 19 roles):

In one blink of an eye, a few important configuration settings can be identified:

Open ports
Port mappings
Volume mapping
Dependencies between roles
Containers that do not show up in docker

Important configuration that this diagram doesn’t tell us

Hostnames

Open Ports

All exposed ports are the 5 ports which are encircled:

443 (Traefik)
8079 (Traefik)
14430 (SQL)
8081 (xConnect)
8984 (Solr)

The Traefik role exposes the 443 and 8079 ports. When navigating to https://localhost:443 nothing is presented, as seen in the following screenshot:

However, when navigating to http://localhost:8079, an interesting site shows up: the Traefik dashboard:

Something which is remarkable in this entrypoint configuration, is the entrypoint configuration. Apparently, it exposes ports 443 and 8080, but the 8080 port is not in the list of exposed ports.

Port mapping

The reason behind this difference, is the fact that traefik itself is configured to listen to ports 443 and 8080. Docker is configured to listen to port 8079 and redirect the traffic to port 8080 internally. The same applies for the 443 port, but it is internally mapped to the same port:

The same method is used to map port 14330 to 1433 for the sql role, 8081 to 80 for xConnect and 8984 to 8983 for Solr.

Volume mapping

Another important piece of configuration is the volume mapping. These volume mappings are indicated by the folder icons:

The following folders are mounted:

Traefik: ./docker/traefik to c
Rendering: ./ to C
CM: named: ${LOCAL_DATA_PATH}\cm to C
SQL: .\docker\data\sql to c:\data
SOLR: .\docker\data\solr to c:\data
Multiple roles – named: ${HOST_LICENSE_FOLDER}

Some folders appear to be mounted to “C”. This is not the case, but probably a problem in my visualization software.

Dependencies between roles

The thin dotted lines between the rectangles mark the dependencies (One of these dependencies is marked by the red arrow). For the purpose of convenience, the port and volume mappings are omitted in this visualization

These dependencies might have a small note (as seen in the red bounded box). This note indicates the required state of the role it depends on, before it can be started. In this case, Traefik has a dependency on identity, “rendering” and “Content management”, where the required status for both cm as well as identity is marked as “service_healthy”, in other words: if the CM or identity server doesn’t come up healthy, traefik does not start as well. This dependency is not present for the asp.net core rendering host.

Note: In my humble opinion, this requirement is a small drawback for a development environment: a lot of debugging and analysis happens in these environments. Whenever you made configuration error, implemented a bug or whatsoever, traefik will not start and, thus, you will not be able to access your CM environment and see whatever error happened.

Note 2: please take note of the fact that there isn’t a direct dependency of CM directly on the SQL database. As the CM requires xConnect to start “healthy”, which has a dependency on SQL as well, this dependency is implicit.

Containers that do not show up in docker

You might wonder why there are just 10 containers up and running in an XP0 environment, while the dependency diagram shows 12 different roles. The 2 roles which are “extra” are “Solution” and dotnetSDK; Especially the “Solution” role is “special”.

This solution role is actually a builder role and gathers all source code available. Within this docker container, the complete solution will be build, using a special crafted “dotnet sdk” container (which contains a nifty solution to cache the nuget layer. More on this can be read here). In a future blogpost, more will be elaborated regarding building the solution.

Missing insights in this overview

The one piece of configuration that really misses, are the configured hostnames which Traefik is listening to. The is not to strange: the software visualizes the docker-compose dependencies, volume mappings and port mappings. Docker is not configured to listen to a specific hostname; but the role Traefik is. As this is software specific configuration, the visualization software is not able to visualize this.

Where can the configuration be found

To make things easy (or flexible) multiple files can be used to configure docker. In a default situation (which is the case as well for this getting started example), the configuration will be stored in docker-compose.yml and docker-compose.override.yml – these can be found in the root of your workspace.

To clear up whatever is being configured in what file, I created two specific diagrams: one for the standard docker-compose.yml file, and one for the docker-compose.override.yml.

Docker-compose.yml

Docker-compose.override.yml

Summary

Starting with Sitecore on docker might be overwhelming, but having a clear view on all the roles and the the corresponding configurations, probably help you to understand the docker-composition.

↧

How to visualize your docker composition

November 6, 2020, 6:17 am

≫ Next: Test and demo environments in an instant: How to pre-provision content to the master and web database in Sitecore containers in 5 simple steps

≪ Previous: Sitecore 10 on docker – Help to understand the composition of the configuration

After my previous blogpost, I received several questions how I visualized that Docker architecture. This (very short) blogpost will explain how to do this.

The software that I use is a docker-container that is called “docker-compose-viz”, which is (mainly) maintained by Julien Bianchi.

In order to visualize your composition, do the following:

Navigate to your directory, containing your docker-compose.yml
Switch to linux mode
Run the following command:

docker run --rm -it --name dcv -v ${PWD}:/input pmsipilot/docker-compose-viz render -m image docker-compose.yml --output-file=achmea.techday.png --force

It pulls the latest version of the docker-compose-viz image and runs is againts your docker-compose.yml. with the –output-file parameter, any filename and image type can be set.

The following parameters are also of much interest:

--override=OVERRIDE: Tag of the override file to use
--no-volumes: Omit the volume mapping
--no-ports: omit the external exposed ports and their mappings

Happy visualizing!

↧

Test and demo environments in an instant: How to pre-provision content to the master and web database in Sitecore containers in 5 simple steps

December 29, 2020, 7:24 am

≫ Next: How to use Application insights and your visitors to detect when your site is offline

≪ Previous: How to visualize your docker composition

In our company, we use Unicorn for content serialization, in order to be able to deploy “applicative” content like templates across our environments. For dev and test, we also provide content that we use for regression testing in these environments; we don’t (want to) sync our production content to these environments. We also had the wish to spin up environments upon request, with all of this content available in an instant, for example to validate pull requests. With 20000 yml files, the synchronization process takes at least 45 minutes: this takes way too long for a fast regression test and doesn’t fit in a fast “shift left” strategy. With the introduction of containers, things have changed, as full pre-provisioned environments can be spinned up in literally minutes.

Note 1: My current opinion is that this is not a feasible way to deploy content into production!
Note 2: I recently found out that this is the same approach as the demo team uses to provide their Lightroom demo

Summary

To increase the speed of spinning up a new environment with content, the Sitecore database image needs to be preprovisioned with all custom content that is available. These are the steps to achieve this result:

Create a new dacpac of your serialization folders using Sitecore Courier
Update your mssql dockerfile to deploy this database
Push your image
Pull the image and run your environment using docker-compose on any place

I omitted several steps in this blogpost, as this information can be found on a lot of places on the world wide web: pushing images is a one-liner and pulling images shouldn’t be to hard as well Image may be NSFW.
Clik here to view.

Provisioning a new environment with content within 3 minutes – how does it work

I got the inspiration from how Sitecore is proving their modules to the community: they provide a very basic Sitecore environment, which only contains the XM/XP content and roles. As a “a la carte” menu, they offer different Sitecore modules which can be included in your docker container. For example, when there is a requirement for the headless services (THSPKAJ – The Headless Service Previously Known As JSS) , this Sitecore image can be used as an artifact and by copying the binaries/assets from these images into your CM and CD dockerfiles and applying the database asset to your sql image, the headless service magically becomes available. Using this approach and starting a “vanilla” Sitecore environment with JSS, is done within moments.

Provisioning environments the old way: a new environment with unicorn content serialization takes a long time

This “a la carte” way of working was already something that we incorporate in our solutions: with 150 sitecore instances, a manual installation of modules/update packages is very time consuming, repetitive and error-prone. That’s the reason that we created nuget packages for every module – Sitecore PowerShell extensions, JSS, Coveo and a lot of internal modules. The required content is provided as yml, using unicorn. By just referencing the nuget packages, we were able to incorporate these modules into our solutions. Our Sitecore environment blueprint and custom applications are build as web deploy packages, hence deploying one or two webdeployment packages, is all it takes to provision a environment. The drawback of this approach, is that new, temporary environments take almost an hour to provision, as the unicorn sync really takes a long time. Transparent sync could be used of course, but this isn’t always an option. Apart from that: using containers, our local testers don’t require a local setup of visual studio, sql, Sitecore, iis and whatsoever: with other words, developers need less time to support these testers!

A detailed look into the sitecore’s approach with pre-provisioned content and the missing link

By taking a look at the docker build-files for the CM, CD and MSSQL environments, the approach by Sitecore can be seen.

Within the actions marked by “1”, a simple copy action takes place, from the Sitecore “asset” image to the wwwroot. This is just a set of binaries and assets, just like the composition of your custom solution.

It gets interesting at action 2: at first, a certain asset is copied. The second command deploys a database from the directory to which the database just got copied (and gets removed afterwards).

It’s very likely that in this action, the specific JSS-databasecontent was provisioned to some existing databases. But which databases and how?

Uncovering the magic behind the DeployDatabases1.ps1

The DeployDatabases.ps1 is a tool which resides in the mssql database image provided by Sitecore. However, I wasn’t able to find the sources behind this file. However, using vs code or visual studio, it is possible to attach to a running container and see the complete filesystem behind it:

This uncovered the secrets behind the DeployDatabases.ps1 (more secrets are revealed in a follow up blogpost!). The most interesting part can be found here:

The script iterates through all *.dacpac files in the specified Resources directory, the one that is specified as resourcesDirectory in the RUN command in your Dockerfile.

Foreach dacpac, the fullname (sourceFile) and the basename is extracted (lines 100-103). The interesting part happens at line 116: using the sqlPackage command, the sourceFile is applied on the database which contains the name of your dacpac file. What does this mean? Let’s say you have the a dacpac located in “c:\jss_data\Sitecore.Master.dacpac”, then this file will be applied to an already existing “Sitecore.Master” database.

Of course, this technique of using dacpac is not new, as it has been used for ages by Sitecore in their Web Deployment Packages (for use in conjunction with, for example, azure), but there is one difference:In the “Azure” or “On premise” scenario, the perquisite is that there is already a running infrastructure: at least a webserver needs to be configured and Sql needs to be setup as well. When you would spin up a new, fresh environment, this would require to setup a new fresh webserver and a fresh new set of databases, which do cost (a lot) of time. In this case with containers, the actual data gets embedded into the docker image, the only thing that is required, is that the image is pulled and you’re good to go.

Package serialized content into a Dacpac file.

This is where things became more complicated. My first approach was to generate an update package using Sitecore Courier and to use the Sitecore Azure Toolkit to generate a web deployment package from that update package. The dacpacs could then be extracted from the WDP. This approach could take quite some time, is complex (as in: it takes a lot of steps) and I had no idea if this approach would even work!

Purely due to some luck, I discovered that the Sitecore Courier PowerShell commandlet includes a parameter to generate a dacpac directly!

A day later, I discovered that the Sitecore demo team uses this approach as well. They use the following script, which can be found here in their repository. The only modification that I made, was by adding line 21: to copy the Sitecore.master.dacpac to Sitecore.Web.dacpac. As the deploy databases script tries to deploy any dacpac, based on the naming convention Sitecore.<sitecoredatabasename.dacpac, the web dacpac would be automatically applied to the web database.

Param(
  [string]$target,
  [string]$output
)

Write-Host "Installing Module"
Install-Module -Name Sitecore.Courier -Repository PSGallery -Force -Confirm:$False  -RequiredVersion 1.4.3

Write-Host "Importing Module"
Import-Module Sitecore.Courier -Force -Verbose

Write-Host "Creating Update Packages"
New-CourierPackage -Target $target -Output "$output/data" -SerializationProvider "Rainbow" -IncludeFiles $false -EnsureRevision $true -DacPac $true
Write-Host "Created Update Package" -ForegroundColor Green

New-Item -ItemType Directory -Path "$output/security"
New-CourierSecurityPackage -items $target -output "$output/security/Sitecore.Core.dacpac"
Write-Host "Created Security Update Package" -ForegroundColor Green

Rename-Item  -Verbose -Path "$output/data/master.dacpac" -NewName "Sitecore.Master.dacpac"
Copy-Item -Verbose -Path "$output/data/Sitecore.Master.dacpac" -Destination "$output/data/Sitecore.Web.dacpac"

Rename-Item  -Verbose -Path "$output/ data/core.dacpac" -NewName "Sitecore.Core.dacpac"

Write-Host "Renaming dacpacs" -ForegroundColor Green

Building the SQL image

The large project that was used as crash test dummy, has the following structure:

The server folder contains all the Sitecore code and serialized content, the docker-folder contains the dockerfiles for every role. Hence, the “./server” folder was used as target and “./docker/build/mssql/data/” as output folder.

This resulted in the following output:

All that is left to do, is to modify the docker build file to deploy these databases (as discussed in one of the previous paragraphs). After the deployment of JSS, the custom content will be deployed to the database. This has been done in two steps:

The Security package will be deployed
The “regular” content will be deployed.

Where the JSS module is copied from the headless image, which was provided and pulled from Sitecore, the custom content is, in this case, copied from the data directory in the build folder.

It is possible to incorporate the generation of the dacpac in your solution image (or in any other). When planning to go this route, please take note that this action might be very time consuming, in our specific case this takes 5 minutes of extra build time. As this step is not required for your production workload, I’d recommend to create a separate builder image to create these dacpacs, as it won’t “pollute” your solution image. I’ll will go in depth in structuring your dockerfiles for CI-speed in a later blogpost!

In case of using an image to pull your generated dacpac from another image, the copy action in the mssql- dockerfile would copy the data from the solution-image:

Building the new SQL image

All is left to do, is to build the new SQL image:

Docker-compose build msqql

After several minutes, the new image is ready for use: by explicitly bringing all roles down and recreating those roles, this environment gets started from scratch, within 3 minutes. Publishing content is not needed, as the web database was already pre-provisioned as well. The official publishing mechanism hasn’t been used, but this is, for regression testing an acceptable approach!

To test this approach, the docker-compose.yml and docker-compose.override.yml were modified in such a way, that all volumes were removed!

Conclusion

When having the need of starting an environment with pre-provisioned content within minutes, for example for demo’s, regression testing or automated testing, the old friend Sitecore courier is of (very) much help. The changes that are required are not complex, there aren’t many changes, thus this approach was a very easy approach to test. Of course, Unicorn Transparent sync could be used as well, but I didn’t use that approach yet. I would NOT recommend to use this approach in production, as you wouldn’t be able to edit and store content. Do not include the generation of dacpac’s in your solution file as well, as it increases the build time for non-required assets to run on production.

↧

How to use Application insights and your visitors to detect when your site is offline

February 16, 2021, 8:23 am

≫ Next: Once bitten, twice shy: Why my docker-compose build keeps having problems with my azure devops artifacts feed

≪ Previous: Test and demo environments in an instant: How to pre-provision content to the master and web database in Sitecore containers in 5 simple steps

When hosting high traffic websites, it’s important to keep them up and running at all times. At the moment one of them goes down, it might lead to a conversion loss or decrease in NPS. Detection of unplanned downtime is very important in these cases. In some cases, there isn’t even downtime, but *something* in the infrastructure prevents the website from loading (I’ll explain a few cases after the break). This blogpost will teach you how to use your visitors as a continuous monitoring beacon. Code can be found here. Also a small shoutout to my colleague Marten Bonnema who created an AI-plugin which *does* work with serviceworkers.

The case: multiple complaints that the site didn’t load, while our monitoring software “proved” different

The sole reason I started to investigate this way of monitoring, was that our monitoring software didn’t see any outage, why our internal call center users, testers and external customers were randomly, but often, mentioning that the website wasn’t able to load. The hard part: we weren’t able to find any notable events in our logging and monitoring software, while the were numerous complaints. It was also very hard to reproduce. Something about a needle and a haystack.

By accident, the issue became much easier to reproduce. After injecting new (corporate) monitoring software, we found out that the issue started to occur almost all of the time. Whenever the issue occurred, only after explicitly clearing the cookies, the issue was temporarily solved. When accessing the website directly on an internal url, the issue was not reproduceable. The application insights availability tests didn’t show anything about outage, while other monitoring software didn’t register any error. It almost had something to do with a combination of clientside code (javascript), in conjunction with a piece of the infrastructure. In the end, it turned out that we were exceeding a header limit (which was set on a reverse proxy). This blogpost will not go any deeper on whether or not this setting was or wasn’t, a good idea, but this blogpost will explain why we weren’t able to monitor this and how we are going to change this.

How to monitor this random behaviour

As the request from the client did not arrive at our website (the reverse proxy had issues) and this issue caused connection resets/http2 compression errors, it was *very* hard to monitor this behaviour and take the appropriate actions. We weren’t able to tell how many users were affected, as nothing was registered. So how is monitoring possible, if even the initial connection fails? The solutions is, in fact, quite simple and elegant!

The challenge

The challenge lies within the fact how “the web” works. When loading a webpage, the very first thing which happens, is requesting an html-document. Within this document resides html, which tells us what additional dependencies (such as css, fonts) should be loaded and what and when javascript code should be executed. At some point, there is a cue to load application insights, which automatically takes care of logging. IF the document load and the loading of the application insights SDK succeeds, the magic starts.

basic working of “the web” – only after an initial download of the html markup, it *might* be possible to log to application insights

Unfortunately, in our case, this application insights execution comes to late in the lifecycle: as the initial document load failed, there is no cue to load the application insights SDK and start logging. Something was needed that could log to application insights, even when the website was down. Installing a logging client on a clients machine wasn’t an option. Or was it?

Serviceworkers to the rescue!

A service worker is a script in the browser that runs within the background of the browser. It can, for example, handle push notifications, sync data, or handle (intercept) network requests. This worker must be installed (and updated) by the website within your browser before it can be executed, but after that installation, it can be used to do whatever you want.

Let’s assume that the service worker has been installed. Within the following code, all network events (the fetch event) are intercepted and can be manipulated/enriched/whatsoever.

A common scenario is to lookup resources in a local cache and return them directly, even before an http request could be made. When this request bugs out (in other words: there is a network error, protocol error, network reset, you name it), an exception will be thrown. Just catch the exception and start logging. Note: Although there is very convenient use of promises, I didn’t make use of these promises in this example. As most readers of my blog are .Net (Sitecore) developers, this notation will be a bit more accessible to read.

The first line of code makes sure that only document requests are handled. All other requests (resources, api calls), are handled via the regular application insights SDK. We only want to sent and extra event at the moment that the webpage is not accessible.

The 4^th line re-fetches exactly the same request, with the difference that we can catch any exception. When no exception happens, the request returns and will be returned to the browser. This can be seen in the image below: the actual document request is intercepted and handled by the serviceworker. The initiator is sw.js (my service worker) and preceded by a small gear icon.

please note: an HTTP error code is a *succesful* request. Something went ‘just’ wrong on the other side. Whenever the ‘other side’ does not respond, there is no there is network connectivity, or something else happens, the http request is not succesful and an exception will be thrown.

Whenever an error occurs, the actual document request and the intercepted fetch fail. An custom request to application insights will be made to log a dependency error. The network tab of the developer toolbar shows us that the initial request fails (as expected) and that a successful request to application insights has been made.

These errors show up in the failure blade of application insights, when selecting the browser-view:

Caveats and drawbacks

First time visitors

Of course, this is solution as a few flaws/drawbacks. When a visitor visits the site for the very first time, the service worker hasn’t been installed yet: IF that visitor faces a problem, there is no way that this issue can be logged.

Explicit call to the AI rest endpoint: the SDK cannot be used.

There are two main reasons for this: Service workers do have a problem with importing modules, the AI SDK module cannot be used
The application insights script has references to the window object. As this is unavailable in the serviceworker, this code fails.

This is just a caveat, not a major problem, as there is just a single event that needs to be logged.

The logged event is not really a dependency

This event is not really a Remote dependency, but an exception. But for convenience issues, we made the choice to mark it as a remote dependency. This way, it shows up in the browser view, which makes it very convenient to scope down issues whenever this is required

Conclusion

Adding “offline” logging to a website is not hard. By making use of a service worker, intercepting events and logging these events to application insights whenever a page cannot be requested, an interesting extra layer of monitoring/logging can be added to your toolkit.

↧

Once bitten, twice shy: Why my docker-compose build keeps having problems with my azure devops artifacts feed

February 16, 2021, 8:44 am

≫ Next: How to use the new dotnet Nuget Security Vulnerabilities scanning for packages.config and .Net full framework in 3 simple steps

≪ Previous: How to use Application insights and your visitors to detect when your site is offline

In a previous blogpost I explained how to setup a docker build which allows to connect to an authorized Azure Devops Artifacts feed. I often use this feed, as it contains packages which we don’t want to share in public. However, almost every single time when I start fiddling around with my private feed, things break. Badly:

Response status code does not indicate success: 401 (Unauthorized).

Although I was pretty sure that the FEED_ACCESSTOKEN, which is required for correct authentication, was correctly set in my environment file, the docker-build still falled back to an old value.

Emptying the cache, deleting images: nothing helped. It appeared that I had set the environment variable for this same FEED_ACCESSTOKEN on system-level as well. Apparently, the global environment variable takes precedence over the locally set variable.

Two solutions are possible here:

run $env:FEED_ACCESSTOKEN =”” before you run your actual build
simply delete the FEED_ACCESSTOKEN from your environment variables.

Thanks for reading another episode of “Once bitten, twice shy.”

↧

How to use the new dotnet Nuget Security Vulnerabilities scanning for packages.config and .Net full framework in 3 simple steps

March 17, 2021, 2:58 am

≫ Next: I’ll present at the SUGNL: 3 lessons I’ve learned with Sitecore on Docker (and does Sitecore require you to migrate?)

≪ Previous: Once bitten, twice shy: Why my docker-compose build keeps having problems with my azure devops artifacts feed

A few days ago, Microsoft explained on their devblog how to scan nuget packages for security vulnerabilities. This is a feature which was recently released, but has been on the github issue list for quite some time. Microsoft uses the Github Adivsory Database to identify vulnerabilities in nuget packages, click here for more information. Microsoft added the vulnerability check to their dotnet tooling. Just run a dotnet list package –vulnerable, (make sure to update visual studio or .net 5.0!!) and a nice overview of vulnerable packages is shown. However, this only works with the PackageReference format. In our situation, we are still using the old packages.config format in hundreds of projects, as we cannot migrate to the PackageReference format yet. This old format can’t benefit from this lovely gem; That’s why I decided to create a little script in order to get an overview of (possible) vulnerabilities in our code bases. The script can be found here.

The solution is quite simple:

Run the powershell script as provided:
- find all packages.config
- get all packages and the version and framework information
- create a new (temporary) project and insert all the packages using the package reference format
- note: I used this script as inspiration
run ‘dotnet restore’
run ‘dotnet list package –vulnerable’

dotnet restore

the ‘dotnet restore’ command could give some errors and warnings (in our case, it is), as every unique nuget package (determined by id, version and framework) is collected and inserted as a package reference. Errors as shown in the screenshot below are not uncommon, but this is not relevant for the objective that needs to be achieved.

dotnet list <projectname> package –vulnerable

This command will query the github advisory database and report any direct reference that has an issue. When using ‘dotnet list <projectname> package –vulnerable –include-transitive, even the indirectly used packages will be displayed.

Summary

Getting information about security vulnerabilities has become very convenient using the new dotnet list package –vulnerable addon. However, this doesn’t work with the classic packages.config format. A solution is to provide a small temporary projectfile which includes all the packages as a PackageReference

↧

I’ll present at the SUGNL: 3 lessons I’ve learned with Sitecore on Docker (and does Sitecore require you to migrate?)

March 30, 2021, 1:03 pm

≪ Previous: How to use the new dotnet Nuget Security Vulnerabilities scanning for packages.config and .Net full framework in 3 simple steps

On April 8th, the first virtual Sitecore Usergroup for the Netherlands will be organized, I am proud that Achmea, my employer, may host this first virtual meetup.

In my presentation, I’ll talk about 3 lessons (or maybe more) that I’ve learned when migrating from Sitecore 9 on Azure PaaS to Sitecore 10 on containers. Each phase (my first experience, optimizing the docker strategy and making the most out of the platform) will be part of this presentation. I’ll give you the answer to the populair question “does Sitecore require me to migrate to containers”?

This will NOT be a “getting started” session, but isn’t an “advanced concepts” session either. Just a fun session from a guy who had fun migrating from one technology to another.

↧