Disclaimer
This article represents my personal interpretations. There is a lot of stuff where I created mental bridges for me. This is often not backed up by any official Microsoft-statement. If there is one I will surely reference it.
Summary
Over the years (15+ now) me and my team have developed a deeper understanding of what Azure is, how one should use it and how it distinguishes itself from competitors. What also was part of this journey and still is, is the experience of a lot of issues the platform has, understanding the reasons for that and how to tackle it. So you won’t find any real solutions in this post. It is meant for sharing my thoughts so that readers can relate or disagree but at least have a picture about someones view. Hopefully it helps to understand how things could be approached and gives more confidence if you experience issues - which you certainly will.
The origins matter
Being a little bit late to the game (as usual) Microsoft had to think of a cloud which serves purposes AWS did not. So analyzing this should already clear some things. First of all I was surprised of some of the personas at Microsoft involved.
The CEO of this branch was (I think from the beginning) a guy named Scott Guthrie. He is part of a group of Scott`s at Microsoft which old guys like me know as “The Lesser Scotts” (Scott Guthrie, Scott Hunter and Scott Hanselman). All of these have a background in development and not in datacenters or something like this. Scott Guthrie was well known in the MS dev-community as the ASP.NET-guy. So this single decision is visible in a lot of things which make Azure different I think. But later more on that.
The second person which was a surprise for me was Mark Russinovich to be nominated as the CTO of Azure. He was most famous (or maybe notorious if you ask former CEO Steve Ballmer) for creating sysinternals, a suite of tools for Windows Admins which where (and still are) actually far better than whatever Microsoft provided. Sysinternals is now maintained by Microsoft and Mark is an employee now.
But think of it: You start a multi-billion operation, plan to buy properties, bury undersea-cables, design data centers and your own hardware - all of this. And who is responsible? The head of a dev-section. In my opinion this is visible in every corner of Azure to this day. For instance a lot of admins told me that they are confused about the way Azure names and handles things. On the other hand devs usually don’t have this issue. I think this is on purpose so that devs can easier relate to the stuff.
Another interesting point about the origins is the simple fact that Amazon is mainly a retailer where Microsoft is a software vendor. Having it’s own products in line and being deeply integrated in nearly every company in the world naturally leads to developing a cloud that tries to conveniently interconnect those 2 areas - on-prem and cloud that is - together. That is what Microsoft did from the beginning. So if you compare products between hyperscalers Microsoft very often looks poor. Most complains are about high costs in comparison. I think it is because Azure is targeting B2B mainly. And here total cost of ownership (TOC) is very important. If the cheapest service in some area starts at 70 bucks a month but frees me of a person I otherwise would need to pay I will take it as a company. If my licensing model is covered I’ll take the SQL-offer from Azure instead of the cheaper one from Amazon. But this is not seen by people asking questions on Reddit of cause.
Bad decisions
So far so good and the world could be very simple. Go AWS if you are a startup, a company that has low demands on-prem anyways or if you simply can afford to make decisions based on likes or dislikes. If you are a company with existing regulatory demands, MS-licenses or skills probably go Azure and thats it.
I think Microsoft made a lot of promises they did not keep up with and started to diverge from their community if you like. There was a time when they announced quality initiatives internally for instance. Reading about this seems funny today. Instead of developing products seriously on their Azure platform some of them are poorly designed and maintained. A few did not even see serious updates for longer periods. Instead the focus is almost always shifting with the potential of revenue for Microsoft.
Examples could be hypes of the past years like:
- BigData tools which are now completely overhauled
- The fail in providing managed blockchain products
- Starting Bicep and not pushing it seriously
- Trying to convince people that Bastion is the way to interact with your VMs.
- Connecting Windows with Azure no matter what.
- Bringing in MariaDB and MySQL and leave it at poor states
- Not fixing serious issues with the backplane like stupid naming conventions and network issues
- Creating marketing-driven default settings of products like GRS storage in the portal.
- The focus on low-code-solutions like Function Apps.
This list could go on for some time. The current pig is obviously AI. We experience network issues, backplane overloads and highly unstable APIs all over the place. It is not proven but plausible to assume scaling issues in data canters as a cause.
I know that a company like Microsoft needs to follow the markets kind of, but outsourcing a lot of the development to Open Source communities and being that ignorant about issues is not a good strategy.
A good example of the ill-headed Azure development if you ask me is the time Microsoft gives those issues at the Build conference and similar events in the past years. One of the baddest ideas they hat IMHO is the “.NET Aspire” program. I’ll come to that later a little bit but I guess that if, lets say, a Aspire guy meets one responsible for governance in customer clouds at the cafeteria those will get into physical fighting probably. The contradiction between those 2 areas alone is so huge that I watch this unfolding shaking my head in disbelief.
Anyways: this is not about all the problems in Azure. What I want to say here is, that Azure started with good intentions and now diverted leaving a lot of customers confused.
For developers
Azure is not a infrastructure cloud and should not handled as this if you ask me. The process called “lift and shift” where you move networks and VMs and simply migrate from OnPrem to Azure does not really help at all. It is possible and works in certain situations very well but this is something you could do with AWS, GCP and even a lot of hosters.
What you really should do is to ask: “What is the VM doing?“. Instead of spreading the so often bad architectures and now run them in Azure you should finally get rid of this and rebuild it natively in Azure. Azure can be so nice, cost-effective and still usable if you follow things like the Cloud Adaption Framework. But this demands planning, knowledge and first and foremost a certain culture in a company.
The only way to put real benefits out of Azure is to enable Developers to quickly ship features to a broad internal or external audience. Doing this in a secure and controlled way without stopping the devs from being fast and productive is the challenge the OPs should take.
If you don’t use Azure as your VM-outsourcer it is basically empty if you don’t hire devs to put workload in it. Turns out, most devs hate Azure to the guts because it demands a lot of tedious steps for them (Entra ID and other stuff). This is not because Azure is bad but because nothing is prepped in a way a developer would expect it to be. This is where now AWS and GCP being heavily promoted at universities can shine.
Now customers tell me that they would love to do something in Azure based on their already trusted relationship with Microsoft but simply can’t do it because they could not find people knowing something about it.
Inversion of control
There is this pattern in programming where you do certain things differently like instantiating objects. This IoC I think should be applied to Azure environments as well. What a lot of customers do (because they even get advised in that direction by MS employees) is to basically lock down their Tenant and whitelist allowed resources, operations and so on. This is not giving you the potentials of Azure as a developer. You will still get silly processes to follow and in the end your time-to-market might be even lower. Also it denies trial-and-error approaches and this limits the experience that you gain in Azure.
Operators now claim that they must hold control about what is happening in Azure. Hearing this I instantly can tell that it originates from a lack of knowledge. Azure has a lot of control mechanisms in place which have no counterpart on premise. One good example are policies which, if applied correctly, can give a dev a lot of freedom without taking governance and compliance at risk. In fact this would free up admins from a lot of tasks they today have in Azure. They just are not aware of this or do not get the time to apply them correctly.
Microsoft on the other side is making contradicting claims and makes it often hard to reach this point.
Efforts like Aspire in my opinion directly come from the fact that devs are telling Microsoft that deploying and monitoring architectures in Azure is so hard to do correctly, that their admins can’t do it and will also not give them the freedom to do. “So lets put everything in containers and make our Azure Container App being aware of this specific trick and here we go!”, says Microsoft.
I’m here to tell them “No” because it leads to quick-and-dirty solutions which are hurting CAF principals like naming conventions and will probably instantly fail if a decent amount of policies is applied to the subscription. I literally heard the sentence “Just turn off the polcies then.” from MS support when Aspire did not work on a tenant I managed.
Another issue is that CAF tells people to automate everything and use infrastructure-as-code (Terraform, Bicep). Then people look at demos during conferences or even the MS docu which tells em often to simple go here and there in the portal and happily click buttons. This generates wrong expectations and lets people underestimate the amount of work it needs to get reliable results.
“Shadow Cloud”
When cloud started, everybody was kind of hyped about the chance to get rid of a huge problem which was “Shadow IT”. Namely tools like Excel, Access, Sharepoint and so on created so much freedom for people that all of sudden a whole enterprise would rely on this single Excel sheet to operate. I’m overstating a little bit, but not that much. People started to use cloud-born tools like Zoom, Dropbox and so on against the will of the IT and now we had a darknet in every company.
Putting efforts to cloud in theory gives you this controlled environment which might prevent this from happening in the first place. But here comes the punch line. Ready? It didn’t! Lack of knowledge, bringing over old processes and thinking and also laziness. Now some studies are estimating that about 25% of workloads in the clouds are happing on a non-governed state. This does not surprise me at all.
Just the product “Azure Function App” is something I can’t appreciate. It can run arbitrary code inside of something that is maintained by operators who are not able to check what it is doing. So what do you expect to happen? Some now say that you could use pipelines to check those functions. For a lot of devs the reason to use them was not to have to bother about those things and have something that can be changed directly in the portal. You wouldn’t believe how many times I saw somebody copy-pasting important code from VS code to the function app in the portal and happily hit save-buttons.
The idea was that DevOps (another myth that didn’t really make it to “normal” companies really) will prevent us from getting into this situation. It tells people to come together (devs and ops that is) but nobody told decision makers how to do this. and that this important. At first it raises money-demands and that brings the chance that it may pay off. This is not so sexy as it sounds in neat PowerPoint slides.
But thats my point: I think that you cannot cherry-pick stuff and expect the total result to run smoothly. Either you are aligning to the idea of a cloud-first approach but then you also need to invest into your company culture. If you don’t, you gonna inherit the same problems all over again.
This is not because Azure sucks or AWS is strange in some areas or whatever. It is because you are not investing seriously into it.
Solutions
But, it is what it is, right? People run stuff on clouds and get attacked or face other reasons of problems. So what to do?
My company DEVDEER was facing this situation while we where developing and deploying solutions in Azure. After we went productive we where literally searching for people to manage those environments. We couldn’t find them and 4 years ago we decided to tackle this problem.
So after we thought we understand Enterprise Hub & Spoke, Compliance, Governance and other buzzwords around cloud-architectures we started to make solutions ready for this. This is something which is opinionated and surprisingly is not covered so well by the docs. It required a lot of testing, failing, support tickets with an often clueless first-level support and many more fun parts.
We now have a lot of tools and patterns which allow us to do the job the IT should do together with us (being the devs). We have developed CAF-compliant Bicep templates, a PowerShell Module which allows to automate compliance checks and preps and default NuGet Packages which bake CAF into the source code as well.
We also provide our own managed service now giving our customers assessments on the state of their projects. We tell them about outdated packages and underlying frameworks, we watch the behavior of systems instead looking into availability and costs only (which we also do). In short: We took the job and now generate revenue with it.
This is not to advertise my stuff! It is mentioned here to show that even a small company like us can handle it if the will to do it exists.
It did take us a substantial amount of investment in terms of money and time. So the one thing I can now say with confidence is: Azure works if you are willing to invest in it.
Yes, Microsoft could stop certain initiatives and put more effort in fixing annoying instabilities and mystical limits (why am I not able to put dashes in storage account names and limited to 2 dozens of characters for instance?). But time will tell and you can deal with those things. Just lay a solid foundation and be ready to adopt because the changes are coming quicker in the modern world.
So cloud with Azure demands exactly the same as - lets say - buildings. You need planning, you need to groom them and you need specialized people. It is about the amount of investment one is willing to put in those things. In the end the CEO can touch a building, understand if it is doing ok and maybe brag a little about it. Those are things he or she cannot do so easy with Azure (or any IT system) and this is what needs to be addressed. The decisions made are very often not based on rational levels.
Conclusion
Azure is complicated in detail because it has to be. Microsoft has to give us control and power over those things and with that comes a lot of work and responsibility. Azure is a cloud made for developers. Enabling those people is crucial to really gain benefits from it. The one thing it requires is the readiness for mind-shifts. It starts with the amount of time I give my technicians and ends with the open-mindedness of admins and devs. So: Business as usual.