- nicolai's notes
- Posts
- i 20x my azure bill (21k€) – emotional intelligence, crisis management, war mode and communication
i 20x my azure bill (21k€) – emotional intelligence, crisis management, war mode and communication
(reading time 25 minutes, writing time 6 hours)
recently, on a tuesday night, i was finishing up my work. we’re building this internal tool, easytimetracking, to automate the timetracking entries of most of our consultants.
it’s still in the early stages, the mvp phase. so we’re testing assumptions and that night i was deploying a few openai o3-mini
models across availability zones on azure to make sure that we have enough capacity for the upcoming pilot phase.
so i’m asking cursor to apply a few updates here. make sure that the deployments are working!
routine stuff.
then came wednesday morning. at 10 or so i received an email in my inbox. from IT. cost alert. your azure subscriptions have exceeded 10k€ in costs.
hm weird… we usually spend just 1k per month
let me open the dashboards, check it out. alright, yeah, the costs are here

let’s figure out what’s going on here…
i text the it, that i’ll call them after my meetings are over, during my lunch.
i went on the call and together we’re browsing the azure cost dashboards.
the worst shit btw! you can’t imagine how much i hate azure and it’s boundless complexity! but i work for a microsoft partner…
so i’m checking out all the main metrics. token usage. number of requests. deployments across regions.
switching through all sorts of dashboards. from main azure to azure ml studio to azure openai studio (what a mess…)
trying to find something… something that seems out of the ordinary.
but we couldn’t find anything. so we open a ticket: billing issue.
we believe there’s a bug in charging here.
we submit the ticket and i got back to work!
btw, i’m nicolai. i’m the head of engineering for apex innovation. and i’m sure you can feel it, but there’s something bad about to happen! i’m sharing my experience, how i dealt with it, what went wrong and what to improve with you, so you can learn from my mistakes! if you’re curious for more, subscribe!
anyways, i’m back to work and just waiting for the ticket to be handled.
but before i sign off work, i get another text: nicolai, the costs have surged to 20k!
WTF!! this seems a lot more serious than i thought
it’s just going straight up. linear growth, no end in sight. i need to take this a lot more seriously…

i need to switch into war mode!
war mode
now in moments like this, it’s very easy to panic and to loose focus and loose control.
the numbers are frightening. the continuous growth. it’s just a lot to deal with!
but, it’s also the moment when it’s crucial how you’re reacting, what character you show and how seriously you take the situation!
you need to shift into war mode.
war mode isn’t about fighting, about being aggressive or about being intense.
it’s counter-intuitive. war is chaotic. everything outside of you might feel upside down. you’re not in control of the external situation.
so you need to be even more diligent about keeping yourself in control!
so my reaction wasn’t panic, it was a surge of intense focus and calm.
how to reach this state of calm and foucs? this is my manual for managing emotional crises. it doesn’t just apply to business situations, but also personal situations, where calm and straight forwardness is the real strength!
First Thing: acknowledge your emotions. don’t supress them. it’s like keeping a ball under water. feel your emotions!
then you need to step up, get a bird’s eye perspectice, stop your emotions from being at the steeting wheel and shift into the “observer role.”
this is my internal dialog in these situations: “Okay, part of me feeling anxious right now. part of me is stressed an worried about this number. i’m worried about judgement and doing something wrong”
“But I’m observing that feeling. I’m distinct from it. It’s part of me, but it’s not all of me.”
Attach your anxiety only to part of you. it doesn’t define you. distance yourself from the raw emotion.
if you find your emotions to be too overwhelming and too difficult to control, use your body as a tool. do breathing exercises. simple, deep breaths. you control your destiny if you can control your breath.
Next, try to feel every part of your body consciously. scan downwards. i can feel my head, i can feel my neck… oh it’s tense, loosen the shoulders. i can feel my chest, my arms, my hands, my stomach, my legs, my feet!
this will firmly put you in the present moment. it breaks the panic loop and takes you back to consicousness. when you feel yourself slipping back into anxiety, repeat the exercises.
Many traditions, from Kriya Yoga to Buddhist mindfulness and Christian contemplative practices, use breath and body awareness to cultivate this observation of self and stillness.
Now, with emotional freedom and grounding... analyze clearly.
Yes, I have emotions. acknowledge them. but what you need to do now, is to figure out, what is actually going on here!
if you don’t analyze clearly, you’re flying blind. emotional separation from compartmentalizing gives you the mental clarity to analyze effectively.
the main benefit of crisis: it eliminates any doubt or distractions. only this matters now! you get immense clarity. this is actually a good life lesson! always strive to be in a position where you’re 100% clear on what you need to do. no more doubts!
in this specific moment (seeing the 20k) my task became utterly clear: i have to deal with this problem. right now!
root cause
when you start to analyze, it’s good to brainstorm all the different reasons that may have caused this issue.
one of the first things to check that came to my mind was the used token counts of the llm deployments.
i’ve read far too often about some involuntary loop, that infinitely requested more and more tokens that all had to be payed!
i’m not super deep into ai deployments anymore… i mostly manage engineers and do some coding in between. otherwise the mistake probably wouldn’t have happened in the first place. so this seemed like the obvious cost vector.
but i was wrong. the low token count on the dashboard showed that the high costs were not caused by a lot of requests.
in the meanwhile IT was doing deeper digging while the costs kept climbing… they went through the costs page of azure and filtered by which costs could be the main cause here.
after hours they figured out, it was a special provisioned type measured in Pull-Through-Units that i had somehow configured using pulumi.
that type of deployment was fucking expensive! 1000€/hour.
insane when you think about it. only customers who have extreme traffic and need reliable responses should even think about configuring this type of deployment.
my mistake earlier was looking at the wrong metrics. tokens vs provisioned time. and the azure dashboards were not super well set-up where i could drill down in the costs view and specifically see what the cause was. it was too high level.
this root cause about the “provisioned” type was identified by a colleage at IT. another colleage of mine took the courage to just delete the deployments to stop the costs!
the lesson in this is to have a team of experts in your corner so you have full power to respond to everything coming at you!
you don't have to be the sole expert on every technical detail, especially as a manager. you just need to have a circle of experts around you to get to the solution. don’t wait too long before activating them.
now, finally it was time to:
stop the bleeding
once we had the cause, the action became clear! delete the 1,000€/hour service!
so we did! we immediately delete all deployments!
earlier, when the cause wasn’t clear to me yet, i was afraid to delete any of the deployments, worrying that our customers and users would be impacted.
i don’t think i made the right call. our cosma bot is pre-revenue.
lesson: in any scenario where you’re loosing a bunch of money, it’s better to cut out the costs to make sure you’re on safe ground!
but after removing the deployments, the graph looked like peace again! :D

the bleeding had stopped, the fire was chocked. but the real problems are only just starting!
brothers (male/female) in crime
now, without help, this wouldn’t have been possible.
€21k of company money was wasted, due to my mistake.
before communicating this to my superiors and budget owners, i went to get help from my brothers in crime.
how to even approach this kind of conversation?? what's the right strategy?
you should never think you’re the smartest person here. that you know everything. anyone can benefit from advice and feedback of others!
the easiest is to get the help of someone you know, outside of this full mess!
someone who understands high-pressure situations, tech, communication, someone who’s maybe seen it before
i talked to 2 different people about this crisis (you know who you are and i appreciate you!)
they were there for me to listen and they were there for me, after i finished letting out my stress to give good advice on how to approach this situation.
they helped me strategize, they helped me communicate and they kept me sane!
unfortunately, this is only something you can do, if you’ve built these connections beforehand. like getting insurance you can only sign-up for it, before the crisis happened. luckily you’re probably reading this, when nothing major is going on, so call up people you know, build connections and build trust!
don’t isolate yourself in a crisis. seek help from your brotherhood and get support wherever you can find it.
it would be stupid to think you need to handle this on your own…
communication
now, actually the hardest part. talking to the people in charge!
how you communicate determines whether you will build trust or destroy it. finding the right balance between mitigation/containing the issue and commucating it is very important.
your timing depends on the situation, the magnitude, your stakeholders and your experience.
depending on how confident you are about fixing the problem after the fact, not just containing and reducing the exposure, you should reach out sooner or later.
i was pretty confident, i can handle this with the help of my team, and getting clear understanding and clear data on what was going on was more important than immediately involving everyone else.
so i called my superior the day after it happened and the problem was extinguished. of course it was still a lot of money and i needed to own up to it!
regardless of timing it all comes down to trust though. how you do build or maintain trust in this situation?
my manual:
Step 1: initiate the contact with urgency and gravity
i texted my manager: I have something very important and rather urgent to discuss with you!
this signals: i need your attention here, it will show after the fact, that i took the issue seriously and it shows that i’m conscious of the magnitude here
Step 2: anchor high and go 100% vulnerable
when you are in the conversation, overindex on being conscious, vulnerable, taking ownership and understanding the depth of the issue. it’s about conveying genuine regret, remorse and taking things super super seriously.
you can’t take too seriosly. always do more than you think is enough!
make yourself the biggest idiot ever to have lived on this earth. it sounds dramatic but it signals sincerity.
my choice of words: hi fabian, i made a really really terrible mistake!
i’m almost ashamed of myself for letting this happen
i fell really really guilty having to tell you this
then, just say it upfront and directly: i accidentally spent 21k€ on azure
this vulnerability earns you candour trust. it shows you aren’t being casual or cocky about this. you deeply acknowledge the magnitude of the mistake. candour trust is about not holding back on criticizing yourself.
taking 120% ownership of this mistake.
avoid any hint of pointing finger: oh this happened because of X person, or that bug, or the confusing ui.
NO!
make your there’s absolutely no doubt in their mind, that this mistake wouldn’t have happend if it weren’t for your acting.
people get suspicious of those who try to shift blame. if it’s undeniably my mistake, i take that proactively, there’s a lot less room for others to criticise or doubt my accountability.
next step is to switch to
Step 3: tactical trust
once you’ve established that you take full overship of this issue, you need to show that you can also handle it. tactical trust is about fixing your own mess!
immediately show everything you’ve already done.
i’ve: stopped the bleeding. things are deleted, the meter is frozen, the costs are contained
mitigation initiated: i already filed a ticket with azure for a refund.
prevention started: i have already deployed a policy that will prevent me or anyone from making the same mistake again and i will make sure it’s deployed across the organization.
if you can show that you will be able to fix this, and you showed the range of things you’ve already done, you’ve won half the battle for trust!
mistakes happen (NEVER EVER EVER say this out loud in this moment. let the other person say or think it for you)
but mistakes also need to be managed. here, you show you’re someone you can go to war! someone who can manage around mistakes effectively
Step 4: ask for advice (carefully timed)
after presenting what you’ve done, offer the chance for their input: what else do you think i should do here. did i overlook anything?
a lot of superiors like to be in control and provide direction, proactively showing your completed steps and then opening the door for their guidance will make them feel at ease, that this situation is not gonna get out of hand!
timing is crucial here.
if you ask for their advice too early, it can seem like you just want them to fix it for you.
it depends on your seniority, the magnitude of the issue, and the stakeholders involved. you’re more junior: communicate earlier. within minutes of discovery, but not without a plan. show your initial list of ideas, of what to do and get them involved quickly!
a good question to start the conversation here is: help me with tactics here, on how to fix this!
for my 21k mistake, i communicated later. about a day after discovery, after i already applied initial fixes, because i was confident i could handle finishing it up and would be able to present a clearer picture.
rule of thumb for juniors or very complex issues: after only 30 minutes of “fixing mode” have a a list of things you need to try. when you start reaching out, make sure the easy and small stuff is already done and tried.
if your list is short (nonetheless, you should always have something you came up with) present the list and use candour again: i’ve come up with this, but honestly, i feel like it won’t help address the issue here. please help me with tactics!
judgement
now, while you’re deep in war mode, you might notice judgement coming from either inside or outside of you.
in this context, judgement is unproductive. it serves no purpose in solving the actual problem. so you need to reject it.
either you’re judging yourself: how could i let this happen, how could i be so stupid, i’m the worst…
or one of your superiors is shaming you…
in that case, it might be time for you to find a new job. again, mistakes happen, but if you made an honest mistake, there’s no reason they should judge you.
if you notice self-judgement, here’s my manual for letting it go for the moment, so you can focus on fixing the issue at hand:
acknowledge the feeling. it’s likely, a part of your personality that formed when you were younger. the part was born to protect you from the judgment of your parents or maybe your teachers. but since nobody is saying anything to you right now, it’s over-acting.
open a conversation, saying: i appreciate you looking out for me!
that may sound weird to you. but if you say these words out loud, while feeling into your body, you will start a conversation with your part that you appreciate their help, and that you need to focus on the issue now, but will get back to them later on.
it’s unlikely though, that you will immediately get through and get their trust. i’ve been practicing for more than two years at this point. so you need to move to step two:
mentally separate.
this is a part of me, that wants to help me, but i need to focus on moving on to the problem-fixing now. if you create a separation inside of you, if you disconnect yourself, make yourself distinct from the feeling, you will be able to loose your running thoughts and emotions.
this frees up your focus. it’s what you need in this moment!
how your manager responds in this situation is a good measure of their crisis management skills.
judging you for the mistake is not fixing the problem for the company. it’s a failure of crisis management on their part. they’re only focused on the blame, not finding solutions. 1
it’s almost fractal: they “made a mistake” by having a system or process that allowed a 21k blunder to happen and their job as manager is to help you become a better crisis manager for the future!
if they respond with judgement during the crisis, they’re not beeing crisis manager themsleves. they’re adding noise and reducing effectiveness.
a godd manager will respond with support for you and focussing on solutions.
ideally they will say things like:
“i appreciate your candour, nicolai”
“i feel you, that sounds very stressfull, but we’re gonna fix it”
“i’m glad you took these steps immediately to fix this”
“honestly, i’m a little stressed right now, thinking about the cost, but we’ll figure this out together” (this shows empathy, shared reality, co-ownership and reduces stress by taking some on themselves)
same applies not just to judgement but to fear. fear is fine. it’s a normal emotion. practice acknowledging fear:
(say out loud)
i’m scared right now about the consequences. that’s okay, it’s super normal! it’s a big number and i made a big mistake. but this too shall pass!
then pivot to action and self-trust: but i’ll handle this and i’ll fix it!!
Letting go of judgment allows you to "love yourself" in this context – not as a sappy platitude. It means allowing yourself to be human (capable of errors, capable of fear) and still trusting your inherent capability to act, learn, and improve.
finishing the job
the initial crisis phase – stopping the bleeding, communicating transparently, handling the initial emotional wave…
that’s intense, high-pressure, war mode stuff. but it’s just the beginning.
the really hard part comes in finished the job! seeing it through to full resolution. getting refunds, making sure everything is permanentely fixed and clean.
this phase takes 10x longer than the initial crisis. and that makes it harder!
it starts when the emotions have faded, when the immediate pressure is relieved. but you’re not done! the 21k is still potentially gone from the budget.
this is where you need discipline!
it’s important to keep your goals clear. you will have discussed them with your boss.
keep yourself accountable and finish the job.
for my azure mistake: i need to get a full refund!
i wasn’t done after filing the ticket. i followed up twice a week, when they weren’t responding, i reached out to people inside of cosmo, people who have connections to microsoft. i talked to it again.
don’t. loose. momentum.
this is where you can really show your professionalism. most people will somehow survive the intense 30 minutes for fire fighting. but consistently pursuing resolution over weeks and months? that shows true reliability and dedication.
if you drop the issue, if you stop following up, if you’re not finishing the job, all the trust you built earlier through candour and initial action will flip into distrust!
you go into “talker” mode in people’s minds: he talks a big game when things go wrong, but can’t deliver on the follow-through!
it’s 10x the effort and requires sustained focus beyond the initial adrenaline. but this is where the biggest payoff happens, both in terms of reputation and trust, as well as in actual money.
if you deliver here, if you do see it through, secure the refund, implement permanent fixes… then, despite spending 21k on azure, you will likely come out of this more respected and more trusted than before!
handling crises, and especially spending a lot on cloud seems like a weird rite of passage for people working in software…
not a week goes by, without some post on hackernews or twitter, on how they accidentally spent a bunch of money…
you almost have to srew up once somewhere big to get the battle scars and truly demonstrate your resilience and crisis management.2
getting through a big screw-up builds immense credibility in the long run if you handle the follow-through well. it shows maturity, reliability, and diligence.
retro
if everything is well and good and done, what is the single most important thing to do now?
you shouldn’t let a good crisis go to waste!
learn something from it!
so, what are my hard-earned lessons here?
technical lessons
add cost limits. Everywhere possible. For critical services or new deployments, especially for non-standard billing models. Set alerts much, much lower than €10k!
procedural lessons
go deep! amazon (and toyota much earlier) have a system of five
WHYs
: don’t stop investigating until you’ve reached five levels deep! my initial investigation was insufficient. i needed to go deeperdon’t ignore the costs. every €/$ you spend is worth something, and especially if the costs keep climing continually, you’re not just spending the present euros, but also the one’s that will keep coming, if you don’t fix it!
eq lessons
practice shifting into "war mode" – controlling your internal state when the external is chaotic
use mindfullness techniques to get your focus back, manage your panic and keep a straight head
active compartmentalization: unproductive self-judgment and fear during the crisis phase to free up mental resources
externals
communicate proactively and with candour (120% ownership of the mistake)
anchor high with vulnerability, then immediately follow with demonstrated tactical action (what you've done to fix/prevent)
trust is built on both owning the mistake and demonstrating competence and reliability in handling it
follow-through
the job isn't done when the fire is out. persistence in the long tail of resolution builds immense trust and professionalism. see it through to the very end, no matter how long it takes
career
pay close attention to how your employer and manager react to significant mistakes
a supportive, learning-oriented response focused on solutions, growth, and shared reality is a crucial green flag for a healthy workplace
judgment, blame, or lack of support are red flags... showing you it's time to consider a different environment where mistakes are treated as learning opportunities, not career-enders
conclusion
like i said earlier, i feel like i got my rite of passage in this experience. it was stressful, it was fun, it was intense.
in the end, we got an offer from microsoft for a 75% refund. i’m not super happy with it.
they initially promised 100%, but since i cross-deployed the models in multiple zones, they could only offer 75% due to some “internal policy”…
but i learned a lot. i got one more notch on my belt.
i have one more story to tell.
i know what to avoid and i feel more mature.
i’m glad i handled the situation well. in the end life can be a lot about regrets. and letting the opportunity pass to do a good job, that’s something that future nicolai would definitely regret…
regardless of where you’re at in life, remember that all experiences, good or bad, are really just experiences. things that you went through. and the way you handle them, shows your character.
live life through action, not through words.
with that, thanks for reading and see you in the next post!
~Nicolai ✌️
PS: if you’d like my help in speeding up your career, going to a senior in just 6 months (including the salary bump), check out my coaching programme at nicolaischmid.de/coaching
1 i’m subscribed to the model of spiral dynamics. companies used to be at the orange stage, think: achievement, progress, dominance (Bill Gate’s Microsoft, Oracle, …) then there was a wave of companies which switched to the green stage: unity, one big family, the bigger goal (a lot of NGOs, Google, Sillicon Valley Startups). I think both are mistakes. We need an integrative approach of achievement through unity. also known as the yellow stage. let me know whether you’re curious to learn more!
2 Matt Levine likes to say, that in some ways, it’s more impressive to say you lost a billion dollars than not. it means someone trusted you with a billion and that must mean something