23 July 2024
Can you optimize your cloud costs too much? An interview with Mark Yudin
What does it mean to optimize cloud costs? Is it about hunting for discounts, reworking the architecture, or focusing developers on business value delivery? Mark Yudin believes the definition doesn’t matter as much. What matters is that you find a sweet spot that makes the ROI on cloud optimization worth it.
The CTO vs Status Quo series studies how CTOs challenge the current state of affairs at their company to push it toward a new height … or to save it from doom.
“You are already ahead of the curve simply by doing ROI calculations”
☝️is what Mark Yudin, VP of Engineering at Insify, believes about cloud cost optimization. The bar is not set too high, isn’t it?
It may seem so, but then again, there must be a reason why. According to Mark, most cloud companies either over-optimize their cloud infrastructure or disregard this subject altogether.
He believes that if your company doesn’t want to be in that category, it should:
- understand that the indirect way the cloud infrastructure affects business spending is more important than direct cloud costs,
- come up with a scalable plan for optimizing cloud costs over time, adjusted to your company’s current situation,
- avoid common pitfalls of cloud cost optimization.
If you do just that, you can develop a system that is efficient and flexible enough not to restrict product development. Don’t believe it? Mark has all the details.
About Mark and Insify
Bio
Mark Yudin has walked a path from software developer to leader, eventually leading him to become the VP of Engineering at Insify in March 2024. Before he achieved that milestone, he saw success as an Engineering Manager at eBay, Software Developer Manager at Amazon, fintech startup co-founder at Biller, and Director of Engineering for Juni.
Expertise
Product development, team management, IT strategy, software architecture
Insify
Founded in 2020 in Amsterdam, Insify offers digital insurance to thousands of freelancers and small businesses. It fills the gap in the market that causes many small to medium companies to remain underinsured due to factors such as limited understanding of insurance requirements. With Insify, they can get a quote directly from the compay’s website. In 2023, Insify completed a Series A funding round of €25 million.
Mark on his role at Insify
Arkadiusz Kowalski: Hello, Mark. Thanks for being here today to talk about cloud costs. Before we get to the main course, tell me something about your new position as VP of Engineering at Insify.
Thanks for having me.
I joined Insify only a few months ago. I am impressed with the determination and intensity of culture here. I also really like the leadership team.
I looked for an organization where my extra effort could really help push a business forward. In some places, you can put in more effort and won’t notice any result of that. You can be satisfied with your work individually, but it won’t amount to a change in the organization itself. Such organizations simply move too slowly.
At Insify, I saw the opportunity to impact a fast-moving organization. And I haven’t been disappointed. There’s a lot to do, but that’s what I came here for.
What Mark learned about cloud cost optimization in 8.5 years
With your diverse experience in the startup world, you could easily compare different organizations.
Startups tend to be cost-conscious. I’m sure that investors don’t like the idea of wasting their money on redundant cloud resources. How important is the issue of cloud costs to Insify?
Being frugal is very important to us. By definition, a startup tries to challenge competitors with a smaller team, fewer resources, and a smaller contact network and customer base.
The salaries and benefits of the engineering team and other fixed expenses are generally the biggest part of a startup’s overall cost. Engineers alone may account for 50-60% of the payroll in more technically-minded startups.
But cloud costs can also make a difference. And I think that there are two variables worth discussing in their context.
First, you’ve got direct cloud cost, which is the cost of running software. That’s where you look for a sublinear scale. You don’t want to scale linearly with the number of customers. The cost of running software should decrease over time.
The total cost of running your software, especially in the early stages, is usually not big compared to salaries. But as you scale up, it may increase rapidly if you ignore it — especially when you use a lot of tools for hosting, managed services, and observability, like Datadog.
’The second cloud cost variable is the employee time cost, which is less visible because you pay for it anyway. Depending on the quality of your architecture, it can dwarf the direct cost. Suboptimal architecture may force you to use 30-40% more time of your people just to stay afloat. What’s more, when your architecture is deficient, the cost of each new feature may continue to increase. As a result, you go slower as you grow, which is what all startups want to avoid.
Insify is mid-size when compared to other startups you worked for. You’ve been a Director of Engineering at Juni, which had a team of 150. You also founded a small startup called Biller and had it acquired in six months. How different were these companies in terms of cloud cost optimization?
I mostly worked for B2B startups, and for those, the direct cost of running software for clients has always been rather small.
Every transaction we facilitated at Biller was essentially a buy-now-pay-later type of deal for businesses. We received a percentage of each transaction, which more than covered the incremental cost of running our software.
In this tiny startup called Biller, the biggest costs were the opportunity cost and the fixed cost of engineering salaries. On the bright side, we had the money for all the fancy cloud software solutions. It gave us a lot more speed in the beginning.
At Juni, we also catered to small businesses. It was also easy to ensure that each incremental customer more than pays for your software running costs.
But we took special care to ensure that the second variable, the hidden cost related to architecture, didn’t get out of hand. If you build your platform in a non-standard ad hoc manner, you may face extremely painful consequences in the future. That’s one of the reasons we followed the AWS Well-Architected Framework.
An interesting example from outside the startup world was my work at eBay. It had different brands in different countries. We had our own data center, which made cloud cost optimization even more important. The indirect running costs were still quite high, which is probably why they are moving to the cloud now.
I also worked at AWS, which naturally uses its own infrastructure products. The Cloud9 IDE product I worked on was for developers, they would pay only for the compute and storage used. That way, Cloud9 IDE brought in revenue indirectly.
Since you mentioned your work as a Software Development Manager at AWS, I wonder if you thought about the issue of cloud costs differently back then. Did the startup experience change your perspective somehow?
It gave me much insight into the long-term implications of using different external tools.
Cloud costs are always something that is at least one step away from your first problem, which is how to ship more with fewer people. But at the same time, taking care of your architecture can eventually have a big effect on your costs.
Amazon doesn’t use a lot of external tooling to avoid having too many dependencies. They are cost-efficient by constraining themselves to using internal AWS tools only. That lowered their direct cloud cost. The incremental cost per customer and the ability to operate in a stable way was Amazon’s strength.
On the other hand, for startups in the B2B space, the cost of additional tooling can easily overtake the cost of running the base software. You can easily spend more on CRM and observability software such as Datadog than you do on AWS even when you extensively use managed services and large RDS instances.
I’m not saying that you should stop using external tools completely. Some are very valuable to a business. But be aware of the trade-offs. You need to decide when to use a third-party tool and when to build it in-house. A custom solution may cost more initially, but it may avoid certain problems, such as vendor lock-in. A major price increase when you already built your whole platform around third-party software can be troublesome. It’s a matter of figuring out the ROI.
Insify’s 3-level cloud optimization strategy
You view cloud optimization as something more than just a cost-cutting measure. A CTO we talked to once told us that it is actually a reflection of the state of the architecture or perhaps even their entire organization. He said: “If I pay more than I should, I must have a sloppy approach to architecture or maybe even my whole business.”
What is cloud cost optimization to you?
It’s about the relative cost of the infrastructure to your business and how it scales with each incremental customer or additional engineer. By examining cloud costs this way, you can find out how expensive it is to add new features.
Generally speaking, if your incremental costs increase faster than your number of customers, you’re doing something wrong. It’s very atypical for a technological business to be that way. What you should see instead is the effect of the economy of scale, where with each incremental customer, the additional cost is close to zero.
I also agree with your analogy between architectural problems and organizational problems. Organizations spend a lot on tools per number of engineers. When each incremental customer gets more expensive, you begin to notice issues across the business. You’ll see poorly managed projects, misused resources, and a lack of discipline.
Let’s consider a scenario. You take over an AWS infrastructure that appears to generate many needless costs.
Naturally, every case is different. But maybe you can devise a generalized roadmap to help this example company control its cloud costs.
Before you get to the optimization, you need to take care of some things first, including the company’s data analysis capabilities. If your costs rise while customer growth metrics don’t, and you don’t know why, there’s not much you can do.
But if your data analysis works correctly and you find what’s wrong, you can dedicate engineering resources to correct it.
ROI is another preliminary consideration that gets overlooked, surprisingly. You don’t want to over-optimize. If your infrastructure is tiny and you spend 100 times more on engineer salaries than on cloud costs, optimization won’t pay off. For large systems, optimization can be worthwhile. Also, if your system has never been optimized before, chances are that your potential ROI will be good. In fact, you could cut your cost in half.
After going through that prep phrase, you can start optimizing. There are three levels to it.
At level one, you don’t look at anything beyond what you actually spend. Instead, you focus on possible discounts of various kinds, which many providers offer. That kind of optimization takes the least effort and can often be done by non-technical people with access to the account manager.
The second level requires more expertise but provides more optimization opportunities. Now, you handle things such as provisioning capacity autoscaling or using spot instances.
Finally, the third level is about changing the architecture and application behavior. Work on this level creates the most benefits. Typically, that’s where the biggest cost optimization opportunities can be found. It is prudent to evaluate that your architecture can support longer term growth even before you have to optimise costs because it is much harder to change it later.
Should all companies go through all the stages when they decide to optimize their cloud infrastructure? When should they do this? After all, we all know how difficult it is to dedicate resources to activities that aren’t directly related to creating business value, especially for startups.
A lot can be saved if you don’t have to go to bucket number 3 and re-architect. To do that, you need to build your architecture in a way that scales from the start.
Unfortunately, startups face a lot of uncertainty about what kind of product they want to deliver. A lot can go wrong. Sometimes, figuring out your business model may force you to modify your architecture. You may even want to pivot your entire product. That also has many implications for technology.
I’m a big proponent of not overworking your architecture at the beginning. Your company may go under before you reach that scale. You may end up with an amazing architecture that runs smoothly and a product no one wants to buy. It’s difficult to avoid having a non-optimized architecture if you actually grow quickly enough as a startup.
Generally speaking, I think that architectural cost is far from the main problem for a pre-seed level company. There is no fixed generic path, but you should definitely monitor how much you spend – in general as a business, for salaries, or operation costs. If the cost rises sharper than your number of customers, that’s where you need to consider optimization.
Of course, as you continue to scale, you exceed your ability to address the situation with easy stage-one solutions. You need to think about re-architecting as you go through further stages of growth. But you must also calculate how much resources you’ll spend on stage two and three practices before you over-optimize.
Finding the sweet spot of optimization is hard. But simply doing ROI calculations puts you ahead of the curve. Many companies today either over-optimize because they think they will grow 1000 times, or they don’t put any thought into this at all.
The kind of over-optimization you talk about seems an example of a continuous improvement strategy going wrong. But what if we wanted to create a CI plan for cloud cost optimization that is actually feasible and worth following? A common scheme AWS recommends is the monitor–analyze–act loop. What would it look like for you?
The initial CI process for cloud cost optimization can be very lightweight and scalable. It could be as simple as periodically checking how your metrics look now and how they compare to their previous state.
You can use mechanisms, such as built-in tools offered by cloud providers, to track your situation from the beginning. Use them and make your progress measurable. If the cost unexpectedly increases 30% month over month, investigate. A sudden rapid increase can be dangerous for a very early-stage startup.
At later stages, you need more complex mechanisms. Once you have several teams, you must bring in more people to determine your spending for all your accounts, split by service type or even different cloud providers.
Let’s wrap this part up. We talked about the ROI of cloud cost optimization itself so far.
But how important is optimization for ensuring the ROI of the entire cloud operation? At the end of the day, should companies really focus on that a lot?
They should definitely give it a lot of thought, but instead of prioritizing direct cloud costs, which are usually not a big percentage of the overall spending, companies should focus on the indirect cost of running software in the cloud.
This indirect cost is about how much effort you need to put into managing something that’s poorly designed. If you have a lot of incidents due to poor infrastructure or capacity, you can destroy your team’s productivity. Startups are especially vulnerable to this because they are usually less mature about handling their operations. Needless spending on rollbacks or bug fixing can eat up to 40-50% of your total operational capacity.
What to watch out for before optimizing costs
I wanted to finish with a quick overview of the most common optimization mistakes companies make. Then, we can talk about some overlooked but helpful practices that can ensure optimization succeeds. Let’s start with questionable choices and attitudes. What comes to your mind?
It’s probably the knee-jerk reaction that pushes companies to build in-house as much as possible. In other words, it’s choosing to build a whole new solution for an existing common problem by default without analyzing other options.
Building in-house can be appropriate, even required, in some cases, depending on how critical a feature is to your business. But what I’ve noticed, especially in early-stage startups, is that if your engineer or co-founder sees a project as an opportunity to build a greenfield technological solution, it rarely produces good business results.
Technology is easier to fix than people, culture, or business. That’s why you can afford to be flexible about it early on. You may think that building something for 10 years rather than six months will save you effort in the long run. But the long run doesn’t exist yet. If you spend a lot of time polishing your technical solution from the start, and it takes time away from your product development, you run the risk of burning through your cash. If you don’t attract customers, you may find that you have nice, polished software nobody cares about.
I would advise against overthinking technology. Make sure that you have a proper time horizon tailored to your efforts. Don’t optimize for the future because you may be optimizing something that you shouldn’t be building in the first place. Start with the basics instead—keep track of your cloud costs and overhead and adjust as you go.
Turning to external partners can be a good way to supplement the in-house know-how. Few businesses can truly specialize in cloud cost optimization.
What should be considered when a company chooses its technology partner?
I don’t necessarily think there’s one key factor to watch out for. You just need to try things out.
Paying someone to do something they do all the time is usually more effective than doing it yourself. It also has a much lower overhead. Therefore, working with an external vendor is extremely attractive, especially from the opportunity cost and engineering perspective.
You want your engineers to focus on solving your customers’ problems. Instead, an external party can optimize, especially the first two levels and even a big part of level three. Especially early on in your startup life, your young team probably hasn’t done this kind of cost optimization before. You don’t want them to spend time learning that. That would be prohibitively expensive.
Instead of interviewing a million different vendors and searching for all the references you can get, try to do a very low-risk test. Get into a limited engagement with a vendor. See what you can get out of that.
In other words – actions speak louder than words?
Definitely. You can go through a regular framework of references, track records, reputation, cost comparison, and so on, but you will never be completely sure you’re making the right move just based on that.
There’s something I feel strongly about.
In the startup world, or maybe in business in general, a quick, bad decision is better than a very long period of indecision or even a good decision that took way too much time to materialize. You realize faster whether something works or not by simply trying it out instead of pondering it. You may be tempted to delay something until you have all the data. Except you never have all the data. You end up making no decision.
Resources
Which learning resources would you recommend to leaders who want to improve their cloud cost optimization strategy?
There are excellent free resources available at every stage of your optimization effort.
Start with all the stuff that you can get for free from your cloud provider. Both AWS and GCP have a lot of easily accessible resources. There’s usually a very simple framework for designing your cloud architecture for cost efficiency and how it’s supposed to work. Any deviation from that creates pain.
As you’re going towards the later stages, you should use premium support from your cloud provider. Those people are paid to help you. They have deep experience with your area of problem. They can greatly help, especially when you hit a roadblock and can’t find a similar scenario online.
If you need more intensive support, turn to a company specializing in cloud architecture and cost optimization. External advice is always a good idea for issues related purely to technology rather than to your customer’s problems.
What’s next? Three actions for CTOs to take
So, what do you think? Are you ready to own your company’s cloud cost optimization initiative?
If you present it as an effort to make your architecture and entire system economically viable, your stakeholders will definitely support you.
There are a bunch of steps to approach optimization properly:
Prioritize
- focus on indirect costs of suboptimal cloud infrastructure setup rather than a direct cloud cost,
- consider how your costs scale with each incremental customer,
- weigh the usefulness of your tools against their price tag – don’t let the cost of tooling get out of control.
Act
- try to calculate the ROI of your optimization plan,
- measure your progress,
- follow Mark’s three-level plan.
Beware
- don’t over-optimize your architecture – you may go under before it ever pays off,
- avoid focusing on in-house development in response to any challenge,
- be decisive. It’s better to simply try things out rather than think about the best way forward for too long. That also applies to finding your next cloud development partner.
Good luck!
Do you want to find out more about Insify?
Check out Insify’s LinkedIn profile to learn more about how the company supports freelancers and entrepreneurs through fair, fast, and flexible insurance.