Capacity planning is an iterative methodology that affords companies the ability to estimate storage computer hardware, software, and connection infrastructure resources that will be required over some future window of time. To diligently accomplish this task, means we can save companies time and money as well as mitigate the possibility of server failure depending on spikes in application usage. Capacity planning is needed to mitigate risk given the unpredictable aspects of application scaling. Clouds scale at an exponential rate in some cases so the needs of future users need to be attended to. Since we are discussing this in the domain of private clouds, we have access to the underlying infrastructure which can allow us to focus on major users that utilize the infrastructure to run large workloads. Companies can then utilize forecasting and demand planning to figure out how user behavior will drive resource provisioning on the back end. Similar to Google analytics, having access to key infrastructure utilization metrics makes the process of capacity planning much more simple. When this visibility isn’t available, the workload is generally over-committed to CPU or RAM which is a major mistake on the part of cloud operators. While we could host aggregates to move workloads or create availability zones as a fail-safe, it’s better to understand the elements that compose capacity planning to ensure that this is the exception instead of the rule. While users have the ability to dynamically spin resources up or down depending on workload, we can take into account other factors as cloud operators to avoid issues. First, we must take into the scale of our capacity planning. In very small clouds (tiny pods, few nodes), a spoke in user workloads can lead to an entirely broken model. If more users are pinging the application, the cloud availability could disappear completely. This problem isn’t as prevalent with larger clouds because of the diminished probability of a single workload consuming all of the available capacity. To solve this issue, cloud operators must have a good picture of the patterns in app utilization. We can start by consistently monitoring usage to identify patterns in workload. Figuring out aggregate instance counts and how many users are spiking our cloud usage could help to paint this picture as well. We also must consider end-to-end lead times for capacity planning. This means we consider how long it takes manufacturers to get servers to your door. This can depend on the quality of the manufacturer and the type of hardware your company standardized on (not including provisioning). If the hardware is internal we must factor in burn-in testing. e2e also entails rack/stack and cabling time in the data center which can take time if you’re doing this at scale. End to end lead times also means we consider what it takes to provision and pull hardware into the cloud for you. We do have automated tools that can expose underlying infrastructure problems (improperly cabled boxes/ servers with different hard drive sizes). Of the aforementioned components, manufacturer lead times are among the most volatile. If you need 200 servers or Internet components there should be padded buffer times built into your capacity plan in case of natural disasters or other things. Understanding the growth of clouds at scale is crucial for capacity planning. Having too many cores in a server is undesirable because a failure of any one of those servers could generate a ‘boot storm’ which doesn’t exhibit qualities of fault tolerance. We would mostly aim for somewhere in the middle and set random access memory according to server size. We can investigate instance sizes of clients to estimate how much ram should be allocated per physical core and the size of those parameters can be adjusted accordingly. We can take snapshots of server/instance growth overtime to project what resources will need. Server count will usually take a linear projection over time but instances in a quickly growing application will take an exponential trajectory in some cases. We can take into account the inflection point where cloud operators become more comfortable and stop ‘lifting and shifting’ off VM ware/bare-metal hyper visors and start auto-scaling groups. Capacity planning will be used to understand how inflection points and triggers are used to bring capacity into the cloud. We can use a certain number of instances as a standard for capacity utilization. A solid process for capacity planning is required as well. These plans will be factored by what kind of workloads you run, the scale of this application, and the architecture. For this process, we must obtain numbers that are based on the observation of numerous customer clouds which can vary depending on workload types that the cloud hosts. We can also ensure there is a viable way to gather and store historical data about the usage of cloud resources.
Additionally, we would have to pull out physical infrastructure info (# of servers) and fit this capacity planning process on each piece of hardware differently. We could also set automated triggers to trip new hardware provisioning processes when capacity usage has reached a predetermined quantity. These quantities would be determined based on cloud size. The aforementioned points constitute an iterative process or methodology that would afford the ability of effective cloud capacity planning to save resources, time and money.