Keeping All the Servers Busy

Vito Leung
3 min readJan 4, 2021

Once upon a time, at this company I worked for a long long time ago, there was an initiative to reclaim servers with low cpu usage. It was obviously someone’s kpi, so I decided to meet them half way.

A Morning Conversation

I woke up one morning and my ops person told me that the company is doing a sweep and will reclaim servers with low CPU usage in the next two weeks. My initial reply is that this is ridiculous, not all servers live and die by CPU (e.g. storage, cold stand by’s, etc). My ops person was very frank and said just turn off the servers for the next two weeks or spin up the cpu usage. My initial reaction was, does it really matter if I turn up the cpu usage now? They can look at historical data right? Then I thought, if these powers that be has a year end kpi tied to this, let me help them with their bonus by meeting them halfway.

It’s not the first time that the company I was working for decided to push forward an initiative to reclaim servers based on cpu usage. I just got irked enough that one morning to finally write something to be used time and time again in future years.

Drawing the X’s and O’s

I quickly drew up something on my notepad during breakfast and then started to hammer away. I sent my scribble to a buddy and of course he couldn’t make out what I was trying to do until I told him about my morning conversation. He said wouldn’t a while loop do what I need? Besides being irritated by this initiative, I just felt like doing something different.

Requirements

If I were to spend time on this, then it must satisfy these requirements:

  • CPU usage must vary like real traffic based on time of day.
  • In addition to CPU, let’s bring up the usage of the hard drive and traffic at the network level.
  • All these calls must start and tear down at different intervals.
  • Since I have more than just one server which needs this remedy, let’s have all my servers send traffic to each other. N to M client-server ecosystem.
  • Have a fail safe kill all in case things get out of control.
  • Log all this to somewhere and clean up the logs daily.
  • Automate the deployment and administration work for changes.

Solution

  • stress: Looked around on the web and there are a lot of tools to help with adding different usage to a server. In the end I settled on using stress which had most of what I needed.
  • bash script: Then I wrote a nifty wrapper via shell script to take care of all the randomizing, server calling, and clean up work.
  • ansible: Whipped up a simple ansible playbook so I can deploy any changes at will. This was very useful initially when tuning the correct number of process to run etc.
  • rename: For the cherry on top, stress was renamed to something else to add a bit of fuzz to things in case someone does a top on the server.

Conclusion

When I was done, I sent this pic to my buddy and he couldn’t believe I actually spent the time to do it. When my buddy asked wouldn’t they know if someone logs on to the servers and do run a few simple command to check? My train of thought was, if their criteria on whether a server is being used is on CPU usage, then they are probably not smart enough to actually check beyond looking at this graph.

--

--