Read-only forum test link

Site Outage continues

Synopsis:
Site went down for an extended period, the root cause remains unknown though it doesn't look like it was an attack, it has not been fixed, it is not over, a string of techs at the web host have made minimal progress towards a diagnosis, I need to move us to another server & upgrade vBulletin, a long & painful process, the first step of which has been severely crippled by the current server's condition.

Full story:
On November 19th, URC suddenly and without warning began to experience intermittent outages -- periods where pages would not load.  This rapidly worsened to the point where the server was at a complete standstill.

Traffic was normal and there was no evidence of any sort of malicious activity, but the server's CPU load was unimaginably high.  I worked for hours on the backend repeatedly restarting all services I could to allow brief windows of accessibility to diagnostic tools.

By late afternoon (still the 19th) I felt I had ruled out anything I had control over, and I sought help from level 1 technical support at our web hosting provider (the company with the server we lease).  After a long chat the tech concluded that we were not under attack and nothing in the hardware had failed, but we were seeing a lot of search engine crawler traffic that could be directly tied to processes causing load spikes.

I reduced searchbot traffic significantly, a process which is fairly quick to initiate, but slow to bear results, and nothing changed with our server load.

What followed was weeks of the most exasperating interactions I have ever had with any sort of technical or customer support.  Three different techs in a row, on a seemingly endless loop impenetrable to logic, reasoning, and facts, told me the following:

During this time, here were the actual facts of the matter:

At several points, techs refused to provide further service on the issue with CPU load at "only" 10x to 12x normal, because the server we're on is so overpowered for the application that 10x to 12x normal is within what they consider to be "acceptable limits."

With near-zero *nix knowledge I've struggled to navigate around the command line with Google in another window to discover and use tools that all of the techs who are paid to do nothing but deal with these exact servers all day seemed to know nothing about.  I distilled the situation down to the very most basic and irrefutable facts that the server itself could directly report, yet I still struggled to bypass the walls of "it's your forum software" and "everything looks normal."  Meanwhile, I had begun taking a fresh local backup of the entire site to my desktop.  Due to the ongoing problems with the server, this download process would be repeatedly interrupted, timed out, and errored out.

November 25th update: The latest tech working on the problem claimed to have restarted a misbehaving service, and logs show that when he did this, load returned to normal.  It took a full day of turnaround for me to hear back exactly what he restarted, and I immediately requested that this auxiliary service, which is absolutely non-essential and has nothing to do with the forums or database, be permanently terminated. 

November 26th update: Response on the ticket remains very slow, and there have been more exchanges with finger-pointing at our forum as the source of all evil since I momentarily re-enabled the forum and it didn't work since the underlying problem has not been solved.  Hours after that brief test (followed by restarting/resetting all services I have access to) server load continued to spike to extremely unhealthy and irrefutably abnormal levels (more than 50x normal), with the forums nowhere to be found.  I am now waiting for a tech to remove the aforementioned misbehaving service, which continues to be active and a top contributor to moment-to-moment CPU load, even though it's supposed to only activate once/day for a maintenance task.  Great news is that at long last, my latest local backup of the site is complete.

December 7th update: The daily struggle with techs continues. I personally need to see this through, and I need to keep the domain pointing to this broken server until this is solved. The inescapable facts will prevail, we will prevail, URC will be back.

As you can imagine, words alone cannot express the level of aggravation & disgust I have experienced, so I will keep the depths of my rage on this side of the keyboard.


Here is what is going to happen:

My sincerest apologies to everyone for all of this inconvenience and mess.  I've been doing my best to fix this, but my best is not nearly enough, and clearly the expert technicians' best has been still worse for most of this life-shortening ordeal.