Product Performance

Follow

Comments

12 comments

  • Avatar
    Eric Jacobs

    "More details will be available once a complete root cause analysis has been completed."

    It's been several days, and there has never been any follow up from MemberClicks. I avoided using MemberClicks most of yesterday, because I didn't know whether the problems had been fixed or not -- I was waiting for the promised follow-up before wasting any more time as I did throughout Tuesday.

    Today, all appeared to be fine when I first logged in and started doing things -- but then for a period of about 10 minutes shortly after 3 p.m. Eastern, the ENTIRE application appeared to be completely down, back end AND front end. Now, it's back up, but I just don't know what is going on. (That is, perhaps it was some other Internet routing problem, and not the MemberClicks app, although I had no problems getting to various other sites on the Internet during this outage.) I urge  MemberClicks to do better communicating with your customers!

  • Avatar
    Scott McLeod

    We apologize again for any inconvenience that the issues on Tuesday caused.  The root cause is still being explored so that it does not happen again. 

    We had no issues with the product yesterday at all.  

    There was a brief slowdown just after 3:00 pm EST today which lasted for a matter of minutes and was not related to the issues on Tuesday.

  • Avatar
    Duncan McCreery

    Hi Everyone,

    We wanted to take a moment to recap what happened last Tuesday and the steps that were taken to remedy the problem.  

    A few minutes after 11 a.m., we saw a few issues that affected the legacy CMS, advanced search, and a few other areas of the application.  Shortly after noticing the problems, we addressed the problematic web server which returned everything back to normal.  However, as we moved into the afternoon, the same issues appeared and were affecting the entire application.  At this time, we began working with the team at our data center as there weren't any load average or memory issues which would normally be the culprit for this type of problem.  

    As the team investigated and troubleshot various potential causes, application performance would be restored and revert back to sluggishness throughout the afternoon.  From this analysis, we discovered that tomcat worker threads (the functions within a server that execute tasks) on two of the application servers were at their maximum.  The tomcat web server configuration was adjusted to address the max threads problem, but by this time the loads on the main apache server were beyond remedy by restart.   

    Ultimately, we added more CPU and memory to one of the web servers, restructured the SAN configuration and executed a full restart of the virtual machine, all of which was completed shortly after 6 p.m. ET to return performance back to normal.

    In a broader sense, the problem last Tuesday was a result of growth and has expedited plans to bolster our infrastructure for our anticipated growth through 2013.  Unfortunately, these updates were not in place in time to prevent the incident last Tuesday.  However, over the next few weeks, the team will be rolling out infrastructure updates which will guard against similar interuptions in the future.

    We take our responsibility as your technology provider very seriously and we will continue to make reliability, stability and transparency our top priorities.  Thank you all for taking the time to read this and we are very sorry for the troubles caused by these problems.  

  • Avatar
    Bob Schilmoeller

    Thanks Duncan for the great detail.  In a weird way it is comforting to know the problem was due to growth and not breakage.  Growing pains can be difficult some time!  I do want to add that I had some users contact me on Monday night saying that they were getting timeouts and were not able to submit forms.   I don't know if our site issues were related to the ones you described above, but I wanted to pass that information along.

    Thanks again for the update!

  • Avatar
    Nichole Eichelberger

    Our engineers have made a series of configuration changes to our servers in the data center. We are seeing improvements with the performance, but we will continue to monitor the system  to make sure the product remains stable. 

    More details will be available once a complete root cause analysis has been completed. 

    We can't apologize enough for the inconvenience this has caused you. We appreciate your patience. 

  • Avatar
    Eric Jacobs

    Is there any update on this? I've been trying to use the MemberClicks application all day, and performance has gone from passably slow to painfully slow to right at the moment, when it's virtually unusable. In the last 10 minutes, it's forgotten search results before it could fully load the next screen, and the home page now loads either incompletely or as text only -- and as I'm writing this, not at all. It would really help to get updates from the MemberClicks team until the problems are resolved.

    (I've suggested before, and reiterate now, that MemberClicks should have a separate simple website just to display whether the service is operating normally or provide status updates when it isn't.)

  • Avatar
    Nichole Eichelberger

    Our engineers are still working with our data center to get this issue resolved. We apologize for the inconvenience this is causing and we will continue to keep you updated when more information becomes available.

  • Avatar
    Bob Schilmoeller

    A big ME TOO on what Eric said!  The detail explanation at the end was good, but I want the constant barometer so I can determine if it is a known problem, etc.

  • Avatar
    Eric Jacobs

    Ditto on the thanks for the detailed follow-up -- and particularly the assurance that your engineering team has figured out what cause the problems and that you're making tangible changes to prevent recurrences.

    I do still think it would be helpful, in the event of a major problem like this one last week or smaller ones which develop and are resolved more quickly, to have a separate simple website running on a different hosting service simply to display system performance/operation status. Normally, it might display nothing more detailed than "All MemberClicks systems are running normally". But when a general slowdown occurs, or a specific piece of your services (e.g. emails) go south, someone would be tasked with updating the text to display a short explanatory message and updating it periodically until the end of the incident. The goal would not be a comprehensive report, like Duncan's above -- that takes time, and might pull an engineer off working on diagnosing or solving the problem, which wouldn't be smart. I just want something to let me know, if I'm experiencing sluggishness with MemberClicks, that (a) it's not just me, and (b) MemberClicks knows it and is working on it, and (c) if/when possible, an estimate of how long the problem may persist. Without this, I waste time in a variety of ways: sometimes waiting and trying again; restarting my browser or even my computer, checking other websites to see if it seems to be an Internet connectivity issue, asking someone else to check it; ideally from a different office and network, to see if it's isolated to my location or not; debating with myself whether to submit a ticket to the Help desk, knowing the odds are the problem will be resolved before anyone even reads it. Just this afternoon, the system was again performing sluggishly and I wrestled with trying to get something done for about 10 minutes, while all these thoughts went through my mind; I stopped using MemberClicks for a bit, and when I came back to it later in the day, all was normal. I know that sometimes some traffic storm might cause a momentary slowdown that quickly resolves itself, but for anything of more than a few minutes duration, a place to check the system status would be a great aid.

  • Avatar
    Bob Schilmoeller

    Did this issue exist last evening?  I had reports from users that last night they got server time outs and forms that would not submit...

  • Avatar
    Bob Schilmoeller

    +1.  Please don't keep us in the dark....

  • Avatar
    Scott McLeod

    Our Engineers are still diligently working on the slowness issues that we have been experiencing this afternoon. 

    We apologize for the inconvenience that this is causing and appreciate your patience as we work towards a solution.  

Please sign in to leave a comment.