Post Outage Discussion

Get Started. It's Free
or sign up with your email address
Post Outage Discussion by Mind Map: Post Outage Discussion

1. What can you do to improve?

1.1. Work with SysAid to increase Server Monitoring to aid in quicker response

1.2. Develop alert system.  Alert notifications might be sent to our cell phones for the critical server problem.

1.2.1. Great Idea:  Can someone look into what options are available that can help accomplish this.  Tools type/features/options/verticals/educational prices/etc

1.3. All productions on servers may be separated from developments?

1.4. Create Protocol for critical situations

1.4.1. Create official procedures and rules

1.4.2. Who backups who/what

1.4.2.1. Cross-trainning

2. How well did we do as a team?

2.1. The team did as well as possible. I felt extremely overwhelmed during the outage but the team stepped up and helped.

2.2. The internal communications in the IT team was good.  I believe all teammates received updates on what was going on during the outage.

2.3. Managed frustrations and stress very well

3. Where can we improve?

3.1. More communication to the Help Desk team, especially during the initial outage, to let them know what is down.

3.2. Add System Monitoring to alerting us to this outage quicker

3.3. Customer alerting. Need a faster mechanism to notifying customers...sorry partners.

3.4. Send few updates to customers during outage

3.5. Possibly split up file server into smaller VMs for easier backup, recovery, and migration

3.6. Look into VM Replication for critical services if possible

3.7. We Need a good local onsite repository for backups. I was using cluster storage for backup data which proved to be unsound,

4. How well did Juan perform as a leader?

4.1. Great as far as I'm concerned. He was there when I needed help.

4.2. We received good and right directions to resolve this issue.

4.3. Idea 3

5. How can Juan help you?

5.1. Keep the dew a flowin'

5.2. Be clear about roles during the issue

5.2.1. For example: who is fixing the server, who will be sending out updates, who will be on standby for testing, etc

5.2.1.1. Good Point.  We can work on this

5.2.2. Assign or clarify tasks

5.3. decision-making

5.3.1. This is just general.  When we have some options to fix a issue and cannot decide which one is the best practice, then your decision making will be helpful.