When DS project fails

Get Started. It's Free
or sign up with your email address
When DS project fails by Mind Map: When DS project fails

1. Intro

1.1. Most DS projects are high-risk ventures

1.2. At its core, data science is R&D

1.2.1. Impossible for these tasks to always succeed

1.2.2. New trends and signals are very rarely found in any field

1.3. You're trying to predict something no one has predicted before

1.4. We must grapple with our ideas not succeeding

2. What is failure

2.1. When it doesn't meet its objective

2.2. Note: It happens extremely often

3. What to do when project fails

3.1. Document lessons learned

3.1.1. Why did it fail?

3.1.2. What could have been done to prevent failure?

3.1.3. What did you learn about the data and the problem?

3.2. Consider pivoting the project

3.2.1. Requires a lot of communication with stakeholders and customers

3.2.2. Essentially back at the beginning of product design process

3.3. End the project (cut and run)

3.3.1. If you can't pivot, end it

3.3.2. Allow yourself and team to move onto new, more promising work

3.3.3. Don't keep working on a project forever in the hope that someday it'll work

3.4. Communicate with your stakeholders

3.4.1. Increase the amount of comm if project is failing!

3.4.2. Be transparent, this will inspire trust

3.4.3. After communicating, you may feel relief, not suffering

4. Handling negative emotions

4.1. It's often not you as a bad data scientist

4.2. Most DS projects fail because it's inherently based on trying things that could never work

4.3. Use the right mental model: You're like a treasure hunter

4.3.1. Once in a while, models and analyses work

4.3.2. Everyone is continuously failing, it's just part of the job

5. Why fail

5.1. Data isn't what you wanted

5.1.1. Many ways a dataset could have problems

5.1.1.1. Extremely difficult to check them all before starting project

5.1.2. Avoiding this error

5.1.2.1. Best-case scenario: get samples of data before starting project

5.1.2.2. Next-best scenario: Have a project timeline designed around poor data

5.1.3. What to do if you get this

5.1.3.1. Limited options

5.1.3.2. Alternative data sources may exist

5.1.3.3. But these are usually different enough to cause real analysis problems

5.1.3.4. You may want to try to engineer around data holes

5.1.3.4.1. Sometimes it may work but the alternate solutions aren't always adequate

5.2. Data doesn't have a signal

5.2.1. Example: recording of 10000 rolls of a die in a casino

5.2.2. Extremely common in DS

5.2.3. No way to know before starting a project

5.2.4. Can be the end of the project

5.2.5. Possible ways out

5.2.5.1. Reframe the problem to see whether a different signal exists

5.2.5.2. Try changing the data source

5.2.5.2.1. Limited odds of this succeeding

5.2.6. Don't try a complex method

5.2.6.1. If the simplest method cannot detect signal, the more complex ones won't be able to either

5.2.6.2. They can't make something out of nothing

5.3. Customer didn't end up wanting it

5.3.1. Spend lots of time talking to and working with customers

5.3.2. Understand their true problems

5.3.3. Don't jump into building interesting models and exploring data

5.3.4. Have an MVP and iterate to refine

6. Managing risk

6.1. Some projects are riskier than others

6.1.1. Known data, making dashboard: Easy

6.1.2. New dataset, real-time ML model, new UI: Riskier

6.2. Mitigating it

6.2.1. Have a basket of projects in flight

6.2.1.1. If one fails, you can fall back on others

6.2.2. Bake early stopping points

6.2.2.1. Set expectation that if by a certain point it isn't successful, it'll be cut off

6.2.2.2. Ending the project this way is less surprising and less costly

6.3. But don't stop taking risks

6.3.1. DS teams can fail in aggregate by not taking enough risks