Visualising Unstructured Data

Get Started. It's Free
or sign up with your email address
Visualising Unstructured Data by Mind Map: Visualising Unstructured Data

1. Gary Lau

1.1. Senior Business Analyst MQ

1.2. Analytics

1.3. 10+ years dealing with Business Questions

1.4. "Mental Models are made to be Broken"

1.5. "Bringing Meaning to Madness"

2. Talk

2.1. MQ Analytics Journey

2.2. BUZZ ME!

3. Resources

3.1. Blogs

3.1.1. Gary's Work in Process

3.1.2. Kalev

3.1.3. Data Scientist Blog

3.1.4. Guardian Data Blog

3.2. Unstructured Data

3.2.1. Semantic Web

3.2.2. Sound To Text

3.2.3. Book: Search User Interfaces

3.2.4. Quora

3.2.5. Natural Language Toolkit

3.3. Visualisation

3.3.1. GNUPlot

3.3.2. Snake Oil

3.3.3. Tools for Visualisation

3.3.4. Standford Visualisation Group

3.3.5. Standford Unversity

3.3.6. Tools for Data Visualisation

3.3.7. Touch Graph

3.3.8. Designing Data Visualizations

3.3.9. Gephi

3.3.10. R Resource

3.3.11. Flowing Data American Spending

3.3.12. SBS ABS 2012 Census Visualisation

3.3.13. Financial Markets

3.3.14. Free Visualsiations

3.3.15. ITO

3.3.16. Jeff

4. How

4.1. "Computers can only return results, since they have lots of memory and no imagination." - Gary Lau

4.2. Computer Processing

4.2.1. Statistics - Insight Without Understanding Word Cloud Generator word frequency Tree Cloud word proximity Word Tree theme Leximancer Portal Company Technology Evaluation

4.2.2. NLP - Natural Language Processing Semantics: Meaning, typically contextually dependent and hinted at by symbols Terminology Tokenization: Identification of distinct elements, e.g., words, punctuation mark Stemming: Reducing word variants (conjugation, declension, case, pluralization) to bases. Named entities: people, companies, places, etc. Co-reference: Multiple expressions that describe the same thing. Pattern-based entities: e-mail addresses, phone numbers, etc. Concepts: abstractions of entities Concrete and abstract attributes (e.g., 10-year, expensive, comfortable) Taxonomy: An exhaustive, hierarchical categorization of entities and concepts, either specified or generated by clustering. Ontology : In practice, a classification of a set of items in a way that represents knowledge. Subjectivity in the forms of opinions, sentiments, and emotions: attitudinal data. Limitations Ambiguity Data preparation

5. What

5.1. Unstructured Data

5.1.1. What does it look like? Square Spider web

5.1.2. Largely Synonymous with "Text Data" Sensations Thoughts Feelings Actions Sound & Images i.e. videos

5.1.3. Key difference Dimensionality 80% of all data

5.1.4. Examples Flesh Map Visuwords Tweeterverse Anymails

5.2. Visualisation

5.2.1. Visual Cognitive Psychology information overload natural ability to group, compare, correlate, detect patterns and outliers

5.2.2. Examples US Debt Dashboard Madness Visual Context Sentiment Exploration Explanation and Action Data Driven Documents

5.3. 1+1= Power to Fight Madness

5.3.1. US Supermarket Sentiment

5.3.2. combine with Datamining is 1+1+1

6. Who

6.1. "Text analytics-based searches for information provide non-definitive results. I don't know what to do" - Boss?

6.2. Decison makers

6.2.1. Internal Existing Students Existing Staff Research

6.2.2. External Prospective Students Alumni

6.3. Data Scientists

6.3.1. Qualities no substitute for verbal communication and good old-fashioned face time training to reduce "alienation"

6.3.2. Environment culture that both thrives and acts on customer feedback

6.4. Vendors

6.4.1. Evaluate Support and Training Level of accuarcy Integration into BI Reporting Integration into Data Mining Ingestion Capability Language Support Visualisation

6.4.2. Market Academia Leximancer NVIVO Rapid Miner Industrial SAS Enterprise Miner Attivio IBM Content Analytics IBM Modeller Oracle Endeca Information Discovery Cognito DiscoverText API Open Source Processing GATE (General Architecture for Text Engineering) R Mallet: machine learning toolkit

6.5. Commentators

6.5.1. Seth Grimes Introducing Text Analytics “Large companies lack the agility to respond effectively and in a timely manner to the social media challenge. There is lots of room for innovation from small players.” "Text analytics may move into databases and ETL tools" Published: Text Analytics in 2009 Application Data Source Information To Extract Satisfaction

7. When

7.1. Early Stages

7.1.1. Leverage internal skills

7.1.2. Tools Easy to Use, Interpret and Trust Low start-cost

7.1.3. Sweet spot

7.1.4. Business Champions

7.1.5. Quick wins

7.1.6. No formal project or budget - raise CAPEX 2013

7.2. Late Stages

7.2.1. Wall between structured and unstructured data crumbling Integration into BI and Datamining

7.2.2. Automatic ingestion and reporting

7.2.3. Build domain expertise

7.2.4. Custom build Visualisations

8. Why

8.1. Voice of Customer / Student / Staff

8.1.1. Competitive Advantage

8.1.2. Research

8.1.3. Wellness Engine Student Success and Risk Factors

8.1.4. Virtuous Circle

8.2. Government Funding Criteria

8.2.1. 2012 plan to introduce performance based funding as part of quality assurance arrangements shift from voluntary to mandatory use of surveys with the results used to assess and reward academic staff performance measures student attainment of generic skills and learning outcomes student retention and progression student satisfaction

8.2.2. delay did not occur due to budgetary constraints concerns that performance based funding has discouraged diversity, student access and opportunity

8.3. Failure

8.3.1. Vodafail The Musical ROI: Reduce Churn

8.3.2. Intuit:accounting software website 50% increase in user self-service shrank call center support request ROI: < 1 year

8.4. "Insanity: doing the same thing over and over again and expecting different results."

8.4.1. Albert Einsten

8.4.2. "Love: doing disappointing things and having the same feeling" - Gary Lau

9. Where

9.1. Data

9.1.1. Student Survey First Year Experience Survey (FYES) Course Experience Questionnaire (CEQ) University Experience Survey (UES) Higher Degree Research Students Annual Survey (MUSEQ-R) Collegiate Learning Assessment (CLA) may supplement CEQ Australian Graduate Survey (AGS) Best Aspects Needs Improving Beyond Graduation Survey (BGS) CeQuery Tool: access database last updated in 2005

9.1.2. Staff Personal Development Review (PDR)

9.1.3. Research Social Sciences Publications

9.1.4. Content Management Systems

9.1.5. Legal Contracts When do they expire? Which are the high value contracts?