All Activity Data Projects
by Ona Sumner
1. Data Release / Open Data
1.1. Open release of search log data (we hope) (RISE)
1.2. An aggregation of shared OpenURL router data for others to use (Shared OpenURL Router)
1.3. Ontologies ? and for recognising ? ? ? (UCIAD)
2. Anonimity
2.1. IP address encryption - symetric shared key on servers (AEIOU)
2.2. Use of mosaic scripts to anonymise data (RISE)
2.3. How to anoymise (MOSAIC)
2.4. Exposing IP addresses in activity data (AEIOU)
2.5. Need to build code in a way that enables others to reuse (RISE)
3. Authentication and Tracking Users
3.1. Tracking users for recommendations across institutional repositories (AEIOU)
3.2. Authentication within Google gadget (RISE)
3.3. Tools to use for storing and indexing (MOSAIC)
3.4. How to create a system based on links within sessions that are meaningful enough for recommendations - many small clusters of links may not then link up (Shared OpenURL Router)
3.5. How long is a session to be able to analyse activity data across that timescale? (Shared OpenURL Router)
3.6. How to identify "sessions" in a large body of data in the log for a) non-proxy data b) proxy-data (Shared OpenURL Router)
3.7. Experience of outsourced developments (STAR_TRAK)
3.8. Tracking users across repositories - cookies (AEIOU)
4. APIs
4.1. API's web services (STAR_TRAK)
4.2. API onto JRUL activity data (SALT)
5. Data Capture
5.1. Experience in making activity types / patterns and identification patterns emerge from heterogenous logs
5.2. Sharing of EZ Proxy & Open URL data with EDINA & Huddersfield + possible interchange of standard coming out of it (RISE)
5.3. Getting the data from each partner, e.g. from their systems (LIDP)
5.4. How to get data from others a) from their OpenURL Resolver logs? b) without being illegal? (Shared OpenURL Router)
5.5. Metadata Standard (SHEMA) for activity data? (AEIOU)
5.6. Data Schema (understand) what does it tell us (STAR_TRAK_NG)
5.7. Accurately recognising users across logs - is IP plus user agent sufficient? (UCAID)
5.8. Data extraction from LMS (SALT)
6. Synthesis
6.1. Sharing best practice (LIDP)
6.2. Technical How To's (STAR_TRAK)
7. Tools
7.1. Google gadget to search EBSCO discovery services (RISE)
8. Fun / engaging outputs
8.1. Produce many 'fun' interactive web-based information visualisations inc. evolved systems (AGtivity)
8.2. Visualisation tools (recommendations) (STAR_TRAK)
8.3. Finding good visualisations for the data available (EVAD)
9. Scalability
9.1. Solutions to managing large amounts of data. (Well, not large compared to Physics!) (SHARED OpenURL Router)
9.2. Experience in loading huge amounts of activity data onto a triple store and reasoning with it (UCIAD)
9.3. Scalability of storing, analysis and implementing large amounts of traces of activity (in our case, as RDF) (UCIAD)
9.4. Potential performance constraints and growth of database (RISE)
9.5. Scale - too much data for the length of the project (LIDP)
10. Algorithms
10.1. Pluggable architecture to handle different log formats into a ? scheme / ontology
10.2. Analysis models for other institutions (EVAD)
10.3. A recommender service prototype (Shared OpenURL Router)
10.4. Faceted search for loans by people like me (MOSAIC)
10.5. Potential solution to "this article is like that article' - linking up with EDINA work (RISE)
10.6. 'This article is like that article' (RISE)
10.7. Identifying 'related' items in recommendation service (AEIOU)
10.8. Data capture (attendance) RFID/OCR (STAR_TRAK)
10.9. How to compile traces of activities that the same but are captured from different systems (UCIAD)
10.10. Aggregating data from various sources (SALT)
10.11. Uncertainty classifying tests / trivial events (activities) with logs, i.e. removing 'outliers' (AGtivity)
10.12. Can ontology alone ? recognise ? (unlikely) and what more is needed? (UCIAD)
10.13. Dealing with missing data for parts of logs (AGTivity)
10.14. Correlating various time stamps between machine ? - issues in faulty time stamps (AGTivity)