1. Boundaries
1.1. Predict Boundaries
1.1.1. Machine Learning
1.1.1.1. BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in A Text-to-Speech Front-End
1.1.2. Syntax
1.1.2.1. Influence of syntax on prosodic boundary prediction
1.1.2.2. A COMPUTATIONAL GRAMMAR OF DISCOURSE-NEUTRAL PROSODIC PHRASING IN ENGLISH
1.1.2.3. Influence of syntax on prosodic boundary prediction
1.1.3. Length of Phrases
1.1.3.1. Tracking the what and why of speakers’ choices: Prosodic boundaries and the length of constituents
1.2. Psycolinguistic Effects
1.2.1. Recycling Prosodic Boundaries
1.2.2. Prosodic cues enhance rule learning by changing speech segmentation mechanisms
1.3. Function
1.3.1. Prosodic phrasing is central to language comprehension
1.3.2. Informative Prosodic Boundaries*
2. General Prosody Modelling - Boundaries and Prominence, Intonational Contours, etc.
2.1. Deriving and utilizing rich representations in spoken language translation
2.2. THE TILT INTONATION MODEL
2.3. SLAM
2.3.1. AUTOMATIC MODELLING AND LABELLING OF SPEECH PROSODY: WHAT’S NEW WITH SLAM+?
2.4. Cross-phrase Modeling
2.4.1. Fluent speech prosody: Framework and modeling
2.5. Detect prosodic events
2.5.1. Joint detection of sentence stress and phrase boundary for prosody
2.5.2. On the acoustic correlates of high and low nuclear pitch accents in American English
2.5.3. Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence
2.6. Effects
2.6.1. How prosody influences sentence comprehension
3. Machine Prosody Challengres
3.1. Speech, Prosody, and Machines: Nine Challenges for Prosody Research
4. Discourse/Larger Context
4.1. USING PREVIOUS ACOUSTIC CONTEXT TO IMPROVE TEXT-TO-SPEECH SYNTHESIS
5. Subjective Differences in Prosody Interpretation
5.1. Empathy influences how listeners interpret intonation and meaning when words are ambiguous
6. Prominence
6.1. Predict Prominence
6.1.1. Informativity
6.1.1.1. Strongly dependent on context: "Accent is predictable (if you are a mind reader)"
6.1.1.2. Compound Nouns
6.1.1.3. Frequency Measures
6.1.1.4. Exploring features from natural language generation for prosody modeling
6.1.2. Lexical Features
6.1.2.1. To Memorize or to Predict: Prominence Labeling in Conversational Speech
6.1.2.1.1. Ani Nenkova: Prominence in conversational speech: pitch accent, contrast and givenness
6.1.3. Machine Learning
6.1.3.1. Predicting Prosodic Prominence from Text with Pre-trained Contextualized Word Representations
6.1.3.2. Detecting Pitch Accents at the Word, Syllable and Vowel Level.
6.1.4. Constructions
6.1.4.1. The Stress and Structure of Modified Noun Phrases in English
6.2. Function
6.2.1. Accent and reference resolution in spoken-language comprehension
7. Language Models - Do they improve prosody in TTS
7.1. Pre-trained Text Embeddings for Enhanced Text-to-Speech Synthesis - BERT
7.1.1. Samples
7.2. Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models
7.3. Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis
7.4. What do they encode?
7.4.1. Similarity Analysis of Contextual Word Representation Models
7.4.2. How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings
7.4.3. Discourse Probing of Pretrained Language Models
8. Cross Language/Modality Prosody
8.1. Speech-to-speech translation
8.1.1. Preserving Word-level Emphasis in Speech-to-speech Translation using Linear Regression HSMMs
8.1.2. End to end Model for Cross-Lingual Transformation of Paralinguistic Information
8.2. Keystroke Prosody
8.2.1. Prosodic Boundaries in Writing: Evidence from a Keystroke Analysis
8.3. Gestures
8.3.1. Gestural Coordination at prosodic boundaries
8.4. Language Comparison
8.4.1. How Linguistic Prosody Shapes Meaning