Get Started. It's Free
or sign up with your email address
Corpus DWDS by Mind Map: Corpus DWDS

1. Copyright issues

1.1. text selection = indépendant of copyright status

1.1.1. most of the important text = copyrighted

1.1.1.1. convince authors and copyright authors ->

1.2. committee of prominent public personalities

1.2.1. lent authority to the project's negotiations with publishing houses

1.3. acquisition of copyright protected textes for the DWDS Kerncorpus = several levels

1.3.1. permission to use texts project-internally for lexicographic work

1.3.1.1. DWDS Kerncorpus = publicly available as a resource

1.4. compromise

1.4.1. Kerncorpus -> published on the web

1.4.1.1. precious resource for a wide spectrum of linguistic and lexicographic research

1.4.2. project has to ensure that no copyright is violated

1.4.3. technical devices = basis for negotiations

1.4.3.1. procedures applied

1.4.3.1.1. DWDS Kerncorpus = only selected samples

1.4.3.1.2. flexible mechanism to display query results

1.4.3.1.3. password protection for copyrighted texts

1.4.3.1.4. anonymization of named entities

1.5. copyright holders for electronic rights

1.5.1. not the publisher but the author

1.5.1.1. publishers ask author

1.6. Negotiations = slow

1.7. 1st breakthrough = agreement reached with Suhrkamp

1.7.1. disseminate texts freely

1.7.2. granted permission to use te texts of 22 authors

1.8. January 2006

1.8.1. 15 publishing houses

1.8.1.1. 71% of the texts of the DWDS Kerncorpus publicly available via DWDS web site

1.8.2. 29% left = available only for internal use

2. Cynthia LAROQUE - L3 SDL

3. Berlin-Brandenburg Academy of Sciences (BBAW)

4. Motivations

4.1. no recent dictionary of the German language

4.1.1. add recent developments in the current language

4.2. traditional print dictionaries = alphabetical order, structured into individual and discrete entries

4.2.1. DWDS corpus new order = lexical categories, types of syntactic and lexical fields

4.3. traditional dictionaries do not form a balanced corpus of German

4.3.1. challenge of DWDS corpus = filter out interesting words and words senses in a large mass of data

5. Constructed between 2000-2003

5.1. still an activity at BBAW

5.2. two corpus were compiled at the BBAW

5.2.1. a balanced core corpus = Kerncorpus

5.2.1.1. reference for the German language of the 20th century

5.2.2. an opportunistic supplementary corpus = Ergänzungscorpus