()   

 

You can also download data for offline use: full-text, word frequency

Overview of the corpus  (PDF)   (ES)

This corpus contains about two billion words of Spanish, taken from about two million web pages from 21 different Spanish-speaking countries from the past three to four years. The corpus has been funded by the US National Endowment for the Humanities, and it has allowed us to update the original Corpus del Español (2002), which was also funded by the NEH.

There are five main ways to search the corpus:

First, you can browse a frequency list of the top 40,000 words in the corpus, including searches by word form, part of speech, frequency ranges in the word list, and English translation. This should be particularly useful for language learners and teachers.

Second, you can search by individual word, and see definitions, synonyms, collocates, topics, concordance lines, and links to external resources for each of these words.

Third, you can input entire texts and then use data from the corpus to get detailed information on the words and phrases in the text.

Fourth, you can search for phrases and strings, including words, substrings, part of speech, and even synonyms. And because the corpus is optimized for speed, searches for substrings (*ismo, des*r) and phrases are very fast, e.g.: se VERB (pret), COMPRAR * NOUN ADJ, NOUN "bonito" -- and even high frequency phrases like: de NOUN a NOUN, VERB * NOUN, or NOUN de NOUN.

Finally, you can find random words and also browse through randomly-selected "Words of the Day", and then save new words and come back and review them later.


Click on any of the links in the search form on the search page for context-sensitive help, and to see the range of queries that the corpus offers. You might pay special attention to the comparisons between dialects and virtual corpora, which allow you to create personalized collections of texts related to a particular area of interest.


Detailed help files:
Examining variation between the dialects
Corpus size (100x as much data for Modern Spanish as the original CdE)
Comparison to other corpora: CORPES (RAE) and larger corpora