When we collected the two million web pages, we
relied on Google's identification of the country for the web page. This was more difficult when,
for example, it was a .COM site (e.g. www.felicidad.com). One might wonder
how Google knew what country this was from.
To test how
well Google did, we looked up a number of words and constructions in John Lipski's
Latin American Spanish (supplemented from other resources on the Web), where
a particular word or construction was supposedly more common in a given country or region. The
fact that the following words and phrases do appear much more frequently in that
country suggests that Google's categorization is quite good.
Lexical
Caribean
Puerto Rico
ay bendito,
chavos,
chiringa,
mahones,
habichuela (+DR),
zafacón (+DR)
Cuba
guajiro,
jimaguas,
babalao,
bitongo,
pedir botella
Rep Dom
mangú,
fucú,
tutumpote,
mangulina,
mofongo (+PR)
México and Central America
México
ándale,
híjole,
órale,
güero,
(muy) padre,
chamaco (CAm/Car),
pinche (NOUN),
popote,
charola
Guatemala
huipil,
canche,
muchá,
patojo,
chafa (+HN),
chirmol,
canche
El Salvador
cipote,
chero,
pupusa,
cuilio,
bayunco,
piscucha
Honduras
catracho,
papada
Nicaragua
chavalo,
maje (+CAm),
pinol,
pinolillo,
chigüín,
vigorón,
gallo pinto (+CR),
idiay (+CR)
Panamá
fulo,
chombo,
guandul
Costa Rica
chinear,
guila,
chunche
South America
Colombia
cachaco,
cachifo,
verraquera,
estar mamado,
guandoca,
biche
Venezuela
bojote,
coroto,
catire,
gafo,
macundales,
arepa,
cachapa,
cambur,
caraotas,
jojoto
Ecuador
chumar,
chulla,
montuvio,
omoto
Perú
anticucho,
jebe,
chupe,
pisco,
jora,
chompa (+CL/EC),
choclo (+CL/EC)
Bolivia
opa,
colla,
chuño,
lagua
Chile
pololo*,
pololear,
achuntar,
bencina,
bacán,
fome,
huaso
Paraguay
ñembo,
ñanduti,
karai,
yopará,
mitai
Uruguay
tropero,
hacer * sota,
con fritas
Argentina
pibe,
fiaca,
morfar,
falopa,
sobre el pucho,
falluto,
cafishio
España
ordenador,
aparcar,
enfadar,
gafas,
zumo,
chulo,
guay,
coger,
bolígrafo,
patata,
melocotón,
echar de menos,
vale
Note that oftentimes, the corpus shows that a word or phrase is more
common in an entire region, rather than just one specific country. For example,
the following words are more frequent in Central America:
chele,
guaro,
estar bolo,
chimar,
chingo,
chompipe,
tiste,
molote,
chichipate,
barrilete,
pisto (+HN/SV) and the following are more frequent in Argentina and
Uruguay:
che !,
laburo,
lunfardo.
Syntactic and morphological
Of course the corpus can be used to look at syntactic and morphological
differences between dialects as well. The following are just a few examples
(with a short sample, and the country or zone in which it is most common):
qué tú VERB (¿qué tú quieres?):
Carib
PREP SUBJ VERB (para ella entender): Carib
más nada .|, : Carib
ART POSS NOUN (una mi amiga):
GT
mero
VERB: GT
te [v*2s*] tu NOUN (te rompiste tu pierna):
MX
vos sos (voseo): Cono Sur, CAm
teneís (vosotros): ES
la|las GUSTAR (laísmo; la gusta el chocolate): ES
qué tan ADJ (¿qué tan importante es eso?): not ES
cuanto más VERB (ES) /
por más que VERB /
entre más VERB /
mientras más VERB |