ÿWPCð  ,U.x %¦#¬ » Á 0OÇ ²  . 0O4 ƒ&‰ ¸ Ä "]¯!Ô 04 !d   @!ô !° „ !° 4 ) ä NK ^ M w@Y ™ m› <þ6X9`(CourierXHP LaserJet IIIPh€lexicon€proportional€to€the€log€of€a€tHP3P.PRS2,zÙ,\,\,,ðÿHÀGnuÁ0ÿÿÈGÿÿÈG('2Øy$§§Ý ƒ¤U!ÝÓ  ÓÝ W Ý Ñ°°ÑÿÿÈG('2Øy$©©Ý ƒ¤U!ÝÓ  ÓÝ W ÝÿÿÈG Ѱ°ÑÑààÑÔ™‰?xxx,,Xxà¥þÿþÿqþÿþÿþÿþÿÿÿöÿÿÿÿÿÿ{ÿÿÿÿþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿþÿþÿþÿþÿÿÿÿÿT‰?xxx,,/Xxù£þÿþÿþÿþÿþÿþÿþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿþÿþÿþÿþÿÿÿÿÿÔ«‰?xxx,,fXx/¦þÿþÿþÿþÿþÿþÿþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿþÿþÿÿÿÿÿÿÿÿÿTs¬DbPb,,¡ÐbÜðþÿþÿþÿþÿþÿþÿþÿÿÿþÿÿÿÿÿÿÿìÿÿÿÿþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ”b®BhPh,,ÒÐhá þÿþÿþÿþÿþÿþÿþÿÿÿþÿÿÿÿÿÿÿÿÿÿÿÿÿþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ€D+€D,€D-€D.€D/€D0€D1€D2€D3€D4€D5€D6€D7€D8€D9€D:€D;€D<€D=€D>€D?€D@€DA€DB€DC€DD€DE€DF€DG€DÿÿÿÿI€DJ€DK€DL€DM€DN€DO€DP€DQ€DR€DS€DT€DU€DV€DW€DX€DY€DZ€D[€D\€D]€D_€Da€Db€Dc€Dd€De€Df€Dg€Dh€( ¤U$¡¡Ó  Ó<Ô6X9`+Courier<ˆ 9+CourierBoldp<Ô6Xß9`+"CourierItalic)Ͼ¦ p`CG Timeso,ó¸_ p CG TimesBold¿¤z†&‰ pßDqß DrßDsßtßußDvßwßDxßyßD8ÿU‹ÿÀÀÀÿÿHGB ë‡ € #|Xsü•þÿ"AIP[rƒÿÿÿÿÿÿÿÿÿÿÿÿÿÿﳈE?ÿÿÿÿÿþÿÌÌ0³ËÏ4ÌÏüÀÃüÿ04/ÿÿÿÿÿÿN·ÿÿà«ÿððmpÏŸÿ‰@ð2BP€@LL þÿ"AIP[rƒÿÿÿÿÿÿÿÿÿÿÿÿÿÿﳈE?ÿÿÿÿÿþÿÌÌ0³ËÏ4ÌÏüÀÃüÿ04/ÿÿÿÿÿÿN·ÿÿà«ÿððmpÏŸÿ‰@ð2BP€@LL þÿ"AIP[rƒÿÿÿÿÿÿÿÿÿÿÿÿÿÿﳈE?ÿÿÿÿÿþÿÌÌ0³ËÏ4ÌÏüÀÃüÿ04/ÿÿÿÿÿÿN·ÿÿà«ÿððmpÏŸÿ‰@ð2BP€@LL "âÈ4–6^ì  ÿÿÿÿ¨ ÿÿÿÿÿÿ^GPoxxÕ»PPPxÕPPPPxxxxxxxxxxPPðÕðkÕ­—¤²—Ž­²P]©—Ú²­Š­ —²­ã­­œPPPxxPkxkxkPxxCCxC»xxxxY]Cxx­xxkkxkðñxxxxxxPxxxxxxxxx­k­k­k­k­kÕ ¤k—k—k—k—kPCPCPCPC²x­x­x­x­x²x²x²x²x­x­k²x­x­x­x²xŠx­k­k­k¤k¤k¤k¤k²x—k—k—k—k­x­x­x­x­x­x²x²xPCPCPCPC¯…]©x—C—C—CšH•C²x²‹²x²x­x­xÕ­ Y Y Y]‚[]]—C—C—C²x²x²x²x²x²xã­­xœkšjœk²x—C²x Y]—C­x­x²x­x²x—kPCC­x4PPxxPP/ððððððððððððððððððððððððððððððððððððððððððððððððN­­­xxxPkbbxxÓxxxÕÕxTxxxxÕTPP||xÕ>>ððxx““¶ðxðx­xŽÈÈ……ÕP|xã!T«­­ðððx­ðð}­­ÕÕðððxððððåÕÕðððÕxPPððððð­“ðÕ­­­­­­­­­­Pðx­x…xÕð­ð xPx­kxŠ­»œðð­­­­­­­­­­ðððððoð­­­­ððððððððððððððððððð­­ððððððÈ “»ð­ðððfŽ­­­­ððððððÅ­­­x­­ð­­ððð­­­k­­­­k­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­P­­­P­­­P­­­P­­­­­­­­­­­­­Ø­“—…“… k—]œk²x­kPC©k …Ú…²k x­k­“Šx“xx—k­k “­…»“““­“—]²xPCC­k­kk““k““““PPPxxxxxxx““““““““““““““““““““““]]]]]]]xxxxxxxxxxxxxxxxxxxxxxCCCCCCCCCCCCkkkkkkkkkkkkkkkkkkk““““““““““““““““““““““ŠxkPð"âÈ4–6^ì  ÿÿÿÿ¨ ÿÿÿÿÿÿ^GPoxxÕÅPPPxÕPPPPxxxxxxxxxxPPðÕðxÕ­ ­­ »»]s» ã­»»­… ­­ð­­ PPPxxPx…k…kTx…CP…CÅ…x……k]P…x­xxkxxxðñxxxxxxPxxxxxxxx…­x­x­x­x­xù­­k k k k k]C]C]C]C­…»x»x»x»x­…­…­…­…­x­x­…»x»x­x­x…­x­x­x­k­k­k­k­… k k k k»x»x»x»x»x»x»…»…]C]C]C]CÚ‰s»… C C C ] C­…­¿­…­…»x»x²­k­k­k…]…]…]…] P P P­…­…­…­…­…­…ð­­x k k k­… C­…­k…] P­x­x­…»x­… k]CC»x4PPxxPP/ððððððððððððððððððððððððððððððððððððððððððððððððN­­­xxxPxffxxêxxxÕÕxTxxxxÕTPPŠŠxÕ>>ððxx““¶ðxðx­x—ÈÈ……ÕPŠxãT«­­ðððx­ððŠ­­ÕÕðððxððððåÕÕðððÕxPPððððð­“ðÕ­­­­­­­­­­Pðx­x…xÕð­ð xPx­kxŠ­»¤ðð­­­­­­­­­­ðððððsð­­­­ððððððððððððððððððð­­ððððððÑ “»ð­ðððfŽ­­­­ððððððÅ­­­x­­ð­­ððð­­­x­­­­x­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­P­­­P­­­P­­­P­­­­­­­­­­­­­Ø­“ …“… k ] k»k­k]C»k …ã…­k x»k­““x“xx k­k “­…»“““­“ ]»k]CC»k­kk““k““““PPPxxxxxxx““““““““““““““““““““““]]]]]]]kkkkkkkkkkkkkkkkkkkkkkCCCCCCCCCCCCkkkkkkkkkkkkkkkkkkk“““““““““““““““““““““““xkPðÝ ƒ¤U!ÝÓ  ÓÝ W ÝÔ_ÔÑ7€XXdÈXXdÈ7ÑÑ€‡µÑÑ€ ù·ÑÌÌÌÌà@ì9àˆÌÌ€€€€€ÌÌò ò€€€€€€€€€€€€€€€€€€€€€€€€€€Ô_ÔLEXGUIDEÔ_Ô„2000ó óÐ € @ Ѐ€€€€€€€€€€€€€€€€€ò òA€GUIDE€TO€THE€LEXICAL€ANALYSISÐ H   Ѐ€€€€€€€€€€€€€€€€€€€OF€NATURAL€TEXTS€WITH€Ô_ÔQLEXÔ_ÔÌó óÌ€€€€€€€€€€€€€€€€€€€€€€€€€ò òDonald€P.€Hayesó óÐ  `  ÐÌ€€€€€€€€€€€€€€€€€€ò òSOCIOLOGY€TECHNICAL€REPORT€SERIESÐ 0ð  Ѐ€€€€€€€€€€€€€€€€€€€€€€€€€€€€€#99„7ñpüñÌ€€€€€€€€€€€€€€€€€€€€€€€€€ñrüñ(ñsüñÔ% € ÔñsüññrüñDECEMBER€1999)ó óÐ À€  ÐÌÌÌ€€€€€€€€€€€€€€ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÐ .Ø'3 ÐÒܰÒÒX°Ò€€€€€€€€€€€€€€€€€€€€€€€TABLE€OF€CONTENTSÌÌI.€€€€LEX„„a€scientific€measure€of€a€text's€lexical€difficulty€..€€3Ì€€€€€€€€€€€Table€1„„the€LEX€spectrum€of€natural€texts€...........€€5€ÌII.€€€LEX€is€based€on€Ô_ÔHerdanÔ_Ô's€theoretical€model€of€word€choice€.€€6Ì€€€€€€€€€€€Figure€1„„word€choice€in€63€international€newspapers€.€€8Ì€€€€€€€€€€€Figure€2„„word€choice€in€diverse€texts€...............€€9ÌIII.€€LEX€uses:€theoretical€and€applied€.........................€€9€ÌIV.€€€Conditions€for€using€Ô_ÔQLEXÔ_Ô€..................................13ÌV.€€€€Text€preparation€for€Ô_ÔQLEXÔ_Ô€analysis€........................€14€Ì€€€€€€€€A.€Converting€text€files€from€Ô_ÔX.TXTÔ_Ô€to€Ô_ÔX.ASCIIÔ_Ô€..........€14Ì€€€€€€€€€€€€€€IRS1040.Ô_ÔASCÔ_Ô„„sample€text€ready€for€Ô_ÔQLEXÔ_Ô€analysis€..€14Ì€€€€€€€€B.€òòBare„minimumóó€text€preparation€for€Ô_ÔQLEXÔ_Ô€analysis€......€15Ð  `  Ѐ€€€€€€€C.€NORMAL€text€preparation€for€Ô_ÔQLEXÔ_Ô€analysis.............€16€ÌVI.€€€Editing€rules€.............................................€17€Ì€€€€€€€€A.€òòQuick€and€dirtyóó€text€editing€.........................€17Ð ø¸  Ѐ€€€€€€€B.€NORMALò ò€ó ótext€editing:€general€rules€...................€17Ð À€  Ѐ€€€€€€€C.€Print's€distorting€effects€on€the€REFERENCE€LEXICON€..€22Ì€€€€€€€€D.€Ô_ÔLEXEDITÔ_Ô„„semi„automatic€text€editing€.................€23ÌVII.€€Performing€a€Ô_ÔQLEXÔ_Ô€analysis€................................€24Ì€€€€€€€€A.€Installing€Ô_ÔQLEXÔ_Ô€and€MIGNON€...........................€24€Ì€€€€€€€€B.€Step„by„step€procedures€..............................€24ÌVIII.€Ô_ÔMIGNONÔ_Ô€calculates€the€LEX€statistics€......................€33Ì€€€€€€€€A.€Procedures€for€running€Ô_ÔMIGNONÔ_Ô€........................€33Ì€€€€€€€€B.€Variable€identity€in€Ô_Ôx.LEXÔ_Ô€output€files€..............€35€ÏIX.€€€Interpreting€the€LEX€statistic€............................€36Ì€€€€€€€€A.€Word€difficulty€and€regularities€in€word€choice€......€36Ì€€€€€€€€B.€The€REFERENCE€LEXICON€and€word€choice€in€newspapers€..€37Ì€€€€€€€€C.€A€more€general€model€for€word€choice€.................€38Ì€€€€€€€€D.€LEX's€stability€over€the€past€334€years€..............€38Ì€€€€€€€€E.€LEX's€validity€.......................................€39Ì€€€€€€€€F.€LEX€validation€using€the€5000+„text€Cornell€Corpus€...€39Ì€€€€€€€€G.€LEX€validation€using€basal€readers€for€grades€1„8€....€39ÌX.€€€€Interpreting€Ô_ÔQLEXÔ_Ô's€other€statistics€......................€39Ì€€€€€€€€A.€Sentences:€mean€length€and€distributions€.............€40€Ì€€€€€€€€B.€Tokens,€types€and€their€frequencies€..................€40Ì€€€€€€€€C.€Table€of€Residuals€...................................€41Ì€€€€€€€€D.€Cumulative€proportion€distribution€...................€41€ÌXI.€€Recommended€measures€of€lexical€difficulty€.................€43ÌXII.€References€.................................................€44Ô% € ÔÏÔ_ÔXIII.ObtainingÔ_Ô€a€copy€of€Ô_ÔQLEXÔ_Ô€and€this€Ô_ÔLEXGUIDEÔ_Ô€.................€45ÌXIV.€Sample€Ô_ÔQLEXÔ_Ô€output:€IRS1040.OUT€............................€46ÌXV.€€Variable€identity€in€x.321€files€...........................€56ÌÔ_ÔXVI.€CreditsÔ_Ô:€software€and€professional€.........................€59ÌÌÌÌÌÌÌÒ°XÒ††ò òòòÌÐ .Ø'3 ЇÌÌÌñfüñÌñfüñI.€€LEX„„a€natural€science€measure€of€any€English€text'sÏaccessibility/lexical€difficulty/comprehensibilityóóó ó.€Ó*XXk°˜X°›X*ÓÐ `  ÐÌLEX€scores€describe€how€the€òò10,000óó€most€common€grammatical€andÏcontent€words€in€the€English€language€were€used€in€a€specificÏtext.€€That€text's€word€use€is€always€compared€to€a€referenceÏlexicon€(Carroll,€Davies€and€Richman,€òòWord€Frequency€Bookóó,Ï1971).€€In€their€reference€lexicon,€the€range€of€word€choiceÏwas:€'the'€(which€occurred€~73,000€times€per€million€words€inÏnatural€texts)€to€the€10,000th€ranked€word€'tournament'€(whichÏoccurred€an€estimated€2.8€times€per€million).ÌÌLEX€is€a€natural€science€measure„„which€meets€all€requirementsÌfor€such€measures.€€First,€it€is€based€on€a€formal€mathematicalÏmodel€of€word€choice€(Ô_ÔHerdanÔ_Ô,€1966).€€Herdan's€model€has€beenÏsupported€abundantly€by€research€in€psycholinguistics,€cognitiveÏpsychology,€and€by€decades€of€experience€with€standardized€'wordÏknowledge'€tests„„around€the€world€in€many€languages.€€The€LEXÏstatistic€also€has€ratio„scale€measurement€properties€(includingÏa€true€zero).€€Furthermore,€LEX€satisfies€Louis€Ô_ÔGuttmanÔ_Ô's€formalÏrequirements€for€cumulative€scales.€€LEX€is€a€rare€measure€forÏthe€human€sciences€because€it€is€also€a€stable€measure„„EnglishÏnewspaper€texts€have€been€written€at€virtually€the€same€LEXÏlevel,€all€over€the€world,€since€1665.€Ìò òÌó óEvery€LEX€score€has€both€a€sign€and€a€magnitude.€€If€the€wordsÐ  à Ðof€a€text€are€skewed€toward€common€words,€then€its€LEX€score€isÏòònegativeóó,€i.e.€its€text€is€simpler€than€the€average€newspaper„„Ïwhich€serves€are€the€common€reference€standard.€€Examples€ofÏtexts€with€negative€LEX€scores€include€mother€talk€to€theirÏchild€in€the€home.€€When€word€choice€is€skewed€toward€the€rareÏwords€of€English,€LEX€signs€are€òòpositiveóó,€i.e.,€such€texts€areÏmore€difficult€than€the€average€newspaper.€€An€example€would€beÏresearch€articles€published€by€òòNatureóó€in€1994.€€The€òòmagnitudeóó€ofÏthe€LEX€score€describes€the€òòdegreeóó€to€which€word€choice€isÏskewed,€when€compared€with€Ô_Ô(a)€HerdanÔ_Ô's€theoretical€linearÏpattern€of€word€choice,€(b)€the€highly€correlated€Carroll€etÏal's€corpus,€and€(c)€the€linear€pattern€of€word€choice€foundÏempirically€in€English„language€newspapers€since€1665.ÌÒX°ªÒÌLEX's€precision€is€conditional€on€the€sample€text's€size€and€on€howÏthe€sample€was€derived:€by€stratified€simple€random€sampling€or€byÏsome€less€systematic€sampling€procedure.€€LEX€is€suitable€forÏanalyzing€both€old€and€modern€texts€and€should€be€suitable€forÏanalyzing€English€texts€well€into€the€next€century.€€For€texts€derivedÏby€24€stratified€SRS€of€>150€words€each,€the€standard€error€ofÏmeasurement€is€1.3€LEX„„i.e.,€1.3€units€in€a€measure€whose€empiricalÏrange€to€date€runs€from€LEX€=€+58€to€LEX€=€„81.€€Normally,€LEX€scoresÏare€based€on€text€samples€of€1,000€or€more€words,€in€ten€or€more€sub„Ð .Ø'3 ÐñfüñÐ .Ø'3 мñfüñsamples€of€100€or€more€consecutive€words,€in€full€sentences,€taken€byÏstratified€random€sampling€methods.€In€1000„word€samples,€the€standardÏerror€is€closer€to€2€LEX.€€While€not€yet€proven,€the€substitution€ofÏFrench,€German€or€any€other€Reference€Lexicon€for€the€English€lexiconÏshould€yield€equally€strong€measures€of€a€text's€accessibility€andÏlexical€difficulty.ÌÌòòLEX€and€readabilityóó€scores€are€related€(r=€~+.70)€but€'readability'Ïscores€are€Ô_ÔatheoreticalÔ_Ô,€have€not€been€validated€to€the€same€degreeÌor€correlate€as€highly€with€multiple€criteria€as€LEX€[LEX€is€groundedÏon€a€very€large,€carefully€constructed,€comparative€data€base„„theÏCornell€Corpus„2000].€€By€contrast,€LEX€is€a€well„validated€naturalÏscience€measure€interpreted€as€measuring€a€text'sÏ'accessibility/lexical€difficulty/comprehensibility'.ÌÌLEX€is€wrongly€linked€to€'readability'€measures€(e.g.,€the€Ô_ÔFleschÔ_Ô„¼ñfüñ¼¼ñfüñÔ_ÔKincaidÔ_Ô,€or€Gunning€Fog€indices).€€Ô_ÔFleschÔ_Ô€developed€his€pragmatic€toolÏto€aid€primary€school€teachers€in€making€quick€decisions€on€theÏsuitability€of€books€for€their€students„„taking€into€considerationÏtheir€reading€skills,€level€of€comprehension€and€interests.€€HisÏ'readability'€measure€is€a€composite€of€two€measures:€(a)€the€fractionÏof€words€with€six€or€more€letters,€and€(b)€the€text's€average€lengthÏof€sentence„„in€words.€€ÌÌOne€reason€LEX€and€readability€scores€produce€different€estimates€ofÏthe€same€text's€difficulty€is€that€'òòreadability'€texts€are€not€editedóó.€ÏAll€LEX€texts€are€edited€according€to€a€comprehensive€set€ofÏtranscription€rules„„a€necessity€step€if€two€or€more€texts€are€to€beÏcompared€and€their€differences€interpreted€correctly.ÌÌ'Text€difficulty'€is€a€multi„dimensional€concept€with€importantÏgrammatical,€semantic€and€lexical€component(s).€LEX€does€not€measureÏsome€of€these€components.€€It€has€been€difficult€to€compare€theirÏrelative€contributions€to€overall€text€comprehension.€ÌÌÒ°XÒÒÜÒThe€spectrum€of€lexical€difficulty€in€English„language€texts€isÏshown€in€ò òTABLE€1ó ó€according€to€LEX€and€related€scores.Ð `" $ ÐÌ€€€€€€€€€€€€€€€€€€€€€€€€Table€1€about€hereÌÌÌÌÌÌÌÌÌÌÌÌÌñfüñÑ ù·ÑѰ°ÑÑà°ÑÑ  ÑñfüñÐ .Ø'3 ÐÑ ù·ÑѰ°ÑÑà°ÑÑ  ÑÒX°ÒÒ°Ò€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€TABLE€1Ì€SPECTRUM€OF€NATURAL€TEXTS€AND€THE€LEVELS€AT€WHICH€THEY€WERE€PITCHEDÌ€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€*€€€€€**Ì€€€€òòSource€of€textóó€€€€€€€€€€€€€€€€€òòDate/Nóó€€€òòLEXóó€€€òòÔ_ÔMeanUÔ_Ôóó€òòÔ_ÔMedianUÔ_Ôóó€òò%RareóóÌÌòòÓ?À ‡Xð¼x!(#‹XXXk°˜X?ÓNatureóó„„Ô_ÔtranshydrogenaseÔ_Ô€article€€€€1960€€€€58.6€€€€44€€€€€~1€€€€37.8ÌòòCellóó„„research€articles€€€€€€€€€€€€€1989€€€€41.1€€€€91€€€€€€6€€€€27.6ÌòòNatureóó„„main€research€articles€€€€€€1990€€€€34.7€€€107€€€€€€8€€€€25.2ÌòòJ.€Amer.€Chem.€Assoc.óó„„articles€€€€€1990€€€€33.3€€€€94€€€€€13€€€€20.5ÌòòNew€Ô_ÔEnglÔ_Ô.€J.€of€Medicineóó„„articles€€1990€€€€27.0€€€126€€€€€14€€€€18.8ÌòòScienceóó„„main€research€articles€€€€€1990€€€€26.9€€€120€€€€€17€€€€19.1ÌòòScientific€Americanóó„„articles€€€€€€€1991€€€€14.3€€€158€€€€€28€€€€14.4ÌWatson„Crick€DNA€model€in€òòNatureóó€€€€1958€€€€11.1€€€164€€€€€35€€€€10.7ÌòòBlack€Scholaróó„„articles€€€€€€€€€€€€€1994€€€€10.3€€€171€€€€€32€€€€12.2ÌòòNew€Scientistóó„„articles€€€€€€€€€€€€€1990€€€€€7.2€€€191€€€€€32€€€€11.3ÌòòPopular€Scienceóó„„articles€€€€€€€€€€€1994€€€€€4.6€€€197€€€€€39€€€€11.5ÌIRS„„Instructions€for€Form€1040€€€€€1988€€€€€3.1€€€208€€€€€40€€€€10.0ÌòòòŽòTimeóŽóóó„„articles€€€€€€€€€€€€€€€€€€€€€€1994€€€€€1.6€€€211€€€€€41€€€€11.4ÌòòEconomistóó„„articles€€€€€€€€€€€€€€€€€1990€€€€€0.9€€€210€€€€€47€€€€10.4ÌÌò òNEWSPAPERS€(English„language,€Intl)€N=61€€€€€0.0€€€216€€€€€51€€€€€9.2Ð à  Ѐó ó€€Ð ¨h ÐòòNature€(1900)„„res.€articlesóó€€€€€€€€1900€€€€„0.5€€€226€€€€€46€€€€€8.9ÌòòNational€Geographicóó„„articles€€€€€€€1984€€€€„0.6€€€199€€€€€53€€€€€8.7ÌòòDiscoveróó„„articles€€€€€€€€€€€€€€€€€€1990€€€€„2.6€€€229€€€€€48€€€€10.2ÌòòThe€Timesóó€(London„„1791)€€€€€€€€€€€€€€€€€€€€„3.0€€€233€€€€€50€€€€€8.3ÌòòNew€Yorkeróó„„articles€€€€€€€€€€€€€€€€1994€€€€„3.9€€€231€€€€€62€€€€10.2ÌColonial€American€newspapers€€€1720„1730€€€€„4.3€€€236€€€€€53€€€€€8.5ÌòòModern€Maturityóó„„articles€€€€€€€€€€€1985€€€€„5.0€€€230€€€€€58€€€€€8.8ÌòòSmithsonianóó„„articles€€€€€€€€€€€€€€€1988€€€€„9.1€€€264€€€€€88€€€€€9.1ÌòòSports€Illustratedóó„„articles€€€€€€€€1994€€€„10.3€€€257€€€€€89€€€€€8.4ÌSport€sections€from€newspapers€€€€€€€€€€€€€„12.3€€€264€€€€€74€€€€€7.6ÌAdult€books„„fiction,€USA€€€€€€€€€€N=€34€€€„15.8€€€282€€€€101€€€€€7.4ÌòòRanger€Rickóó„„science€for€children€€€€€€€€€€„18.4€€€291€€€€120€€€€€6.1ÌNewspaper€funnies€€€€€€€€€€€€€€€€€€€€€€€€€€„21.6€€€309€€€€112€€€€€5.6ÌNancy€Drew€mystery€series€€€€€€€€€€N=€69€€€„23.4€€€291€€€€129€€€€€4.2ÌÔ_ÔComicbooksÔ_ÔÔ_Ô„„Ô_ÔGB€&€USA€€€€€€€€€€€€€€€N=€37€€€„23.7€€€318€€€€133€€€€€6.0ÌBooks€read€at€age€10„14,€GB€€€€€€€€N=261€€€„24.3€€€312€€€€136€€€€€5.0̆TV„„cartoon€shows€€€€€€€€€€€€€€€€€€N=€26€€€„28.6€€€339€€€€152€€€€€5.0ÌBooks€read€at€age€9„12,€USA€€€€€€€€N=€94€€€„29.0€€€325€€€€160€€€€€3.9ÌTV„„reruns„„popular€w/€children€€€€N=€33€€€„35.3€€€359€€€€189€€€€€3.3ÌTV„„Ô_ÔprimetimeÔ_Ô€shows€€€€€€€€€€€€€€€€N=€44€€€„36.4€€€371€€€€194€€€€€3.3ÌPre„school€books€for€children€€€€€€N=€31€€€„37.0€€€360€€€€181€€€€€2.6ÌAdult„to„adult€conversations€€€€€€€N=€68€€€„37.5€€€375€€€€211€€€€€2.5ÌLegal€wiretaps€on€cocaine€dealer€€€€1990€€€„42.2€€€387€€€€222€€€€€2.7ÌòòWinnie€the€Poohóó„„Milne€€€€€€€€€€€€€€€€€€€€€„43.3€€€381€€€€210€€€€€1.8ÌTV„„òòSesame€Street€&€Mr.€Rogers'óó€€€€€€€€€€€€„44.1€€€397€€€€228€€€€€1.0ÌMothers'€talk€to€children,€age€5€€€N=€32€€€„45.8€€€394€€€€222€€€€€1.5ÌFarmer€talking€to€his€dairy€cows€€€€€€€€€€€„56.0€€€520€€€€292€€€€€3.1ÌPre„Primer„„Scott„Foresman€€€€€€€€€€1956€€€„80.5€€€640€€€€646€€€€€0.0Ì*€€U€=€Frequency€per€million€tokens€(Carroll,€Davies€&€Ô_ÔRichmanÔ_Ô,€1971)Ì**€Percent€of€tokens€ranked€outside€the€first,€most€common€10,000ÌÌÒ°XÒÌÐ p00*6 ÐчµÑѰàÑÑ  чÔ ” Ô€€€€€LEX€has€been€validated€in€numerous€ways:€by€experiments€inÏwhich€predictions€as€to€the€direction€of€changes€in€LEX€levelsÏunder€variable€conditions€were€tested;€by€comparison€with€otherÏless€well„validated€measures;€by€its€compliance€with€Herdan'sÏ(1966)€general€theoretical€model€of€word€choice;€and€by€severalÏkinds€of€substantive€research.€€These€include:€(a)€the€'Ô_ÔdumbingÔ_ÔÏdown'€of€American€basal€readers,€beginning€in€1947,€when€comparedÏwith€pre€World€War€II€basals;€(b)€the€growing€inaccessibility€ofÏscience€journals€to€the€educated€reader,€since€World€War€II;€(c)Ïthe€comparative€richness€of€the€natural€language€experiences€ofÏchildren€growing€up€in€underclass€through€upper„middle€classÏfamilies;€and€(d)€by€predictions€of€the€same€witness's€LEX€levelÏwhen€testifying€under€direct€and€cross„examination.€ÌÌ€€€€LEX's€interpretation€as€a€measure€of€a€text's€'accessibility,Ïlexical€difficulty€and€comprehensibility'€has€also€been€validatedÏby€internal€comparisons€among€the€5000+€texts€edited€and€analyzedÏin€a€file€named€òòCornell€Corpus„2000óó.€€This€Corpus€includes€everyÏmajor€text€category:€broadcasts,€print,€and€conversation,€formalÏand€informal.ÌÌò òòòII.€LEX€is€based€on€Ô_ÔHerdanÔ_Ô's€theoretical€model€of€word€choiceóóó ó.€€Ð ¨h ÐñgüñÌñgüñ€€€€The€abstract€generalizations€formulated€in€mathematics,€inÏphysics€or€in€syntax€are€among€mankind's€most€powerful€intellectualÏachievements.€€The€power€of€those€generalizations€is€due,€in€part,Ïto€their€essential€independence€from€specific€content.€€ForÏGalileo's€laws€of€motion,€it€does€not€matter€(theoretically)Ïwhether€the€falling€body€is€a€planet,€feather€or€cannonball,€soÏlong€they€are€in€a€vacuum.€€In€mathematics,€the€numeric€units€mayÏbe€dollars,€weights€or€distances„„which€specific€money€unit€doesÏnot€matter,€since€the€fundamental€relations€refer€to€a€higher€orderÏof€abstraction.€€Similarly€in€syntax,€which€specific€noun€is€usedÏin€a€phrase€or€sentence€is€unimportant„„what€does€matter€for€syntaxÏis€that€word's€grammatical€role€in€the€sentence.ÌÌ€€€€The€model€underlying€the€LEX€statistics€similarly€describes€anÏabstract€feature€of€all€texts:€òòits€pattern€of€word€choiceóó.€€OneÏcannot€detect€these€patterns€by€simply€reading€a€text€becauseò ò€theÐ ð#°& Ðorder€in€which€words€appear,€so€important€to€syntax€and€semantics,Ïis€irrelevant€to€the€text's€pattern€of€word€choiceó ó.€€The€words€ofÐ €%@( Ðany€large€text€may€be€randomized€in€a€thousand€different€ways,€yetÏthe€pattern€of€word€choice€and€the€LEX€scores€will€remain€the€same.ÌÌÔ_Ô€€€€HerdanÔ_Ô€(1966)€was€among€the€first€to€identify€a€pattern€of€wordÏchoice„„he€proposed€that€word€choice€is€a€special€case€of€the€€ÏòòÔ_ÔlognormalÔ_Ôóó€statistical€distribution.€€By€that€time,€it€was€alreadyÏknown€that€the€word€types€in€the€major€word€corpuses€(e.g.,ÏÔ_ÔThorndikeÔ_Ô€&€Ô_ÔLorgeÔ_Ô;€Dale;€and€Brown€University)€have€quite€similarÏrankings.€€Ô_ÔHerdanÔ_Ô's€formal€theoretical€model€for€word€choice€statesÏthat€the€probability€of€a€word's€choice€in€a€text€is€conditional€onÏthe€log€of€that€word's€general€frequency€of€usage.€€òòThat€impliesñhüñÔ% € ÔñhüñÐ .Ø'3 Ðthat€personal€lexicons€reflect€differential€experience€with€theÏEnglish€lexicon.óóÌÌ€€€€A€package€of€programs€named€Ô_ÔQLEXÔ_Ô€identifies€these€patterns€ofÏword€choice€and€produce€the€LEX€statistics€from€a€òòcumulativeÏproportion€distributionóó.€€This€is€done€by€first€determining€whatÏfraction€of€a€text's€words€is€'òòtheóó'„„English's€most€common€word.€ÏTo€that€proportion€is€added€the€proportionate€use€made€of€English'sÏsecond€most€common€type„„'of';€then€the€fraction€which€is€the€thirdÏtype€'and'€is€added€to€the€cumulative€proportion€distribution;€thenÏthe€fourth€most€common€word„„'a';€the€fifth„„'to',€and€so€onÏthrough€all€the€10,000€most€commonly€used€word„types€in€English.€ÏThe€resulting€cumulative€proportion€distribution€supplies€theÏtext's€òòpattern€of€word€choiceóó.€€ÌÌLEX€and€several€other€lexical€measures€describe€those€patterns.€€InÏevery€Ô_ÔQLEXÔ_Ô€analysis,€the€ranking€of€word€types€in€the€ReferenceÏLexicon€(which€is€always€represented€along€the€X„axis€by€the€òòlogóó€ofÏeach€word's€rank)€is€based€on€two€bodies€of€evidence:€(a)€theÏlargest€and€most€diverse€corpus€of€English€word€usage,€Carroll,ÏÔ_ÔRichmanÔ_Ô€and€Davies'€òòThe€Word€Frequency€Bookóó€(1971);€and€(b)€theÏpattern€of€word€choice€in€newspapers€(an€early€finding€in€thisÏcourse€of€this€line€of€research).€€These€first€10,000€most€commonlyÏused€word€types€are€ò òa€constantó ó€in€every€Ô_ÔQLEXÔ_Ô€analyses,€i.e.,€theirÐ 8ø Ðrank€in€the€first€10,000€most€common€English€word„types€serves€asÏthe€Reference€Corpus€against€which€every€sample€text's€actualÏchoice€of€words€is€compared.€€When€a€new,€contemporary,€far€largerÏbut€equally€diverse€and€well„designed€corpus€becomes€available,ÏCarroll,€et€al's€lexicon€will€be€replaced€and€this€new€Corpus€willÏserve€as€Reference€Lexicon.€Presumably€its€ranking€and€estimatedÌfrequency€of€usage€per€million€words€will€be€more€precise„„thoughÏthe€correlation€of€word€ranks€in€the€new€Reference€Lexicon€withÏCarroll,€et€al.€and€the€older€corpuses€should€still€remain€wellÏabove€r=.95.€€ÌÒX°ù6ÒÒ°XÒÓ98ŽX(#‹XÀ ‡Xð¼x!(#‹X9ÓÌ€€€€Figure€1€describes€the€empirical€òòpattern€of€word€choiceóó€in€63Ïnewspaper€samples€drawn€from€all€over€the€world€between€1665€andÏ1992.€€The€patterns€of€word€choice€in€these€newspapers€was€comparedÏagainst€Ô_ÔHerdanÔ_Ô's€theoretical€model„„which€is€approximately€linearÏwithin€this€range.€€Each€newspaper€sample€consisted€of€ten€or€moreÏsub„samples€of€100€or€more€words.€€Newspaper€word€choiceÏdistributions€generally€fit€the€linear€pattern€predicted€byÏÔ_ÔHerdanÔ_Ô's€model.€€Aside€from€this€fit,€it€is€notable€that€theÏdistributions€for€the€Q1,€Q2€and€Q3€newspapers€òòare€parallelóó.€€ThatÏimplies€that€it€was€their€differential€use€of€a€single€word„„'the'Ïthat€accounts€for€the€minor€differences€in€their€patterns€of€wordÏchoice,€even€in€samples€as€small€as€1000+€words.€If€one€equatesÏthose€newspapers'€use€of€'the',€the€Q1,€Q2€and€Q3€newspapers'€useÏof€the€first€10,000€most€common€word€types€of€English€is€similar.ÌÌÌÐ .Ø'3 ЇÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌñ;üñÌñ;üñÌÌ€€€€The€standardized€educational€and€occupational€testing€industryÏearly€capitalized€on€these€powerful€regularities€in€word€choice€byÏconstructing€'word€knowledge'€tests.€€Empirically,€children€wereÏfound€to€rank€the€words€of€the€English€lexicon€in€largely€the€sameÏorder€which€coincides€with€the€order€in€Thorndike€and€Lorge,€Dale,ÏBrown€University€and€the€Carroll,€Richman€and€Davies€corpusus.€ÏIndependent€tests€showed€that€a€test€word's€'difficulty'€was€highlyÏcorrelated€with€its€general€frequency€of€usage„„'hard'€words€wereÏusually€uncommon€or€rare,€'easy'€words€were€nearly€always€common.ÏThe€standardized€tests€were€validated€by€the€discovery€thatÏstudents€who€scored€well€on€word€knowledge€tests€generally€knowÏmore€of€the€uncommon€words€of€the€lexicon,€had€higher€levels€ofÏreading€comprehension,€academically€performed€well,€and€achieved€atÏa€higher€levels€on€other€texts€of€verbal€skills.€€Research€byÏcognitive€psychologists€also€found€that€acquisition€of,€access€to,Ïand€retrieval€times€for€words€from€a€lexicon€were€strongly€relatedÏto€a€word's€general€frequency€of€usage.€€While€the€specific€neuralÏmechanisms€involved€are€only€now€beginning€to€be€worked€out€andÏmuch€remains€unknown,€it€is€clear€is€that€the€choice€of€words€inÏtexts€and€one's€word€knowledge€follow€strong€mathematicalÏregularities.ÌÌ€€€Figure€2€describes€word€choice€patterns€which€represent€much€ofÏthe€effective€range€of€LEX.€The€òòtopmostóó€cumulative€proportionÏdistribution€represents€the€pattern€of€word€choice€of€32€mothers'Ïwhile€speaking€with€their€child€at€home€(all€the€children€were€30Ïmonths€of€age€at€the€time).€€The€mothers'€mean€distribution€isÏstrongly€skewed€toward€common€words€(relative€to€theoretical€linearÏdistribution€in€Ô_ÔHerdanÔ_Ô's€model€and€in€newspapers).€€The€òòmiddleóóÏdistribution€describes€a€single€1000+€word€sample€from€the€òòNew€YorkÐ .Ø'3 ÐTimesóó€(1987).€€It€too€approximates€the€average€newspapers'€linearÏpattern.€€The€òòbottomóó€of€the€three€distributions€represented€is€theÏpattern€of€word€choice€in€the€main€research€articles€in€òòNatureóóÏ(1994).€€Every€article€in€that€general€scientific€journal€is€skewedÏtoward€rare€words€and€the€degree€of€that€skew€has€grown€in€eachÏdecade€since€World€War€II.€Ó=À ‡Xð¼x!(#‹X8ŽX(#‹XEL=ÓÓ9ÜŽX(#‹XÀ ‡Xð¼x!(#‹X9ÓÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌñ=üñÌñ=üñÌÓ=À ‡Xð¼x!(#‹XÜŽX(#‹XãY=ÓÒX°7LÒÒ°£HXÒò òòòIII.€€LEX€uses:€theoretical€and€appliedóóó ó.€€Ð `" $ ÐÌ€€€€The€first€theoretical€use€to€which€LEX€was€put€was€to€test€aÏniche„like€model€of€how€text€difficulty€shaped€the€dynamics€of€theÏscience€publishing€industry€since€World€War€II.€€An€explosiveÏgrowth€in€the€number€of€scientists€was€associated€with€a€growth€inÏscience€journals€and€magazines.€€The€principal€hypothesis€testedÏwas€that€science€magazines€must€dispersed€themselves€(lexically)€soÏas€to€minimize€competition€for€readers€and€advertisers,€whileÏmaximizing€the€number€of€subscribers.€€When€magazines€sought€toÏoccupy€the€same€lexical€niche,€two€outcomes€were€apparent:€eitherÏthe€magazine€shut€down€or€the€publisher€moved€into€a€new€lexicalÏniche€by€changing€the€mean€LEX€level€of€its€principal€articles.€€ÌÌ€€€€€For€example,€a€series€of€musical€chairs„like€moves€wasÏprovoked€in€the€science€journal€industry€when,€in€1947,€òòNatureóóÐ .Ø'3 Ðvacated€its€historical€niche€(LEX€=€~0.0)„„the€mean€level€of€aÏnewspaper.€€By€1950,€òòNatureóó's€major€articles€had€risen€to€LEX€=Ï+17,€reflecting€a€decision€to€allow€authors€to€write€specificallyÏfor€scientists€and€to€ignore€the€educated€person.€€òòScienceóó€held€toÏits€traditional€niche€(LEX€=€+6)€until€1960,€then€it€followedÏòòNatureóó's€lead€and€began€published€major€articles€at€higher€LEXÏlevels,€by€allowing€scientists€to€write€at€more€difficult€levels.€ÏòòScience'sóó€time€series€had€reached€LEX€=€+15€by€1970€[by€which€time,ÏòòNatureóó's€articles€averaged€LEX€=€+25].€Sensing€òòScience'sóó€now„¼ñ=üñ¼ñ=üñvacated€niche,€the€publisher€and€editors€of€òòScientific€AmericanóóÏchanged€its€LEX€level€(from€~0.0€to€LEX€=€+5€by€1970),€effectivelyÏmoving€into€òòScienceóó's€now€vacated€niche.€€By€1980,€òòScientificÏAmericanóó's€average€article€had€drifted€to€LEX€=€+10,€which€mustÏhave€exceeded€many€of€its€subscribers'€ability€to€understand€theÏarticles€because€within€a€short€period,€well€over€a€hundredÏthousand€subscribers€Ô_Ôfailed€to€renew€their€subscriptions.€€That,€inÏturn,€induced€many€advertisers€to€withdraw€as€well.€€That€loss€ofÏsupport€weakened€the€magazine's€financial€position,€and€made€it€aÏtarget€of€a€corporate€acquisition.€€ÌÌ€€€€€By€vacating€its€traditional€LEX€=€0.0€niche€(which€òòScientificÏAmericanóó€had€long€shared€with€òòPopular€Scienceóó),€that€niche€wasÏviewed€as€an€opportunity€by€the€editors€of€four€new€scienceÏmagazines,€who€felt€there€was€now€room€for€a€magazine€designedÌfor€the€educated€reader€interested€in€science.€€These€four€wereÏòòScience€Digestóó,€òòÔ_ÔSciQuestÔ_Ôóó,€òòScience„80óó€and€òòDiscoveróó.€€Each€sought€toÏoccupied€that€same€LEX€=€0.0€niche,€but€there€were€insufficientÏsubscribers€and€advertisers€and€the€first€three€had€ceasedÏoperations€by€1986.€€Their€vacating€that€niche€encouraged€theÏstartup€of€still€another€science€magazine,€òòNew€Scientistóó„„whoseÏtexts€rose€to€LEX€=€+7€by€1990€(Hayes,€1991).€€Similar€changes€andÏoutcomes€were€experienced€by€several€other€science€magazines,Ïincluding€the€weekly€òòScience€Newsóó.€€ÌÌ€€€€€While€there€was€competition€for€the€0.0€niche,€the€time€seriesÏfor€every€major€professional€science€journal€shows€a€rapidÏexpansion€of€specialized,€technical€language€and€higher€positiveÏLEX€scores.€€ÌÌ€€€€€In€consequence,€professional€science€is€largely€inaccessibleÏto€the€general€college„educated€reader,€which€increases€ourÏreliance€on€intermediaries€who€select€and€translate€developments€inÏscience€for€the€general€reader€(e.g.€the€òòNew€York€Times€ScienceóóÏsection€and€the€rapid€growth€in€the€coverage€of€science€andÏmedicine€by€television€networks).€€ÌÌ€€€€€Finally,€high€school€science€textbooks€have€grown€far€more€Ïdifficult€than€textbooks€for€the€rest€of€the€12th€grade€curriculum,Ïe.g.,€history,€English€and€social€science€texts.€€The€relatively€Ìhigh€LEX€levels€is€probably€contributing€to€student€avoidance€ofÏthe€non„required€advanced€science€courses€and€the€decliningÏfraction€of€American€scientists,€engineers€and€softwareÐ .Ø'3 Ðspecialists,€and€the€growth€of€dependence€upon€the€foreign„trained.€ÌÌ€€€€A€second€illustrative€theoretical€use€of€LEX€measurements€wasÏto€resolve€an€empirical€dispute€between€behavior€geneticists€(BG)Ïand€most€social€and€developmental€scientists.€€At€issue€is€theÏrelative€importance€of€children's€environments€and€experiencesÏ(particularly€their€natural€language€experiences)€in€shapingÏchildren's€verbal€achievement.€€Recent€BG€research€on€adoptedÏchildren€in€families€with€other€siblings€finds€that€such€languageÏexperiences€are€'essentially'€the€same€for€children€growing€up€inÏwell„off€and€relatively€poor€families.€€In€contrast,€social€andÏdevelopmental€scientists€point€to€their€own€voluminous€researchÏshowing€language€experiences€are€different€for€children€growing€upÏin€underclass€or€upper„middle€class€families.ÌÌ€€€€LEX€was€used€to€measure€the€difficulty€of€the€natural€languageÏexperiences€children€have€with:€(a)€the€texts€of€the€televisionÏprograms€they€choose€to€watched€'regularly';€(b)€the€texts€of€theÏbooks€and€magazines€they€chose€to€read€for€themselves,€and€(c)€theÏtexts€of€household€conversations€children€have€with€their€mothers.€ÏExceptional€for€this€kind€of€research,€and€unlike€the€samples€usedÏin€the€behavior€genetics€research,€the€children€in€these€studiesÏwere€statistically€representative€samples€of€all€British€children,Ìall€public€school€California€6th€graders€and€families€in€theÌBristol€England€metropolitan€area€born€within€a€week€of€oneÏanother.ÌÌ€€€€€We€find€differences€in€the€richness€of€children's€languageÏexperience€of€those€growing€up€in€different€social€classÏbackgrounds€(consistent€with€social€and€developmental€research),Ïbut€all€those€differences€were€small€(several€LEX€only)€and€smallÌdifferences€in€the€quantity€of€those€experiences.€€However,€thoseÏsmall€differences€were€pervasive€and€persisted€throughout€childhoodÏin€our€cross„sectional€and€time„series€data.€€Differential€quantityÏand€difficulty€of€language€experience€was€found€at€every€age€fromÏ30€months€through€14€years€of€age.€€One€implication€is€that€suchÏsmall,€persistent€differences€in€language€experience,€may€produceÏòòcumulativeóó€effects€on€the€Ô_ÔchildrensÔ_Ô'€general€domain€knowledgeÏ(e.g.,€their€knowledge€of€baseball,€genetics€or€finance),€whichÏaffects€their€verbal€skills€(which€include€the€extent€of€theirÏconceptual€knowledge€and€lexical€development).€€Such€differencesÏwould€doubtless€contribute€to€their€reading€comprehension€andÏacademic€performance€and€may€help€to€account€for€the€world„wideÏpattern€of€higher€verbal€skills€among€children€with€richer€languageÏexperience€(Hayes,€Ô_ÔWolferÔ_Ô,€Ô_ÔRoschkeÔ_Ô,€Ô_ÔTsayÔ_Ô€and€Ô_ÔAhrensÔ_Ô,€1999).€ÌÌ€€€€Another€example€of€LEX's€theoretical€use€is€in€testing€possibleÏmechanisms€affecting€a€person's€pattern€of€word€choice.€€Two€haveÌbeen€explored„„personal€stress€and€lack€of€preparedness.€€ForÏexample,€in€one€test€of€courtroom€testimony,€witnesses€often€mustÏanswer€questions€put€to€them€by€two€attorneys:€one€under€direct€andÏthe€other€under€cross„examination.€€LEX€analyses€document€thatÐ .Ø'3 Ðvirtually€every€witness€in€the€Patricia€Hearst€trial€gave€lexicallyÏsimpler€testimony€in€response€to€cross„examination€(where€theÏinterrogator€is€the€opponent's€attorney€and€whose€job€it€is€toÏchallenge€the€witness'€credibility,€and€if€possible,€discredit€thatÏtestimony€with€the€jury€and€judge€(Hayes€and€Ô_ÔSpiveyÔ_Ô,€1999).€BothÏcognitive€scientists€and€trial€lawyers€predict€this€finding€andÏattribute€it€to€the€distracting€effect€of€high€stress€and€lack€ofÏpreparation€on€the€witness'€word€choice€under€the€pressure€ofÏrealtime€text€production.ÌÌ€€€Finally,€LEX€has€also€been€used€in€applied€research.€€One€studyÏsought€to€answer€this€question:€Why€did€nationwide€òòSAT„verbalóóÏscores€remain€at€near„constant€levels€from€the€mid„1950's€throughÏthe€early€1960s€(the€highest€mean€level€recorded€was€1963),€butÏthen€declined€in€each€of€the€next€16€consecutive€years€(until€itÏmore€or€less€stabilized€in€1978).€€Mean€SAT„verbal€scores€haveÏremained€at€that€low€level€ever€since.€€Ô_ÔTo€explain€this€SAT„verbalÏtime€series,€ChallÔ_Ô€(1967,€1977)€hypothesized€that€one€majorÏcontributing€factor€was€the€lowering€of€academic€standards„„¼ñ=üñ¼ñ=üñincluding€the€Ô_Ô'dumbingÔ_Ô€down'€of€the€schools'€curriculum,€reductionÏin€homework€and€difficulty€of€the€principal€school€readingÏmaterials€used€by€students€throughout€the€nation.€€ÌÌ€€€In€this€application,€LEX€was€used€to€measure€the€difficulty€ofÏnearly€800€basal€readers€used€across€the€United€States€during€threeÏtime€periods:€1919„1945,€1946„1962,€and€1963„1991.€€We€found€thatÏLEX€levels€of€basal€readers€published€in€the€US€between€1946„1962Ïperiod€declined€sharply€from€their€pre„World€War€II€levels.€€(TheyÏwere€not€simplified€in€the€UK„„so€the€American€decline€in€levelÌwas€a€choice€made€by€educational€leaders€and€their€schoolÏpublishing€colleagues).€€€Since€the€major€schoolbook€publishersÏmarketed€their€basal€series€nationwide,€one€òòeffectóó€of€these€newÏbasals€may€have€been€to€reduce€the€range€and€òòdepth€of€domain€andÏconceptual€knowledgeóó€which€student€derive€from€their€basic€texts.€ÏOne€of€the€primary€tasks€of€schooling€is€to€broaden€the€range€of€aÏchild's€domain€knowledge.€€Simplifying€texts€narrowed€the€breadthÏand€depth€of€conceptual€and€lexical€resources€available€to€studentsÏwhich€may€have€contributed€to€the€reduced€verbal€achievement€on€theÏSAT„verbal€tests€when€these€students€took€the€SAT„verbal€test€yearsÏlater€near€the€end€of€their€high€school€experience.ÌÌ€€€The€conventional€wisdom€in€education€attributes€this€decline€toÏchanges€in€the€composition€of€those€taking€the€SAT„„many€moreÌand€less„select€students€began€to€take€the€SAT.€€That€explanationÏfails€to€explain€why€the€SAT€scores€began€to€fall€in€1963€and€notÏearlier,€and€why€the€scores€declined€for€those€16€consecutiveÏyears,€since€the€number€taking€the€SAT€had€stabilized€around€1963.€ÏNor€can€it€explain€the€persistent€low€level€of€mean€verbal€Ìachievement€since€1978€to€the€present.€€Nor€can€it€explain€whyÏverbal€achievement€declined€particularly€among€those€at€the€òòhighestÏlevels€of€verbal€achievementóó.€€Despite€many€more€students€takingÏthe€SAT,€there€was€an€òòabsolute€declineóó€in€the€number€scoring€overÐ .Ø'3 Ð600,€and€an€òòeven€greater€proportional€declineóó€in€those€scoring€overÏ700.€€Something€affected€the€achievement€level€of€even€the€mostÏable€students„„those€who€had€always€taken€the€SAT.€€ÌÌ€€€LEX€analyses€supply€the€evidence€compatible€with€Ô_ÔChall'sÏhypothesis:€schoolbooks€became€much€less€difficult€after€1946€andÏa€side„effect€(doubtless€unintended)€of€that€dumbing€down€ofÏtextbooks€was€to€lower€the€students'€domain/conceptual€knowledgeÏand€skills.€€When,€years€later,€the€students€took€the€SAT,€thoseÏcohorts€whose€verbal€SAT€scores€declined€were€the€ones€who€firstÏencountered€the€simplified€texts€(Hayes,€Ô_ÔWolferÔ_Ô€&€Wolfe,€1996).€ÏCross„lagged€correlation€of€basal€reader€levels€throughout€theirÏelementary€and€secondary€schooling€with€the€national€SAT„verbalÏscores€is€generally€consistent€with€Ô_Ôthis€hypothesis.€€Since€LEXÏlevels€of€contemporary€basal€readers€and€major€textbooks€haveÏremained€unchanged€since€1978,€there€are€no€grounds€to€expectÏverbal€scores€to€rise€without€a€fundamental€increase€in€the€texts'ÏLEX€levels.€€In€fact,€scores€have€not€risen€for€30€years.€€WhenÏcomplaints€about€the€low€verbal€scores€grew,€Ô_ÔETSÔ_Ô€ultimately€re„¼ñ=üñ¼ñ=üñcalibrated€the€verbal€scores€upwards,€especially€for€those€at€theÏupper€range,€despite€ample€evidence€that€modern€tests€areÏcomparable€in€difficulty€to€those€throughout€the€late€1950s€andÏearly€1960s.ÌÌò òòòIV.€CONDITIONS€FOR€USING€Ô_ÔQLEXÔ_Ôóóó óÐ À ÐÌ€€€€Ô_ÔThe€QLEXÔ_Ô€package€of€programs€provides€researchers€with€toolsÏfor€measuring€important€facets€of€any€English€text„„itsÏaccessibility,€comprehensibility€or€lexical€difficulty€and€its€useÏof€grammatical/closed€class€terms.€€ÌÌ€€€€€There€are€few€conditions€of€QLEX's€use.€€Ô_ÔQLEXÔ_Ô€was€developedÏacross€several€generations€of€mini„computers€and€PCs,€usingÏdifferent€operating€systems,€several€programming€languages,€andÏfile€formats.€€Most€of€this€work€was€done€under€the€DOS€operatingÏsystem.€ÌÌ€€€€€You€are€free€to€use€Ô_ÔQLEX€Ô_ÔÔ_Ôso€longÔ_Ô€you€agree€to€theseÏconditions:€the€programs€must€not€to€be€copyrighted€for€commercialÏpurposes,€under€this€or€any€other€name.€€The€programs€were€designedÏto€provide€scientists€with€a€major€tool€for€text€analysis„„so€theyÏare€part€of€the€general€scientific€measurement€system,€not€aÏcommercial€product.€€There€is€no€guarantee€the€programs€will€run€onÏyour€machine;€nor€are€they€guaranteed€to€be€free€of€errors€(thoughÏthey€have€been€tested€tens€of€thousands€of€times).€€Scientists€areÏencouraged€to€continue€LEX€development€as€a€scientific€tool€as€withÏany€such€tool.€€I€would€like€to€be€notified€of€your€changes,€theÏjustification€for€them,€and€the€effects€those€changes€have€on€LEXÏmeasurements.€My€e„mail€address€is€dph1@cornell.edu.€€QLEX€andÏother€files€may€eventually€be€downloaded€from€my€WEBsite,€but€forÏnow€they€can€be€obtained€from€the€author.Ìò òòòÐ .Ø'3 Їñ>üñÌñ>üñV.€TEXT€PREPARATION€FOR€Ô_ÔQLEXÔ_Ô€ANALYSISó óóó.€€Ð @ ÐÌ€€€€Ô_ÔQLEXÔ_Ô,€the€name€for€the€software€package€which€carries€out€aÏsystematic€lexical€analysis€on€any€English€text,€may€be€used€on€anyÏtext:€spoken,€broadcast€or€printed.€€First€developed€and€used€inÏ1980,€Ô_ÔQLEXÔ_Ô€continues€to€evolve€as€more€is€learned€about€theÏunderlying€theoretical€models,€the€mechanisms,€the€many€LEXÏmeasurements€and€their€interpretation.€€This€is€a€work€in€progress.€ÏFirst€reports€on€LEX's€uses€appeared€in€technical€papers€in€theÏCornell€Sociology€Technical€Reports€series€catalogued€in€theÏcomputer€data€base€for€research€universities€called€Ô_ÔRLINÔ_Ô;€inÏmultiple€presentations€at€the€AAAS€annual€meetings;€and€at€otherÏpsychological€and€sociological€professional€meetings.€€RecentÏdescriptions€of€models€and€research€are€reported€in€Hayes,€1988;ÏHayes€and€Ô_ÔAhrensÔ_Ô,€1988;€Hayes,€1992;€Hayes,€Ô_ÔWolferÔ_Ô€&€Wolfe,€1996)ÌÌÒX°”ZÒÒ°XÒ€€€€€òòA.€Converting€text€files€from€Ô_Ôx.TXTÔ_Ô€to€Ô_Ôx.ASCIIÔ_Ô.óóò ò€€ó óSome€textÐ À€  Ðfiles€are€saved€using€the€file€extension€x.Ô_ÔTXT€(where€x€stands€forÌa€filename)Ô_Ô€but€if€the€text€is€to€be€analyzed€by€Ô_ÔQLEXÔ_Ô,€the€textÏmust€be€converted€into€an€ASCII€(standard)€file.€€In€òòWordPerfectóó,Ïthis€conversion€is€done€when€you€'Close'€your€file.€€Chose€theÏoption€which€converts€the€text€to€an€ASCII€file.€€If€you€forget,Ïyour€text€will€contain€odd„looking€characters€and€Ô_ÔQLEXÔ_Ô€will€notÏproduce€a€LEX€analysis.€€ÌÌ€€€What€should€a€text€file€look€like€after€it€has€been€edited?ÏExamine€the€following€edited€text€sample€taken€from€an€InternalÏRevenue€Service€publication:ÌÒX°k”ÒÒ°ÒÒÜXÒÌ&&000€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€IRS1040.Ô_ÔASCÔ_ÔÌThis€publication€was€sent€by€the€IRS€to€all€taxpayers€whoÏhad€filled€out€Form€1040€in€1986.€€It€describes€someÏchanges€in€the€tax€law€which€Affect€the€1987€tax.€€AÏstratified€Ô_ÔSRSÔ_Ô€from€the€first€19€pages€is€shown.ÌÌ&&111€This€year€for€the€first€time€the€=Ô_ÔTaxReformActÔ_Ô€ofÏ=1986€will€have€major€impact€on€the€preparation€of€your€taxÏreturn.€€This€has€important€consequences€for€you€and€forÏus.€Changes€made€by€the€new€=Act€are€summarized€on€the€nextÏpage.€You€can€learn€more€about€the€ones€that€affect€you€byÏgetting€one€of€the€publications€listed€near€the€end€of€thisÏbooklet.€Learning€about€the€changes€now€will€make€it€easierÏfor€you€to€prepare€your€return€when€you€start€working€onÏit.€Ì€€€ÌIncreased€standard€deduction.€€The€standard€deductionÏ(formerly€the€zero€bracket€amount),€has€increased€for€mostÏindividuals.€ÌÌAlternative€minimum€tax.€€The€tax€rate€has€been€increasedÏto€=21€percent€and€several€tax€preferences€have€been€addedÏor€deleted.€Ð .Ø'3 ЇAllocation€of€interest€expense.€€Whether€your€interestÏexpense€is€subject€to€the€new€limits€that€apply€to€personalÏand€investment€interest€depends€on€how€and€when€the€loanÏproceeds€were€used.€€Special€rules€apply€in€determining€theÏtype€of€interest€on€loan€proceeds€deposited€in€a€personalÏaccount,€such€as€a€checking€account.€Ì€ÌJoint€or€separate€returns.€Generally,€married€couples€willÏpay€less€tax€if€they€file€a€joint€return€because€the€taxÏrate€for€married€persons€filing€jointly€is€lower€than€theÏtax€rate€for€married€persons€filing€separately.€€However,Ïas€a€result€of€some€of€the€changes€in€the€tax€law,€such€asÏthe€increased€income€limit€that€applies€to€medical€andÏdental€expenses€and€the€new€individual€retirementÏarrangement€deduction€rules€that€apply€to€certainÏindividuals,€you€may€want€to€figure€your€tax€both€ways€toÏsee€which€filing€status€is€to€your€tax€benefit.€€Ì€ÌMarried€persons€who€live€apart.€€Some€married€persons€whoÏhave€a€child€and€who€do€not€live€with€their€spouse€may€fileÏas€head€of€household€and€use€tax€rates€that€are€lower€thanÏthe€rates€for€single€or€for€married€filing€a€separateÏreturn.€€This€also€means€that€you€can€take€the€standardÏdeduction€even€if€your€spouse€itemizes€deductions.€€You€mayÏalso€be€able€to€claim€the€earned€income€credit.€€ChildrenÏof€Divorced€or€Separated€Parents.€€The€parent€who€hasÏcustody€of€a€child€for€most€of€the€year,€the€custodialÏparent,€can€generally€take€the€exemption€for€that€child€ifÏthe€child's€parents€together€paid€more€than€half€of€theÏchild's€support.€€This€general€rule€also€applies€to€parentsÏwho€do€not€live€together€at€any€time€during€the€last€sixÏmonths€of€the€year.ÌÒ°Í—ÒÒXÜÛ—ÒÒ°XÒòòÌÌóó€€€€òòB.€òòò òBare„minimumó óóó€text€preparation€for€Ô_ÔQLEXÔ_Ô€analysis.€óó.€Ó3 XXŽXÀ ‡Xð¼x!(#‹X3Ó€If€youÐ Ð " Ðà  àare€Ó$XXk°XXXŽX$Óin€a€hurry€and€need€a€rough€estimate€of€a€text's€main€LEXÌà  àscores:€ÌÌ€€€€(1)€converts€the€text€into€ASCII€(i.e.€make€the€file'sÌ€à  à€extension€Ô_Ôx.ASCÔ_Ô€(as€in€òòIRS1040.Ô_ÔASCÔ_Ôóó)„„to€remove€all€word„à @À! àÐ ¸$x' Ðà  à€processor€symbols€which€Ô_ÔQLEXÔ_Ô€does€not€understand;€Ì€€€€(2)€place€the€requiredidentification€symbol€(òò&&000óó)€at€theÌ€€€€€€€€òòtop,€left€edgeóó;€Ì€€€€(3)€assign€a€filename€to€the€òòupper€far€right€hand€corner;óóÌ€€€€(4)€use€this€identification€code„„ID€(òò&&111óó)€at€the€beginningÌ€€€€€€€€of€the€first€line€of€text€Ô_Ôwhich€QLEXÔ_Ô€is€to€analyze.ÌÌ€€€€€€€€òòNote€the€ID€distinctionóó:€the€symbol€ò ò&&000ó ó€is€used€for€theÐ ø*¸$/ Ѐ€€€€€€€header€and€comments„„all€words€after€that€are€ignored€byÌ€€€€€€€€Ô_ÔQLEXÔ_Ô;the€symbol€ò ò&&111ó ó€is€put€before€any€text€Ô_Ôyou€want€QLEXÔ_ÔÐ ˆ,H&1 Ðà  à€to€analyze.Ì€€€€€€€Ð .Ø'3 Ѐ€€€That's€it!€€Proceed€to€Section€VII,€B.€to€run€Ô_ÔQLEXÔ_Ô.€ÌÌ€€€€òòC.€ò òNORMALñAüñlñAüñó ó€text€preparation€for€Ô_ÔQLEXÔ_Ô€analysisóóÔ_Ô„„Ô_Ôthese€rules€lookÐ Ð Ðà  àimposing€but€it€takes€less€time€to€edit€most€files€than€itÌ€à  àdoes€to€read€or€describe€them.Ð `  Ѐ€€€€Ìà  àòò1.€ID€(Identification)€Codesóó€€Text€prepared€for€Ô_ÔQLEXÔ_Ô€analysesÏis€divided€into€two€types:€(a)€text€Ô_ÔQLEXÔ_Ô€is€to€analyze,€and€(b)Ïtext€Ô_ÔQLEXÔ_Ô€is€òònotóó€to€analyze.€€€To€distinguish€these€two€ID€types:Ð € @ Ðò ò&&000ó ó€is€placed€at€the€beginning€of€any€passage€Ô_ÔQLEXÔ_Ô€is€to€ignoreÐ H   Ð(e.g.,€information€about€this€file,€the€date,€how€the€text€wasÏproduced€and€sampled,€its€context,€etc);€and€ò ò&&???ó ó€(any€numberРؘ  Ðbetween€1€and€9,€ò òbut€not€zeroó ó)€for€any€text€you€want€Ô_ÔQLEXÔ_Ô€toÐ  `  Ðanalyze„„cf.€IRS1040.Ô_ÔASCÔ_Ô€above.€&&111€is€the€ID€number€used€in€mostÏtext€analyses.ÌÌIf€the€expression€&&000€begins€a€line€or€passage,€Ô_ÔQLEXÔ_Ô€willÏcontinue€to€ignore€everything€thereafter€until€it€finds€another€IDÏsymbol,€i.e.,€Ô_ÔQLEXÔ_Ô€assumes€that€all€text€after€the€&&xxx€symbolÏshould€be€analyzed„„until€it€comes€across€the€&&000€symbol.ÌÌà  àòò2.€Header€and€Commentsóó€€The€ID€code€&&000€òòmust€appearóó€at€the€Ïtop€left€edge€of€the€text,€as€shown€in€IRS1040.Ô_ÔASCÔ_Ô.€€Normally,€aÏtext€file€is€described€by€identifying€its€source€(e.g.€stratifiedÏsimple€random€sample€taken€from€the€New€York€Times,€July€5,€1998)Ïor,€if€it€is€a€conversation,€the€cast€of€characters€(e.g.€MotherÏand€her€daughter,€Jennifer,€age€4€years,€3€months,€in€their€home).€ÏThe€context€of€that€conversation€could€also€be€described€(JenniferÏis€tired€after€having€had€no€nap.€This€conversation€took€place€inÏthe€kitchen.€€No€one€else€was€present).€€òòYou€may€place€commentsÏanywhere€you€like€in€your€textóó,€so€long€as€those€comments€appearÏòòafteróó€the€&&000€symbol€which€is€always€placed€at€the€far€left€of€aÏline,€followed€by€a€space.€€After€you€complete€a€comment,€òòdon'tÏforgetóó€to€use€the€ID€&&xxx€at€the€front€of€the€line€where€Ô_ÔQLEXÔ_Ô€isÏto€resume€its€analysis.€€For€example,€&&121€could€be€a€motherÏ(identified€as€#1),€talking€to€her€child€(identified€as€#2),€as€in:ÌÌ€€€€€&&121€Bring€me€some€butter€from€the€refrigerator.€Ì€€€€€€€€€€€You€have€to€have€yummy€ingredients€when€you€makeÌ€€€€€€€€€€€cookies.Ì€€€€€&&211€Do€you€want€the€whole€box€or€just€one€of€the€pieces?€Ì€€€€€€€€€€€Oh,€there's€only€one€piece€left€in€the€box!€ÌÌThe€last€of€an€ID's€three€digits€may€be€used€for€designating€theÏcontext€for€the€conversation€(e.g.€#1€is€mother„child€talk€while€inÏthe€yard,€#2€is€while€they€are€talking€at€the€grocery;€#3€is€whileÏthey€waited€for€a€brother€to€get€out€of€school),€or€this€thirdÏdigit€may€be€used€to€keep€track€of€the€different€segments€in€aÏtime„series.€€For€example,€in€that€mother„child€conversation,€Ô_ÔtheÏthird€digit€could€represent€segment€3€of€a€multi„part€conversation.€ÏFor€analyzing€segment€#3,€one€would€use€ID€&&123€to€instruct€Ô_ÔQLEXÔ_ÔÏto€search€for€text€spoken€by€the€mother€to€her€child,€in€segment€#3Ð .Ø'3 Ðonly;€&&215€would€ask€Ô_ÔQLEXÔ_Ô€to€search€for€the€child€talking€with€herÏmother,€in€segment€#5€only.€ÌÌMost€analyses€of€conversation€do€not€distinguish€between€theÏseveral€speakers€of€a€conversation€or€different€segments€of€theÏsame€text.€€In€that€case,€simply€put€&&111€in€front€of€the€firstÏword€of€their€conversation„„that€will€òòsufficeóó€for€the€entire€textÏ(as€in€the€IRS€sample).€€In€some€conversations,€however,€you€mayÏwish€to€distinguish€what€each€person€said€to€another,€as€in€whatÌPresident€Nixon€said€to€John€Dean€as€opposed€to€his€principalÌadviser,€Robert€Haldeman.€ÌÒX°é¢Òòòò òÌÒ°XÒVI.€€EDITING€RULESóóó ó.Ð  `  ÐÌà  àòòA.€òòò òQuick„and„dirty€text€editingó óóóóó.€€Use€any€transcription€rules,Ð 0ð  ÐÓ3ÜXk°ø` ‹XXXk°X3Óà  àà ø àòòincluding€noneóó„„so€long€as€you€use€the€òòBare„Minimumóó€editingÐ ø¸  Ðà  àà ø àrules€(cf.€Section€V.,€B.€above€).€€Ô_ÔQLEXÔ_Ô€will€run€on€textsÏà  à€€€with€bare€minimum€editing,€you€will€have€output,€and€canÌà  àà ø àget€the€LEX€statistics.€€Those€values€will€be€in€the€generalÌà  à€€€ballpark€of€genuine€LEX€scores€calculated€with€a€fullyÌà  àà ø àedited€text.€Unedited€text€LEX€scores€will€not€be€comparableÌà  àà ø àto€the€LEX€values€of€€the€5000+€texts€in€the€Cornell€Corpus„¼ñAüñ¼ñAüñà  à€€€2000€used€for€interpreting€your€findings.€€If€comparabilityÌà  àà ø àand€precision€is€needed,€you€must€use€the€NORMAL€textÌà  àà ø àediting€procedures€below.€€Editing€takes€time,€requiresÌà  àà ø àclose€attention,€and€can€be€lugubrious„„that€is€a€cost€ofÌà  àà ø àobtaining€a€scientific€measurement.ÌÌà  àòòB.€òòò òNORMALó óóó€text€editing:€general€rulesóóò ò.€ó ó€Texts€should€beÐ  à Ðà  àà ø àtranscribed€according€to€normal€conventions€of€spellingÌà  àà ø àand€orthography„„as€found€in€an€unabridged€dictionary.Ìà  àà ø àThere€are€two€exceptions€in€Ô_ÔQLEXÔ_Ô€analyses:€(1)€Ô_ÔQLEXÔ_ÔÌà  àà ø àanalyses€are€carried€out€onò ò€word„typesó ó,€i.e.€everyÐ ‰I  Ðà  àà ø àuniquely€spelled€variant€of€a€term€is€treated€as€aÌà  àà ø àdistinct€type€in€the€English€lexicon€(e.g.,€while€'boat'Ìà  àà ø àand€'boats'€share€a€common€stem,€they€are€differentÌà  àà ø àword„types€in€lexical€analysis);€and€(2)€the€distinctionÌà  àà ø àbetween€capital€and€lower€case€is€not€retained€in€Ô_ÔQLEXÔ_ÔÌà  àà ø àanalyses€(so€'the',€'The'€and€'THE'€are€combined).ÌÌ€€€€€€€€€€All€terms€in€a€text€should€be€transcribed,€even€if€theyÌà  àà ø àcannot€be€found€in€an€unabridged€dictionary,€e.g.€newÌà  àà ø àwords,€word„fragments€and€filled€pauses.€€Word„fragmentsÌà  àà ø àand€filled€pauses€are€often€disregarded€by€transcribersÌà  àà ø àso€their€transcription€is€typically€unreliable€unless€doneÌà  à€€€by€trained,€conscientious€workers.€€Some€substantive€areasÏà  àà ø àof€psychology,€linguistics€and€sociology,€consider€suchÌ€à  àà ø àinformation€useful€for€some€analytic€and€interpretiveÐ A+%/ Ѐ€€€€€€€€€purposes.ÌÌÐ ™-Y'2 Ðà  àòòPROPER€NAMES,€PLACES,€PRODUCTS€AND€SCIENCE€TERMSóó.€DictionariesÌ€€€€à  à€€€are€not€consistent€in€their€treatment€of€proper€names,€soÐ È Ðà  à€€€it€is€necessary€to€impose€a€common€practice€for€all€suchÌà  à€€€terms.€€Before€every€proper€name,€place,€or€product€isÌà  àà ø àinserted€the€equal€sign€(e.g.€=Mary,€=FBI,€=Chicago;Ìà  àà ø à=Coke).€€Multi„term€names€(e.g.€=Ô_ÔRiodeJaneiroÔ_Ô)€should€beÌà  àà ø àrun€together€since€they€form€a€single€unit„„otherwise€itÌà  àà ø àwould€be€treated€as€three€'words'.€€Care€must€be€taken€withÌà  àà ø àsuch€terms€as€'=honey',€'=love',€'=darling'€and€'=dear'Ìà  àà ø à(when€the€name€refers€only€to€a€specific€individual).Ìà  àà ø à€Ìà  àà ø àScience€makes€heavy€use€of€contractions€and€technical€termsÌà  àà ø àto€avoid€the€repeated€use€of€longer€phrases.€€ChemicalÌà  àà ø àcompounds,€e.g.€Ô_ÔNaClÔ_Ô,€should€be€=Ô_ÔNaClÔ_Ô,€but€Na€by€itself€hasÌà  àà ø àno€=€sign€because€it€is€a€recognized€dictionary€entry.Ìà  àà ø àÌà  àà ø àBiology€and€chemistry€texts€pose€complex€transcriptionÌà  àà ø àproblems€with€their€vast€sets€of€technical€terms€and€names.Ìà  àà ø àòòThe€general€ruleóó€is:€if€it€is€a€proper€name€(often€aÌà  àà ø àLatinate€species€or€family€name),€place€the€equal€signÌà  àà ø àbefore€such€terms.€€Normal€dictionary€entries,€like€hormonesÌà  àà ø àor€proteins,€however,€do€not€have€the€=€sign€before€them.€ÌÌÌà  àòòNUMBERSóó.€€Some€numbers€(e.g.€'one'€through€'ten')€areÌà  àà ø àdictionary€entries€so€the€equal€sign€is€not€used€for€them.€Ïà  àà ø àArbitrarily,€Arabic€and€roman€numbers€(except€when€writtenÌà  àà ø àout€as€one,€two,€..,€ten)€do€have€an€equal€sign€placedÌà  àà ø àbefore€them,€as€in€=forty,€=1492;€=$28;€=43€years€old).€€IfÌà  àà ø àthe€number€is€an€entry€in€a€dictionary,€no€=€sign€is€used.Ì€Ìà  àòòDECIMALS,€LARGE€NUMBERS,€EQUATIONS€AND€COLONSóó.€DictionariesÌà  àà ø àare€inconsistent€in€their€handling€of€numbers,€so€theyÌà  àà ø àrequire€special€handling.€€The€number€9.95€is€transcribedÌà  àà ø àas€=9'95€because€periods€are€reserved€as€the€sentence„Ìà  àà ø àending€symbol€in€Ô_ÔQLEXÔ_Ô's€sentence€sub„routine.€€€The€commaÌà  àà ø àin€=360,000€is€converted€to€=360'000,€otherwise€the€commaÌà  àà ø àwould€divide€that€number€into€two€terms,€360€and€000.ÌÌà  àà ø àEquations€do€not€appear€as€entries€in€most€dictionaries.€Ïà  àà ø àThey€are€transcribed€with€the€expression€€=Ô_ÔequaÔ_Ô.€€The€colonÌà  àà ø àwith€time€(e.g.€11:23)€is€dropped,€for€the€same€reasons.ÌÌà  àòòSOUNDS€AND€SPECIAL€COLLOQUIAL€EXPRESSIONSóó.€€Many€common€termsÌà  àà ø àused€to€communicate€a€message/mood€are€not€included€asÌà  àà ø à'words'€in€unabridged€dictionaries,€and€so€they€are€givenÏà  àà ø àthe€equal€sign.€€Examples€include€=meow,€=Phew!€=Ô_ÔZotÔ_Ô!€=Gosh,Ìà  àà ø à=bang,€=pop,€=Ha,€€=Ouch!,€=Ô_ÔWheeÔ_Ô,€and€=Hurray!ÌÌà  àòòPUNCTUATIONóó.€€Use€normal€punctuation€for€printed€texts,€butÌà  àà ø àpunctuation€of€natural€conversation€is€more€difficult€à @H! àà @H! àsince€Ï€à  à€€€run„on€sentences€are€common€and€one€often€needs€a€goodÐ .Ø'3 Ðà  àà ø àrecording€to€detect€the€shifts€of€intonation€at€the€end€ofÌà  àà ø àa€sentence.€€Punctuate€texts€from€intonation,€pauses,Ìà  àà ø àsubstance€and€common€sense.€€The€ability€to€go€over€theÌà  àà ø àsame€passage€again€and€again€is€essential€to€reduce€theÌà  àà ø àarbitrariness€of€these€decisions.€€Since€reliability€ofÌà  àà ø àpunctuation€from€spontaneous€speech€is€not€as€high€as€fromÌà  àà ø àprint,€statistical€differences€between€two€text's€Ô_ÔMLUÔ_ÔÌà  àà ø à(mean€length€of€utterance„„based€on€words)€may€be€due,€inÌà  àà ø àpart,€to€unreliability€and€arbitrariness€in€punctuation.ÌÌà  àòòCONTRACTIONSóó.€€òòAs€a€general€ruleóó,€type€words€as€they€appearÌà  àà ø àin€print€or€as€spoken€(unless€otherwise€specified€above€orÌà  àà ø àbelow).€The€exceptions€include€such€terms€as€òò'causeóó,€whichÌà  àà ø àdictionaries€treat€as€because,€and€terms€which€areÌà  àà ø àcontracted€in€casual€speech,€e.g.€'whatever's',€meaningÌà  àà ø àwhatever€is,€should€be€decomposed€into€'whatever€is'.Ìà  àà ø àThere€are€many€such€contractions,€and€all€should€be€treatedÌà  àà ø àthis€way.Ì€€€Ìà  àòòREGIONAL€EXPRESSIONSóó.€€Regional€accents€may€alter€a€word'sÌà  àà ø àform€or€phonology€to€such€an€extent€that€the€referent€mayÌà  àà ø ànot€be€clear€to€persons€from€outside€the€region.€€In€suchÌà  àà ø àcases,€use€the€common€referential€term€to€replace€theÌà  àà ø àexaggerated€or€local€form.€€For€example,€while€working„classÏà  à€€€Scots€use€English,€but€substitute€Scottish€terms€whichÌà  à€à ø àrequire€transcription€to€their€near„equivalent€in€EnglishРȈ Ðà  àà ø à(e.g.€the€word€'fit'€in€Scottish€is€normally€translated€asÌà  àà ø à'make'€in€English).€Extreme€working„class€accents€in€LondonÏà  àà ø àand€the€southern€United€States€require€similar€treatment.ÌÌà  àòòHYPHENATIONóó.€€"Cost„of„living"€and€"first„aid"€are€complexÌà  àà ø àexpressions€which€appear€in€dictionaries€in€that€form.Ìà  àà ø àThey€should€be€transcribed€with€their€hyphens€in€place.Ìà  àà ø àHyphenation€is€sometimes€missing€in€similar€expressions,Ìà  àà ø àsuch€as€'thank€you'€and€'ice€cream'.€€Transcribe€theseÌà  àà ø àterms/expressions€with€the€missing€hyphen:€as€thank„you€andÌà  àà ø àice„cream.€€This€prevents€Ô_ÔQLEXÔ_Ô€from€treating€suchÌà  àà ø àexpressions€as€two€(or€more)€separate€words.€€SinceÌà  àà ø àlanguage€is€dynamic,€the€time€may€come€when€this€and€otherÏà  àà ø àrules€must€be€changed.ÌÌà  àòòSPACINGóó.€€There€must€be€a€space€between€every€word,€unlessÌà  àà ø àthere€is€(or€should€be)€a€hyphen,€an€apostrophe,€etc.€€AÌà  àà ø àspace€separates€an€end„of„sentence€symbol€(period,Ìà  àà ø àquestion€mark,€exclamation€mark€and€the€specialÌà  àà ø àinterruption€symbol€[@]€from€the€first€word€of€the€nextÌà  àà ø àsentence.€For€example:€"Stop€doing€Ô_Ôthat!YouÔ_Ô€know€that!"€isÌà  àà ø àwrong€because€'You'€was€not€separated€from€the€previousÌà  àà ø àexclamation€mark.€€In€Ô_ÔQLEXÔ_Ô,€the€colon€and€semi„colon€areÌà  àà ø àòònotóó€considered€sentence€ending€marks.ÌÌÐ .Ø'3 Їà  àòòSPLITTING€WORDS€AT€THE€END€OF€LINESóó.€€Ô_ÔQLEXÔ_Ô€is€a€capable€set€ofÌà  àà ø àprograms€but€it€does€not€handle€instances€of€wrap„Ô_ÔaroundsÔ_ÔÌà  àà ø àand€split€words€at€the€end€of€lines.€€òòBe€sureóó€that€words€à @°" àà @°" àandÌà  àà ø àexpressions€are€not€broken€at€the€end€of€a€line.òòÌóóÌà  àòòQUOTESóó.€€Use€the€symbol€",€not€the€symbol€'€for€quotes,Ìà  àà ø àbecause€'€is€reserved€in€Ô_ÔQLEXÔ_Ô€for€contractions€and€numericalÌà  àà ø àexpressions.€€Quotation€marks€are€ignored€by€Ô_ÔQLEXÔ_Ô€programs.ÌÌà  àòòINTERRUPTIONSóó.€€Aural€interruption€of€one€person's€speech€byÌà  àà ø àanother€has€been€handled€in€many€ways€in€psycholinguistics.Ìà  àà ø àUnder€Ô_ÔQLEXÔ_Ô,€when€one€speaker€is€interrupted€by€another,Ìà  àà ø àinstead€of€ending€that€person's€passage€with€the€usualÌà  àà ø àsentence€ending,€use€the€special€symbol€(@),€meaning€thatÌà  àà ø àthis€is€the€final€word€of€a€passage€spoken€by€a€person€whoÌà  àà ø àhad€been€interrupted„„in€place€of€the€period,€question€markÌà  àà ø àor€!.€€It€is€not€uncommon€for€the€interrupter,€in€turn,€toÌà  àà ø àbe€interrupted.€€If€that€happens,€the€interrupter's€turnÌà  àà ø àshould€also€be€ended€by€the€@€symbol.€€Ô_ÔQLEXÔ_Ô€can€determineÌà  àà ø àwho€was€interrupted€and€how€often.€€A€new€ID€code€must€beÌà  àà ø àused€before€the€interrupter's€first€word€to€show€the€personÌà  àà ø àholding€the€'floor'€has€shifted€to€a€new€speaker.€€ThisÌà  àà ø àconvention€does€not€show€the€precise€point€in€the€text€whereÌà  àà ø àthe€interruption€took€place,€nor€how€many€words€were€spokenÌà  àà ø àby€the€two€speakers€simultaneously.ÌÌà  àòòPERIODSóó.€€Under€Ô_ÔQLEXÔ_Ô,€periods€are€reserved€exclusively€toÌà  àà ø àmark€the€end€of€sentences.€€In€the€case€of€abbreviationsÌà  àà ø àlike€Mrs.,€Dr.,€or€in€i.e.,€and€e.g.,€the€periods€must€beÌà  àà ø àòòomittedóó,€otherwise€the€sentence€length€measure€is€invalid.ÌÌà  àà ø àAlso,€in€some€printed€texts,€unfinished€sentences€areÌà  àà ø àsometimes€expressed€by€a€string€of€periods„„a€convention€toÌà  àà ø àsuggest€that€the€voice€tailed€off.€€Each€period€would€beÌà  àà ø àincorrectly€interpreted€as€a€sentence€of€zero€word€length,Ìà  àà ø àunless€it€is€omitted.Ì̆à  àòòMISSING€OR€UNDECIPHERABLE€WORDS€AND€THE€òò=Ô_ÔZZZZÔ_Ôóó€SYMBOLóó.€€InÐ (#è% Ðà  àà ø àspontaneous€conversations€recorded€in€their€naturalÌà  àà ø àcontexts,€background€noise€or€poor€recording€quality€canÌà  àà ø àmake€it€difficult€or€impossible€to€detect€a€missing€word€orÌà  àà ø àphrase.€€A€missing€word€is€transcribed€by€the€uniqueÌà  àà ø àexpression:€=Ô_ÔzzzzÔ_Ô.€€When€a€whole€passage€is€missing,€use€theÏà  àà ø àComment€symbol€(&&000)€at€the€beginning€of€a€line€toÌà  à€€€indicate€the€nature€of€the€problem€and€its€approximateÌà  àà ø àlength.€€One€indicator€of€a€text's€quality€is€the€frequencyÌà  à€€€of€the€=Ô_ÔzzzzÔ_Ô€symbol's€use.ÌÌ€à  àòòBRITISH€vs€AMERICAN€ENGLISHóó.€€Numerous€words€are€spelledÐ À+€%0 Ðà  àà ø àdifferently€in€the€UK€and€USA,€e.g.€grey€and€gray;€Ô_ÔpractiseÔ_ÔÌà  àà ø àand€practice.€€Use€the€American€spelling€convention,€sinceÌà  àà ø àthe€Reference€Lexicon€used€for€these€analyses€(Carroll,Ð .Ø'3 Ðà  àà ø àÔ_ÔRichmanÔ_Ô€and€Davies,€1971)€comes€from€American€English€texts.ÌÌ€à  àòòFILLED„PAUSE€SPELLINGSóó.€€Use€these€conventions€for€filledÐ Ð Ðà  àà ø àpauses:€€=uh„huh€„„€I€acknowledge,€agree„„usually€spokenÌà  àà ø àwith€rising€inflection;€=un„huh€„„€I€disagree,€no„„withÌà  àà ø àfalling€inflection;€€=um€„„€I€follow€you,€I'm€listening;Ìà  àà ø à=uh€„„€hold€it,€I'm€groping€for€the€right€word€or€what€toÌà  àà ø àsay€next;€=oh„oh€„„€a€problem.ÌÌà  àòòFALSE€STARTS€and€FRAGMENTSóó.€€All€false€starts,€incompleteÌà  àà ø àphrases€and€repetitions€should€be€typed,€as€produced.€€WordÌà  àà ø àfragments€should€be€preceded€by€the€equal€sign€since€theyÌà  àà ø àare€not€recognized€as€words€in€dictionaries.ÌÌà  àòòPRINT€CONVENTIONS€REPRESENTING€CONVERSATIONóó.€€There€is€aÌà  àà ø àpublishing€convention€about€conversation€which€requires€aÌà  àà ø àchange€to€avoid€invalid€sentence€measures.€€Publishers€doÌà  àà ø àthis:€"Where€is€it,€Mom?"€said€Jane.€€The€problem€is€thatÌà  àà ø àthere€are€two€sentence€endings€in€that€one€sentence.Ìà  àà ø àTranscribed,€it€should€be:€"Where€is€it,€=Mom,"€said€=Jane?Ìà  àà ø àA€comma€is€substituted€and€the€question€mark€shifted€to€theÌà  àà ø àend€of€the€sentence.Ì€Ìà  àòòFOREIGN€LANGUAGE€TERMS€AND€EXPRESSIONSóó.€€Use€comparableÌà  àà ø àAmericanisms€where€possible,€unless€there€is€no€suitableÌà  àà ø àexpression„„in€which€case,€type€the€foreign€expression€withÌà  àà ø àthe€words€run€together€with€one€equal€sign€in€front.ÌÌà  àòòMONEYóó.€€Convert€money€(e.g.€pounds,€marks,€yen,€and€francs)Ìà  àà ø àinto€dollars.€€The€numeric€values€will€be€wrong€but€theÌà  àà ø àconcept€is€correct.ÌÌà  àòòTECHNICAL€TERMSóó.€€Type„„as€shown€or€spoken.€ÌÌà  àòòRARE€WORD€REPETITIONSóó.€€On€occasion,€a€rare€term€is€repeatedÌà  àà ø àmany€times.€€For€example,€in€the€1,000+€word€text€sampleÌà  àà ø àtaken€from€several€òòWoody€Woodpeckeróó€cartoon€shows,€the€wordÌà  à€€€'woodpecker'€appeared€18€times.€That€one€rare€word€had€theÌà  àà ø àeffect€of€making€the€text's€statistical€description€appearÏà  àà ø àmore€difficult€than€it€is.€To€prevent€rare€word€repetitionsÌà  àà ø àfrom€giving€false€estimates€for€a€text's€lexical€difficulty,Ìà  àà ø àan€arbitrary€rule€was€adopted„„no€single€rare€word€(i.e.€aÌà  àà ø àterm€not€listed€among€the€10,000€most€common€word„types€inÌà  àà ø àEnglish)€may€appear€more€than€five€times€per€thousandÌà  àà ø àtokens.€€Every€additional€instance€of€that€term€gets€anÌà  àà ø àequal€sign€before€it.€€Use€a€Comment€header€(&&000)€toÌ€à  àà ø àdescribe€that€this€rare€rule€was€invoked,€how€often€the€=Ð 0*ð#. Ðà  àà ø àsign€was€used,€and€why€it€was€necessary.ÌÌà  àòòDIRTY€WORDSóó.€€Modern€unabridged€dictionaries€contain€someÌà  àà ø à'dirty'€words,€but€most€are€omitted.€€All€such€terms€shouldÌà  àà ø àbe€included€but€most€will€have€the€=€sign.Ð .Ø'3 Їà  àòòWORDS€REQUIRING€SPECIAL€TREATMENTóó.€€To€reduce€the€most€seriousÌà  àà ø àdistortions€of€word€use,€my€colleague€Margaret€Ô_ÔAhrensÔ_Ô€hasÌà  àà ø àcompiled€a€list€which€occur€principally€in€conversation.Ìà  àà ø àFor€each,€the€transcription€convention€is:ÌÌÌà  àà ø à1.€€òòWords€without€equal€signs:ÌóóÌà  àà ø àà ` àholidays€(e.g.€òòChristmas)óó;€€òòbyeóó;€òòbye„byeóó€(a€hyphenatedÌà  àà ø àà ` àword)€òògood„byeóó„„a€hyphenated€word).€€Carroll,€et€al.Ìà  àà ø àà ` àreport€that€the€'good„bye'€form€is€3€times€more€commonÌà  àà ø àà ` àthan€'goodbye'.€€€òògonnaóó€becomes€going€to;€and€òòwannaÌà  àà ø àà ` àóóbecomes€want€to.ÌÌà  àà ø à2.€€òòWords€requiring€the€=€signóó:ÌÌà  à€€€€€€òò=Momóó,€òò=Dadóó,€òò=honeyóó,€òò=sweetieóó,€òò=darlingóó€(when€referringÌà  àà ø àà ` àonly€to€a€specific€person),€letters€of€the€alphabet,Ìà  àà ø àà ` àstanding€alone.€€ò òImportant€exceptions:ó ó€òòò òIó óóó€and€òòò òaó óóó.€Ð P ÐÌà  àà ø à3.€òòSpecial€contractionsóó:€€All€of€these€terms€are€used€inÌà  àà ø àà ` àinformal€speech,€but€rarely€in€formal€print.€€TheÌà  àà ø àà ` àpreferred€solution€is€to€òòdecompose€themóó.€Ìòòà  àà ø àÌóó€€€€€à ø àà ` àòò=Ô_Ôhow'dÔ_Ôóó€€òò=Ô_Ôthat'dÔ_Ôóó€€òò=Ô_Ôthere'dÔ_Ôóó€€òò=Ô_Ôwhat'dÔ_Ôóó€€òò=when'sÐ I  Ðà  àà ø àà ` à=how'sóó€€òò=that'llóó€€òò=Ô_Ôthere'reÔ_Ôóó€€òò=Ô_Ôwhat'llÔ_Ôóó€€òò=where'dÌà  àà ø àà ` à=Ô_Ôhow'reÔ_Ôóó€€òò=there'llóó€€òò=what'reóó€€òò=who'dóó€€òò=Ô_Ôthere'veÔ_ÔÌà  àà ø àà ` à=what'veóó€and€€òò=why'sÌÌóóà  àà ø à4.€òòTerms€generally€omitted€by€dictionaries€but€whoseÌóóà  àà ø à€€€òòinformal€use€is€commonóó.€€Such€terms€get€the€=€sign.ÌÌ€€€€€€€€€à ` à€For€example:€€òò=ehóó€€òò=ouchóó€€òò=wowóó€€òò=Ô_ÔyuckÔ_ÔÐ ‰I  Ðóó€€€€€€€€€€€€€à ¸ à€€€€€€€€òò=Ô_ÔyupÔ_Ôóó€€òò=Ô_ÔickÔ_Ôóó€€òò=owóó€€òò=Ô_ÔyepÔ_Ôóó€€òò=yuckyÐ Q ! ÐÌà  àòŽòC.€Print's€distorting€effects€on€the€Reference€LexiconóŽóóó.€Ìà  àà ø àCarroll,€Ô_ÔRichmanÔ_Ô€and€Davies'€òòWord€Frequency€Bookóó€is€basedÌà  àà ø àon€word€use€in€printed€texts„„which€are€normally€written€inÌà  àà ø àformal€style.€€Printed€texts€distort€the€relative€frequencyÌà  àà ø àof€certain€words,€and€particularly€under„represent€wordsÌà  à€à ø àused€in€casual€conversation€(Hayes,€1988).€€EspeciallyÐ É%‰( Ðà  àà ø àunder„represented€are€household€words,€consequently,€termsÌ€à  àà ø àwhich€are€commonplace€in€a€pre„Ô_ÔschoolerÔ_Ô's€experience€(e.g.Ð Y'!* Ðà  àà ø à'pajamas',€'diaper',€'potty'€and€'bottle')€or€informal€wordsÌà  àà ø àlike€'gonna'€and€€'where'd',€appear€as€relatively€rare€wordsÏà  àà ø àin€the€Carroll,€et€al.€corpus„„the€Reference€Lexicon€commonÌà  àà ø àto€all€Ô_ÔQLEX€Ô_Ôanalyses.ÌÌà  àà ø àThe€òòmost€under„represented€wordsóó€in€print€include€'ò òIó ó'€andÐ  ,É%0 Ðà  àà ø à'ò ògood„byeó ó'.€€While€pervasive€in€actual€use,€the€Carroll,€etÐ Ñ,‘&1 Ðà  àà ø àal.€list€reports€frequencies€much€lower€than€they€appear€inÐ ™-Y'2 Ðà  àà ø ànatural€conversation,€e.g.€'good„bye'€in€print€occurs€onlyÌà  àà ø à4€times€per€million€tokens,€far€rarer€than€in€the€CornellÌà  àà ø àCorpus€of€natural€conversations.€€ÌÌà  àà ø àThe€òòmost€over„represented€wordóó€in€print€is€'ò òsaidó ó'.€ThisÐ `  Ðà  àà ø àterm€appears€in€print€to€help€the€reader€keep€track€ofÌà  àà ø àthe€speaker.€€So€common€is€the€use€of€'said'€that€it€is€theÌ€à  àà ø à43rd€most€common€word€on€Carroll,€et€al's€list„„placing€itÐ ¸ x Ðà  àà ø àamidst€all€the€highest€frequency€function€words€of€English.Ìà  àà ø à'Said'€rarely€appears€in€the€natural€conversations€in€theÌà  àà ø àCornell€Corpus.òòÌóóÌà  àòòD.€ò òÔ_ÔLEXEDITÔ_Ôó óÔ_Ô„„Ô_Ôsemi„automatic€text€editingóó.€€The€Ô_ÔLEXEDITÔ_Ô€utilityÐ  `  Ðà  àà ø àprogram€was€designed€to€do€much€of€the€text€editing,Ìà  àà ø àfaithfully,€comprehensively€and€automatically„„but€it€òòdoesóóÌà  àà ø àòònotóó€make€all€the€editing€corrections.€€Furthermore,€inÌà  àà ø àfixing€some€editing€problems,€ò òÔ_ÔLEXEDITÔ_Ô€sometimes€introducesÐ À€  Ðà  àà ø àerrors€of€its€ownó ó.€€Ô_ÔLEXEDITÔ_Ô€saves€time€in€preparing€a€textÐ ˆH  Ðà  àà ø àfor€Ô_ÔQLEX€Ô_Ôanalysis,€òòbut€all€texts€must€be€examined,€word€forÌà  àà ø àword,€to€ensure€compliance€with€the€transcription€rules„„aÌà  àà ø àlugubrious€but€necessary€processóó.€€This€is€the€price€forÌà  àà ø àhaving€a€scientific€tool,€comparability€and€interpretation.ÌÌà  àà ø àÔ_ÔLEXEDITÔ_Ô€is€a€file€in€C:\QLEX€directory.€To€use€it:Ì€à  à€€€€€€à ¸ àtype:€ò òÔ_ÔLEXEDITÔ_Ô€Ô_Ôx.ASCÔ_Ô€Ô_ÔREPLACE.LSTÔ_Ô€€ó ówhere:Ð À ÐÌ€€€à  àà ø àà ` à(a)€Ô_ÔLEXEDITÔ_Ô€is€the€utility's€nameÐ P Ðà  àà ø àà ` à(b)€Ô_Ôx.ASCÔ_Ô€is€the€name€of€the€file€(prepared€accordingÌà  àà ø àà ` à€€€€to€the€Minimal€or€Normal€editing€rules€(above);Ìà  àà ø àà ` à(c)€Ô_ÔREPLACE.LSTÔ_Ô€is€the€list€of€'fixes'€which€Ô_ÔLEXEDITÔ_ÔÌ€€€€€€€€€€€€€€€€€€€searches€for€and€makes€to€the€text€file.€€ÌÌà  àà ø àà ` àÔ_ÔREPLACE.LSTÔ_Ô€can€be€examined€by€using€the€utility€SEE.ÌÌà  àà ø àWhen€Ô_ÔLEXEDITÔ_Ô€completes€its€job€(which€takes€a€second)€à @H! àà @H! àreviewÌà  àà ø àthe€text€to€correct€any€mistakes€which€Ô_ÔLEXEDIT€Ô_Ômight€haveÌà  àà ø àomitted€or€introduced.€€After€Ô_ÔQLEXÔ_Ô€is€run€on€this€editedÌà  àà ø àfile,€one€may€find€these€errors€in€the€alphabetical€orÌà  àà ø àfrequency€listing€of€all€a€text's€words,€or€in€the€list€ofÌ€à  àà ø àResiduals€(words€Ô_ÔQLEXÔ_Ô€did€not€find€in€its€REFERENCE€LEXICONÐ ¸$x' Ðà  àà ø àof€the€10,000€most€common€English€types).€€Make€the€repairsÏà  àà ø àand€then€rerun€Ô_ÔQLEXÔ_Ô.ÌÌà  àà ø àòòMODIFYING€THE€Ô_ÔREPLACE.LSTÔ_Ô€UTILITY.óó€€This€utility€is€neededÌà  àà ø àfor€Ô_ÔLEXEDITÔ_Ô.€€It€may€be€modified„„by€adding€or€removingÌà  àà ø àwords€to€suit€your€own€needs.€Use€your€word„processor€to€getÌà  àà ø àinto€Ô_ÔREPLACE.LSTÔ_Ô,€make€your€changes€and€then€convert€thatÏà  àà ø àfile€back€into€an€ASCII€file.€€Keep€a€backup€of€Ô_ÔREPLACE.LSTÔ_ÔÌà  àà ø àsince€your€changes€may€not€work€as€planned.ÌÌà  àà ø àNo€set€of€transcription€rules€could€serve€all€purposes.€€ForÌà  àà ø àthis€reason,€one€design€goal€was€to€keep€Ô_ÔQLEXÔ_Ô€texts€as€closeÐ .Ø'3 Ðà  àà ø àto€their€original€form€as€possible.€€It€is€relatively€à @H! àà @H! àsimpleÌà  àà ø àto€delete€most€Ô_ÔLEXEDITÔ_Ô€amendments€to€the€texts,€e.g.€aÌà  àà ø à'macro'€can€be€written€to€eliminate€all€the€=€signs€beforeÌà  àà ø àproper€names,€etc;€or€to€insert€periods€for€contractionsÌà  àà ø à(like€Mrs€and€Dr);€or€to€remove€other€of€these€Ô_ÔQLEXÔ_Ô€editingÌà  àà ø àamendments.ÌÌÒX°Ó¸ÒÒ°XÒòòò òVII.€PERFORMING€A€Ô_ÔQLEXÔ_Ô€ANALYSISó óóó.Ð ¸ x Ðò òÌà  àó óòòINSTALLING€Ô_ÔQLEXÔ_Ôóó.€€The€Ô_ÔQLEXÔ_Ô€programs,€the€10,000„word€ReferenceÐ H   Ðà  àà ø àLexicon,€this€Ô_ÔLEXGUIDE.2kÔ_Ô€and€several€utilities€are€storedÏà  àà ø àon€the€LEX€CD.€€ÌÌà  àà ø àTo€install€Ô_ÔQLEXÔ_Ô,€first€get€into€the€DOS€operating€systemÌà  àà ø à(available€in€Windows€95/98€after€clicking€on€START).€€à @À! àà @À! àNext,Ìà  àà ø àmake€a€new€Directory€on€your€C:\€drive„„naming€it€C:\QLEX.€Ïà  àà ø àThen€copy€all€the€files€from€the€CDÔ_ÔÔ_Ô€to€this€new€directory:ÌÌà  àà ø àà ` àType:€ò òÔ_ÔmdÔ_Ô€c:\qlexÐ P Ѐ€€€€€€€€€à ¸ àó óthen:Ð Ø Ðà  à€€€€€€€Type:€ò òcd€c:\qlexÐ à  Ðó óà  àà ø àà ` àà ¸ àthen€go€to€your€CD€DRIVEÐ ¨h Ѐà  àà ø àà ` àType:€ò òcopy€*.*€c:\QLEXó ó.Ð p0 Ðà  àà ø àà ` àÌà  àà ø àà ` àÔ_ÔLEXGUIDE.2KÔ_Ô€can€be€printed€from€C:\QLEX.ÌÌ€€€€€€€€€€€Ô_ÔQLEXÔ_Ô€will€complete€its€analysis€of€a€~1,000„word€text€inÌ€à  àà ø àà ` àless€than€one€second.Ð X ÐÌà  àòòSTEP„BY„STEP€PROCEDURESóó.€Ìà  àÌà  àà ø à1.€While€under€DOS,€and€in€the€C:\QLEX€directory:ÌÌà  àà ø àà ` àType:€ò òÔ_ÔQLEXÔ_Ôó óÐ  È! Ðà  àà ø à€€€Ì€€€€€à ø àà ` àà ¸ àò ò[To€exit€Ô_ÔQLEXÔ_Ô,€strike€the€ESC€key].ó óÐ ˜!X# ÐÌà  àà ø à2.€€In€large€letters,€the€screen€will€show:€"QUICK€LEX"Ìà  àà ø àÌà  àà ø àà ` àà ¸ àStrike€anyò ò€ó ókey€to€proceed.Ð ¸$x' ÐÌà  àà ø à3.à ` à"What€drive€can€be€used€for€temporary€files:€C„„F?"Ð H& ) Ðà  àà ø àà ` à€You€will€be€working€in€the€c:\qlex€directoryÌà  àà ø àà ` àÌà  àà ø àà ` à[You€can€use€other€directories€for€texts€and€outputÌ€à  àà ø àà ` à€files€if€you€designate€them€when€asked].Ð h)(#- ÐÌà  àà ø àà ` à€€€€€Strike:€ò òENTER€ó ó(to€move€to€the€next€step)ò òÐ ø*¸$/ Ðó óÌà  àà ø à4.€"Which€files€do€you€want€to€process?€(Up€to€80Ìà  àà ø àà ` à€characters,€wild€cards€are€OK)"ÌÐ .Ø'3 Ðà  àà ø àà ` àà ¸ àType:€ò òIRS1040.Ô_ÔASCÔ_Ôó ó€€(this€sample€IRS€text€isÐ @ Ðà  àà ø àà ` àà ¸ àà  àcontained€in€your€Ô_ÔQLEXÔ_Ô€directory)à x àÐ È ÐÌà  àà ø àà ` àà ¸ àà  ài.e.€file€should€be€on€òòCóó:\QLEX:Ìà  àà ø àà ` àà ¸ àà  à€€€€€its€file€name€òòIRS1040óó;€andÌà  àà ø àà ` àà ¸ àà  à€€€€€its€file€extension€.òòÔ_ÔASCÔ_Ôóó€(short€for€ASCII)ÌÌà  àà ø à5.€"You've€specified€1€(or€more)€file(s).€€Do€you€want€toÌà  àà ø àà ` àspecify€more?"ÌÌà  àà ø àà ` àà ¸ àType:€ò òN€ó ó(for€No,€if€òòIRS1040.Ô_ÔASCÔ_Ôóó€is€the€only€fileÐ Ð  Ðà  àà ø àà ` àà ¸ àà  àyou€want€Ô_ÔQLEXÔ_Ô€to€analyze.€€Later€you€may€Ìà  àà ø àà ` àà ¸ àà  àchoose€to€analyze€many€texts€at€once).ÌÌà  àà ø àà ` àà ¸ àType:€ò òYó ó€(for€Yes,€if€you€want€to€add€additionalÐ 0ð  Ðà  àà ø àà ` àà ¸ àà  àfiles€already€in€the€C:\QLEX€directoryÌ€€€€€Ìà  àà ø àà ` àà ¸ ài.e.€Ô_ÔQLEXÔ_Ô€will€run€a€series€of€files€if€they€areÌà  àà ø àà ` àà ¸ à€€€€€specified€at€this€point.€The€program€willÌà  àà ø àà ` àà ¸ àà  àkeep€asking€you€to€name€more€files€until€youÌà  àà ø àà ` àà ¸ àà  àtype€ò òNó ó.Ð à  ÐÌà  à€€6.€The€screen€will€show€a€large€MENU:€these€are€analysis€&Ïà  àà ø àà ` àà ¸ àprint€options.ÌÌà  àà ø àà ` àa.€€The€top€of€your€screen€shows€the€first€six€lines€fromÏà  àà ø àà ` à€€€€the€sample€file€òòIRS1040.Ô_ÔASCÔ_Ôóó.€€Ô_ÔQLEXÔ_Ô€lets€you€examineÌà  àà ø àà ` à€€€€the€text,€header€information,€and€the€comments€thereÌà  àà ø àà ` à€€€€so€you€can€confirm€that€is€the€file€you€intend€toÌà  àà ø àà ` à€€€€analyze€and€that€the€file€is€standard€ASCII€format.ÌÌà  àà ø àà ` àb.€€The€cursor€is€in€the€box:€"Are€these€choices€OK?"Ìà  àà ø àà ` àà ¸ àMove€the€CURSOR€arrow€key€ò òDOWNó ó€once.Ð @  Ðà  àà ø àà ` àà ¸ àTo€move€around€the€MENU,€use€of€the€UP€and€DOWNÌà  àà ø àà ` àà ¸ àcursor€keys€as€you€make€your€choices).ÌÌà  àà ø àà ` àc.€€The€next€box€shows€three€possible€formats€forÌà  àà ø àà ` à€€€€Your€file:€òòFreeóó,€òòDECóó,€òòIBMóó.ÌÌà  àà ø àà ` àà ¸ àThis€choice€is€made€for€you€by€Ô_ÔQLEXÔ_Ô€from€itsÌà  àà ø àà ` àà ¸ àexamination€of€your€file.€€€There€is€nothingÌà  àà ø àà ` àà ¸ àfor€you€to€do,€here.€€Note€that€the€òòIRS1040.Ô_ÔASCÔ_ÔÌà  àà ø àà ` àà ¸ àóófile€is€identified€as€an€IBM€file.€€To€move€to€theÌà  àà ø àà ` àà ¸ ànext€decision€box:Ì€Ìà  àà ø àà ` àà ¸ àà  àStrike€the€CURSOR€key€ò òDOWN€ó óonce.Ð h)(#- ЀÌà  àà ø àà ` àà ¸ àòòTo€change€an€earlier€decisionóó,€move€the€CURSORÌà  àà ø àà ` àà ¸ àUP€or€DOWN€to€go€to€that€option,€then€make€theÌà  àà ø àà ` àà ¸ àchange.€ÌÌÐ .Ø'3 Ðà  àà ø àà ` àd.€€The€next€box€asks€which€ò òID€code€isó ó€to€be€analyzed.Ð @ Ðà  àà ø àà ` àà ¸ à111€is€the€default€(the€program€provides€ampersands)ÌÌà  àà ø àà ` àà ¸ àòòSpecial€case:€conversational€textsóó.€€Ô_ÔQLEXÔ_Ô€canÌ€à  àà ø àà ` àà ¸ àanalyze€a€text€one€†speaker€at€a€time.€€Ô_ÔQLEXÔ_Ô€needsÐ `  Ðà  àà ø àà ` àà ¸ àto€know€who€is€speaking,€to€whom€the€speaker€wasÌ€à  àà ø àà ` àà ¸ àspeaking,€and€in€what€context„„before€proceeding.Ð ð ° ÐÌà  àà ø àà ` àà ¸ àSuppose€you€are€analyzing€a€conversation€betweenÌà  àà ø àà ` àà ¸ àPresident€Nixon€and€his€two€closest€staffÌà  àà ø àà ` àà ¸ àassociates:€Ô_ÔHaldemanÔ_Ô€and€Ô_ÔErhlichman,€on€Watergate.ÌÌà  àà ø àà ` àà ¸ àWith€two€or€more€parties€in€a€conversation,€theÌà  àà ø àà ` àà ¸ àòòfirstóó€of€the€three€digits€in€an€ID€refers€toÌà  àà ø àà ` àà ¸ àthe€speaker's€identity.€€President€Nixon€could€beÏà  àà ø àà ` àà ¸ àgiven€speaker€ID€#1;€Ô_ÔHaldemanÔ_Ô,€#2;€Ô_ÔErhlichmanÔ_Ô,€#3.ÌÌà  àà ø àà ` àà ¸ àThe€òòsecondóó€of€the€three€digit€ID€number€representsÏà  àà ø àà ` àà ¸ àthe€person€to€whom€the€speaker€was€talking.Ìà  àà ø àà ` àà ¸ àÌ€€à  àà ø àà ` à€€€€€€€The€òòthirdóó€of€the€three€digit€ID€numbers€may€beÐ à  Ðà  àà ø àà ` àà ¸ àused€for€any€purpose.€Often,€this€number€refers€toÌà  àà ø àà ` àà ¸ àthe€context€in€which€this€conversation€took€placeÌà  àà ø àà ` àà ¸ à(in€the€Oval€Office„„here€assigned€ID€#1).ÌÌà  àà ø àà ` àà ¸ àà  àThus„„&&121„„means€the€speaker€was€Nixon;Ìà  àà ø àà ` àà ¸ àà  àhe€was€talking€to€Ô_ÔHaldemanÔ_Ô;€in€Oval€Office.ÌÌ€à  àà ø àà ` àà ¸ àInvestigators€make€these€ID€code€assignments.Ð  à ÐÌà  àà ø àà ` àà ¸ àò òWildcards€and€ID'só ó.€€When€Mr.€Nixon€spoke€to€bothÐ °p Ðà  àà ø àà ` àà ¸ àÔ_ÔHaldemanÔ_Ô€and€Ô_ÔErhlichmanÔ_Ô€at€the€same€time,€use€1?1„„Ìà  àà ø àà ` àà ¸ àmeaning€the€speaker€was€Nixon,€but€all€the€personsÌà  àà ø àà ` àà ¸ àto€whom€he€was€speaking€would€be€included€in€theÌà  àà ø àà ` àà ¸ àanalysis.€The€?€symbol€is€a€DOS€code€for€all€numbersÌà  àà ø àà ` àà ¸ àbetween€0„9.€ò òDo€not€assign€ID€code€0ó ó„„these€areÐ ˜!X# Ѐà  àà ø àà ` àà ¸ àreserved€strictly€for€Comments„„whose€texts€areÐ `" $ Ðà  àà ø àà ` àà ¸ àignored€in€the€QLEX€analysis.ÌÌà  àà ø àà ` àe.€€The€next€prompt€is:€"Where€to€START/STOP?"ÌÌà  à€€€€€€à ¸ àÔ_ÔQLEXÔ_Ô€allows€you€to€begin€your€analysis€wherever€youÐ H& ) Ðà  àà ø àà ` àà ¸ àwant€and€end€wherever€you€want.ÌÌ€€€€€€€€€€€€€€START€@:€ò ò1ó ó€(the€first€word€in€the€text€is€whereÐ  (`", Ðà  àà ø àà ` àà ¸ àone€starts€ordinarily€but€it€could€be€anywhereÌà  àà ø àà ` àà ¸ àyou€designate.€€The€number€refers€to€the€locationÌà  àà ø àà ` àà ¸ àin€the€string€of€word€types€(ignoring€wordsÌ€à  àà ø àà ` àà ¸ àwith€equal€signs€before€them€and€other€speakers).Ð À+€%0 ÐÌÌÐ .Ø'3 Ðà  àà ø àà ` àà ¸ àENDING€@:€ò ò1000€ó ó(or€if€you€do€not€know€how€manyÐ @ Ðà  àà ø àà ` àà ¸ àwords€are€in€the€text,€put€a€number€well€in€excessÌà  àà ø àà ` àà ¸ àof€what€you€suspect€the€file€contains).€The€outputÌà  àà ø àà ` àà ¸ àwill€tell€how€many€words€are€in€your€chosen€text.ÌÌà  à€€€€€€€€€€€€€Most€files€in€the€Cornell€Corpus€are€based€onÌà  àà ø àà ` àà ¸ à1,000+€words€but€analyses€can€be€carried€out€onÌà  àà ø àà ` àà ¸ àas€few€as€250€and€as€many€as€64,000€words.€€TheÌà  àà ø àà ` àà ¸ àlower€limit€is€arbitrary,€but€function€wordsÌà  àà ø àà ` àà ¸ àaccount€for€about€half€of€all€a€text's€words,€Ìà  àà ø àà ` àà ¸ àso€in€a€500€word€text,€only€about€250€words€Ìà  àà ø àà ` àà ¸ àare€available€on€which€to€construct€the€necessaryÌà  àà ø àà ` àà ¸ àcumulative€frequency€distribution€from€which€LEX€isÏà  àà ø àà ` àà ¸ àdetermined.€ÌÌà  àà ø àà ` àà ¸ àSO„„small€n€texts€increase€LEX's€standard€error.€ÌÌ€€€€€€€€€€€€€€€€€At€the€top€of€Ô_ÔQLEXÔ_Ô's€principal€output€fileÌà  àà ø àà ` àà ¸ àis€a€report€on€how€many€words€are€in€the€file.Ìà  àà ø àà ` àà ¸ àFrom€that€information€you€may€divide€the€textÌà  àà ø àà ` àà ¸ àinto€as€many€sub„files€as€you€want.ÌÌà  àà ø àà ` àà ¸ àThat€same€output€file€tells€you€where€youÌà  àà ø àà ` àà ¸ àended€in€the€text„„if€you€specified€a€shorterÌà  àà ø àà ` àà ¸ àfile€than€it€proves€to€be.ÌÌà  àà ø àà ` àà ¸ àFor€example:€in€the€òòIRS1040.Ô_ÔASCÔ_Ôóó€sample€text,€afterÌà  àà ø àà ` àà ¸ àthe€1000th€word,€the€next€words€were:€€'owe€orÌà  àà ø àà ` àà ¸ àget€a€refund€even'.€ÌÌà  àà ø àà ` àf.€€The€next€box€in€the€Menu,€ò òOUTPUT€OPTIONSó ó,€allowsÐ °p Ðà  àà ø àà ` àà ¸ àyou€to€decide€what€is€to€be€included€in€yourÌà  àà ø àà ` àà ¸ àoutput.€There€is€a€designated€default€(Yes€orÌà  àà ø àà ` àà ¸ àNo)€for€each€option.€€Your€choice€is€to€use€theÌà  àà ø àà ` àà ¸ àdefault€or€change€it€to€the€opposite€option.€€AsÌà  àà ø àà ` àà ¸ àyou€move€the€cursor€down,€either€hit€the€CURSORÌà  àà ø àà ` àà ¸ àDOWN€key€(opting€for€the€designated€option)€orÌà  àà ø àà ` àà ¸ àstrike€the€key€(Y€or€N)€to€give€the€oppositeÌà  àà ø àà ` àà ¸ àoption.€The€Y€or€N€symbol€will€change€reflectingÌà  àà ø àà ` àà ¸ àyour€new€choice.Ìà  àà ø àà ` àÌà  àà ø àà ` àà ¸ àYou€can€examine€the€IRS1040.OUT€file€to€see€ifÌà  àà ø àà ` àà ¸ àyou€have€made€the€choices€you€want€by:Ìà  àÌà  àà ø àà ` àà ¸ àà  àType:€ò òsee€irs1040.outó óÐ  (`", Ðà  àà ø àÌà  àòòOption€1óó.€òòSENTENCE€ANALYSISóó.€€The€default€option€is€YÌà  àà ø à(yes)„„Ô_ÔQLEXÔ_Ô€will€print€out€the€full€sentence€analysisÌà  àà ø àin€Ô_ÔQLEXÔ_Ô's€analysis€of€the€IRS1040.Ô_ÔASCÔ_Ô€file.€€I€urge€youÌà  àà ø àò òNOTó ó€to€request€this€output€since€it€takes€many€minutes€toÐ ˆ,H&1 Ðà  àà ø àprint€out.€€Rather,€view€the€output€from€the€monitor.ÌÐ .Ø'3 Ðà  àà ø àTo€get€this€sentence€analysis€output,€simply€leaveÌà  àà ø àthe€default€option€in€place€and€skip€to€the€nextÌà  àà ø àoption.€If€you€do€not€want€the€sentence€analysisÌà  àà ø àoutput,€strike€ò òN,€ó ówhich€also€moves€the€cursor€toÐ ˜X Ðà  àà ø àthe€next€option.ÌÌà  àà ø àòòNOTE!óó€Once€you€change€one€of€these€options,€thatÌà  àà ø àchoice€will€remain€in€effect€for€all€successive€Ô_ÔQLEXÔ_ÔÌà  àà ø àanalyses€until€you€exit€from€Ô_ÔQLEXÔ_Ô€(using€the€ESCAPEÌà  àà ø àkey).€€If€you€are€doing€many€Ô_ÔQLEXÔ_Ô€analyses€at€once,€itÌà  àà ø àis€convenient€not€to€have€to€change€these€options.ÌÌà  àà ø àòòYou€may€change€back€to€the€default€or€chose€some€otherÌà  àà ø àoption€before€beginning€any€new€Ô_ÔQLEXÔ_Ô€analysisóó.Ì€òòÌóóà  àòòOption€2óó.€òòHISTOGRAM€(and€table)€of€SENTENCESóó.€€The€default€isÌà  àà ø àY.€€You€must€type€N€if€you€do€not€want€the€graph€andÌà  àà ø àtable€printed€out.€€This€information€is€shown€as€page€1Ìà  àà ø àof€the€sample€text€output„„part€of€IRS1040.OUT€(cf€p€41).ÌÌà  àòòOption€3óó.€òòWORD€LISTóó.€€In€any€1,000„word€text,€there€willÌ€à  àà ø àordinarily€be€between€350€to€550€word„types.€The€defaultÐ ¨h Ðà  àà ø àis€ò òYó ó,€i.e.€the€output€will€show€each€of€these€words,€listedÐ p0 Ðà  àà ø àtwice:€alphabetically€and€by€frequency€of€usage.ÌÌà0  àà0ø(#(#àThis€word€list€requires€many€pages€of€output„„but€it€à @Ð àisÐ ø(#ø(# Ѐà  àà ø àuseful€for€finding€typos€and€editing€errors.Ð P ÐÌà  àà ø àIn€a€separate€section€of€the€output€is€a€list€of€words€toÌà  àà ø àwhich€you€gave€names,€places,€numbers,€nonsense€words,Ìà  àà ø à=Ô_ÔzzzzÔ_Ô,€and€other€terms€specified€in€the€transcription€rules.Ïà  àà ø àIf€this€word„list€is€of€no€interest€to€you,€type€ò òN.Ð x8 Ðó óÌà  àòòOption€4.óó€òòRESIDUALSóó.€€These€are€the€text's€òòuncommon€and€rareóóÏà  àà ø àwords,€i.e.,€words€which€occur€fewer€than€three€times€perÌà  àà ø àmillion€in€the€Carroll,€et€al's€corpus.€€None€are€among€theÌà  àà ø à10,000€most€common€English€words.€€Ô_ÔQLEXÔ_Ô€could€not€find€theseÌà  à€€terms€in€its€Reference€Lexicon.€€Your€option€is€to€print€orÌà  à€€not€to€print€this€list€of€Residual€Words.€€You€will€want€toÌà  àà ø àòòexamine€this€listóó€since€typos€turn€up€here,€but€you€can€doÌà  àà ø àit€from€your€monitor€more€easily€than€from€paper.ÌÌà  àà ø àThis€RESIDUAL€list€is€useful€because€it€shows€which€wordsÌà  àà ø àamong€the€10,000€most€common€English€word„types€areÌà  àà ø àclosest€in€spelling€to€each€residual€term.€€Many€residualsÌà  àà ø àare€mere€derivatives€or€inflections€of€common€word„types.ÌÌà  àà ø àTexts€from€newspapers€and€popular€magazines€may€have€100+Ìà  àà ø àresiduals/1,000€word€sample.€The€default€is€ò òYó ó.Ð À+€%0 Ðà  àà ø àÌà  àà ø àOne€of€the€experiments€in€validating€the€LEX€measuresÌà  àà ø àexplored€the€relative€contribution€made€by€words€atÐ .Ø'3 Ðà  àà ø àdifferent€ranks€on€Carroll,€et€al's€list€of€the€10,000Ìà  àà ø àmost€common€words€to€identifying€the€topic€or€subject€of€à @°" àà @°" àtheÌà  àà ø àtext€passage.€€There€is€a€powerful€relationship€between€aÌà  àà ø àword's€frequency€of€use€and€the€information€that€wordÌà  àà ø àconveys.€Statistically,€uncommon€and€rare€words€convey€farÏà  àà ø àmore€information€than€common€words.€€òòWhen€this€finding€isÌà  àà ø àcombined€with€our€research€on€word€polysemy,€not€only€doÌà  àà ø àuncommon€and€rare€words€rarely€have€multiple€meanings€(i.e.Ìà  àà ø àtheir€meaning€is€less€conditional€on€word€context€and€syntaxÌà  àà ø àthan€common€words),€they€convey€more€information€regardingÌà  àà ø àthe€substance€of€the€passage€than€do€common€wordsóó.€ÌÌà  àà ø àò òÔ_ÔQLEXÔ_Ô's€10,000€word€Reference€Lexiconó ó.€€In€Carroll,€etÐ  `  Ðà  àà ø àal's€Reference€Lexicon,€word„types€'THE',€'The',€and€'the'Ïà  àà ø àare€considered€three€distinct€word„types,€i.e.€upper€andÌà  àà ø àlower€cases€of€the€same€'word'€are€different€word„types.Ìà  àà ø àÔ_ÔQLEXÔ_Ô€combines€such€terms,€with€the€result€that€theÌà  àà ø àfirst€10,000€types€in€the€Reference€Lexicon€do€not€preciselyÌà  àà ø àmatch€those€in€the€Carroll,€et€al€list.ÌÌà  àà ø àIn€combining€frequencies€of€upper€and€lower€case€terms,Ìà  àà ø àÔ_ÔQLEXÔ_Ô's€first€10,000€terms€are€roughly€equivalent€to€theÌà  àà ø àfirst€10,500€on€the€Carroll,€et€al.€list,€and€the€combinedÏà  àà ø àvalues€slightly€rearrange€the€rankings€of€terms„„relative€toÌà  àà ø àthose€shown€in€Carroll,€et€al.€These€are€very€minor€matters.ÌÌà  àà ø àòòò òHow€to€examine€this€10,000€type€Reference€Lexicon?óóó ó€€Ô_ÔQLEXÔ_Ô'sÐ P Ðà  àà ø à10,000€REFERENCE€LEXICON€is€a€readable€ASCII€file€namedÌà  àà ø àÔ_ÔòòASCIDICT.óó€A€compressed€version€of€that€file€can€be€found€onÌà  àà ø àthe€Ô_ÔQLEX€Ô_Ôdiskette€under€the€name€ò òÔ_ÔDICTION.QLXÔ_Ôó ó.€€You€canÐ è¨ Ѐà  àà ø àexamine€this€10,000„type€lexicon€while€in€C:\QLEX€by:Ð °p Ðà  àà ø àÌà  àà ø àà ` àType:€ò òÔ_ÔDODICTÔ_Ô€ó óand€use€the€arrow€keys€to€move€aboutÐ @  ÐÌòòà0  àOption€5óó.€òòDISTRIBUTION€BY€FREQUENCYóó.€ò òThis€1„page€analysis€à @8" àà @8" àisó óÐÐ "(#(# Ðà  àà ø àò òthe€most€important€table€produced€by€Ô_ÔQLEXÔ_Ôó ó.€€Columns€on€theÐ ˜!X# Ðà  àà ø àleft€describe€the€frequencies,€the€proportions€and€theÌà  àà ø àcumulative€proportions€for€word€from€that€text.€€It€showsÌà  àà ø àhow€the€speaker€or€author€drew€upon€all€those€10,000€mostÌà  àà ø àcommon€word„types.€Texts€are€compared€by€their€relative€useÌà  àà ø àof€these€cumulative€proportion€distributions,€and€by€theirÌà  àà ø àuse€of€the€~600,000€other€rarer€content€terms.€These€numbersÌà  àà ø àare€later€used€by€the€Ô_ÔMIGNONÔ_Ô€program€to€calculate€LEX.ÌÌà  àà ø àThis€table€shows€how€often€each€of€the€ten€most€commonÌà  àà ø àgrammatical€words€in€English€was€used€in€that€text€(i.e.,Ìà  àà ø àthe,€of,€and,€a,€in,€etc),€singly€and€cumulatively.Ìà  àà ø àThis€table€shows€that€in€the€1000€word€IRS1040.ASC€text,Ìà  àà ø à'the'€was€used€71€timesÔ_ÔÔ_Ô,€i.e.€alone€'the'€accounted€for€7.1%Ïà  àà ø àof€all€its€terms.€€The€first€ten€most€common€English€termsÌà  àà ø à(all€grammatical)€accounted€for€24.5%€of€its€terms.€TenÌà  àà ø àpercent€of€that€text's€words€were€òòrareóó€words€(i.e.,€theyÐ .Ø'3 Ѐà  àà ø àoccur€fewer€than€3€times/million€in€general€usage).Ð @ ÐÌà  àà ø àFollowing€this€one„page€distribution€table€is€anotherÌ€à  àà ø àtable€showing€the€frequency€per€million€of€that€òòcontentóó€orÐ ˜X Ðà  àà ø àòòopen€classóó€word€in€your€text€which€fell€at€the€10th,€25th,Ìà  àà ø à50th,€75th€and€90th€percentile€ranks.€The€default€is€ò òYó ó,Ð ( è Ðà  àà ø àsince€virtually€all€lexical€analyses€include€this€table.€€Ìà  àà ø àÌà  àà ø àà ` àòòRememberóó,€a€LEX€score€excludes€the€text's€use€of€allÌà  àà ø àà ` à75€most€common€function€or€closed€class€English€terms.ÌÌà  àòòOption€6.óó€€òòThe€1„2„3€OUTPUT€FILEóó.€You€have€the€option€here€ofÏà  àà ø àhaving€the€main€lexical€and€sentence€measures€from€Ô_ÔQLEXÔ_ÔÌà  àà ø àanalysis€sent€to€the€printer€or€sent€to€a€computer€file.Ìà  àà ø àThis€information€is€needed€by€Ô_ÔMIGNONÔ_Ô€to€calculate€LEX„„theÌà  àà ø àprincipal€statistic€from€Ô_ÔQLEXÔ_Ô.€€You€will€also€want€toÌà  àà ø àcombine€this€file€with€others€in€statistical€analyses.Ìà  àà ø àThis€file€may€then€be€imported€into€LOTUS€1„2„3€(developmentÌà  àà ø àof€these€programs€goes€back€19€years)€or€to€any€otherÌà  àà ø àspreadsheet€and€then€on€to€your€statistical€package.€€TheÌà  àà ø àdefault€is€N.Ìò òÌà  àà ø àà ` àNOTE.€ó óThere€is€a€quirk€in€Ô_ÔQLEXÔ_Ô.€€Before€you€call€forÐ p0 Ðà  àà ø àà ` àÔ_ÔQLEXÔ_Ô€to€produce€this€1„2„3€file,€your€text€must€haveÌà  àà ø àà ` àrun€successfully.€ÌÌà  àà ø àà ` àà ¸ àòòFirstóó€run€Ô_ÔQLEXÔ_Ô€without€asking€for€the€x.123Ìà  àà ø àà ` àà ¸ à€€€€€€output€file.Ìà  àà ø àà ` àà ¸ àIf€it€ran€successfully:Ìà  àà ø àà ` àà ¸ àà  à(a)€delete€the€Ô_Ôx.outÔ_Ô€file,€thenÌà  àà ø àà ` àà ¸ àà  à(b)€rerun€Ô_ÔQLEXÔ_Ô€asking€for€both€the€Ô_Ôx.outÔ_Ô€andÌà  àà ø àà ` àà ¸ àà  àà h à€x.123€files.ÌÌà  àà ø àà ` àOnly€the€first€time€you€run€a€text€with€Ô_ÔQLEXÔ_Ô€need€youÌà  àà ø àà ` ànot€ask€for€this€1„2„3€output€file.€Thereafter,€it€can€beÌà  àà ø àà ` àproduced€on€the€first€time€you€run€any€Ô_ÔQLEXÔ_Ô€job,€ò òso€longÐ ˜!X# Ðà  àà ø àà ` àas€there€are€no€editing€errors€in€the€textó ó.Ð `" $ ÐÌà  àòòOption€7óó.€òòCALCULATIONS€ON€PERCENTILES€vs€RANK€CURVEóó.Ìà  àà ø àò òIgnore€all€these€numbersó ó„„in€developmental€work€on€Ô_ÔQLEXÔ_Ô,€allÐ ¸$x' Ðà  àà ø àsorts€of€special€analyses€were€tried,€but€not€erased€fromÌà  àà ø àthe€program„„so€ignore€them.ÌÌÌà  àòòOption€8óó.€òòSEND€TO€PRINTERóó.€ò òThis€is€importantó ó„„normally€à @Ð àà @Ð àoneÐ  (`", Ðà  à€à ø àcannot€take€the€time€to€print€out€these€long€outputsÐ h)(#- Ðà  àà ø ànor€would€you€want€them„„they€are€voluminous.€€Save€outputÌà  àà ø àfiles€as€computer€files€and€then€examine€them€from€yourÌà  àà ø àmonitor.€€The€default€is€Y.€€By€typing€ò òNó ó,€the€entireÐ À+€%0 Ðà  àà ø àanalysis€output€will€be€put€into€a€computer€file€(whoseÌà  àà ø àname€you€supply)€which€can€then€be€examined€at€leisureÌà  àà ø àon€the€screen.€€You€can€later€copy€or€print€that€file.Ð .Ø'3 Їà  àà ø àò òPRINTING€OUTPUTó ó.€€Printing€output€from€a€>1,000„wordÐ @ Ðà  àà ø àtext€takes€about€twelve€pages„„(cf.€IRS1040.out)Ì€à  àà ø àconsequently€users€are€advised€to€store€both€Ô_ÔQLEXÔ_Ô€outputÐ Ð Ðà  àà ø àfiles€(x.out€and€x.123)€on€a€hard€drive,€diskette€or€ZIPÌà  àà ø àdrive€and€inspect€the€contents€from€the€monitor.Ìà  àà ø àò òÌà  àà ø àPRINTER€WRAP„AROUNDó ó„„The€formatted€output€fits€on€8.5"„wideÐ ð ° Ðà  àà ø àpaper€but€there€may€be€a€wrap„around€problem€unless€youÌà  àà ø àspecify€a€smaller„than„usual€type€font€for€your€printer.Ì€€€€€€€€€€à ¸ àà  àÐ H   Ðà  àòòOption€9.óó€òòSEND€TO€FILEóó.€If€you€specified€ò òNó ó€to€option€8,€thenÐ Ð  Ðà  àà ø àyou€must€send€the€Ô_ÔQLEXÔ_Ô€output€to€a€computer€file€whose€nameÌà  àà ø àyou€supply.€€Type€ò òYó ófor€this€option.€€Since€the€default€toÐ  `  Ðà  àà ø àoption€8€is€ò òYó ó,€the€†default€for€option€9€ò òMUST€BEó ó€ò òN€(No)ó ó.Ð h(  ÐÌà  àòòOption€10óó.€òòADD€COMMENTARYóó.€€If€you€type€ò òYó ó,€this€option€allowsÐ ø¸  Ðà  àà ø àyou€to€add€commentary€to€your€output€files„„e.g.€why€wasÌ€à  àà ø àthis€text€examined?€€The€default€is€ò òNó ó.€€Strike€the€CURSORÐ ˆH  Ðà  àà ø àDOWN€once€bringing€you€to€the€next€box€on€the€menu.€NormallyÌà  àà ø àno€commentary€is€added€so€type€ò òNó ó.Ð Ø Ðà  àg.€Supply€NAMES€for€these€OUTPUT€FILES.Ìà  àÌà  àà ø àò òòòThe€Ô_ÔQLEXÔ_Ô€OUTPUT€FILEóóó ó.€€If€you€specified€ò òYó ó€to€a€printerÐ p0 Ðà  àà ø àoutput,€and€ò òNó ó€to€the€statistics€output€file€(i.e.€theÐ 8ø Ðà  àà ø à1„2„3€output€file,€then€you€may€skip€over€these€options.ÌÌà  àà ø àIF€you€chose€ò òYó ó€to€Option€9€(i.e.€you€want€to€createÐ P Ðà  àà ø àan€output€computer€file„„the€normal€case),€then€here€isÌà  àà ø àwhere€you€supply€the€needed€File€Name:ÌÌà  àà ø àà ` àType:€ò òÔ_Ôx.outÔ_Ô€ó ó€('x'€stands€for€the€text's€file€nameÐ °p Ѐ€€€€€€€€€€€€€€€€€€€€€€€€„„up€to€eight€characters/numbers).Ì€€à  àà ø àà ` à€€€€€This€file's€extension€ò òmust€beó ó€ò òÔ_Ôx.OUTÔ_Ôó óÐ @  ÐÌà  àà ø àÔ_ÔQLEXÔ_Ô€puts€everything€it€would€have€printed€into€thisÌà  àà ø àfile,€and€stores€it€on€the€Ô_ÔQLEXÔ_Ô€directory.€If€youÌà  àà ø àanticipate€storing€a€great€many€Ô_ÔQLEXÔ_Ô€outputs,€you€canÌà  àà ø àdesignate€some€other€drive€and€directory€for€these€files,Ìà  àà ø àin€front€of€the€file€name€(e.g.€c:\LEXFILES).€€Ô_ÔQLEXÔ_Ô€will€putÏà  àà ø àthe€output€file€(Ô_Ôx.outÔ_Ô)€there.ÌÌà  àà ø àTo€examine€the€contents€of€this€major€output€file:ÌÌà  àà ø àà ` àType:€ò òSEE€Ô_Ôx.outÔ_Ôó ó€€use€the€cursor€to€move€about.Ð Ø'˜!+ ЀÌà  àà ø àà ` àThe€òòSeeóó€utility€is€useful€for€examining€any€ASCIIÌà  àà ø àà ` àfile.€€Exit€from€SEE€by€striking€the€ESC€key.ÌÌà  àà ø àTo€print€this€output€file:ÌÌà  àà ø àà ` àType€ò òPRINT€Ô_Ôx.OUTÔ_ÔÐ P-'2 Ѐ€€€€€€€€€€€€€€€€ó óthen,€in€response€to€the€prompt:€type€ò òPRNÐ .Ø'3 Ðó ó‡à  àà ø àò òòòThe€1„2„3€Statistics€File€Nameóóó ó.€€This€option€supplies€theÐ @ Ðà  àà ø ànecessary€data€on€the€cumulative€proportion€distributionÌà  àà ø àwhich€another€program€called€Ô_ÔMIGNONÔ_Ô€uses€to€calculateÌà  àà ø àLEX„„the€statistic€which€describes€the€direction€of€skewÌà  àà ø àand€the€magnitude€of€that€text's€skew€in€word€choice.ÌÌà  àà ø àà ` àType:€ò òx.123ó ó€using€the€same€filename€as€in€theÐ ð ° Ѐ€€à  àà ø à€€à ` àASCII€file€but€.123€as€the€file's€extension.Ð ¸ x ÐÌà  àà ø àTo€print€the€contents€of€the€file:Ìà  àà ø àÌà  àà ø àà ` àType:€ò òPRINT€x.123ó óРؘ  Ðà  àà ø àà ` àthen,€in€response€to€the€prompt:Ìà  àà ø àà ` àType:€ò òPRNó óÐ h(  ÐÌà  àh.€òòSKIP€THIS€FILEóó?€€The€default€for€this€Menu€option€is€N.Ìà  àà ø àSince€you€can€use€wildcards€(e.g.€*€or€?)€to€identify€aÌà  àà ø àwhole€series€of€texts€to€analyze€in€a€single€Ô_ÔQLEXÔ_Ô€run,Ìà  àà ø àyou€have€the€option€of€skipping€a€particular€file€withinÌà  àà ø àthat€set€of€files.ÌÌà  àà ø àTo€skip€the€file€shown€at€the€top€of€the€menu€screen:ÌÌà  àà ø àà ` àType:€ò òY€€ó ó(skip€that€text)ò òÐ 8ø ÐÌà  àà ø àà ` àó óòòRemember!óó€€if€you€want€to€analyze€the€very€nextРȈ Ðà  àà ø àà ` àtext,€change€this€option€back€to€ò òNó ó€(meaning„„IÐ P Ðà  àà ø àà ` àdon't€want€to€skip€this€next€text).Ìà  àÌÔ_Ôà  àiÔ_Ô.€òòARE€THESE€CHOICESò ò€ó óOKóó?€€After€making€so€many€choices,Ð è¨ Ðà  àà ø àthe€program€gives€you€a€chance€to€review€your€decisionsÌà  àà ø àbefore€the€Ô_ÔQLEXÔ_Ô€analysis€of€texts€begins.€The€default€isÌà  àà ø àY.€€Before€striking€ENTER€(which€starts€the€analysis),Ìà  àà ø àcheck€to€see€if€your€choices€are€exactly€what€you€want.Ìà  àà ø àIf€not,€use€the€cursor€key€to€move€about€the€Menu,€makeÌà  àà ø àthe€necessary€correction,€then€return€to€this€point.ÌÌà  àà ø àà ` àStrike€ò òENTERó 󀄄€and€Ô_ÔQLEXÔ_Ô€begins€the€analysisÐ (#è% ÐÌà  àà ø àWhen€you€analyze€many€texts€in€one€Ô_ÔQLEXÔ_Ô€run,€thisÌ€à  àà ø àprocess€is€repeated€for€each€new€text€displayedÐ €%@( Ðà  àà ø àat€the€top€of€the€screen€until€you€have€gone€through€theÌà  àà ø àentire€set.€€If€all€the€analyses€involve€the€same€MenuÌà  àà ø àoptions,€all€you€need€do€is€give€the€output€files€†theirÌà  àà ø ànames€and€strike€ENTER€after€each€file.€€Not€until€theÌà  àà ø àlast€sample€text's€choices€have€been€made€will€the€ENTERÌà  àà ø àkey€start€Ô_ÔQLEXÔ_Ô.€ÌÌ€€€à  àò òHOW€LONG€DOES€A€Ô_ÔQLEXÔ_Ô€JOB€TAKE?€€Typically,€under€1€SECONDó óÐ À+€%0 ÐÒX°ì#ÒÒ°XÒò òÌÌÐ .Ø'3 ÐòòVIII.€€Ô_ÔMIGNON:€the€program€which€calculates€the€final€LEX€measuresÔ_Ôóóó óÐ @ ІÌà  àThe€C:\QLEX€directory€contains€MIGNON€programs.€òòÔ_ÔMIGNONóó,€aÌ€à  àsubset€of€Ô_ÔQLEXÔ_Ô's€programs,€calculates€LEX€and€the€otherÐ ˜X Ѐà  àstatistics,€and€òòproduces€two€new€files,€Ô_Ôx.LEXÔ_Ô€and€x.321óó.Ð `  Ðà  àTo€run€Ô_ÔMIGNONÔ_Ô,€first€run€Ô_ÔQLEX€to€produce€the€cumulativeÌà  àfrequency€distributionÔ_Ô,€and€then€check€to€confirm€that€òòitsÌà  àóótwo€output€files€Ô_Ô(x.outÔ_Ô€and€x.123)€were€saved.ÌÌà  àÔ_ÔMIGNONÔ_Ô€operates€on€a€text's€x.123€file.€€It€calculates€LEX€Ïà  àby€integrating€the€òòò òAREAó óóó€beneath€a€text's€cumulativeÐ Ð  Ðà  àproportion€distribution„„produced€during€the€Ô_ÔQLEXÔ_Ô€analysisÌà  à(e.g.,€this€can€be€seen€in€IRS1040.OUT„„attached€to€LEXGUIDE.ÌÌà  àNext,€Ô_ÔMIGNONÔ_Ô€compares€two€areas€„„the€size€of€this€text'sÌà  àAREA€is€contrasted€with€a€òòconstant€areaóó„„the€integrated€òòAREAÌà  àóóbeneath€Ô_ÔHerdanÔ_Ô's€theoretical€Ô_ÔlognormalÔ_Ô€model€of€word€choice„„¼ñ üñ¼ñ üñà  àwhich€we€now€know€is€closely€approximated€(empirically)€byÌà  àword€choice€in€the€world€sample€of€newspapers€(mean€LEX€=Ì€à  à0.0).Ð Ø ÐÌà  àò òLEXó ó€is€the€difference€between€those€two€AREAS:Ð ¨h Ðà  àà ø à(1)€the€area€under€a€text's€cum.€prop.€distribution,€andÌà  àà ø à(2)€the€area€under€Ô_ÔHerdanÔ_Ô's€theoretical€Ô_ÔlognormalÔ_Ô€model€ofÌà  àà ø à€€€€word€choice.ÌÌà  àà ø àà ` à[ò ò„€LEXó ó]€texts:Ð P Ðà  àà ø àà ` àà ¸ àthese€natural€texts€have€larger€areas€thanÌà  àà ø àà ` àà ¸ àdoes€Ô_ÔHerdanÔ_Ô's€linear€model,€i.e.€theirÌà  àà ø àà ` àà ¸ àtexts'€word€choice€is€skewed€toward€commonÌà  àà ø àà ` àà ¸ àwords.€€These€texts€are€'lexically€simpler'Ì€€€€€€€€€€€€€€€€€€€€€or€more€accessible€than€the€average€newspaper.€Ïà  àà ø àà ` àà ¸ àFor€that€reason,€such€texts€are€given€the€(ò ò„ó ó)€LEXÐ @  Ðà  àà ø àà ` àà ¸ àsign.ÌÌà  àà ø àà ` à[ò ò+€LEXó ó]€texts:Ð ˜!X# Ðà  àà ø àà ` àà ¸ àthese€natural€texts€have€smaller€areas€thanÌà  àà ø àà ` àà ¸ àthat€under€Ô_ÔHerdanÔ_Ô's€linear€model,€i.e.€theirÌà  àà ø àà ` àà ¸ àtexts'€word€choice€was€skewed€toward€rareÌà  àà ø àà ` àà ¸ àwords.€€These€texts€are€more€'difficult'Ìà  àà ø àà ` àà ¸ àlexically€(less€accessible)€than€the€averageÌ€à  àà ø àà ` àà ¸ ànewspaper.€€For€that€reason,€such€texts€are€givenÐ H& ) Ðà  àà ø àà ` àà ¸ àthe€(ò ò+ó ó)€sign.Ð 'Ð * ÐÌà  àà ø àà ` à[ò òLEX€magnitudeó ó]:Ð  (`", Ðà  àà ø àà ` àà ¸ àthis€is€the€òòquantitative€difference€in€areaóó€beneathÏà  àà ø àà ` àà ¸ àthe€two€distributions€(the€empirical€versus€theÌ€à  àà ø àà ` àà ¸ àtheoretical).€The€larger€this€number€the€more€(less)Ð ø*¸$/ Ðà  àà ø àà ` àà ¸ àaccessible€the€text.ÌÌà  àA.€òòProcedures€for€running€Ô_ÔMIGNONÔ_Ôóó.€€It€takes€Ô_ÔMIGNONÔ_Ô€less€thanÏà  àà ø àa€second€to:€(a)€calculate€the€LEX€statistics€and€(b)Ð .Ø'3 Ðà  àà ø àproduce€the€two€new€computer€files:€Ô_Ôx.LEXÔ_Ô€and€x.321.€€FirstÌà  àà ø àtry€Ô_ÔMIGNON€Ô_Ôon€the€IRS1040.123€sample€output.ÌÌà  àà ø àà ` àType:€ò òÔ_ÔMIGNONÔ_ÔÐ ˜X Ðó óÌà  àà ø àà ` àThen,€answer€the€Ô_ÔquiryÔ_Ô€giving€the€file's€name:Ì€€€€€€€€€€€€€€€€€€€€€€€€€€€Ìà  àà ø àà ` àType:€ò òirs1040ó óÐ ¸ x ÐÌ€€€à  àà ø àòòò òNoteó óóó„„(a)€use€only€the€file€name;Ð H   Ѐ€€à  àà ø à€€€€€€(b)€òòomitóó€the€period€and€its€extensionÐ Ð  ЀÌà  àà ø àÔ_ÔMIGNONÔ_Ô€calculates€and€produces€the€LEX€and€other€lexicalÌà  àà ø àà ` àstatistics,€and€two€new€files:Ìà  àà ø àà ` àà ¸ à(a)€1040.321Ìà  àà ø àà ` àà ¸ à(b)€IRS1040.LEXÌÌà  àà ø àWhen€you€run€Ô_ÔMIGNON€on€Ô_Ôthe€IRS1040.LEX€file,€the€resultsÌà  àà ø àappear€on€your€monitor.€€That€analysis€produces€eightÌà  àà ø àstatistics„„beginning€on€the€leftmost:€ò òòò'LEX1'ó ó„„the€best,Ð Ø Ðà  àà ø àwell„validated€measure€of€a€text's€accessibility,Ìà  àà ø àdifficulty€or€comprehensibilityóó.Ì€Ìà  àà ø àLEX€(open€class€words€only)€and€LEX1€are€one€and€the€same.ÌÌà  àà ø àòòWhat€then€is€LEX2óó?€€Our€research€has€established€thatÌ€à  àà ø àgrammatical,€function€or€closed„class€terms€contributeÐ P Ðà  àà ø àvirtually€no€increment€in€predictive€power€to€a€text'sÌà  àà ø à'lexical€difficulty'„„beyond€that€produced€by€the€text'sÌà  àà ø àchoice€among€the€10,000€most€common€content€or€open€classÏà  àà ø àtypes.Ìà  àÌà  àà ø àMuch€effort€went€into€finding€the€Ô_ÔminimaxÔ_Ô€solution€forÌà  àà ø àwhere€to€partition€closed€from€open„class€terms€(sinceÌà  àà ø àthey€overlap€and€some€words'€grammatical€role€isÌ€à  àà ø àconditional).Ð ˜!X# Ðà  àÌÒX°n¬Òà  àà ø àà ` àòòLEX1óó€partitions€this€array€between€word€ranks€75€andÌà  àà ø àà ` à76„„meaning€that€LEX1€includes€a€text's€use€of€allÌà  àà ø àà ` àtypes€ranked€between€76€and€10,000.ÌÌà  àà ø àà ` àòòLEX2óó€sets€the€partition€point€between€word€ranksÌà  àà ø àà ` à35€and€36,€so€it€includes€all€types€ranked€from€36Ìà  àà ø àà ` àthrough€10,000.€€An€empirical€case€can€be€made€forÌà  àà ø à€€€€partitioning€the€first€and€most€common€10,000€word€typesÌà  àà ø à€€€€at€an€even€lower€rank.ÌÌà  àà ø àòòWhat€is€contained€in€this€new€IRS1040.321€fileóó?Ìà  àà ø àà ` àIRS1040.321€contains€the€percentages€on€which€itsÌà  àà ø àà ` àcumulative€proportion€distribution€is€produced,€andÌà  àà ø àà ` àseveral€new€exploratory€measures€created€by€Ô_ÔMIGNONÔ_Ô.Ìà  àà ø àà ` àThe€identity€of€the€selected€set€of€99€variables€plusÐ .Ø'3 Ðà  àà ø àà ` àthe€file's€name€is€described€in€Section€XV.€In€textÌà  àà ø à€€€€research,€one€normally€compares€sets€of€texts€with€oneÌà  àà ø à€€€€another€(e.g.€17th€&€18th€C.€newspapers€versus€19th€C.Ì€à  àà ø àà ` ànewspapers€versus€20th€C.€newspapers).€€That€requires€Ð ˜X Ðà  àà ø àà ` àthe€outputs€of€many€individual€texts€be€aggregated.€€YouÌà  àà ø àà ` àmay€combine€these€x.321€files€with€others€to€form€largeÌà  àà ø àà ` àdata€matrices€on€any€spreadsheet€software.ÌÌ€€€€€€€€€€€€€€In€Corel€Quattro„Pro€8,€this€is€done€as€in€this€example:ÌÌà  àà ø àà ` à€€€Type:€ò òcopy€*.321€c:\newspapr\newspapr.wb3ó óÐ Ð  ÐÌ€€€€€€€€€€€€€€€€€€€€€All€x.321€files€produced€by€MIGNON€will€beÌà  àà ø àà ` à€€€€€€€joined€into€a€single€file€known€as€newspapr.wb3€onÌ€€€€€€€€€€€€€€€€€€€€€the€directory€Newspapr€on€the€c:\€driveÌÌ€€€€€€€€€€€€€€€€€From€Quattro€Pro€8€or€Excel€or€Lotus€1„2„3,€one€canÌ€€€€€€€€€€€€€€€€€retrieve€this€newspapr.wb3€file€and€enter€it€into€aÌ€à  àà ø àà ` à€€€spreadsheet.€€From€there,€one€can€then€import€the€Ð P Ðà  àà ø àà ` à€€€spreadsheet€file€into€a€statpak€(such€as€Minitab)Ìà  à€€€€€€€€€€and€carry€out€more€elaborate€analyses€on€the€full€set€ofÌà  àà ø àà ` à€€€newspapers.Ìà  àà ø àà ` àà ¸ à€€€€€€€€€€€€€€€€€€€€€€€€€Ì€€€€€€€€€€€€€€ò òIMPORTANT:ó ó€the€MIGNON€program€will€perform€its€analysisÐ 8ø Ѐ€€€€€€€€€€€€€€€€of€x.123€files€singly€or€can€carry€out€its€analysisÌà  àà ø àà ` à€€€and€produce€the€x.321€and€x.LEX€output€files€forÌà  àà ø àà ` à€€€òòallóó€*.123€files€on€a€directory,€sequentially.€This€savesÌà  àà ø àà ` à€€€a€lot€of€keystrokes€and€time€when€there€are€many€QLEXÌ€à  àà ø àà ` à€€€analyses.Ð  à ÐÌò òÌà  àó óB.ò ò€ó óòòVariable€identity€in€Ô_ÔMIGNON's€x.LEXÔ_Ô€output€fileóó.Ð x8 ЀÌà  àà ø àà ` àà ¸ àvariable€#10:Ìà  àà ø àà ` àà ¸ àà  àò òLEX2ó ó€(the€difference€in€two€Areas€under€theÐ Ð " Ðà  àà ø àà ` àà ¸ àà  àcumulative€proportion€curves:€the€empiricalÌà  àà ø àà ` àà ¸ àà  àand€the€theoretical€(the€Carroll,€et€al,Ìà  àà ø àà ` àà ¸ àà  àand€newspaper€linear€distributions)€for€thoseÌà  àà ø àà ` àà ¸ àà  àwords€ranked€between€ò ò36ó ó€and€ò ò10,000ó ó).Ð ð#°& ÐÌà  àà ø àà ` àà ¸ àvariable€#11:€òòÌóóà  àà ø àà ` àà ¸ àà  àòòNóó„„the€number€of€tokens€found€in€this€textÌà  àà ø àÌà  àà ø àà ` àà ¸ àvariable€#15:€Ìà  àà ø àà ` àà ¸ àà  àthe€text's€òòmeanóó€open„class€type,€expressed€byÌà  àà ø àà ` àà ¸ àà  àits€U€value€(i.e.€its€freq.€per€million€inÌà  àà ø àà ` àà ¸ àà  àCarroll,€et€al's€Reference€Lexicon).Ìà  àà ø àÌà  àà ø àà ` àà ¸ àvariable€#67:Ìà  àà ø àà ` àà ¸ àà  àthe€òò%óó€of€all€a€text's€tokens€which€appear€amongÌà  àà ø àà ` àà ¸ àà  àthe€10,000€most€common€types€of€the€Carroll,Ìà  àà ø àà ` àà ¸ àà  àet€al.€Reference€LexiconÐ .Ø'3 Їà  àà ø àà ` àà ¸ àvariable€#72:Ìà  àà ø àà ` àà ¸ àà  àthe€text's€òòmedianóó€(Q2)€open€class€word,€expressedÌà  àà ø àà ` àà ¸ àà  àby€its€U€value€(freq./million).ÌÌà  àà ø àà ` àà ¸ àvariable€#88:Ìà  àà ø àà ` àà ¸ àà  àthe€size€of€the€Area€beneath€the€cumulativeÌà  àà ø àà ` àà ¸ àà  àproportion€distribution€for€the€text'sÌà  àà ø àà ` àà ¸ àà  àò òCLOSED€CLASSó ó€types€(types€ranked€ò ò1ó ó€through€ò ò75ó ó).Ð ¸ x ÐÌà  àà ø àà ` àà ¸ àvariable€#98:Ìà  àà ø àà ` àà ¸ àà  àthe€size€of€the€Area€beneath€the€cumulativeÌà  àà ø àà ` àà ¸ àà  àproportion€distribution€for€ò òCLOSED€CLASSó ó€types.Рؘ  ÐÌà  àà ø àà ` àà ¸ àvariable€#99:Ìà  àà ø àà ` àà ¸ àà  àò òLEX1ó ó€(for€this€text,€the€difference€in€two€Ð 0ð  Ðà  àà ø àà ` àà ¸ àà  àAreas:€the€empirical€and€theoreticalÌà  àà ø àà ` àà ¸ àà  àdistributions,€for€all€ò òOPEN€CLASS€ó óword€typesÐ À€  Ðà  àà ø àà ` àà ¸ àà  àranked€between€ò ò76ó ó€and€ò ò10,000ó ó).€ò òLEX1ó ó€is€theÐ ˆH  Ðà  àà ø àà ` àà ¸ àà  àprincipal€LEX€measure€of€a€text's€lexicalÌ€à  àà ø àà ` àà ¸ àà  àdifficulty/accessibility.Ð Ø ÐÌÒ°XÒòòò òÌIX.€INTERPRETING€THE€LEX€STATISTICó óóó.€Ð p0 ÐÌà  àLEX1€(open€class€words€only)€is€interpreted€as€measuring€aÏtext's€accessibility/lexical€difficulty/comprehensibility.€€ThisÏinterpretation€is€based€on€large€body€of€research€on€theÏ'difficulty'€and€recognition€of€single€words.ÌÌÌà  àA.€òòWord€'difficulty'€and€regularities€in€word€choiceóó.€SeveralÌà  àà ø àdecades€of€research€has€shown€that€text€difficulty€is€aÌà  àà ø àconcept€of€considerable€complexity€(Carr€and€Levy,€1990;Ìà  àà ø àLevy€and€Carr,€1990;€Holland,€1981).€€This€sectionÌà  àà ø àdescribes€what€is€probably€the€single€most€powerfulÌà  àà ø àcomponent€of€a€text's€difficulty„„its€lexicalÌà  àà ø àdifficulty,€as€represented€by€LEX1€and€by€its€relatedÌ€à  àà ø àstatistics.Ð (#è% ÐÌà  àà ø àPowerful€statistical€regularities€in€word€choice€wereÌà  àà ø àsummarized€and€formed€the€basis€for€Ô_ÔHerdanÔ_Ô's€(1960,€1966)Ìà  àà ø àÔ_ÔlognormalÔ_Ô€model€for€word€choice:€when€the€types€of€a€lexiconÌà  àà ø àare€ranked€by€their€frequency€in€general€use,€and€thoseÌà  àà ø àranks€are€expressed€by€their€common€log,€he€proposedÌà  àà ø àthat€the€resulting€word€choice€distribution€fits€theÌà  àà ø àÔ_ÔlognormalÔ_Ô€statistical€distribution€(next€to€the€normalÌà  àà ø àdistribution„„one€of€the€most€common€distributions€found€inÏà  àà ø ànature).€€When€represented€as€shown€in€Figure€1,€thatÌà  àà ø àdistribution€is€essentially€linear€across€the€range€fromÌà  àà ø àrank€1€through€10,000.ÌÌÐ .Ø'3 Ðà  àà ø àPut€another€way,€the€probability€of€a€word's€choice€isÌà  àà ø àbased€on€the€log€of€its€frequency€in€general€use.€€CarrollÏà  àà ø à(1971),€using€a€specially„designed€five€million€word€corpusÏà  àà ø àof€texts€sampled€from€school€books€for€children€age€9€to€15,Ìà  àà ø àfound€that€English€word€choice€fits€Ô_ÔHerdanÔ_Ô's€model€well€forÌà  àà ø àmost€of€its€distribution,€but€the€test€was€constrained€byÌà  àà ø àthe€fact€that€his€five€million€word€sample€contained€onlyÌà  àà ø à86,000+€of€the€estimated€609,000€(as€estimated€by€Carroll,Ìà  àà ø à1971)€word€types€in€the€English€lexicon.ÌÌà  àà ø àChoice€of€word,€conditional€on€its€frequency€in€generalÌà  àà ø àusage.€€The€rarer€a€term,€the€longer€the€period€required€toÌà  àà ø àretrieve€it€from€memory,€and€the€longer€the€time€requiredÌà  àà ø àfor€its€recognition€when€presented€Ô_ÔtachistoscopicallyÔ_Ô€(JustÌà  àà ø à&€Carpenter,€1987).€€Consequently,€the€rarer€the€word,€theÌà  à€à ø àless€likely€it€is€to€be€used€òòwhen€the€text€is€produced€óóà @À! àà @À! àòòinÐ ø¸  Ðà  àà ø àrealtimeóó.€€Consequently,€spontaneous€conversations€should€beÌà  àà ø à(and€are)€less€difficult€than€rehearsed€text€and€mostÌà  àà ø àprinted€texts€(typically€written€off„line).ÌÌà  àà ø àDespite€the€approximately€609,000€word„types€in€English,Ìà  àà ø àslightly€more€than€100€are€classed€as€function€or€closed„Ìà  àà ø àclass,€yet€that€small€set€of€grammatical€terms€accounts€forÌà  àà ø àabout€òòhalfóó€of€all€the€words€in€texts.€€The€other€half€areÌà  àà ø àfrom€the€many€open„class€types.ÌÌ€à  àB.€òòThe€REFERENCE€LEXICON:€word€choice€in€newspapersóó.€€AnÐ P Ðà  àà ø àempirical€(as€opposed€to€Ô_ÔHerdanÔ_Ô's€theoretical)€model€forÌà  àà ø àall€text€comparisons€is€the€pattern€of€word€choice€found€inÌà  àà ø ànewspapers.€€Their€mean€is€set€at€LEX€=€0.0€because€itsÌà  àà ø àpattern€of€word€choice€is€linear€for€1,000„word€samples€fromÌà  àà ø àmajor€English€language€newspapers€published€in€Africa,€Asia,Ìà  àà ø àAustralia,€Europe,€India€and€in€North€America,€as€far€à @H! àbackÌà  àà ø àas€1665.€€LEX's€Ô_ÔinterquartileÔ_Ô€range€in€these€63€newspaperÏà  àà ø àsamples€is€relatively€narrow€(LEX€=€„3.8€to€+2.6).ÌÌà  àà ø àEach€feature€section€of€a€newspaper,€however,€has€aÌ€à  àà ø àcharacteristic€pattern€of€word€choice€and€difficulty:€ourÐ (#è% Ðà  àà ø àsamples€of€comics€were€written€at€LEX€=€„25;€advice€columnsÌà  àà ø àat€LEX€=€„21;€sports€at€„13.8,€science€and€medicine€at€+3.7;Ìà  àà ø àand€business€news€at€+€4.7.Ì€à  àÐ H& ) Ðà  àà ø àSeveral€factors€shape€a€text's€pattern€of€word€choice.Ìà  àà ø àThese€include€the€target€audience's€effect€on€the€choice€ofÌà  àà ø àmessage,€and€on€the€choice€of€domain€(technical€vs€mundane).Ìà  àà ø àTexts€addressed€to€foreigners€barely€acquainted€withÌà  àà ø àEnglish,€to€young€children€and€to€animals€are€generallyÌà  àà ø àheavily€skewed€toward€common€terms,€showing€that€speakersÌà  àà ø àand€writers€self„consciously€manipulate€LEX€levels„„thoughÌà  àà ø àseldom€with€much€accuracy€according€to€our€research.€€InÌ€à  àà ø àexperiments€designed€to€measure€subjects€success€at€hittingÐ P-'2 Ðà  àà ø àspecific€LEX€targets,€few€could€consistently€hit€near€anÐ .Ø'3 Ðà  àà ø àintended€level€of€text€difficulty.€€ÌÌà  àà ø àSeveral€of€these€determinants€of€the€pattern€of€word€choiceÏà  àà ø àmay€occur€simultaneously.€€Texts€written€by€scientists€forÌà  àà ø àfellow€specialists€in€their€sub„field€are€heavily€skewedÌà  àà ø àtoward€rare€terms„„a€combined€audience€and€domain€selectionÏà  àà ø àeffect.€ÌÌà  àà ø àOther€determinants€of€the€pattern€of€word€choice€includeÌà  à€€€realtime€versus€offline€text€production,€and€the€level€ofÌà  àà ø àpersonal€stress/distraction€while€producing€texts€(as€whenÌ€à  àà ø àwitnesses€testify€under€high€or€low€stress€levels).€Ð ؘ  ÐÌà  à€C.€òòò òA€general€model€for€word€choiceó óóó.€òòòòTexts€in€widely€diverseñ üñ€ñ üññ üñÐ h(  Ðà  àà ø àdomains€may€have€the€same€LEX€score,€and€texts€iñüñoñüñn€the€sameÌà  àà ø àdomainñüñtopicñüñ€may€nonetheless€have€a€wide€range€of€LEX€scoresóó.€€óóAÐ A  Ðà  àà ø àmore€general€model€for€word€choice€than€Ô_ÔHerdanÔ_Ô's€is€theÌà  àà ø àtheoretical€spectrum€of€LEX€values€found€in€the€CORNELLñüñÌñüñ€à  àà ø àñüñ¼ñüñCORPUS„2000.€€Every€LEX€score€represents€a€unique€pattern€ofÐ ™Y Ðà  àà ø àword€choice.€€Theoretically,€no€LEX€pattern€overlaps€that€ofÌà  àà ø àanother,€at€any€point€along€their€distributions.ÌÌà  àà ø àSpeech€and€writing€clearly€violate€Ô_ÔHerdanÔ_Ô's€Ô_ÔlognormalÌà  àà ø àÔ_Ôpattern€of€word€choice,€but€little€is€known€about€theÌà  àà ø àspecific€mechanism(s)€which€allow€the€Ô_ÔlognormalÔ_Ô€patternÌà  àà ø àto€be€Ô_ÔoverriddenÔ_Ô.€€Speaking€to€knowledgeable€or€ill„informedÏà  àà ø àaudiences€induces€complex€adjustments€of€domain,€topic€andÏà  àà ø àlexical€choice€(cf.€Cornell€PhD€dissertation,€Margaret€GÌ€à  àà ø àAhrens).€€Witness€testimony€under€direct€and€cross„Ð i) Ðà  àà ø àexamination€also€suggests€that€immediate€personal€stress€andÌà  àà ø àthe€necessity€to€produce€answers€in€realtime€are€among€theÏà  àà ø àmany€constraints€which€shape€LEX€levels.ÌÌà  à€D.€òòLEX's€stability€over€the€past€334€yearsóó.€While€the€EnglishÌà  àà ø àlexicon€is€growing€rapidly€with€new€words€or€words€take€onÌà  àà ø ànew€meanings€and€lose€others,€and€still€other€words€becomeÏà  àà ø àarchaic,€the€patterns€of€word€choice€(particularly€the€useÌà  àà ø àof€the€10,000€most€common€types)€has€changed€little€in€overÌà  àà ø àthe€past€335€years.€How€do€we€know€that?€When€this€softwareÏà  àà ø à(with€its€built„in€modern€lexicon)€was€applied€to€sampleÌ€à  àà ø àEnglish€and€colonial€American€newspapers€published€in€theÐ É%‰( Ðà  àà ø àmiddle€1600's€and€1700's€(in€London,€Ipswich,€Philadelphia,Ïà  àà ø àRichmond,€and€Charleston),€their€level€(LEX€=€„3.5)€wereÌ€à  àà ø àwell€within€the€range€of€contemporary€newspapers.€€The€1791Ð !(á!+ Ðòòà  àà ø àTimesóó€(London)€was€written€at€LEX€=€„3.1;€in€1850€it€was€atÏà  àà ø à+3;€and€in€1992€at€„1.7.€The€òòNew€York€Timesóó€was€written€atÏà  àà ø àLEX€=€+2.0€in€its€first€year€(1852)€and€at€„0.85€in€1987.€Ïà  àà ø àOverall,€newspaper€levels€appear€to€have€risen€at€the€rateÌà  àà ø àslightly€more€than€1€LEX€per€century.ÌÌÐ ™-Y'2 Ðà  à€E.€òòLEX€Validityóó.€€There€are€multiple€grounds€for€believingÌà  àà ø àthat€LEX€validly€estimates€a€text's€accessibility€andÌ€à  àà ø àdifficulty.€€Decades€of€research€on€reading€ability€showÐ Ð Ðà  àà ø àthat€òòknowledge€of€the€meanings€and€uses€of€uncommon€wordsÌà  àà ø àstrongly€predicts€reading€comprehension€levelsóóÌà  àà ø à(Thorndike,€1973;€Ô_ÔSaarnioÔ_Ô,€et€al,€1990).€€LEX's€validity€hasÏà  àà ø àalso€been€confirmed€in€a€series€of€experiments€in€whichÌà  àà ø àspeakers€address€different€audiences€on€different€topics.€Ïà  àà ø àTheir€spontaneous€speech€and€writing€for€these€audiences€onÌà  àà ø àthose€topics€contained€the€predicted€changes€in€LEX€levels.Ìà  àà ø àÌà  à€F.€òòLEX€validation€using€the€5000+€text€Cornell€Corpus„2000óó.€Ïà  àà ø àAnother€way€to€validate€the€interpretation€that€LEX€measuresÌà  àà ø àthe€lexical€difficulty€of€a€text€is€to€compare€texts€againstÌà  àà ø àone€another.€€Table€1€contains€LEX€and€three€related€à @Ð àlexicalÏà  àà ø àstatistics€on€a€spectrum€of€texts€sampled€from€the€5000+Ìà  àà ø àtexts€in€the€Cornell€Corpus.(Hayes,€1988;€Hayes€andÌà  àà ø àÔ_ÔAhrensÔ_Ô,€1988).ÌÌà  àà ø àòòLEX€validation€using€US€basal€readers:€Grades€1€to€8óó.€StillÌà  àà ø àanother€standard€for€validating€LEX€scores€is€the€level€ofÌà  àà ø àAmerican€basal€readers€used€as€the€principal€school€textsÌà  àà ø àbetween€1919€through€1991.€€These€basal€readers€were€à @Ð àà @Ð àà @Ð àused€inÌà  àà ø àfirst€grade€through€middle„school.€€LEX€levels€were€higherÌà  àà ø àbefore€1945€but€after€WW€II,€all€publishers€of€schoolbooksÏà  àà ø àsimplifiedÔ_ÔÔ_Ô€their€Ô_ÔbasalsÔ_Ô€series€(this€was€the€era€of€theÌà  àà ø àcontroversial€Scott,€Foresman€Dick,€Jane€and€Spot€basalÌà  àà ø àreaders).€€Strong€parent€and€teacher€reaction€to€thatÌ€à  àà ø àsimplification€forced€the€publishers€to€restore€LEX€levelsÐ  à Ðà  àà ø àbut€the€restoration€was€only€for€grades€1€through€3€toÌà  àà ø àtheir€pre„World€War€II€levels.€€The€LEX€levels€for€theÌ€à  àà ø àremaining€4th€through€8th€grades€remained€lower€than€basalsÐ x8 Ðà  àà ø àused€between€World€Wars€I€and€II.€(cf.€page€two€of€the€à @À! àà @À! àfileÌà  à€€€named€CUCORP99.wb3€for€the€full€results).ÌÌà  àà ø àIn€Ô_ÔsimplifyingÔ_Ô€these€texts€the€publishers€also€drasticallyÌà  àà ø àshortened€sentence€lengths€(e.g.€these€declined€from€anÌà  àà ø àaverage€of€20€down€to€12€words€per€sentence€for€5th€gradeÌà  àà ø àÔ_Ôbasals€Ô_Ôpublished€after€1946).€Table€2€gives€LEX€levels€forÌà  àà ø àUSA€and€€British€basal€readers.€€Note€the€British€publishersÏà  àà ø àdid€not€simplify€their€first€grade€Ô_ÔbasalsÔ_Ô€after€WW€II,Ì€à  àà ø àdespite€the€fact€that€some€of€those€publishers€wereÐ H& ) Ѐà  àà ø àoperating€in€both€countries.Ð 'Ð * ÐÒX°ÈàÒÒH°ÒÓ+ÓÌÌÒ°HÒÒ°XÒòòò òX.€€INTERPRETING€Ô_ÔQLEXÔ_Ô'S€OTHER€STATISTICSóóó ó.€€Ð h)(#- ÐÌà  àà ø àIf€every€analysis€and€printing€option€is€chosen,€Ô_ÔQLEXÔ_ÔÏoutput€can€be€a€bewildering€mass€of€words€and€numbers„„initially.€ÏThis€section€identifies€and€interprets€these€sentence€and€lexicalÏmeasures.ÌÐ .Ø'3 Ðà  àA.€òòSENTENCESóó.€€At€the€top€of€each€text's€òòoutput€fileóó€is€theÏname€of€the€file€(e.g.,€IRS1040.outÔ_ÔÔ_Ô)€and€certain€informationÏabout€this€analysis,€including€where€in€the€text€the€analysisÏbegan,€ended,€and€the€ID€code€used€in€that€analysis€(e.g.€111).€ÏBeneath€those€statistics€is€a€HISTOGRAM€of€the€text's€sentenceÏlengths.€€A€vast€literature€about€sentence€lengths€serves€as€aÏframework€for€interpreting€these€numbers.€In€the€Cornell€Corpus„„¼ñüñ¼ñüñ2000,€the€mean€sentence€length€(ò òÔ_ÔMLUÔ_Ô„„€in€wordsó ó)€of€òòspontaneousÐ ¸ x Ðconversationsóó€between€adults€and€school„age€children€is€6.6Ïwords,€but€lower€(5.1)€when€adults€talk€with€pre„school€children.€ÏÔ_ÔMLUÔ_Ô€in€popular€TV€show€conversations€(usually€scripted)€areÏcomparable€in€length€to€parent„older€child€speech€(6.6).€ÌÌà  àIn€the€Cornell€Corpus„2000,€the€most€widely€read€texts€(e.g.Ïnewspapers€have€Ô_ÔMLUÔ_Ô's€around€22€words,€but€magazines€vary€widelyÌ(top€ten€magazines€have€MLU's€around€16.€€Research€articles€inÏòòNatureóó€have€Ô_ÔMLUÔ_Ô's€exceeding€27€words,€while€books€chosen€and€readÏby€elementary€school€children€have€mean€Ô_ÔMLUÔ_Ô's€of€11.3.€€BooksÏread€to€pre„school€children€average€12.2€words.€ÌÌà  àThe€first€page€of€the€output€(e.g.€òòIRS1040.OUT)Ô_ÔÔ_Ôóó€shows€thatÏIRS€text€contained€56€full€sentences,€and€1028€tokens€of€which€41Ïwere€coded€with€the€equal€sign,€leaving€987€non„name€tokens€forÏthat€full€sentence€analysis.€€The€median€length€of€sentence€wasÏ16€words,€the€Ô_ÔMLUÔ_Ô€was€18.4€(with€a€large€SD„„12.7€words).€ThereÏwas€one€Ô_ÔoneÔ_Ô„word€sentence€(2%€of€all€sentences),€and€allÏsentences€in€this€sample€ended€with€a€period.ÌÌà  àIn€a€designed€sample€of€101€texts€representing€every€majorÏclass€of€text€in€the€Cornell€Corpus,€the€linear€correlationÏbetween€text€Ô_ÔMLUÔ_Ô€(in€words)€and€LEX€was€r€=.763,€but€there€areÌnumerous€anomalies€where€the€association€is€much€lower,€leavingÏthe€interpretation€open.€€The€correlations€within€the€200+€sub„¼ñüñ¼ñüñdirectories€of€the€CORNELL€CORPUS„2000€are€normally€much€smaller.ÌÌà  àIt€should€be€noted€that€sentence€length€measures€based€onÌspontaneous€conversations€of€several€participants€are€normallyÏuntrustworthy€since€such€transcriptions€require€extraordinaryÏcare,€and€excellent€recordings„„a€situation€seldom€found€in€theÏgeneral€literature.€€ÌÌà  àB.€òòTOKENS€AND€THEIR€FREQUENCIESóó€€'Tokens'€refer€to€theÏ'words'€in€a€text,€i.e.€terms€separated€by€spaces.€€'Word„types'Ïrefer€to€uniquely€spelled€tokens.€€Note€that€word„types€ignoreÏpolysemy€(which€is€highest€among€common€terms,€but€polysemy€fallsÏoff€rapidly€as€the€type€becomes€less€common„„Hayes,€1988).€€TermsÏranked€beyond€type€1,000€have€few€alternative€meanings€or€majorÏdifferences€in€senses.ÌÌà  àAs€a€general€rule,€the€more€tokens€in€a€text,€the€smallerÏthe€proportion€of€word„types€in€that€text.€€In€their€five€millionÏword€sample€corpus,€Carroll,€Ô_ÔRichmanÔ_Ô€and€Davies€(1971)€found€onlyÐ .Ø'3 Ð86,741€word„types„„the€rest€were€duplicates.€€For€this€reason,Ïthe€ratio€of€word„types€to€tokens€(Ô_ÔTTRÔ_Ô)€in€two€texts€cannot€beÏcompared„„unless€both€texts€have€exactly€the€same€number€ofÏtokens.€€All€printed€text€samples€and€samples€of€television€showÏtranscripts€in€the€Cornell€Corpus€are€based€on€1,000+„wordÏsamples€(ten€stratified€sub„samples),€so€their€Ô_ÔTTRsÔ_Ô€are€onlyÏroughly€comparable.€€They€would€be€exactly€comparable€only€if€oneÌdesignates€exactly€1000€types€in€carrying€out€a€LEX€analysis.ÌÌà  à€€The€second€page€of€the€òòIRS1040.Ô_ÔASCÔ_Ôóó€output€is€the€listingÏof€all€that€text's€word„types,€alphabetically€(on€the€left)€andÏby€their€frequency€of€occurrence€(on€the€right).€This€output€fileÏmay€be€examined€from€your€monitor€(cf.€Section€XIV).€The€fullÏlisting€may€not€interest€to€most€investigators.€€In€this€case,Ïnote€that€the€term€'deduction'€is€the€highest€ranked€in€frequencyÏof€use)€of€the€words„„indicating€something€of€the€content€in€theÏIRS€sample€text.€€The€eleven€most€commonly€used€word„types€in€theÏIRS1040.Ô_ÔASCÔ_Ô€text€convey€virtually€nothing€about€its€content,€yetÏthose€eleven€accounted€for€280€of€the€1000€tokens.€€ÌÌà  à€Among€the€many€uses€for€these€word€lists,€a€child's€use€ofÏspecific€grammatical€or€self/other€reference€terms€during€itsÏfirst€months€and€years€of€speech€can€be€of€use€to€theories€of€aÏchild's€syntactic,€lexical€development€and€self€development.ÌÌà  àC.€òòTABLE€OF€RESIDUALSóó€€Page€8€of€the€IRS1040.out€fileÏcontains€all€the€uncommon€and€rare€words€used€of€that€text,€i.e.Ïthose€which€Ô_ÔQLEXÔ_Ô€did€not€find€among€the€10,000€most€common€word„¼ñüñ¼ñüñtypes€of€English.€€A€few€are€uncommon€inflections€of€commonÏwords,€but€most€are€simply€uncommon€or€rare€names€for€objects,Ïplaces,€events€and€relations.€€Most€òòtypographical€errorsóó€turn€upÏin€this€list,€making€it€essential€that€you€examine€this€list€andÏcorrect€the€text€before€running€Ô_ÔQLEXÔ_Ô€for€the€final€time.€€To€helpÏdetermine€whether€the€word€is€a€genuine€uncommon€or€rare€word„¼ñüñ¼ñüñtype€(not€an€inflection€or€derivation€of€a€common€word),€theÏclosest€preceding€and€following€words€in€Ô_ÔQLEXÔ_Ô's€10,000€wordÏREFERENCE€LEXICON€are€printed€to€the€right€of€each€residual€term.ÌÌà  àD.€òòTHE€CUMULATIVE€PROPORTION€DISTRIBUTIONóó.€€This€is€the€mostÏimportant€table€in€a€QLEX€analysis.€€This€Table€contains€theÏinformation€necessary€for€calculating€the€text's€lexicalÏdifficulty.€€It€is€the€author€or€speaker's€òòpattern€of€wordÏchoiceóó„„a€concept€closely€related€to€the€text's€'accessibility'Ïor€'difficulty'.€€ÌÌà  àThe€òòIRS1040.Ô_ÔASCÔ_Ôóó€output€(cf.€Section€XIV)€shows€that€lexicalÏanalysis€was€based€on€exactly€1,000€tokens;€the€number€of€typesÏin€those€1000€terms€was€340,€making€the€ratio€of€types€to€tokensÏ(the€Ô_ÔTTRÔ_Ô)€.340);€and€its€òòmean€content€wordóó€(after€excludingÏinstances€of€the€first€75€most€common,€function€terms)€occursÏwith€a€frequency€of€206.7€per€million€in€the€Carroll,€et€alÏReference€Lexicon.€€The€words€at€the€10%,€Q1,€Q2,€Q3€and€90%Ð .Ø'3 Ðpositions€in€the€distribution€of€usage€and€their€word€ranks€inÏthe€Reference€Lexicon€are€also€included€as€indicators€of€theÏtext's€dispersion€in€word€choice.ÌÌà  à€€The€left€hand€column€of€this€table€represents€the€firstÏand€most€common€10,000€word„types€in€English.€€In€this€IRSÏpublication€(IRS1040.Ô_ÔASC)Ô_Ô€'the'€alone€accounted€for€7.1€percentÏof€all€the€text's€1,000€words.€In€Carroll,€et€al.'s€ReferenceÏCorpus,€'the'€alone€accounted€for€7.2%€of€all€terms€used.€€WhenÏcombined€with€the€second€most€common€word€in€English€('of'),Ïthose€two€words€account€for€10.1€percent€of€all€words€in€the€IRSÏtext.€€The€first€ten€most€common€word„types€account€for€24.5Ïpercent€of€all€tokens€used€in€the€IRS€text;€the€first€75€(nearlyÏall€function€words)€accounted€for€46.5€percent€of€this€sampleÏtext's€tokens.€€The€first€1,000€types€on€Carroll's€list€accountÏfor€65.8%;€the€first€5,000€account€for€84.7%;€and€the€firstÏ10,000€most€common€English€words€account€for€90.0%€of€the€termsÏin€the€IRS€text,€leaving€exactly€100€rare€words€from€the€originalÏ1,000€as€RESIDUAL€words,€i.e.,€the€words€Ô_ÔQLEXÔ_Ô€could€not€find€inÏits€Reference€Lexicon„„these€are€the€text's€uncommon/rare€words.ÌÌà  àIn€the€òòspontaneous€conversationsóó€recorded€in€their€naturalÌcontexts€of€the€Cornell€Corpus,€'the'€alone€accounts€for€betweenÏ2€and€3€percent€of€all€tokens„„not€the€7€percent€in€print.€TheÏmost€common€75€words€(virtually€all€grammatical€words)€accountÏfor€45%€of€newspaper€words,€44%€in€popular€magazines,€and€43%€inÏadult€books,€but€51%€of€all€words€in€adult„adult€conversations„„Ìand€a€bit€less€when€adults€talk€with€children.ÌÌà  àThe€top€1,000€most€common€word„types€account€for€only€55%€ofÏall€words€in€abstracts€of€articles€in€òòScienceóó;€68%€of€all€wordsÏin€newspaper€texts;€69%€of€all€words€in€general€magazines;€butÏ84%€of€all€words€used€in€adult€speech€to€children€under€age€2;Ï85%€to€children€between€age€2€and€6,€and€85%€for€children€ofÏprimary€school€age.ÌÌà  àThe€top€5,000€word„types€account€for€74%€of€the€words€inÏscientific€abstracts,€84%€of€words€in€newspapers,€85%€in€popularÏmagazines€but€94%€in€adult„to„adult€conversations,€and€nearly€96%Ïin€adult„with„child€texts.€ÌÌà  àAgain,€the€ranking€of€those€10,000€most€commonly€used€wordsÏEnglish€comes€from€Carroll,€et€al's€(1971)€analysis.€While€theirÏcorpus€is€the€most€modern,€largest,€and€most€comprehensive€of€itsÏkind,€it€would€be€desirable€to€have€a€REFERENCE€LEXICON€based€onÏa€far€larger€and€more€diverse€data€base.€€When€one€becomesÏavailable,€it€will€be€substituted€for€the€Carroll,€et€al€list.ÌThe€pattern€of€word€choice€in€Carroll's€corpus€is€virtuallyÏidentical€to€that€in€newspapers,€general€news€magazines€andÏencyclopedias.€€The€Carroll€corpus€òòisóó€representative€of€adultÏword€use,€òòin€printóó,€because€the€publishers€of€those€schoolbooksÏused€the€first€10,000€most€common€word„types€in€virtually€theÐ .Ø'3 Ðsame€way€as€do€authors€writing€for€adults.€The€Carroll€corpus€isÏòòdissimilaróó€in€pronounced€ways€from€word€use€in€spontaneousÏconversation€between€family€members,€formal€conversations€andÏtechnical€writing€(cf.€Hayes,€òòJ.€of€Memory€and€Languageóó,€1988).ÌÌòòà  àANALYSIS€OF€CUMULATIVE€PERCENTAGE€Ô_ÔVSÔ_Ô€RANKóó'„„ò òIGNOREó ó€theseÐ ( è Ðmaterials„„they€were€once€part€of€Ô_ÔQLEXÔ_Ô's€development€but€thatÌtack€was€later€abandoned.ÌÌÌò òòòXI.€RECOMMENDED€MEASURES€OF€LEXICAL€DIFFICULTYóóó ó.€€Ð Ð  ÐÌà  àWork€on€scientific€measures€for€assessing€a€text'sÏaccessibility,€difficulty€and€comprehensibility€is€a€on„goingÏprocess,€just€as€the€fundamental€measures€of€science€continueÌto€be€refined€and€developed.€€So€far€as€is€known,€the€mostÏcomprehensive,€and€best€validated€measure€of€a€text's€difficultyÏisò ò€LEX1€(open€class€word„types€only)ó ó„„based€on€Ô_ÔQLEX'sÔ_Ô€analysis€ofÐ ˆH  Ðedited€natural€texts€and€calculated€by€Ô_ÔMIGNONÔ_Ô€from€that€text'sÏx.123€output€file.€€ÌÌà  àNearly€as€good€a€measure€of€a€text's€difficulty€is€the€Ï'òòÔ_ÔMeanUÔ_Ôóó'€statisticÔ_Ô„„Ô_Ôi.e.,€the€frequency€per€million€tokens€(in€theÏCarroll,€et€al€Corpus)€of€that€word€which€lies€at€the€mean€of€allÏopen€class€words€in€a€sample€text.€€The€larger€that€word's€meanUÏvalue,€the€more€common€the€word€choice€in€that€text.€€Ô_ÔMeanUÔ_Ô€isÏthe€statistic€of€choice€when€texts€samples€are€small€(e.g.€<500Ïwords).€€An€advantage€of€MeanU€is€its€interpretation„„the€higherÌthe€value,€the€simpler€the€text,€the€smaller€the€value,€the€lessÌaccessible€the€text.ÌÌà  àLEX€and€Ô_ÔMeanUÔ_Ô€are€closely€related€Ô_Ô(rÔ_Ô€=€„.976)€in€the€101Ïtexts€which€form€the€Universal€Sample€of€texts€from€the€CornellÏCorpus€(cf.€also€Table€1).€€One€reason€LEX€is€considered€theÏbetter€measure€of€a€text's€lexical€difficulty€is€that€the€entireÏcumulative€proportion€distribution€can€be€closely€approximatedÏwhen€one€knows€a€text's€LEX€scores„„something€which€cannot€beÏdone€with€that€text's€'Ô_ÔMeanUÔ_Ô'€statistic.€€LEX€supplies€far€moreÏinformation€about€the€full€pattern€of€word€choice€at€all€pointsÏfrom€word€rank€76€through€10,000€than€does€'MeanU'.€€ÌÌà  àA€text's€'Ô_ÔMedianUÔ_Ô'€statistic€has€proved€to€be€a€less€validÏmeasure€than€either€LEX€or€Ô_ÔMeanUÔ_Ô€because€it€bears€a€nonlinearÏrelation€to€them€both.ÌÓ›ÓÌòòò òÌÒX°¿ÒÒ°XÒÌÌÌÌÌÐ .Ø'3 ЇXII.€€REFERENCESó óóó.Ð @ ÐÌCarr,€T.€H.€and€B.€A.€Levy.€(Ô_ÔEdsÔ_Ô.)€òòReading€and€Its€DevelopmentÌà  àà ø àà ` àóó1990,€Academic€Press,€San€Diego.ÌCarroll,€J.€B.€in€òòWord€Frequency€Bookóó,€J.€B.€Carroll,€P.€DaviesÌà  àà ø àà ` àand€B.€Ô_ÔRichmanÔ_Ô€1971,€Houghton€Mifflin,€Boston).ÌHayes,€D.€P.€Polysemy€in€the€lexicon,€1987€(unpublished€ms.)ÌHayes,€D.€P.€Speaking€and€writing:€distinct€patterns€of€wordÌà  àà ø àà ` àchoice.€€òòJ.€of€Memory€and€Languageóó,€1988,€27,€572„585.ÌHayes,€D.€P.€and€M.€G.€Ô_ÔAhrensÔ_Ô,€Vocabulary€simplification€forÌà  àà ø àà ` àchildren:€a€special€case€of€'Ô_ÔmothereseÔ_Ô'.€òòJ.€of€ChildÌà  àà ø àà ` àLanguageóó,€1988a,€15,€395„410.ÌHayes,€D.€P.€The€Growing€Inaccessibility€of€Science.€TechnicalÌà  àà ø àà ` àReport€Series,€Department€of€Sociology,€CornellÌà  àà ø àà ` àUniversity,€Ithaca,€NY.€(1991);€also€in€òòNatureóó,€1992,Ìà  àà ø àà ` à356,€739„40.ÌHayes,€D.€P.€Ô_ÔWolferÔ_Ô,€L.€T.€and€M.€F.€Wolfe,€SchoolbookÌà  àà ø àà ` àsimplification€and€it€relation€to€the€decline€in€SAT„Ìà  àà ø àà ` àverbal€scores.€òòAmer.€Educational€Research€Journalóó,€1996,Ìà  àà ø àà ` à33,€489„508.ÌHayes,€D.€P.€and€M.€Ô_ÔSpiveyÔ_Ô.€A€general€model€for€word€choice:Ìà  àà ø àà ` àbehavior€under€stress.€Sociology€Technical€Report€#4,Ìà  àà ø àà ` àCornell€University,€1999.ÌÔ_ÔHerdanÔ_Ô,€G.,€òòType„token€mathematicsóó€1960,€Mouton,€The€Hague.ÌÔ_ÔHerdanÔ_Ô,€G.,€òòThe€Advanced€Theory€of€Language€as€Choice€andÌà  àà ø àà ` àChanceóó.€1966,€Springer„Ô_ÔVerlagÔ_Ô,€Berlin.ÌHolland,€M.€V.,€òòTechnical€Report€No.€12óó,€1981,€AmericanÌà  àà ø àà ` àInstitutes€for€Research,€Washington€D.€C.ÌJust,€M.€A.€and€P.€A.€Carpenter,€€òòThe€Psychology€of€ReadingóóÌà  àà ø àà ` àòòand€Language€Comprehensionóó.€1967,€Boston,€Allyn€andÌà  àà ø àà ` àBacon,€Inc.ÌLevy,€B.€A.€and€T.€H.€Carr,€in€Carr,€T.€H.€and€B.€A.€Levy€(Ô_ÔEdsÔ_Ô)Ìà  àà ø àà ` àòòReading€and€Its€Developmentóó€1990,€Academic€Press,€NewÌà  àà ø àà ` àYork.ÌÔ_ÔSaarnioÔ_Ô,€D.€A.,€Okla,€E.€R.€and€S.€G.€Paris,€in€Carr,€T.€H.€andÌà  àà ø àà ` àB.€A.€Levy,€1990,€òòReading€and€Its€DevelopmentÌà  àà ø àà ` àóóAcademic€Press,€New€York,€57„79.ÌÔ_ÔThorndikeÔ_Ô,€R.€L.€òòReading€comprehension€in€15€countries€óó€inÌà  àà ø àà ` àòòInternational€Studies€in€Education,€IIIóó.€1973,€HalsteadÌà  àà ø àà ` àPress,€New€York.€1„179.ÌWatson,€J.€D.€and€F.€H.€C.€Crick,€òòNatureóó,€1953,€117,€737„8.ÌWells,€G.€òòLanguage€Development€in€the€pre„school€yearsóó.€1985Ìà  àà ø àà ` àCambridge€University€Press,€Cambridge.ÌÌÌÌò òòòÌÌÌÌÌÐ .Ø'3 ЇXIII.€TO€OBTAIN€A€COPY€OF€THE€Ô_ÔQLEXÔ_Ô€PROGRAMS€AND€THIS€Ô_ÔLEXGUIDEÔ_Ôóóó óÐ @ ÐÌCopies€of€Ô_Ôa€CD€containing€the€entire€Cornell€Corpus€of€5000+€Ìalready„edited€texts,€divided€into€more€than€200€sub„directoriesÏ(each€with€its€summary€statistics),€and€the€LEXGUIDE.2K€to€provideÌevidence€for€LEX's€validation,€precision€and€interpretation€can€beÏobtained€from€the€author:€Donald€P.€Hayes,€382€Ô_ÔUrisÔ_Ô€Hall,ÏDepartment€of€Sociology,€Cornell€University,€Ithaca,€New€York,Ï14853.€€My€phone€is€607„255„1425,€e„mail:€dph1@cornell.edu.€òòÌÌÔ_ÔLEXGUIDEÔ_Ôóó€can€be€read€from€or€copied€from€this€Ô_ÔCDÔ_Ô.€ÌÌTo€cite€LEX,€its€applications€or€useÔ_ÔÔ_Ô,€the€reference€should€be€to€ÏDonald€P.€Hayes,€"The€Growing€Inaccessibility€of€Science",€òòNatureóó,Ï(1992)€356,€pp€739„40.ÌÌÌò òòòÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÐ .Ø'3 ЇXIV.€SAMPLE€OUTPUT„„„IRS1040.OUT€óóó óÐ @ Ðò òòòÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÐ .Ø'3 ЇÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÐ .Ø'3 ЇÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÐ .Ø'3 ЇÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÐ .Ø'3 ЇÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÐ .Ø'3 ЇÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÐ .Ø'3 ЇÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÐ .Ø'3 ЇÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÐ .Ø'3 ЇÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÐ .Ø'3 ЇÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÐ .Ø'3 ЇÒX°GÒÒ„XÒXV.€€VARIABLE€IDENTITY€IN€X.321€FILESóóó ó€ñüñ(as€of€6„23„1992)ñüñò òñüñÐ @ ÐñüñÐ @ ÐÓ  Óó óÌThis€file€(x.321)€is€produced€by€the€Ô_ÔMIGNONÔ_Ô€program.€€Its€99Ïvariables€have€the€following€identities:ÌÌVAR.€€€€NAME€€€€€€€€€€€€€€€€€€IDENTITYÌ€1€€€€'01€€€Ô_ÔMLUÔ_Ô'€€à  àmean€length€of€sentence€(in€words).Ð ð ° Ѐ2€€€€'02S36-1k'€à  à'S'€stands€for€the€Area€beneath€the€REFERENCEÐ ¸ x Ðà  àà ø àà ` àà ¸ àà  àLEXICON€(newspaper)€model's€cumulativeÌà  àà ø àà ` àà ¸ àà  àproportion€Distribution.€36„1K€means€the€AreaÌà  àà ø àà ` àà ¸ àà  àunder€the€curve€when€the€partition€is€setÌà  àà ø àà ` àà ¸ àà  àbetween€the€words€ranked€35€and€36.Ì€3€€€€'03S1K10K'€à  àthe€Area€beneath€the€standard€newspaperÐ  `  Ðà  àà ø àà ` àà ¸ àà  àmodel's€cumulative€proportion€distribution,Ìà  àà ø àà ` àà ¸ àà  àfor€words€ranked€from€1000€through€10,000Ì€4€€€€'04S3610K'à  àsame€as€the€standard€model,€but€the€AreaÐ ø¸  Ðà  àà ø àà ` àà ¸ àà  àbeneath€the€cumulative€proportion€distributionÌà  àà ø àà ` àà ¸ àà  àfor€words€ranked€from€36€to€10000.€Used€to€Ìà  àà ø àà ` àà ¸ àà  àcalculate€the€ò òLEX2ó ó€statisticÐ P Ѐ5€€€€'05A36-1K'€à  àthe€Area€between€words€ranked€36€to€1,000€inÐ Ø Ðà  àà ø àà ` àà ¸ àà  àthis€text.Ì€6€€€€'06A1K10K'€à  àsame€as€variable€5„„but€Area€between€wordsÐ ¨h Ðà  àà ø àà ` àà ¸ àà  àranked€36€through€10,000€Ì€7à  à'07A3610K'€€€€à  àsame€as€var.€5€but€Area€between€36€and€10,000„„Ð 8ø Ðà  àà ø àà ` àà ¸ àà  àused€in€calculating€ò òLEX2ó óÐ À Ѐ8à  à'08N36-1K'à  àsame€as€var.€5€but€NET€Area€between€wordsРȈ Ðà  àà ø àà ` àà ¸ àà  àranked€36€through€1,000,€i.e.€after€subtractingÌà  àà ø àà ` àà ¸ àà  àthis€text's€Area€under€the€curve€for€wordsÌà  àà ø àà ` àà ¸ àà  àranked€from€36€to€1000€from€the€StandardÌà  àà ø àà ` àà ¸ àà  àModel's€Area€for€those€same€words.à x àÐ è¨ Ѐ9à  à'09N1k10K'€à  àthe€NET€Area€under€the€curve€for€words€rankedÐ °p Ðà  àà ø àà ` àà ¸ àà  àbetween€1,000€and€10,000.Ì10à  à'10N3610K'€€€€à  àsame€as€#9,€except€NET€Area€for€words€rankedÐ @  Ðà  àà ø àà ` àà ¸ àà  àfrom€36€to€10,000„„this€is€ò òLEX2ó ó.Ð  È! Ð11€à  à'11€N€all'à  àNumber€of€tokens€in€this€textÐ Ð " Ð12€à  à'12Ntypes'€€à  àNumber€of€types€in€this€textÐ ˜!X# Ð13€à  à'13€€€Ô_ÔTTRÔ_Ô'€€à  àType/Token€ratio€for€this€textÐ `" $ Ð14€€à  à'14€Ô_ÔNcontÔ_Ô'€à  àNumber€of€content€terms€(ranked€76€„€10,000)Ð (#è% Ð15à  à'15€Ô_ÔMeanUÔ_Ô'€€à  àText's€Ô_ÔMeanUÔ_Ô€for€content€words,€i.e.€the€Ð ð#°& Ðà  àà ø àà ` àà ¸ àà  àfrequency/million€of€the€mean€content€wordÌà  àà ø àà ` àà ¸ àà  àin€this€text.Ì16€à  à'16€€€the'€à  àProportion€'the'€is€of€all€tokens€in€this€textÐ H& ) Ð17€à  à'17€2€€of'€€à  àCumulative€Proportion„„'the'€and€'of'€togetherÐ 'Ð * Ðà  àà ø àà ` àà ¸ àà  àis€of€all€tokens€in€the€textÌ18€à  à'18€3€and'€€à  àCumulative€proportion„„'the',€'of'€'and'Ð  (`", Ðà  àà ø àà ` àà ¸ àà  àtogether€of€all€tokens€in€this€textÌ19€à  à'19€4€€€a'€€€à  àCumulative€proportion€for€'the'€'of'€'and'Ð 0*ð#. Ðà  àà ø àà ` àà ¸ àà  àand€'a'€together€of€all€tokens€in€this€text.Ì20€€à  à'20€5€€to'€€€à  à[same€pattern„„for€variables€20€through€67]Ð À+€%0 Ð21à  à'21€6€€in'Ð ˆ,H&1 Ð22€€à  à'22€7€€is'Ð P-'2 Ð23à  à'23€8€you'Ð .Ø'3 Ð24à  à'24€9that'Ð @ Ð25€à  à'25€10€it'Ð È Ð26€à  à'26€€€€15'Ð Ð Ð27€€à  à'27€€€€20'Ð ˜X Ð28€€à  à'28€€I€22'Ð `  Ð29€€à  à'29€€€€23'Ð ( è Ð30€€à  à'30€€€€25'Ð ð ° Ð31€€à  à'31€€€€35'Ð ¸ x Ð32€€à  à'32€€€€50'Ð € @ Ð33€€à  à'33€€€€60'Ð H   Ð34€à  à'34€€€€75'Ð Ð  Ð35€à  à'35€€€100'Рؘ  Ð36€€à  à'36€€€150'Ð  `  Ð37€à  à'37€€€200'Ð h(  Ð38€à  à'38€€€250'Ð 0ð  Ð39€à  à'39€€€300'Ð ø¸  Ð40€€à  à'40€€€400'Ð À€  Ð41€€à  à'41€€€500'Ð ˆH  Ð42€à  à'42€€€600'Ð P Ð43€€à  à'43€€€700'Ð Ø Ð44€€à  à'44€€€800'Ð à  Ð45€€à  à'45€€€900'Ð ¨h Ð46€€à  à'46€€1000'Ð p0 Ð47€€à  à'47€€1200'Ð 8ø Ð48€€à  à'48€€1400'Ð À Ð49€€à  à'49€€1600'РȈ Ð50€€à  à'50€€1800'Ð P Ð51€€à  à'51€€2000'Ð X Ð52€€à  à'52€€2500'Ð  à Ð53€€à  à'53€€3000'Ð è¨ Ð54€€à  à'54€€3500'Ð °p Ð55€€à  à'55€€4000'Ð x8 Ð56€€à  à'56€€4500'Ð @  Ð57€€à  à'57€€5000'Ð  È! Ð58€€à  à'58€€5500'Ð Ð " Ð59€à  à'59€€6000'Ð ˜!X# Ð60€€à  à'60€€6500'Ð `" $ Ð61€€à  à'61€€7000'Ð (#è% Ð62€€à  à'62€€7500'Ð ð#°& Ð63€€à  à'63€€8000'Ð ¸$x' Ð64€€à  à'64€€8500'Ð €%@( Ð65€€à  à'65€€9000'Ð H& ) Ð66€€à  à'66€€9500'Ð 'Ð * Ð67€€à  à'67€10000'Ð Ø'˜!+ Ð68€€à  à'68€€10%U'à  àFreq./million€text's€10th%Ô_ÔileÔ_Ô€wordÐ  (`", Ð69à  à'69€€10rk'€€€à  àRank€of€the€text's€10th%Ô_ÔileÔ_Ô€wordÐ h)(#- Ð70€€à  à'70€€€Q1U'€€€à  àFreq./million€of€text's€Q1€wordÐ 0*ð#. Ð71€€à  à'71€€Q1rk'€€à  àRank€of€this€text's€Q1€wordÐ ø*¸$/ Ð72€€à  à'72€€€Q2U'€€€à  àFreq./million€of€text's€Q2€word€€€Ð À+€%0 Ð73€à  à'73€€Q2rk'€€€à  àRank€of€this€text's€Q2€wordÐ ˆ,H&1 Ð74€à  à'74€€€Q3U'€€€à  àFreq./million€of€text's€Q3€word€Ð P-'2 Ð75€à  à'75€€Q3rk'€€à  àRank€of€this€text's€Q3€wordÐ .Ø'3 Ð76€à  à'76€€€90U'€€€à  àFreq./million€of€text's€90th%€wordÐ @ Ð77€€à  à'77€€90rk'€€€à  àRank€of€this€text's€90th%€wordÐ È Ð78€à  à'78€Stand'€€€à  àStandard€Model€(i.e.€newspapers)„„Area€underÐ Ð Ðà  àà ø àà ` àà ¸ àà  àcurve€for€words€ranked€1€through€10,000Ì79€à  à'79St1-35'€€€à  àStandard€Model€(i.e.€newspapers)„„Area€underÐ `  Ðà  àà ø àà ` àà ¸ àà  àthe€curve€for€words€ranked€1€through€35Ì80€à  à'80St361k'€à  àStandard€Model€(i.e.€newspapers)„„Area€underÐ ð ° Ðà  àà ø àà ` àà ¸ àà  àthe€curve€for€words€ranked€36€to€1000Ì81à  à'81St110k'€à  àStandard€Model„„Area€under€the€curve€for€wordsÐ € @ Ðà  àà ø àà ` àà ¸ àà  àranked€1€through€10,000Ì82€à  à'82St1-75'€€€à  àStandard€Model„„Area€beneath€the€curve€for€theÐ Ð  Ðà  àà ø àà ` àà ¸ àà  àwords€ranked€1€through€75„„'closed€class'Ìà  àà ø àà ` àà ¸ àà  àtypes.Ì83à  à'83St7610'€€à  àStandard€Model„„Area€under€the€curve€for€wordsÐ h(  Ðà  àà ø àà ` àà ¸ àà  àranked€76€through€10,000„„the€'Open€class'Ìà  àà ø àà ` àà ¸ àà  àtypes„„used€in€calculating€ò òLEX1ó ó.Ð ø¸  Ð84€à  à'84ALLAre'€€€à  àCombined€Area€under€words€ranked€1„„10,000Ð À€  Ð85€€à  à'85A€1-35'€€€à  àArea€between€word€ranked€1€through€word€rank€35Ð ˆH  Ð86à  à'86A36-1K'€à  àArea€between€words€ranked€36„1,000€Ð P Ð87à  à'87A1K10k'€€€à  àArea€between€words€1,000€and€10,000€Ð Ø Ð88€à  à'88ACLOSD'à  àArea€of€CLOSED€CLASS€types„„1€to€75Ð à  Ð89à  à'89A7610K'€€à  àArea€of€OPEN€CLASS€TYPES„„76€to€10KÐ ¨h Ð90à  à'90NetUns'€€€à  àNet€Area€between€words€ranked€1„35€Ð p0 Ð91€€à  à'91N€1-35'€€€à  àNumber€of€tokens€from€word€ranks€1„35Ð 8ø Ð92à  à'92N36-1K'à  àNumber€tokens€from€word€ranks€36€to€1,000Ð À Ð93à  à'93N1K10K'€€€à  àNumber€tokens€from€word€ranks€1K€to€10KРȈ Ð94à  à'94N€1-75'€à  àNumber€tokens€from€word€ranks€1€to€75Ð P Ð95€à  à'95N7610K'€€à  àNumber€tokens€from€word€ranks€76€to€10K€Ð X Ð96à  à'96N3610K'€€à  àNumber€tokens€from€word€ranks€36€to€10K€Ð  à Ð97€à  à'97S€OPEN'€€€à  àStandard€Model's€OPEN€class€AreaÐ è¨ Ð98à  à'98ArOPEN'€€€à  àThis€text's€OPEN€class€AreaÐ °p Ð99€à  à'99€€€LEX'€€€à  àThe€text's€òòNETóó€OPEN€class€Area„„ò òLEX1Ð x8 Ðó ó100€€€€100€€€€€€€€à  àThe€name€of€this€file,€e.g.€Ô_Ôx.ASCÔ_ÔÐ @  ÐÒX„1\ÒÒ°KÞXÒÌÌò òòòÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÐ .Ø'3 ЇXVI.€CREDITS:€software,€and€others'€assistanceóóó óÐ @ ÐÌà  àVersion€1.0€of€Ô_ÔQLEXÔ_Ô€was€written€by€Peter€Bond€ñüñwritten€ñüñin€the€springÏof€1980€in€BASIC€for€a€DEC€Ô_ÔPDPÔ_Ô„11„34€mini„computer.€€Version€2.0ñ%üñÏthrough€version€5.0€were€written€to€run€under€DOS.€Version€2.0Ïñ&üñÏñ&üñwas€written€by€Scott€Ô_ÔMcAllisterÔ_Ô€in€1982.€€Version€3.0€was€writtenÏin€PASCAL€for€IBM€PCs€by€David€Post€IN€1985.€€The€current€versionÏÔ_ÔÔ_Ô„„the€one€most€extensively€tested€and€used„„ñ'üñ€ñ'üñwas€written€inÏÔ_ÔTurboPASCALÔ_Ô€by€Domingo€Bernardo€in€1988.€€Ô_ÔMignonÔ_Ô€Ô_ÔBelongieÔ_Ô€wroteÏseveral€major€utilities€for€the€Ô_ÔQLEXÔ_Ô€system€in€the€early„1990s.€ÏThe€most€recent€version€(6.0),€was€written€by€Charles€and€MichaelÏÔ_ÔMikolajczakÔ_Ô€in€1997€and€1998,€in€C++,€to€run€withñ)üñunderñ)üñ€the€NTÏoperating€system.€€It€remains€in€the€testing€stage,€but€whenÏready€it€will€be€installed€on€the€Web€for€anyone€to€downloadÏalong€with€this€Ô_ÔLEXGUIDE,€the€CORNELL€CORPUS„2000ñ*üñ,€READMEñ*üñ€and€theÏother€files€on€the€CDÔ_Ô.€€All€these€programmers€were€or€becameÏinterested€in€text€analysis,€and€each€contributed€ideas€beyondÏsimply€writing€the€programs.€Their€contributions€continue€to€beÏappreciated.ÌÌà  àñ+üñ2.€€ñ+üñWhile€many€Cornell€undergraduates€assisted€in€this€line€ofÏresearch„„too€many€to€name„„Margaret€G€Ô_ÔAhrensÔ_Ô€for€several€yearsÏdevoted€her€considerable€talents€as€a€graduate€student€as€ñ,üñ€ñ,üñmyÏcollaborator€and€is€the€co„author€of€several€papers.€€Aside€fromÏher€skill€as€an€experimenter€and€analyst€in€over€a€dozenÏvalidation€experiments€(some€reported€in€her€dissertation),€sheÏmade€countless€intellectual€contributions€and€proposed€numerousÏalternative€ways€of€editing€and€analyzing€these€texts€whichÏstrengthened€the€analyses.€€Those€efforts€are€most€appreciated.€ÌÌà  àCredit€is€also€due€to€the€many€researchersñ/üñscholarsñ/üñ,€librarians,Ìpublic€officials€and€student€subjects€of€experiments€whoÏgenerously€allowed€me€to€use€their€data€(especially€Gordon€WellsÌat€the€Univ.€of€Toronto,€Frank€Whitehead€of€Sheffield€UniversityÏin€the€UK,€and€Brian€Ô_ÔMacWhinney€and€Catherine€Snow€Ô_Ôñ1üñ€ñ1üñat€theÏCarnegie„Mellon€consortium€on€child€language€development-„¼Ô_ÔCHILDESÔ_Ô),€who€gave€me€access€to€theirñ3üñsomeñ3üñ€primary€data,€providedÏaccess€to€subjects€or€participated€in€the€many€experiments„„toÏall„„many€thanks.ÌÌÌÌÌÌÌÌÌÌÌñ4üñÌñ4üñÌFilename:€LEXGUIDE.2K€on€MICRON€c:\cd1;€c:\e;€ZIP€f:\€and€floppyÌñ5üñÔ% € Ôñ5üñ12„Ô% € Ô2„99Ð .Ø'3 ЇÌÒX°A‡Òà0  àñ6üññCüññbüññiüñ