ÿWPCð  ,U.x %¦#¬ » Á 0OÇ ²  . 0O4 ƒ&‰ ¸ Ä "]¯!Ô 04 !d   @!ô !° „ !° 4 ) ä NK ^ M w@Y ™ m› <þ6X9`(CourierXHP LaserJet IIIPh€lexicon€proportional€to€the€log€of€a€tHP3P.PRS2,zÙ,\,\,,ðÿHÀGnuÁ0ÿÿÈGÿÿÈG('2Øy$§§Ý ƒ¤U!ÝÓ  ÓÝ W Ý Ñ°°ÑÿÿÈG('2Øy$©©Ý ƒ¤U!ÝÓ  ÓÝ W ÝÿÿÈG Ѱ°ÑÑààÑÔ™‰?xxx,,Xxà¥þÿþÿqþÿþÿþÿþÿÿÿöÿÿÿÿÿÿ{ÿÿÿÿþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿþÿþÿþÿþÿÿÿÿÿT‰?xxx,,/Xxù£þÿþÿþÿþÿþÿþÿþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿþÿþÿþÿþÿÿÿÿÿÔ«‰?xxx,,fXx/¦þÿþÿþÿþÿþÿþÿþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿþÿþÿÿÿÿÿÿÿÿÿTs¬DbPb,,¡ÐbÜðþÿþÿþÿþÿþÿþÿþÿÿÿþÿÿÿÿÿÿÿìÿÿÿÿþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ”b®BhPh,,ÒÐhá þÿþÿþÿþÿþÿþÿþÿÿÿþÿÿÿÿÿÿÿÿÿÿÿÿÿþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ€D+€D,€D-€D.€D/€D0€D1€D2€D3€D4€D5€D6€D7€D8€D9€D:€D;€D<€D=€D>€D?€D@€DA€DB€DC€DD€DE€DF€DG€DÿÿÿÿI€DJ€DK€DL€DM€DN€DO€DP€DQ€DR€DS€DT€DU€DV€DW€DX€DY€DZ€D[€D\€D]€D_€Da€Db€Dc€Dd€De€Df€Dg€Dh€( ¤U$¡¡Ó  Ó<Ô6X9`+Courier<ˆ 9+CourierBoldp<Ô6Xß9`+"CourierItalic)Ͼ¦ p`CG Timeso,ó¸_ p CG TimesBold¿¤z†&‰ pßDqß DrßDsßtßußDvßwßDxßyßD8ÿU‹ÿÀÀÀÿÿHGB ë‡ € #|Xsü•þÿ"AIP[rƒÿÿÿÿÿÿÿÿÿÿÿÿÿÿﳈE?ÿÿÿÿÿþÿÌÌ0³ËÏ4ÌÏüÀÃüÿ04/ÿÿÿÿÿÿN·ÿÿà«ÿððmpÏŸÿ‰@ð2BP€@LL þÿ"AIP[rƒÿÿÿÿÿÿÿÿÿÿÿÿÿÿﳈE?ÿÿÿÿÿþÿÌÌ0³ËÏ4ÌÏüÀÃüÿ04/ÿÿÿÿÿÿN·ÿÿà«ÿððmpÏŸÿ‰@ð2BP€@LL þÿ"AIP[rƒÿÿÿÿÿÿÿÿÿÿÿÿÿÿﳈE?ÿÿÿÿÿþÿÌÌ0³ËÏ4ÌÏüÀÃüÿ04/ÿÿÿÿÿÿN·ÿÿà«ÿððmpÏŸÿ‰@ð2BP€@LL "âÈ4–6^ì  ÿÿÿÿ¨ ÿÿÿÿÿÿ^GPoxxÕ»PPPxÕPPPPxxxxxxxxxxPPðÕðkÕ­—¤²—Ž­²P]©—Ú²­Š­ —²­ã­­œPPPxxPkxkxkPxxCCxC»xxxxY]Cxx­xxkkxkðñxxxxxxPxxxxxxxxx­k­k­k­k­kÕ ¤k—k—k—k—kPCPCPCPC²x­x­x­x­x²x²x²x²x­x­k²x­x­x­x²xŠx­k­k­k¤k¤k¤k¤k²x—k—k—k—k­x­x­x­x­x­x²x²xPCPCPCPC¯…]©x—C—C—CšH•C²x²‹²x²x­x­xÕ­ Y Y Y]‚[]]—C—C—C²x²x²x²x²x²xã­­xœkšjœk²x—C²x Y]—C­x­x²x­x²x—kPCC­x4PPxxPP/ððððððððððððððððððððððððððððððððððððððððððððððððN­­­xxxPkbbxxÓxxxÕÕxTxxxxÕTPP||xÕ>>ððxx““¶ðxðx­xŽÈÈ……ÕP|xã!T«­­ðððx­ðð}­­ÕÕðððxððððåÕÕðððÕxPPððððð­“ðÕ­­­­­­­­­­Pðx­x…xÕð­ð xPx­kxŠ­»œðð­­­­­­­­­­ðððððoð­­­­ððððððððððððððððððð­­ððððððÈ “»ð­ðððfŽ­­­­ððððððÅ­­­x­­ð­­ððð­­­k­­­­k­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­P­­­P­­­P­­­P­­­­­­­­­­­­­Ø­“—…“… k—]œk²x­kPC©k …Ú…²k x­k­“Šx“xx—k­k “­…»“““­“—]²xPCC­k­kk““k““““PPPxxxxxxx““““““““““““““““““““““]]]]]]]xxxxxxxxxxxxxxxxxxxxxxCCCCCCCCCCCCkkkkkkkkkkkkkkkkkkk““““““““““““““““““““““ŠxkPð"âÈ4–6^ì  ÿÿÿÿ¨ ÿÿÿÿÿÿ^GPoxxÕÅPPPxÕPPPPxxxxxxxxxxPPðÕðxÕ­ ­­ »»]s» ã­»»­… ­­ð­­ PPPxxPx…k…kTx…CP…CÅ…x……k]P…x­xxkxxxðñxxxxxxPxxxxxxxx…­x­x­x­x­xù­­k k k k k]C]C]C]C­…»x»x»x»x­…­…­…­…­x­x­…»x»x­x­x…­x­x­x­k­k­k­k­… k k k k»x»x»x»x»x»x»…»…]C]C]C]CÚ‰s»… C C C ] C­…­¿­…­…»x»x²­k­k­k…]…]…]…] P P P­…­…­…­…­…­…ð­­x k k k­… C­…­k…] P­x­x­…»x­… k]CC»x4PPxxPP/ððððððððððððððððððððððððððððððððððððððððððððððððN­­­xxxPxffxxêxxxÕÕxTxxxxÕTPPŠŠxÕ>>ððxx““¶ðxðx­x—ÈÈ……ÕPŠxãT«­­ðððx­ððŠ­­ÕÕðððxððððåÕÕðððÕxPPððððð­“ðÕ­­­­­­­­­­Pðx­x…xÕð­ð xPx­kxŠ­»¤ðð­­­­­­­­­­ðððððsð­­­­ððððððððððððððððððð­­ððððððÑ “»ð­ðððfŽ­­­­ððððððÅ­­­x­­ð­­ððð­­­x­­­­x­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­P­­­P­­­P­­­P­­­­­­­­­­­­­Ø­“ …“… k ] k»k­k]C»k …ã…­k x»k­““x“xx k­k “­…»“““­“ ]»k]CC»k­kk““k““““PPPxxxxxxx““““““““““““““““““““““]]]]]]]kkkkkkkkkkkkkkkkkkkkkkCCCCCCCCCCCCkkkkkkkkkkkkkkkkkkk“““““““““““““““““““““““xkPðÝ ƒ¤U!ÝÓ  ÓÝ W ÝÔ_ÔÑ7€XXdÈXXdÈ7ÑÑ€‡µÑÑ€ ù·ÑÌÌÌÌà@ì9àˆÌÌ€€€€€ÌÌò ò€€€€€€€€€€€€€€€€€€€€€€€€€€Ô_ÔLEXGUIDEÔ_Ô„2000ó óÐ € @ Ѐ€€€€€€€€€€€€€€€€€ò òA€GUIDE€TO€THE€LEXICAL€ANALYSISÐ H   Ѐ€€€€€€€€€€€€€€€€€€€OF€NATURAL€TEXTS€WITH€Ô_ÔQLEXÔ_ÔÌó óÌ€€€€€€€€€€€€€€€€€€€€€€€€€ò òDonald€P.€Hayesó óÐ  `  ÐÌ€€€€€€€€€€€€€€€€€€ò òSOCIOLOGY€TECHNICAL€REPORT€SERIESÐ 0ð  Ѐ€€€€€€€€€€€€€€€€€€€€€€€€€€€€€#99„7ñpüñÌ€€€€€€€€€€€€€€€€€€€€€€€€€ñrüñ(ñsüñÔ% € ÔñsüññrüñDECEMBER€1999)ó óÐ À€  ÐÌÌÌ€€€€€€€€€€€€€€ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÐ .Ø'3 ÐÒܰÒÒX°Ò€€€€€€€€€€€€€€€€€€€€€€€TABLE€OF€CONTENTSÌÌI.€€€€LEX„„a€scientific€measure€of€a€text's€lexical€difficulty€..€€3Ì€€€€€€€€€€€Table€1„„the€LEX€spectrum€of€natural€texts€...........€€5€ÌII.€€€LEX€is€based€on€Ô_ÔHerdanÔ_Ô's€theoretical€model€of€word€choice€.€€6Ì€€€€€€€€€€€Figure€1„„word€choice€in€63€international€newspapers€.€€8Ì€€€€€€€€€€€Figure€2„„word€choice€in€diverse€texts€...............€€9ÌIII.€€LEX€uses:€theoretical€and€applied€.........................€€9€ÌIV.€€€Conditions€for€using€Ô_ÔQLEXÔ_Ô€..................................13ÌV.€€€€Text€preparation€for€Ô_ÔQLEXÔ_Ô€analysis€........................€14€Ì€€€€€€€€A.€Converting€text€files€from€Ô_ÔX.TXTÔ_Ô€to€Ô_ÔX.ASCIIÔ_Ô€..........€14Ì€€€€€€€€€€€€€€IRS1040.Ô_ÔASCÔ_Ô„„sample€text€ready€for€Ô_ÔQLEXÔ_Ô€analysis€..€14Ì€€€€€€€€B.€òòBare„minimumóó€text€preparation€for€Ô_ÔQLEXÔ_Ô€analysis€......€15Ð  `  Ѐ€€€€€€€C.€NORMAL€text€preparation€for€Ô_ÔQLEXÔ_Ô€analysis.............€16€ÌVI.€€€Editing€rules€.............................................€17€Ì€€€€€€€€A.€òòQuick€and€dirtyóó€text€editing€.........................€17Ð ø¸  Ѐ€€€€€€€B.€NORMALò ò€ó ótext€editing:€general€rules€...................€17Ð À€  Ѐ€€€€€€€C.€Print's€distorting€effects€on€the€REFERENCE€LEXICON€..€22Ì€€€€€€€€D.€Ô_ÔLEXEDITÔ_Ô„„semi„automatic€text€editing€.................€23ÌVII.€€Performing€a€Ô_ÔQLEXÔ_Ô€analysis€................................€24Ì€€€€€€€€A.€Installing€Ô_ÔQLEXÔ_Ô€and€MIGNON€...........................€24€Ì€€€€€€€€B.€Step„by„step€procedures€..............................€24ÌVIII.€Ô_ÔMIGNONÔ_Ô€calculates€the€LEX€statistics€......................€33Ì€€€€€€€€A.€Procedures€for€running€Ô_ÔMIGNONÔ_Ô€........................€33Ì€€€€€€€€B.€Variable€identity€in€Ô_Ôx.LEXÔ_Ô€output€files€..............€35€ÏIX.€€€Interpreting€the€LEX€statistic€............................€36Ì€€€€€€€€A.€Word€difficulty€and€regularities€in€word€choice€......€36Ì€€€€€€€€B.€The€REFERENCE€LEXICON€and€word€choice€in€newspapers€..€37Ì€€€€€€€€C.€A€more€general€model€for€word€choice€.................€38Ì€€€€€€€€D.€LEX's€stability€over€the€past€334€years€..............€38Ì€€€€€€€€E.€LEX's€validity€.......................................€39Ì€€€€€€€€F.€LEX€validation€using€the€5000+„text€Cornell€Corpus€...€39Ì€€€€€€€€G.€LEX€validation€using€basal€readers€for€grades€1„8€....€39ÌX.€€€€Interpreting€Ô_ÔQLEXÔ_Ô's€other€statistics€......................€39Ì€€€€€€€€A.€Sentences:€mean€length€and€distributions€.............€40€Ì€€€€€€€€B.€Tokens,€types€and€their€frequencies€..................€40Ì€€€€€€€€C.€Table€of€Residuals€...................................€41Ì€€€€€€€€D.€Cumulative€proportion€distribution€...................€41€ÌXI.€€Recommended€measures€of€lexical€difficulty€.................€43ÌXII.€References€.................................................€44Ô% € ÔÏÔ_ÔXIII.ObtainingÔ_Ô€a€copy€of€Ô_ÔQLEXÔ_Ô€and€this€Ô_ÔLEXGUIDEÔ_Ô€.................€45ÌXIV.€Sample€Ô_ÔQLEXÔ_Ô€output:€IRS1040.OUT€............................€46ÌXV.€€Variable€identity€in€x.321€files€...........................€56ÌÔ_ÔXVI.€CreditsÔ_Ô:€software€and€professional€.........................€59ÌÌÌÌÌÌÌÒ°XÒ††ò òòòÌÐ .Ø'3 ЇÌÌÌñfüñÌñfüñI.€€LEX„„a€natural€science€measure€of€any€English€text'sÏaccessibility/lexical€difficulty/comprehensibilityóóó ó.€Ó*XXk°˜X°›X*ÓÐ `  ÐÌLEX€scores€describe€how€the€òò10,000óó€most€common€grammatical€andÏcontent€words€in€the€English€language€were€used€in€a€specificÏtext.€€That€text's€word€use€is€always€compared€to€a€referenceÏlexicon€(Carroll,€Davies€and€Richman,€òòWord€Frequency€Bookóó,Ï1971).€€In€their€reference€lexicon,€the€range€of€word€choiceÏwas:€'the'€(which€occurred€~73,000€times€per€million€words€inÏnatural€texts)€to€the€10,000th€ranked€word€'tournament'€(whichÏoccurred€an€estimated€2.8€times€per€million).ÌÌLEX€is€a€natural€science€measure„„which€meets€all€requirementsÌfor€such€measures.€€First,€it€is€based€on€a€formal€mathematicalÏmodel€of€word€choice€(Ô_ÔHerdanÔ_Ô,€1966).€€Herdan's€model€has€beenÏsupported€abundantly€by€research€in€psycholinguistics,€cognitiveÏpsychology,€and€by€decades€of€experience€with€standardized€'wordÏknowledge'€tests„„around€the€world€in€many€languages.€€The€LEXÏstatistic€also€has€ratio„scale€measurement€properties€(includingÏa€true€zero).€€Furthermore,€LEX€satisfies€Louis€Ô_ÔGuttmanÔ_Ô's€formalÏrequirements€for€cumulative€scales.€€LEX€is€a€rare€measure€forÏthe€human€sciences€because€it€is€also€a€stable€measure„„EnglishÏnewspaper€texts€have€been€written€at€virtually€the€same€LEXÏlevel,€all€over€the€world,€since€1665.€Ìò òÌó óEvery€LEX€score€has€both€a€sign€and€a€magnitude.€€If€the€wordsÐ  à Ðof€a€text€are€skewed€toward€common€words,€then€its€LEX€score€isÏòònegativeóó,€i.e.€its€text€is€simpler€than€the€average€newspaper„„Ïwhich€serves€are€the€common€reference€standard.€€Examples€ofÏtexts€with€negative€LEX€scores€include€mother€talk€to€theirÏchild€in€the€home.€€When€word€choice€is€skewed€toward€the€rareÏwords€of€English,€LEX€signs€are€òòpositiveóó,€i.e.,€such€texts€areÏmore€difficult€than€the€average€newspaper.€€An€example€would€beÏresearch€articles€published€by€òòNatureóó€in€1994.€€The€òòmagnitudeóó€ofÏthe€LEX€score€describes€the€òòdegreeóó€to€which€word€choice€isÏskewed,€when€compared€with€Ô_Ô(a)€HerdanÔ_Ô's€theoretical€linearÏpattern€of€word€choice,€(b)€the€highly€correlated€Carroll€etÏal's€corpus,€and€(c)€the€linear€pattern€of€word€choice€foundÏempirically€in€English„language€newspapers€since€1665.ÌÒX°ªÒÌLEX's€precision€is€conditional€on€the€sample€text's€size€and€on€howÏthe€sample€was€derived:€by€stratified€simple€random€sampling€or€byÏsome€less€systematic€sampling€procedure.€€LEX€is€suitable€forÏanalyzing€both€old€and€modern€texts€and€should€be€suitable€forÏanalyzing€English€texts€well€into€the€next€century.€€For€texts€derivedÏby€24€stratified€SRS€of€>150€words€each,€the€standard€error€ofÏmeasurement€is€1.3€LEX„„i.e.,€1.3€units€in€a€measure€whose€empiricalÏrange€to€date€runs€from€LEX€=€+58€to€LEX€=€„81.€€Normally,€LEX€scoresÏare€based€on€text€samples€of€1,000€or€more€words,€in€ten€or€more€sub„Ð .Ø'3 ÐñfüñÐ .Ø'3 мñfüñsamples€of€100€or€more€consecutive€words,€in€full€sentences,€taken€byÏstratified€random€sampling€methods.€In€1000„word€samples,€the€standardÏerror€is€closer€to€2€LEX.€€While€not€yet€proven,€the€substitution€ofÏFrench,€German€or€any€other€Reference€Lexicon€for€the€English€lexiconÏshould€yield€equally€strong€measures€of€a€text's€accessibility€andÏlexical€difficulty.ÌÌòòLEX€and€readabilityóó€scores€are€related€(r=€~+.70)€but€'readability'Ïscores€are€Ô_ÔatheoreticalÔ_Ô,€have€not€been€validated€to€the€same€degreeÌor€correlate€as€highly€with€multiple€criteria€as€LEX€[LEX€is€groundedÏon€a€very€large,€carefully€constructed,€comparative€data€base„„theÏCornell€Corpus„2000].€€By€contrast,€LEX€is€a€well„validated€naturalÏscience€measure€interpreted€as€measuring€a€text'sÏ'accessibility/lexical€difficulty/comprehensibility'.ÌÌLEX€is€wrongly€linked€to€'readability'€measures€(e.g.,€the€Ô_ÔFleschÔ_Ô„¼ñfüñ¼¼ñfüñÔ_ÔKincaidÔ_Ô,€or€Gunning€Fog€indices).€€Ô_ÔFleschÔ_Ô€developed€his€pragmatic€toolÏto€aid€primary€school€teachers€in€making€quick€decisions€on€theÏsuitability€of€books€for€their€students„„taking€into€considerationÏtheir€reading€skills,€level€of€comprehension€and€interests.€€HisÏ'readability'€measure€is€a€composite€of€two€measures:€(a)€the€fractionÏof€words€with€six€or€more€letters,€and€(b)€the€text's€average€lengthÏof€sentence„„in€words.€€ÌÌOne€reason€LEX€and€readability€scores€produce€different€estimates€ofÏthe€same€text's€difficulty€is€that€'òòreadability'€texts€are€not€editedóó.€ÏAll€LEX€texts€are€edited€according€to€a€comprehensive€set€ofÏtranscription€rules„„a€necessity€step€if€two€or€more€texts€are€to€beÏcompared€and€their€differences€interpreted€correctly.ÌÌ'Text€difficulty'€is€a€multi„dimensional€concept€with€importantÏgrammatical,€semantic€and€lexical€component(s).€LEX€does€not€measureÏsome€of€these€components.€€It€has€been€difficult€to€compare€theirÏrelative€contributions€to€overall€text€comprehension.€ÌÌÒ°XÒÒÜÒThe€spectrum€of€lexical€difficulty€in€English„language€texts€isÏshown€in€ò òTABLE€1ó ó€according€to€LEX€and€related€scores.Ð `" $ ÐÌ€€€€€€€€€€€€€€€€€€€€€€€€Table€1€about€hereÌÌÌÌÌÌÌÌÌÌÌÌÌñfüñÑ ù·ÑѰ°ÑÑà°ÑÑ  ÑñfüñÐ .Ø'3 ÐÑ ù·ÑѰ°ÑÑà°ÑÑ  ÑÒX°ÒÒ°Ò€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€TABLE€1Ì€SPECTRUM€OF€NATURAL€TEXTS€AND€THE€LEVELS€AT€WHICH€THEY€WERE€PITCHEDÌ€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€*€€€€€**Ì€€€€òòSource€of€textóó€€€€€€€€€€€€€€€€€òòDate/Nóó€€€òòLEXóó€€€òòÔ_ÔMeanUÔ_Ôóó€òòÔ_ÔMedianUÔ_Ôóó€òò%RareóóÌÌòòÓ?À ‡Xð¼x!(#‹XXXk°˜X?ÓNatureóó„„Ô_ÔtranshydrogenaseÔ_Ô€article€€€€1960€€€€58.6€€€€44€€€€€~1€€€€37.8ÌòòCellóó„„research€articles€€€€€€€€€€€€€1989€€€€41.1€€€€91€€€€€€6€€€€27.6ÌòòNatureóó„„main€research€articles€€€€€€1990€€€€34.7€€€107€€€€€€8€€€€25.2ÌòòJ.€Amer.€Chem.€Assoc.óó„„articles€€€€€1990€€€€33.3€€€€94€€€€€13€€€€20.5ÌòòNew€Ô_ÔEnglÔ_Ô.€J.€of€Medicineóó„„articles€€1990€€€€27.0€€€126€€€€€14€€€€18.8ÌòòScienceóó„„main€research€articles€€€€€1990€€€€26.9€€€120€€€€€17€€€€19.1ÌòòScientific€Americanóó„„articles€€€€€€€1991€€€€14.3€€€158€€€€€28€€€€14.4ÌWatson„Crick€DNA€model€in€òòNatureóó€€€€1958€€€€11.1€€€164€€€€€35€€€€10.7ÌòòBlack€Scholaróó„„articles€€€€€€€€€€€€€1994€€€€10.3€€€171€€€€€32€€€€12.2ÌòòNew€Scientistóó„„articles€€€€€€€€€€€€€1990€€€€€7.2€€€191€€€€€32€€€€11.3ÌòòPopular€Scienceóó„„articles€€€€€€€€€€€1994€€€€€4.6€€€197€€€€€39€€€€11.5ÌIRS„„Instructions€for€Form€1040€€€€€1988€€€€€3.1€€€208€€€€€40€€€€10.0ÌòòòŽòTimeóŽóóó„„articles€€€€€€€€€€€€€€€€€€€€€€1994€€€€€1.6€€€211€€€€€41€€€€11.4ÌòòEconomistóó„„articles€€€€€€€€€€€€€€€€€1990€€€€€0.9€€€210€€€€€47€€€€10.4ÌÌò òNEWSPAPERS€(English„language,€Intl)€N=61€€€€€0.0€€€216€€€€€51€€€€€9.2Ð à  Ѐó ó€€Ð ¨h ÐòòNature€(1900)„„res.€articlesóó€€€€€€€€1900€€€€„0.5€€€226€€€€€46€€€€€8.9ÌòòNational€Geographicóó„„articles€€€€€€€1984€€€€„0.6€€€199€€€€€53€€€€€8.7ÌòòDiscoveróó„„articles€€€€€€€€€€€€€€€€€€1990€€€€„2.6€€€229€€€€€48€€€€10.2ÌòòThe€Timesóó€(London„„1791)€€€€€€€€€€€€€€€€€€€€„3.0€€€233€€€€€50€€€€€8.3ÌòòNew€Yorkeróó„„articles€€€€€€€€€€€€€€€€1994€€€€„3.9€€€231€€€€€62€€€€10.2ÌColonial€American€newspapers€€€1720„1730€€€€„4.3€€€236€€€€€53€€€€€8.5ÌòòModern€Maturityóó„„articles€€€€€€€€€€€1985€€€€„5.0€€€230€€€€€58€€€€€8.8ÌòòSmithsonianóó„„articles€€€€€€€€€€€€€€€1988€€€€„9.1€€€264€€€€€88€€€€€9.1ÌòòSports€Illustratedóó„„articles€€€€€€€€1994€€€„10.3€€€257€€€€€89€€€€€8.4ÌSport€sections€from€newspapers€€€€€€€€€€€€€„12.3€€€264€€€€€74€€€€€7.6ÌAdult€books„„fiction,€USA€€€€€€€€€€N=€34€€€„15.8€€€282€€€€101€€€€€7.4ÌòòRanger€Rickóó„„science€for€children€€€€€€€€€€„18.4€€€291€€€€120€€€€€6.1ÌNewspaper€funnies€€€€€€€€€€€€€€€€€€€€€€€€€€„21.6€€€309€€€€112€€€€€5.6ÌNancy€Drew€mystery€series€€€€€€€€€€N=€69€€€„23.4€€€291€€€€129€€€€€4.2ÌÔ_ÔComicbooksÔ_ÔÔ_Ô„„Ô_ÔGB€&€USA€€€€€€€€€€€€€€€N=€37€€€„23.7€€€318€€€€133€€€€€6.0ÌBooks€read€at€age€10„14,€GB€€€€€€€€N=261€€€„24.3€€€312€€€€136€€€€€5.0̆TV„„cartoon€shows€€€€€€€€€€€€€€€€€€N=€26€€€„28.6€€€339€€€€152€€€€€5.0ÌBooks€read€at€age€9„12,€USA€€€€€€€€N=€94€€€„29.0€€€325€€€€160€€€€€3.9ÌTV„„reruns„„popular€w/€children€€€€N=€33€€€„35.3€€€359€€€€189€€€€€3.3ÌTV„„Ô_ÔprimetimeÔ_Ô€shows€€€€€€€€€€€€€€€€N=€44€€€„36.4€€€371€€€€194€€€€€3.3ÌPre„school€books€for€children€€€€€€N=€31€€€„37.0€€€360€€€€181€€€€€2.6ÌAdult„to„adult€conversations€€€€€€€N=€68€€€„37.5€€€375€€€€211€€€€€2.5ÌLegal€wiretaps€on€cocaine€dealer€€€€1990€€€„42.2€€€387€€€€222€€€€€2.7ÌòòWinnie€the€Poohóó„„Milne€€€€€€€€€€€€€€€€€€€€€„43.3€€€381€€€€210€€€€€1.8ÌTV„„òòSesame€Street€&€Mr.€Rogers'óó€€€€€€€€€€€€„44.1€€€397€€€€228€€€€€1.0ÌMothers'€talk€to€children,€age€5€€€N=€32€€€„45.8€€€394€€€€222€€€€€1.5ÌFarmer€talking€to€his€dairy€cows€€€€€€€€€€€„56.0€€€520€€€€292€€€€€3.1ÌPre„Primer„„Scott„Foresman€€€€€€€€€€1956€€€„80.5€€€640€€€€646€€€€€0.0Ì*€€U€=€Frequency€per€million€tokens€(Carroll,€Davies€&€Ô_ÔRichmanÔ_Ô,€1971)Ì**€Percent€of€tokens€ranked€outside€the€first,€most€common€10,000ÌÌÒ°XÒÌÐ p00*6 ÐчµÑѰàÑÑ  чÔ ” Ô€€€€€LEX€has€been€validated€in€numerous€ways:€by€experiments€inÏwhich€predictions€as€to€the€direction€of€changes€in€LEX€levelsÏunder€variable€conditions€were€tested;€by€comparison€with€otherÏless€well„validated€measures;€by€its€compliance€with€Herdan'sÏ(1966)€general€theoretical€model€of€word€choice;€and€by€severalÏkinds€of€substantive€research.€€These€include:€(a)€the€'Ô_ÔdumbingÔ_ÔÏdown'€of€American€basal€readers,€beginning€in€1947,€when€comparedÏwith€pre€World€War€II€basals;€(b)€the€growing€inaccessibility€ofÏscience€journals€to€the€educated€reader,€since€World€War€II;€(c)Ïthe€comparative€richness€of€the€natural€language€experiences€ofÏchildren€growing€up€in€underclass€through€upper„middle€classÏfamilies;€and€(d)€by€predictions€of€the€same€witness's€LEX€levelÏwhen€testifying€under€direct€and€cross„examination.€ÌÌ€€€€LEX's€interpretation€as€a€measure€of€a€text's€'accessibility,Ïlexical€difficulty€and€comprehensibility'€has€also€been€validatedÏby€internal€comparisons€among€the€5000+€texts€edited€and€analyzedÏin€a€file€named€òòCornell€Corpus„2000óó.€€This€Corpus€includes€everyÏmajor€text€category:€broadcasts,€print,€and€conversation,€formalÏand€informal.ÌÌò òòòII.€LEX€is€based€on€Ô_ÔHerdanÔ_Ô's€theoretical€model€of€word€choiceóóó ó.€€Ð ¨h ÐñgüñÌñgüñ€€€€The€abstract€generalizations€formulated€in€mathematics,€inÏphysics€or€in€syntax€are€among€mankind's€most€powerful€intellectualÏachievements.€€The€power€of€those€generalizations€is€due,€in€part,Ïto€their€essential€independence€from€specific€content.€€ForÏGalileo's€laws€of€motion,€it€does€not€matter€(theoretically)Ïwhether€the€falling€body€is€a€planet,€feather€or€cannonball,€soÏlong€they€are€in€a€vacuum.€€In€mathematics,€the€numeric€units€mayÏbe€dollars,€weights€or€distances„„which€specific€money€unit€doesÏnot€matter,€since€the€fundamental€relations€refer€to€a€higher€orderÏof€abstraction.€€Similarly€in€syntax,€which€specific€noun€is€usedÏin€a€phrase€or€sentence€is€unimportant„„what€does€matter€for€syntaxÏis€that€word's€grammatical€role€in€the€sentence.ÌÌ€€€€The€model€underlying€the€LEX€statistics€similarly€describes€anÏabstract€feature€of€all€texts:€òòits€pattern€of€word€choiceóó.€€OneÏcannot€detect€these€patterns€by€simply€reading€a€text€becauseò ò€theÐ ð#°& Ðorder€in€which€words€appear,€so€important€to€syntax€and€semantics,Ïis€irrelevant€to€the€text's€pattern€of€word€choiceó ó.€€The€words€ofÐ €%@( Ðany€large€text€may€be€randomized€in€a€thousand€different€ways,€yetÏthe€pattern€of€word€choice€and€the€LEX€scores€will€remain€the€same.ÌÌÔ_Ô€€€€HerdanÔ_Ô€(1966)€was€among€the€first€to€identify€a€pattern€of€wordÏchoice„„he€proposed€that€word€choice€is€a€special€case€of€the€€ÏòòÔ_ÔlognormalÔ_Ôóó€statistical€distribution.€€By€that€time,€it€was€alreadyÏknown€that€the€word€types€in€the€major€word€corpuses€(e.g.,ÏÔ_ÔThorndikeÔ_Ô€&€Ô_ÔLorgeÔ_Ô;€Dale;€and€Brown€University)€have€quite€similarÏrankings.€€Ô_ÔHerdanÔ_Ô's€formal€theoretical€model€for€word€choice€statesÏthat€the€probability€of€a€word's€choice€in€a€text€is€conditional€onÏthe€log€of€that€word's€general€frequency€of€usage.€€òòThat€impliesñhüñÔ% € ÔñhüñÐ .Ø'3 Ðthat€personal€lexicons€reflect€differential€experience€with€theÏEnglish€lexicon.óóÌÌ€€€€A€package€of€programs€named€Ô_ÔQLEXÔ_Ô€identifies€these€patterns€ofÏword€choice€and€produce€the€LEX€statistics€from€a€òòcumulativeÏproportion€distributionóó.€€This€is€done€by€first€determining€whatÏfraction€of€a€text's€words€is€'òòtheóó'„„English's€most€common€word.€ÏTo€that€proportion€is€added€the€proportionate€use€made€of€English'sÏsecond€most€common€type„„'of';€then€the€fraction€which€is€the€thirdÏtype€'and'€is€added€to€the€cumulative€proportion€distribution;€thenÏthe€fourth€most€common€word„„'a';€the€fifth„„'to',€and€so€onÏthrough€all€the€10,000€most€commonly€used€word„types€in€English.€ÏThe€resulting€cumulative€proportion€distribution€supplies€theÏtext's€òòpattern€of€word€choiceóó.€€ÌÌLEX€and€several€other€lexical€measures€describe€those€patterns.€€InÏevery€Ô_ÔQLEXÔ_Ô€analysis,€the€ranking€of€word€types€in€the€ReferenceÏLexicon€(which€is€always€represented€along€the€X„axis€by€the€òòlogóó€ofÏeach€word's€rank)€is€based€on€two€bodies€of€evidence:€(a)€theÏlargest€and€most€diverse€corpus€of€English€word€usage,€Carroll,ÏÔ_ÔRichmanÔ_Ô€and€Davies'€òòThe€Word€Frequency€Bookóó€(1971);€and€(b)€theÏpattern€of€word€choice€in€newspapers€(an€early€finding€in€thisÏcourse€of€this€line€of€research).€€These€first€10,000€most€commonlyÏused€word€types€are€ò òa€constantó ó€in€every€Ô_ÔQLEXÔ_Ô€analyses,€i.e.,€theirÐ 8ø Ðrank€in€the€first€10,000€most€common€English€word„types€serves€asÏthe€Reference€Corpus€against€which€every€sample€text's€actualÏchoice€of€words€is€compared.€€When€a€new,€contemporary,€far€largerÏbut€equally€diverse€and€well„designed€corpus€becomes€available,ÏCarroll,€et€al's€lexicon€will€be€replaced€and€this€new€Corpus€willÏserve€as€Reference€Lexicon.€Presumably€its€ranking€and€estimatedÌfrequency€of€usage€per€million€words€will€be€more€precise„„thoughÏthe€correlation€of€word€ranks€in€the€new€Reference€Lexicon€withÏCarroll,€et€al.€and€the€older€corpuses€should€still€remain€wellÏabove€r=.95.€€ÌÒX°ù6ÒÒ°XÒÓ98ŽX(#‹XÀ ‡Xð¼x!(#‹X9ÓÌ€€€€Figure€1€describes€the€empirical€òòpattern€of€word€choiceóó€in€63Ïnewspaper€samples€drawn€from€all€over€the€world€between€1665€andÏ1992.€€The€patterns€of€word€choice€in€these€newspapers€was€comparedÏagainst€Ô_ÔHerdanÔ_Ô's€theoretical€model„„which€is€approximately€linearÏwithin€this€range.€€Each€newspaper€sample€consisted€of€ten€or€moreÏsub„samples€of€100€or€more€words.€€Newspaper€word€choiceÏdistributions€generally€fit€the€linear€pattern€predicted€byÏÔ_ÔHerdanÔ_Ô's€model.€€Aside€from€this€fit,€it€is€notable€that€theÏdistributions€for€the€Q1,€Q2€and€Q3€newspapers€òòare€parallelóó.€€ThatÏimplies€that€it€was€their€differential€use€of€a€single€word„„'the'Ïthat€accounts€for€the€minor€differences€in€their€patterns€of€wordÏchoice,€even€in€samples€as€small€as€1000+€words.€If€one€equatesÏthose€newspapers'€use€of€'the',€the€Q1,€Q2€and€Q3€newspapers'€useÏof€the€first€10,000€most€common€word€types€of€English€is€similar.ÌÌÌÐ .Ø'3 ЇÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌñ;üñÌñ;üñÌÌ€€€€The€standardized€educational€and€occupational€testing€industryÏearly€capitalized€on€these€powerful€regularities€in€word€choice€byÏconstructing€'word€knowledge'€tests.€€Empirically,€children€wereÏfound€to€rank€the€words€of€the€English€lexicon€in€largely€the€sameÏorder€which€coincides€with€the€order€in€Thorndike€and€Lorge,€Dale,ÏBrown€University€and€the€Carroll,€Richman€and€Davies€corpusus.€ÏIndependent€tests€showed€that€a€test€word's€'difficulty'€was€highlyÏcorrelated€with€its€general€frequency€of€usage„„'hard'€words€wereÏusually€uncommon€or€rare,€'easy'€words€were€nearly€always€common.ÏThe€standardized€tests€were€validated€by€the€discovery€thatÏstudents€who€scored€well€on€word€knowledge€tests€generally€knowÏmore€of€the€uncommon€words€of€the€lexicon,€had€higher€levels€ofÏreading€comprehension,€academically€performed€well,€and€achieved€atÏa€higher€levels€on€other€texts€of€verbal€skills.€€Research€byÏcognitive€psychologists€also€found€that€acquisition€of,€access€to,Ïand€retrieval€times€for€words€from€a€lexicon€were€strongly€relatedÏto€a€word's€general€frequency€of€usage.€€While€the€specific€neuralÏmechanisms€involved€are€only€now€beginning€to€be€worked€out€andÏmuch€remains€unknown,€it€is€clear€is€that€the€choice€of€words€inÏtexts€and€one's€word€knowledge€follow€strong€mathematicalÏregularities.ÌÌ€€€Figure€2€describes€word€choice€patterns€which€represent€much€ofÏthe€effective€range€of€LEX.€The€òòtopmostóó€cumulative€proportionÏdistribution€represents€the€pattern€of€word€choice€of€32€mothers'Ïwhile€speaking€with€their€child€at€home€(all€the€children€were€30Ïmonths€of€age€at€the€time).€€The€mothers'€mean€distribution€isÏstrongly€skewed€toward€common€words€(relative€to€theoretical€linearÏdistribution€in€Ô_ÔHerdanÔ_Ô's€model€and€in€newspapers).€€The€òòmiddleóóÏdistribution€describes€a€single€1000+€word€sample€from€the€òòNew€YorkÐ .Ø'3 ÐTimesóó€(1987).€€It€too€approximates€the€average€newspapers'€linearÏpattern.€€The€òòbottomóó€of€the€three€distributions€represented€is€theÏpattern€of€word€choice€in€the€main€research€articles€in€òòNatureóóÏ(1994).€€Every€article€in€that€general€scientific€journal€is€skewedÏtoward€rare€words€and€the€degree€of€that€skew€has€grown€in€eachÏdecade€since€World€War€II.€Ó=À ‡Xð¼x!(#‹X8ŽX(#‹XEL=ÓÓ9ÜŽX(#‹XÀ ‡Xð¼x!(#‹X9ÓÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌñ=üñÌñ=üñÌÓ=À ‡Xð¼x!(#‹XÜŽX(#‹XãY=ÓÒX°7LÒÒ°£HXÒò òòòIII.€€LEX€uses:€theoretical€and€appliedóóó ó.€€Ð `" $ ÐÌ€€€€The€first€theoretical€use€to€which€LEX€was€put€was€to€test€aÏniche„like€model€of€how€text€difficulty€shaped€the€dynamics€of€theÏscience€publishing€industry€since€World€War€II.€€An€explosiveÏgrowth€in€the€number€of€scientists€was€associated€with€a€growth€inÏscience€journals€and€magazines.€€The€principal€hypothesis€testedÏwas€that€science€magazines€must€dispersed€themselves€(lexically)€soÏas€to€minimize€competition€for€readers€and€advertisers,€whileÏmaximizing€the€number€of€subscribers.€€When€magazines€sought€toÏoccupy€the€same€lexical€niche,€two€outcomes€were€apparent:€eitherÏthe€magazine€shut€down€or€the€publisher€moved€into€a€new€lexicalÏniche€by€changing€the€mean€LEX€level€of€its€principal€articles.€€ÌÌ€€€€€For€example,€a€series€of€musical€chairs„like€moves€wasÏprovoked€in€the€science€journal€industry€when,€in€1947,€òòNatureóóÐ .Ø'3 Ðvacated€its€historical€niche€(LEX€=€~0.0)„„the€mean€level€of€aÏnewspaper.€€By€1950,€òòNatureóó's€major€articles€had€risen€to€LEX€=Ï+17,€reflecting€a€decision€to€allow€authors€to€write€specificallyÏfor€scientists€and€to€ignore€the€educated€person.€€òòScienceóó€held€toÏits€traditional€niche€(LEX€=€+6)€until€1960,€then€it€followedÏòòNatureóó's€lead€and€began€published€major€articles€at€higher€LEXÏlevels,€by€allowing€scientists€to€write€at€more€difficult€levels.€ÏòòScience'sóó€time€series€had€reached€LEX€=€+15€by€1970€[by€which€time,ÏòòNatureóó's€articles€averaged€LEX€=€+25].€Sensing€òòScience'sóó€now„¼ñ=üñ¼ñ=üñvacated€niche,€the€publisher€and€editors€of€òòScientific€AmericanóóÏchanged€its€LEX€level€(from€~0.0€to€LEX€=€+5€by€1970),€effectivelyÏmoving€into€òòScienceóó's€now€vacated€niche.€€By€1980,€òòScientificÏAmericanóó's€average€article€had€drifted€to€LEX€=€+10,€which€mustÏhave€exceeded€many€of€its€subscribers'€ability€to€understand€theÏarticles€because€within€a€short€period,€well€over€a€hundredÏthousand€subscribers€Ô_Ôfailed€to€renew€their€subscriptions.€€That,€inÏturn,€induced€many€advertisers€to€withdraw€as€well.€€That€loss€ofÏsupport€weakened€the€magazine's€financial€position,€and€made€it€aÏtarget€of€a€corporate€acquisition.€€ÌÌ€€€€€By€vacating€its€traditional€LEX€=€0.0€niche€(which€òòScientificÏAmericanóó€had€long€shared€with€òòPopular€Scienceóó),€that€niche€wasÏviewed€as€an€opportunity€by€the€editors€of€four€new€scienceÏmagazines,€who€felt€there€was€now€room€for€a€magazine€designedÌfor€the€educated€reader€interested€in€science.€€These€four€wereÏòòScience€Digestóó,€òòÔ_ÔSciQuestÔ_Ôóó,€òòScience„80óó€and€òòDiscoveróó.€€Each€sought€toÏoccupied€that€same€LEX€=€0.0€niche,€but€there€were€insufficientÏsubscribers€and€advertisers€and€the€first€three€had€ceasedÏoperations€by€1986.€€Their€vacating€that€niche€encouraged€theÏstartup€of€still€another€science€magazine,€òòNew€Scientistóó„„whoseÏtexts€rose€to€LEX€=€+7€by€1990€(Hayes,€1991).€€Similar€changes€andÏoutcomes€were€experienced€by€several€other€science€magazines,Ïincluding€the€weekly€òòScience€Newsóó.€€ÌÌ€€€€€While€there€was€competition€for€the€0.0€niche,€the€time€seriesÏfor€every€major€professional€science€journal€shows€a€rapidÏexpansion€of€specialized,€technical€language€and€higher€positiveÏLEX€scores.€€ÌÌ€€€€€In€consequence,€professional€science€is€largely€inaccessibleÏto€the€general€college„educated€reader,€which€increases€ourÏreliance€on€intermediaries€who€select€and€translate€developments€inÏscience€for€the€general€reader€(e.g.€the€òòNew€York€Times€ScienceóóÏsection€and€the€rapid€growth€in€the€coverage€of€science€andÏmedicine€by€television€networks).€€ÌÌ€€€€€Finally,€high€school€science€textbooks€have€grown€far€more€Ïdifficult€than€textbooks€for€the€rest€of€the€12th€grade€curriculum,Ïe.g.,€history,€English€and€social€science€texts.€€The€relatively€Ìhigh€LEX€levels€is€probably€contributing€to€student€avoidance€ofÏthe€non„required€advanced€science€courses€and€the€decliningÏfraction€of€American€scientists,€engineers€and€softwareÐ .Ø'3 Ðspecialists,€and€the€growth€of€dependence€upon€the€foreign„trained.€ÌÌ€€€€A€second€illustrative€theoretical€use€of€LEX€measurements€wasÏto€resolve€an€empirical€dispute€between€behavior€geneticists€(BG)Ïand€most€social€and€developmental€scientists.€€At€issue€is€theÏrelative€importance€of€children's€environments€and€experiencesÏ(particularly€their€natural€language€experiences)€in€shapingÏchildren's€verbal€achievement.€€Recent€BG€research€on€adoptedÏchildren€in€families€with€other€siblings€finds€that€such€languageÏexperiences€are€'essentially'€the€same€for€children€growing€up€inÏwell„off€and€relatively€poor€families.€€In€contrast,€social€andÏdevelopmental€scientists€point€to€their€own€voluminous€researchÏshowing€language€experiences€are€different€for€children€growing€upÏin€underclass€or€upper„middle€class€families.ÌÌ€€€€LEX€was€used€to€measure€the€difficulty€of€the€natural€languageÏexperiences€children€have€with:€(a)€the€texts€of€the€televisionÏprograms€they€choose€to€watched€'regularly';€(b)€the€texts€of€theÏbooks€and€magazines€they€chose€to€read€for€themselves,€and€(c)€theÏtexts€of€household€conversations€children€have€with€their€mothers.€ÏExceptional€for€this€kind€of€research,€and€unlike€the€samples€usedÏin€the€behavior€genetics€research,€the€children€in€these€studiesÏwere€statistically€representative€samples€of€all€British€children,Ìall€public€school€California€6th€graders€and€families€in€theÌBristol€England€metropolitan€area€born€within€a€week€of€oneÏanother.ÌÌ€€€€€We€find€differences€in€the€richness€of€children's€languageÏexperience€of€those€growing€up€in€different€social€classÏbackgrounds€(consistent€with€social€and€developmental€research),Ïbut€all€those€differences€were€small€(several€LEX€only)€and€smallÌdifferences€in€the€quantity€of€those€experiences.€€However,€thoseÏsmall€differences€were€pervasive€and€persisted€throughout€childhoodÏin€our€cross„sectional€and€time„series€data.€€Differential€quantityÏand€difficulty€of€language€experience€was€found€at€every€age€fromÏ30€months€through€14€years€of€age.€€One€implication€is€that€suchÏsmall,€persistent€differences€in€language€experience,€may€produceÏòòcumulativeóó€effects€on€the€Ô_ÔchildrensÔ_Ô'€general€domain€knowledgeÏ(e.g.,€their€knowledge€of€baseball,€genetics€or€finance),€whichÏaffects€their€verbal€skills€(which€include€the€extent€of€theirÏconceptual€knowledge€and€lexical€development).€€Such€differencesÏwould€doubtless€contribute€to€their€reading€comprehension€andÏacademic€performance€and€may€help€to€account€for€the€world„wideÏpattern€of€higher€verbal€skills€among€children€with€richer€languageÏexperience€(Hayes,€Ô_ÔWolferÔ_Ô,€Ô_ÔRoschkeÔ_Ô,€Ô_ÔTsayÔ_Ô€and€Ô_ÔAhrensÔ_Ô,€1999).€ÌÌ€€€€Another€example€of€LEX's€theoretical€use€is€in€testing€possibleÏmechanisms€affecting€a€person's€pattern€of€word€choice.€€Two€haveÌbeen€explored„„personal€stress€and€lack€of€preparedness.€€ForÏexample,€in€one€test€of€courtroom€testimony,€witnesses€often€mustÏanswer€questions€put€to€them€by€two€attorneys:€one€under€direct€andÏthe€other€under€cross„examination.€€LEX€analyses€document€thatÐ .Ø'3 Ðvirtually€every€witness€in€the€Patricia€Hearst€trial€gave€lexicallyÏsimpler€testimony€in€response€to€cross„examination€(where€theÏinterrogator€is€the€opponent's€attorney€and€whose€job€it€is€toÏchallenge€the€witness'€credibility,€and€if€possible,€discredit€thatÏtestimony€with€the€jury€and€judge€(Hayes€and€Ô_ÔSpiveyÔ_Ô,€1999).€BothÏcognitive€scientists€and€trial€lawyers€predict€this€finding€andÏattribute€it€to€the€distracting€effect€of€high€stress€and€lack€ofÏpreparation€on€the€witness'€word€choice€under€the€pressure€ofÏrealtime€text€production.ÌÌ€€€Finally,€LEX€has€also€been€used€in€applied€research.€€One€studyÏsought€to€answer€this€question:€Why€did€nationwide€òòSAT„verbalóóÏscores€remain€at€near„constant€levels€from€the€mid„1950's€throughÏthe€early€1960s€(the€highest€mean€level€recorded€was€1963),€butÏthen€declined€in€each€of€the€next€16€consecutive€years€(until€itÏmore€or€less€stabilized€in€1978).€€Mean€SAT„verbal€scores€haveÏremained€at€that€low€level€ever€since.€€Ô_ÔTo€explain€this€SAT„verbalÏtime€series,€ChallÔ_Ô€(1967,€1977)€hypothesized€that€one€majorÏcontributing€factor€was€the€lowering€of€academic€standards„„¼ñ=üñ¼ñ=üñincluding€the€Ô_Ô'dumbingÔ_Ô€down'€of€the€schools'€curriculum,€reductionÏin€homework€and€difficulty€of€the€principal€school€readingÏmaterials€used€by€students€throughout€the€nation.€€ÌÌ€€€In€this€application,€LEX€was€used€to€measure€the€difficulty€ofÏnearly€800€basal€readers€used€across€the€United€States€during€threeÏtime€periods:€1919„1945,€1946„1962,€and€1963„1991.€€We€found€thatÏLEX€levels€of€basal€readers€published€in€the€US€between€1946„1962Ïperiod€declined€sharply€from€their€pre„World€War€II€levels.€€(TheyÏwere€not€simplified€in€the€UK„„so€the€American€decline€in€levelÌwas€a€choice€made€by€educational€leaders€and€their€schoolÏpublishing€colleagues).€€€Since€the€major€schoolbook€publishersÏmarketed€their€basal€series€nationwide,€one€òòeffectóó€of€these€newÏbasals€may€have€been€to€reduce€the€range€and€òòdepth€of€domain€andÏconceptual€knowledgeóó€which€student€derive€from€their€basic€texts.€ÏOne€of€the€primary€tasks€of€schooling€is€to€broaden€the€range€of€aÏchild's€domain€knowledge.€€Simplifying€texts€narrowed€the€breadthÏand€depth€of€conceptual€and€lexical€resources€available€to€studentsÏwhich€may€have€contributed€to€the€reduced€verbal€achievement€on€theÏSAT„verbal€tests€when€these€students€took€the€SAT„verbal€test€yearsÏlater€near€the€end€of€their€high€school€experience.ÌÌ€€€The€conventional€wisdom€in€education€attributes€this€decline€toÏchanges€in€the€composition€of€those€taking€the€SAT„„many€moreÌand€less„select€students€began€to€take€the€SAT.€€That€explanationÏfails€to€explain€why€the€SAT€scores€began€to€fall€in€1963€and€notÏearlier,€and€why€the€scores€declined€for€those€16€consecutiveÏyears,€since€the€number€taking€the€SAT€had€stabilized€around€1963.€ÏNor€can€it€explain€the€persistent€low€level€of€mean€verbal€Ìachievement€since€1978€to€the€present.€€Nor€can€it€explain€whyÏverbal€achievement€declined€particularly€among€those€at€the€òòhighestÏlevels€of€verbal€achievementóó.€€Despite€many€more€students€takingÏthe€SAT,€there€was€an€òòabsolute€declineóó€in€the€number€scoring€overÐ .Ø'3 Ð600,€and€an€òòeven€greater€proportional€declineóó€in€those€scoring€overÏ700.€€Something€affected€the€achievement€level€of€even€the€mostÏable€students„„those€who€had€always€taken€the€SAT.€€ÌÌ€€€LEX€analyses€supply€the€evidence€compatible€with€Ô_ÔChall'sÏhypothesis:€schoolbooks€became€much€less€difficult€after€1946€andÏa€side„effect€(doubtless€unintended)€of€that€dumbing€down€ofÏtextbooks€was€to€lower€the€students'€domain/conceptual€knowledgeÏand€skills.€€When,€years€later,€the€students€took€the€SAT,€thoseÏcohorts€whose€verbal€SAT€scores€declined€were€the€ones€who€firstÏencountered€the€simplified€texts€(Hayes,€Ô_ÔWolferÔ_Ô€&€Wolfe,€1996).€ÏCross„lagged€correlation€of€basal€reader€levels€throughout€theirÏelementary€and€secondary€schooling€with€the€national€SAT„verbalÏscores€is€generally€consistent€with€Ô_Ôthis€hypothesis.€€Since€LEXÏlevels€of€contemporary€basal€readers€and€major€textbooks€haveÏremained€unchanged€since€1978,€there€are€no€grounds€to€expectÏverbal€scores€to€rise€without€a€fundamental€increase€in€the€texts'ÏLEX€levels.€€In€fact,€scores€have€not€risen€for€30€years.€€WhenÏcomplaints€about€the€low€verbal€scores€grew,€Ô_ÔETSÔ_Ô€ultimately€re„¼ñ=üñ¼ñ=üñcalibrated€the€verbal€scores€upwards,€especially€for€those€at€theÏupper€range,€despite€ample€evidence€that€modern€tests€areÏcomparable€in€difficulty€to€those€throughout€the€late€1950s€andÏearly€1960s.ÌÌò òòòIV.€CONDITIONS€FOR€USING€Ô_ÔQLEXÔ_Ôóóó óÐ À ÐÌ€€€€Ô_ÔThe€QLEXÔ_Ô€package€of€programs€provides€researchers€with€toolsÏfor€measuring€important€facets€of€any€English€text„„itsÏaccessibility,€comprehensibility€or€lexical€difficulty€and€its€useÏof€grammatical/closed€class€terms.€€ÌÌ€€€€€There€are€few€conditions€of€QLEX's€use.€€Ô_ÔQLEXÔ_Ô€was€developedÏacross€several€generations€of€mini„computers€and€PCs,€usingÏdifferent€operating€systems,€several€programming€languages,€andÏfile€formats.€€Most€of€this€work€was€done€under€the€DOS€operatingÏsystem.€ÌÌ€€€€€You€are€free€to€use€Ô_ÔQLEX€Ô_ÔÔ_Ôso€longÔ_Ô€you€agree€to€theseÏconditions:€the€programs€must€not€to€be€copyrighted€for€commercialÏpurposes,€under€this€or€any€other€name.€€The€programs€were€designedÏto€provide€scientists€with€a€major€tool€for€text€analysis„„so€theyÏare€part€of€the€general€scientific€measurement€system,€not€aÏcommercial€product.€€There€is€no€guarantee€the€programs€will€run€onÏyour€machine;€nor€are€they€guaranteed€to€be€free€of€errors€(thoughÏthey€have€been€tested€tens€of€thousands€of€times).€€Scientists€areÏencouraged€to€continue€LEX€development€as€a€scientific€tool€as€withÏany€such€tool.€€I€would€like€to€be€notified€of€your€changes,€theÏjustification€for€them,€and€the€effects€those€changes€have€on€LEXÏmeasurements.€My€e„mail€address€is€dph1@cornell.edu.€€QLEX€andÏother€files€may€eventually€be€downloaded€from€my€WEBsite,€but€forÏnow€they€can€be€obtained€from€the€author.Ìò òòòÐ .Ø'3 Їñ>üñÌñ>üñV.€TEXT€PREPARATION€FOR€Ô_ÔQLEXÔ_Ô€ANALYSISó óóó.€€Ð @ ÐÌ€€€€Ô_ÔQLEXÔ_Ô,€the€name€for€the€software€package€which€carries€out€aÏsystematic€lexical€analysis€on€any€English€text,€may€be€used€on€anyÏtext:€spoken,€broadcast€or€printed.€€First€developed€and€used€inÏ1980,€Ô_ÔQLEXÔ_Ô€continues€to€evolve€as€more€is€learned€about€theÏunderlying€theoretical€models,€the€mechanisms,€the€many€LEXÏmeasurements€and€their€interpretation.€€This€is€a€work€in€progress.€ÏFirst€reports€on€LEX's€uses€appeared€in€technical€papers€in€theÏCornell€Sociology€Technical€Reports€series€catalogued€in€theÏcomputer€data€base€for€research€universities€called€Ô_ÔRLINÔ_Ô;€inÏmultiple€presentations€at€the€AAAS€annual€meetings;€and€at€otherÏpsychological€and€sociological€professional€meetings.€€RecentÏdescriptions€of€models€and€research€are€reported€in€Hayes,€1988;ÏHayes€and€Ô_ÔAhrensÔ_Ô,€1988;€Hayes,€1992;€Hayes,€Ô_ÔWolferÔ_Ô€&€Wolfe,€1996)ÌÌÒX°”ZÒÒ°XÒ€€€€€òòA.€Converting€text€files€from€Ô_Ôx.TXTÔ_Ô€to€Ô_Ôx.ASCIIÔ_Ô.óóò ò€€ó óSome€textÐ À€  Ðfiles€are€saved€using€the€file€extension€x.Ô_ÔTXT€(where€x€stands€forÌa€filename)Ô_Ô€but€if€the€text€is€to€be€analyzed€by€Ô_ÔQLEXÔ_Ô,€the€textÏmust€be€converted€into€an€ASCII€(standard)€file.€€In€òòWordPerfectóó,Ïthis€conversion€is€done€when€you€'Close'€your€file.€€Chose€theÏoption€which€converts€the€text€to€an€ASCII€file.€€If€you€forget,Ïyour€text€will€contain€odd„looking€characters€and€Ô_ÔQLEXÔ_Ô€will€notÏproduce€a€LEX€analysis.€€ÌÌ€€€What€should€a€text€file€look€like€after€it€has€been€edited?ÏExamine€the€following€edited€text€sample€taken€from€an€InternalÏRevenue€Service€publication:ÌÒX°k”ÒÒ°ÒÒÜXÒÌ&&000€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€€IRS1040.Ô_ÔASCÔ_ÔÌThis€publication€was€sent€by€the€IRS€to€all€taxpayers€whoÏhad€filled€out€Form€1040€in€1986.€€It€describes€someÏchanges€in€the€tax€law€which€Affect€the€1987€tax.€€AÏstratified€Ô_ÔSRSÔ_Ô€from€the€first€19€pages€is€shown.ÌÌ&&111€This€year€for€the€first€time€the€=Ô_ÔTaxReformActÔ_Ô€ofÏ=1986€will€have€major€impact€on€the€preparation€of€your€taxÏreturn.€€This€has€important€consequences€for€you€and€forÏus.€Changes€made€by€the€new€=Act€are€summarized€on€the€nextÏpage.€You€can€learn€more€about€the€ones€that€affect€you€byÏgetting€one€of€the€publications€listed€near€the€end€of€thisÏbooklet.€Learning€about€the€changes€now€will€make€it€easierÏfor€you€to€prepare€your€return€when€you€start€working€onÏit.€Ì€€€ÌIncreased€standard€deduction.€€The€standard€deductionÏ(formerly€the€zero€bracket€amount),€has€increased€for€mostÏindividuals.€ÌÌAlternative€minimum€tax.€€The€tax€rate€has€been€increasedÏto€=21€percent€and€several€tax€preferences€have€been€addedÏor€deleted.€Ð .Ø'3 ЇAllocation€of€interest€expense.€€Whether€your€interestÏexpense€is€subject€to€the€new€limits€that€apply€to€personalÏand€investment€interest€depends€on€how€and€when€the€loanÏproceeds€were€used.€€Special€rules€apply€in€determining€theÏtype€of€interest€on€loan€proceeds€deposited€in€a€personalÏaccount,€such€as€a€checking€account.€Ì€ÌJoint€or€separate€returns.€Generally,€married€couples€willÏpay€less€tax€if€they€file€a€joint€return€because€the€taxÏrate€for€married€persons€filing€jointly€is€lower€than€theÏtax€rate€for€married€persons€filing€separately.€€However,Ïas€a€result€of€some€of€the€changes€in€the€tax€law,€such€asÏthe€increased€income€limit€that€applies€to€medical€andÏdental€expenses€and€the€new€individual€retirementÏarrangement€deduction€rules€that€apply€to€certainÏindividuals,€you€may€want€to€figure€your€tax€both€ways€toÏsee€which€filing€status€is€to€your€tax€benefit.€€Ì€ÌMarried€persons€who€live€apart.€€Some€married€persons€whoÏhave€a€child€and€who€do€not€live€with€their€spouse€may€fileÏas€head€of€household€and€use€tax€rates€that€are€lower€thanÏthe€rates€for€single€or€for€married€filing€a€separateÏreturn.€€This€also€means€that€you€can€take€the€standardÏdeduction€even€if€your€spouse€itemizes€deductions.€€You€mayÏalso€be€able€to€claim€the€earned€income€credit.€€ChildrenÏof€Divorced€or€Separated€Parents.€€The€parent€who€hasÏcustody€of€a€child€for€most€of€the€year,€the€custodialÏparent,€can€generally€take€the€exemption€for€that€child€ifÏthe€child's€parents€together€paid€more€than€half€of€theÏchild's€support.€€This€general€rule€also€applies€to€parentsÏwho€do€not€live€together€at€any€time€during€the€last€sixÏmonths€of€the€year.ÌÒ°Í—ÒÒXÜÛ—ÒÒ°XÒòòÌÌóó€€€€òòB.€òòò òBare„minimumó óóó€text€preparation€for€Ô_ÔQLEXÔ_Ô€analysis.€óó.€Ó3 XXŽXÀ ‡Xð¼x!(#‹X3Ó€If€youÐ Ð " Ðà  àare€Ó$XXk°XXXŽX$Óin€a€hurry€and€need€a€rough€estimate€of€a€text's€main€LEXÌà  àscores:€ÌÌ€€€€(1)€converts€the€text€into€ASCII€(i.e.€make€the€file'sÌ€à  à€extension€Ô_Ôx.ASCÔ_Ô€(as€in€òòIRS1040.Ô_ÔASCÔ_Ôóó)„„to€remove€all€word„à @À! àÐ ¸$x' Ðà  à€processor€symbols€which€Ô_ÔQLEXÔ_Ô€does€not€understand;€Ì€€€€(2)€place€the€requiredidentification€symbol€(òò&&000óó)€at€theÌ€€€€€€€€òòtop,€left€edgeóó;€Ì€€€€(3)€assign€a€filename€to€the€òòupper€far€right€hand€corner;óóÌ€€€€(4)€use€this€identification€code„„ID€(òò&&111óó)€at€the€beginningÌ€€€€€€€€of€the€first€line€of€text€Ô_Ôwhich€QLEXÔ_Ô€is€to€analyze.ÌÌ€€€€€€€€òòNote€the€ID€distinctionóó:€the€symbol€ò ò&&000ó ó€is€used€for€theÐ ø*¸$/ Ѐ€€€€€€€header€and€comments„„all€words€after€that€are€ignored€byÌ€€€€€€€€Ô_ÔQLEXÔ_Ô;the€symbol€ò ò&&111ó ó€is€put€before€any€text€Ô_Ôyou€want€QLEXÔ_ÔÐ ˆ,H&1 Ðà  à€to€analyze.Ì€€€€€€€Ð .Ø'3 Ѐ€€€That's€it!€€Proceed€to€Section€VII,€B.€to€run€Ô_ÔQLEXÔ_Ô.€ÌÌ€€€€òòC.€ò òNORMALñAüñlñAüñó ó€text€preparation€for€Ô_ÔQLEXÔ_Ô€analysisóóÔ_Ô„„Ô_Ôthese€rules€lookÐ Ð Ðà  àimposing€but€it€takes€less€time€to€edit€most€files€than€itÌ€à  àdoes€to€read€or€describe€them.Ð `  Ѐ€€€€Ìà  àòò1.€ID€(Identification)€Codesóó€€Text€prepared€for€Ô_ÔQLEXÔ_Ô€analysesÏis€divided€into€two€types:€(a)€text€Ô_ÔQLEXÔ_Ô€is€to€analyze,€and€(b)Ïtext€Ô_ÔQLEXÔ_Ô€is€òònotóó€to€analyze.€€€To€distinguish€these€two€ID€types:Ð € @ Ðò ò&&000ó ó€is€placed€at€the€beginning€of€any€passage€Ô_ÔQLEXÔ_Ô€is€to€ignoreÐ H   Ð(e.g.,€information€about€this€file,€the€date,€how€the€text€wasÏproduced€and€sampled,€its€context,€etc);€and€ò ò&&???ó ó€(any€numberРؘ  Ðbetween€1€and€9,€ò òbut€not€zeroó ó)€for€any€text€you€want€Ô_ÔQLEXÔ_Ô€toÐ  `  Ðanalyze„„cf.€IRS1040.Ô_ÔASCÔ_Ô€above.€&&111€is€the€ID€number€used€in€mostÏtext€analyses.ÌÌIf€the€expression€&&000€begins€a€line€or€passage,€Ô_ÔQLEXÔ_Ô€willÏcontinue€to€ignore€everything€thereafter€until€it€finds€another€IDÏsymbol,€i.e.,€Ô_ÔQLEXÔ_Ô€assumes€that€all€text€after€the€&&xxx€symbolÏshould€be€analyzed„„until€it€comes€across€the€&&000€symbol.ÌÌà  àòò2.€Header€and€Commentsóó€€The€ID€code€&&000€òòmust€appearóó€at€the€Ïtop€left€edge€of€the€text,€as€shown€in€IRS1040.Ô_ÔASCÔ_Ô.€€Normally,€aÏtext€file€is€described€by€identifying€its€source€(e.g.€stratifiedÏsimple€random€sample€taken€from€the€New€York€Times,€July€5,€1998)Ïor,€if€it€is€a€conversation,€the€cast€of€characters€(e.g.€MotherÏand€her€daughter,€Jennifer,€age€4€years,€3€months,€in€their€home).€ÏThe€context€of€that€conversation€could€also€be€described€(JenniferÏis€tired€after€having€had€no€nap.€This€conversation€took€place€inÏthe€kitchen.€€No€one€else€was€present).€€òòYou€may€place€commentsÏanywhere€you€like€in€your€textóó,€so€long€as€those€comments€appearÏòòafteróó€the€&&000€symbol€which€is€always€placed€at€the€far€left€of€aÏline,€followed€by€a€space.€€After€you€complete€a€comment,€òòdon'tÏforgetóó€to€use€the€ID€&&xxx€at€the€front€of€the€line€where€Ô_ÔQLEXÔ_Ô€isÏto€resume€its€analysis.€€For€example,€&&121€could€be€a€motherÏ(identified€as€#1),€talking€to€her€child€(identified€as€#2),€as€in:ÌÌ€€€€€&&121€Bring€me€some€butter€from€the€refrigerator.€Ì€€€€€€€€€€€You€have€to€have€yummy€ingredients€when€you€makeÌ€€€€€€€€€€€cookies.Ì€€€€€&&211€Do€you€want€the€whole€box€or€just€one€of€the€pieces?€Ì€€€€€€€€€€€Oh,€there's€only€one€piece€left€in€the€box!€ÌÌThe€last€of€an€ID's€three€digits€may€be€used€for€designating€theÏcontext€for€the€conversation€(e.g.€#1€is€mother„child€talk€while€inÏthe€yard,€#2€is€while€they€are€talking€at€the€grocery;€#3€is€whileÏthey€waited€for€a€brother€to€get€out€of€school),€or€this€thirdÏdigit€may€be€used€to€keep€track€of€the€different€segments€in€aÏtime„series.€€For€example,€in€that€mother„child€conversation,€Ô_ÔtheÏthird€digit€could€represent€segment€3€of€a€multi„part€conversation.€ÏFor€analyzing€segment€#3,€one€would€use€ID€&&123€to€instruct€Ô_ÔQLEXÔ_ÔÏto€search€for€text€spoken€by€the€mother€to€her€child,€in€segment€#3Ð .Ø'3 Ðonly;€&&215€would€ask€Ô_ÔQLEXÔ_Ô€to€search€for€the€child€talking€with€herÏmother,€in€segment€#5€only.€ÌÌMost€analyses€of€conversation€do€not€distinguish€between€theÏseveral€speakers€of€a€conversation€or€different€segments€of€theÏsame€text.€€In€that€case,€simply€put€&&111€in€front€of€the€firstÏword€of€their€conversation„„that€will€òòsufficeóó€for€the€entire€textÏ(as€in€the€IRS€sample).€€In€some€conversations,€however,€you€mayÏwish€to€distinguish€what€each€person€said€to€another,€as€in€whatÌPresident€Nixon€said€to€John€Dean€as€opposed€to€his€principalÌadviser,€Robert€Haldeman.€ÌÒX°é¢Òòòò òÌÒ°XÒVI.€€EDITING€RULESóóó ó.Ð  `  ÐÌà  àòòA.€òòò òQuick„and„dirty€text€editingó óóóóó.€€Use€any€transcription€rules,Ð 0ð  ÐÓ3ÜXk°ø` ‹XXXk°X3Óà  àà ø àòòincluding€noneóó„„so€long€as€you€use€the€òòBare„Minimumóó€editingÐ ø¸  Ðà  àà ø àrules€(cf.€Section€V.,€B.€above€).€€Ô_ÔQLEXÔ_Ô€will€run€on€textsÏà  à€€€with€bare€minimum€editing,€you€will€have€output,€and€canÌà  àà ø àget€the€LEX€statistics.€€Those€values€will€be€in€the€generalÌà  à€€€ballpark€of€genuine€LEX€scores€calculated€with€a€fullyÌà  àà ø àedited€text.€Unedited€text€LEX€scores€will€not€be€comparableÌà  àà ø àto€the€LEX€values€of€€the€5000+€texts€in€the€Cornell€Corpus„¼ñAüñ¼ñAüñà  à€€€2000€used€for€interpreting€your€findings.€€If€comparabilityÌà  àà ø àand€precision€is€needed,€you€must€use€the€NORMAL€textÌà  àà ø àediting€procedures€below.€€Editing€takes€time,€requiresÌà  àà ø àclose€attention,€and€can€be€lugubrious„„that€is€a€cost€ofÌà  àà ø àobtaining€a€scientific€measurement.ÌÌà  àòòB.€òòò òNORMALó óóó€text€editing:€general€rulesóóò ò.€ó ó€Texts€should€beÐ  à Ðà  àà ø àtranscribed€according€to€normal€conventions€of€spellingÌà  àà ø àand€orthography„„as€found€in€an€unabridged€dictionary.Ìà  àà ø àThere€are€two€exceptions€in€Ô_ÔQLEXÔ_Ô€analyses:€(1)€Ô_ÔQLEXÔ_ÔÌà  àà ø àanalyses€are€carried€out€onò ò€word„typesó ó,€i.e.€everyÐ ‰I  Ðà  àà ø àuniquely€spelled€variant€of€a€term€is€treated€as€aÌà  àà ø àdistinct€type€in€the€English€lexicon€(e.g.,€while€'boat'Ìà  àà ø àand€'boats'€share€a€common€stem,€they€are€differentÌà  àà ø àword„types€in€lexical€analysis);€and€(2)€the€distinctionÌà  àà ø àbetween€capital€and€lower€case€is€not€retained€in€Ô_ÔQLEXÔ_ÔÌà  àà ø àanalyses€(so€'the',€'The'€and€'THE'€are€combined).ÌÌ€€€€€€€€€€All€terms€in€a€text€should€be€transcribed,€even€if€theyÌà  àà ø àcannot€be€found€in€an€unabridged€dictionary,€e.g.€newÌà  àà ø àwords,€word„fragments€and€filled€pauses.€€Word„fragmentsÌà  àà ø àand€filled€pauses€are€often€disregarded€by€transcribersÌà  àà ø àso€their€transcription€is€typically€unreliable€unless€doneÌà  à€€€by€trained,€conscientious€workers.€€Some€substantive€areasÏà  àà ø àof€psychology,€linguistics€and€sociology,€consider€suchÌ€à  àà ø àinformation€useful€for€some€analytic€and€interpretiveÐ A+%/ Ѐ€€€€€€€€€purposes.ÌÌÐ ™-Y'2 Ðà  àòòPROPER€NAMES,€PLACES,€PRODUCTS€AND€SCIENCE€TERMSóó.€DictionariesÌ€€€€à  à€€€are€not€consistent€in€their€treatment€of€proper€names,€soÐ È Ðà  à€€€it€is€necessary€to€impose€a€common€practice€for€all€suchÌà  à€€€terms.€€Before€every€proper€name,€place,€or€product€isÌà  àà ø àinserted€the€equal€sign€(e.g.€=Mary,€=FBI,€=Chicago;Ìà  àà ø à=Coke).€€Multi„term€names€(e.g.€=Ô_ÔRiodeJaneiroÔ_Ô)€should€beÌà  àà ø àrun€together€since€they€form€a€single€unit„„otherwise€itÌà  àà ø àwould€be€treated€as€three€'words'.€€Care€must€be€taken€withÌà  àà ø àsuch€terms€as€'=honey',€'=love',€'=darling'€and€'=dear'Ìà  àà ø à(when€the€name€refers€only€to€a€specific€individual).Ìà  àà ø à€Ìà  àà ø àScience€makes€heavy€use€of€contractions€and€technical€termsÌà  àà ø àto€avoid€the€repeated€use€of€longer€phrases.€€ChemicalÌà  àà ø àcompounds,€e.g.€Ô_ÔNaClÔ_Ô,€should€be€=Ô_ÔNaClÔ_Ô,€but€Na€by€itself€hasÌà  àà ø àno€=€sign€because€it€is€a€recognized€dictionary€entry.Ìà  àà ø àÌà  àà ø àBiology€and€chemistry€texts€pose€complex€transcriptionÌà  àà ø àproblems€with€their€vast€sets€of€technical€terms€and€names.Ìà  àà ø àòòThe€general€ruleóó€is:€if€it€is€a€proper€name€(often€aÌà  àà ø àLatinate€species€or€family€name),€place€the€equal€signÌà  àà ø àbefore€such€terms.€€Normal€dictionary€entries,€like€hormonesÌà  àà ø àor€proteins,€however,€do€not€have€the€=€sign€before€them.€ÌÌÌà  àòòNUMBERSóó.€€Some€numbers€(e.g.€'one'€through€'ten')€areÌà  àà ø àdictionary€entries€so€the€equal€sign€is€not€used€for€them.€Ïà  àà ø àArbitrarily,€Arabic€and€roman€numbers€(except€when€writtenÌà  àà ø àout€as€one,€two,€..,€ten)€do€have€an€equal€sign€placedÌà  àà ø àbefore€them,€as€in€=forty,€=1492;€=$28;€=43€years€old).€€IfÌà  àà ø àthe€number€is€an€entry€in€a€dictionary,€no€=€sign€is€used.Ì€Ìà  àòòDECIMALS,€LARGE€NUMBERS,€EQUATIONS€AND€COLONSóó.€DictionariesÌà  àà ø àare€inconsistent€in€their€handling€of€numbers,€so€theyÌà  àà ø àrequire€special€handling.€€The€number€9.95€is€transcribedÌà  àà ø àas€=9'95€because€periods€are€reserved€as€the€sentence„Ìà  àà ø àending€symbol€in€Ô_ÔQLEXÔ_Ô's€sentence€sub„routine.€€€The€commaÌà  àà ø àin€=360,000€is€converted€to€=360'000,€otherwise€the€commaÌà  àà ø àwould€divide€that€number€into€two€terms,€360€and€000.ÌÌà  àà ø àEquations€do€not€appear€as€entries€in€most€dictionaries.€Ïà  àà ø àThey€are€transcribed€with€the€expression€€=Ô_ÔequaÔ_Ô.€€The€colonÌà  àà ø àwith€time€(e.g.€11:23)€is€dropped,€for€the€same€reasons.ÌÌà  àòòSOUNDS€AND€SPECIAL€COLLOQUIAL€EXPRESSIONSóó.€€Many€common€termsÌà  àà ø àused€to€communicate€a€message/mood€are€not€included€asÌà  àà ø à'words'€in€unabridged€dictionaries,€and€so€they€are€givenÏà  àà ø àthe€equal€sign.€€Examples€include€=meow,€=Phew!€=Ô_ÔZotÔ_Ô!€=Gosh,Ìà  àà ø à=bang,€=pop,€=Ha,€€=Ouch!,€=Ô_ÔWheeÔ_Ô,€and€=Hurray!ÌÌà  àòòPUNCTUATIONóó.€€Use€normal€punctuation€for€printed€texts,€butÌà  àà ø àpunctuation€of€natural€conversation€is€more€difficult€à @H! àà @H! àsince€Ï€à  à€€€run„on€sentences€are€common€and€one€often€needs€a€goodÐ .Ø'3 Ðà  àà ø àrecording€to€detect€the€shifts€of€intonation€at€the€end€ofÌà  àà ø àa€sentence.€€Punctuate€texts€from€intonation,€pauses,Ìà  àà ø àsubstance€and€common€sense.€€The€ability€to€go€over€theÌà  àà ø àsame€passage€again€and€again€is€essential€to€reduce€theÌà  àà ø àarbitrariness€of€these€decisions.€€Since€reliability€ofÌà  àà ø àpunctuation€from€spontaneous€speech€is€not€as€high€as€fromÌà  àà ø àprint,€statistical€differences€between€two€text's€Ô_ÔMLUÔ_ÔÌà  àà ø à(mean€length€of€utterance„„based€on€words)€may€be€due,€inÌà  àà ø àpart,€to€unreliability€and€arbitrariness€in€punctuation.ÌÌà  àòòCONTRACTIONSóó.€€òòAs€a€general€ruleóó,€type€words€as€they€appearÌà  àà ø àin€print€or€as€spoken€(unless€otherwise€specified€above€orÌà  àà ø àbelow).€The€exceptions€include€such€terms€as€òò'causeóó,€whichÌà  àà ø àdictionaries€treat€as€because,€and€terms€which€areÌà  àà ø àcontracted€in€casual€speech,€e.g.€'whatever's',€meaningÌà  àà ø àwhatever€is,€should€be€decomposed€into€'whatever€is'.Ìà  àà ø àThere€are€many€such€contractions,€and€all€should€be€treatedÌà  àà ø àthis€way.Ì€€€Ìà  àòòREGIONAL€EXPRESSIONSóó.€€Regional€accents€may€alter€a€word'sÌà  àà ø àform€or€phonology€to€such€an€extent€that€the€referent€mayÌà  àà ø ànot€be€clear€to€persons€from€outside€the€region.€€In€suchÌà  àà ø àcases,€use€the€common€referential€term€to€replace€theÌà  àà ø àexaggerated€or€local€form.€€For€example,€while€working„classÏà  à€€€Scots€use€English,€but€substitute€Scottish€terms€whichÌà  à€à ø àrequire€transcription€to€their€near„equivalent€in€EnglishРȈ Ðà  àà ø à(e.g.€the€word€'fit'€in€Scottish€is€normally€translated€asÌà  àà ø à'make'€in€English).€Extreme€working„class€accents€in€LondonÏà  àà ø àand€the€southern€United€States€require€similar€treatment.ÌÌà  àòòHYPHENATIONóó.€€"Cost„of„living"€and€"first„aid"€are€complexÌà  àà ø àexpressions€which€appear€in€dictionaries€in€that€form.Ìà  àà ø àThey€should€be€transcribed€with€their€hyphens€in€place.Ìà  àà ø àHyphenation€is€sometimes€missing€in€similar€expressions,Ìà  àà ø àsuch€as€'thank€you'€and€'ice€cream'.€€Transcribe€theseÌà  àà ø àterms/expressions€with€the€missing€hyphen:€as€thank„you€andÌà  àà ø àice„cream.€€This€prevents€Ô_ÔQLEXÔ_Ô€from€treating€suchÌà  àà ø àexpressions€as€two€(or€more)€separate€words.€€SinceÌà  àà ø àlanguage€is€dynamic,€the€time€may€come€when€this€and€otherÏà  àà ø àrules€must€be€changed.ÌÌà  àòòSPACINGóó.€€There€must€be€a€space€between€every€word,€unlessÌà  àà ø àthere€is€(or€should€be)€a€hyphen,€an€apostrophe,€etc.€€AÌà  àà ø àspace€separates€an€end„of„sentence€symbol€(period,Ìà  àà ø àquestion€mark,€exclamation€mark€and€the€specialÌà  àà ø àinterruption€symbol€[@]€from€the€first€word€of€the€nextÌà  àà ø àsentence.€For€example:€"Stop€doing€Ô_Ôthat!YouÔ_Ô€know€that!"€isÌà  àà ø àwrong€because€'You'€was€not€separated€from€the€previousÌà  àà ø àexclamation€mark.€€In€Ô_ÔQLEXÔ_Ô,€the€colon€and€semi„colon€areÌà  àà ø àòònotóó€considered€sentence€ending€marks.ÌÌÐ .Ø'3 Їà  àòòSPLITTING€WORDS€AT€THE€END€OF€LINESóó.€€Ô_ÔQLEXÔ_Ô€is€a€capable€set€ofÌà  àà ø àprograms€but€it€does€not€handle€instances€of€wrap„Ô_ÔaroundsÔ_ÔÌà  àà ø àand€split€words€at€the€end€of€lines.€€òòBe€sureóó€that€words€à @°" àà @°" àandÌà  àà ø àexpressions€are€not€broken€at€the€end€of€a€line.òòÌóóÌà  àòòQUOTESóó.€€Use€the€symbol€",€not€the€symbol€'€for€quotes,Ìà  àà ø àbecause€'€is€reserved€in€Ô_ÔQLEXÔ_Ô€for€contractions€and€numericalÌà  àà ø àexpressions.€€Quotation€marks€are€ignored€by€Ô_ÔQLEXÔ_Ô€programs.ÌÌà  àòòINTERRUPTIONSóó.€€Aural€interruption€of€one€person's€speech€byÌà  àà ø àanother€has€been€handled€in€many€ways€in€psycholinguistics.Ìà  àà ø àUnder€Ô_ÔQLEXÔ_Ô,€when€one€speaker€is€interrupted€by€another,Ìà  àà ø àinstead€of€ending€that€person's€passage€with€the€usualÌà  àà ø àsentence€ending,€use€the€special€symbol€(@),€meaning€thatÌà  àà ø àthis€is€the€final€word€of€a€passage€spoken€by€a€person€whoÌà  àà ø àhad€been€interrupted„„in€place€of€the€period,€question€markÌà  àà ø àor€!.€€It€is€not€uncommon€for€the€interrupter,€in€turn,€toÌà  àà ø àbe€interrupted.€€If€that€happens,€the€interrupter's€turnÌà  àà ø àshould€also€be€ended€by€the€@€symbol.€€Ô_ÔQLEXÔ_Ô€can€determineÌà  àà ø àwho€was€interrupted€and€how€often.€€A€new€ID€code€must€beÌà  àà ø àused€before€the€interrupter's€first€word€to€show€the€personÌà  àà ø àholding€the€'floor'€has€shifted€to€a€new€speaker.€€ThisÌà  àà ø àconvention€does€not€show€the€precise€point€in€the€text€whereÌà  àà ø àthe€interruption€took€place,€nor€how€many€words€were€spokenÌà  àà ø àby€the€two€speakers€simultaneously.ÌÌà  àòòPERIODSóó.€€Under€Ô_ÔQLEXÔ_Ô,€periods€are€reserved€exclusively€toÌà  àà ø àmark€the€end€of€sentences.€€In€the€case€of€abbreviationsÌà  àà ø àlike€Mrs.,€Dr.,€or€in€i.e.,€and€e.g.,€the€periods€must€beÌà  àà ø àòòomittedóó,€otherwise€the€sentence€length€measure€is€invalid.ÌÌà  àà ø àAlso,€in€some€printed€texts,€unfinished€sentences€areÌà  àà ø àsometimes€expressed€by€a€string€of€periods„„a€convention€toÌà  àà ø àsuggest€that€the€voice€tailed€off.€€Each€period€would€beÌà  àà ø àincorrectly€interpreted€as€a€sentence€of€zero€word€length,Ìà  àà ø àunless€it€is€omitted.Ì̆à  àòòMISSING€OR€UNDECIPHERABLE€WORDS€AND€THE€òò=Ô_ÔZZZZÔ_Ôóó€SYMBOLóó.€€InÐ (#è% Ðà  àà ø àspontaneous€conversations€recorded€in€their€naturalÌà  àà ø àcontexts,€background€noise€or€poor€recording€quality€canÌà  àà ø àmake€it€difficult€or€impossible€to€detect€a€missing€word€orÌà  àà ø àphrase.€€A€missing€word€is€transcribed€by€the€uniqueÌà  àà ø àexpression:€=Ô_ÔzzzzÔ_Ô.€€When€a€whole€passage€is€missing,€use€theÏà  àà ø àComment€symbol€(&&000)€at€the€beginning€of€a€line€toÌà  à€€€indicate€the€nature€of€the€problem€and€its€approximateÌà  àà ø àlength.€€One€indicator€of€a€text's€quality€is€the€frequencyÌà  à€€€of€the€=Ô_ÔzzzzÔ_Ô€symbol's€use.ÌÌ€à  àòòBRITISH€vs€AMERICAN€ENGLISHóó.€€Numerous€words€are€spelledÐ À+€%0 Ðà  àà ø àdifferently€in€the€UK€and€USA,€e.g.€grey€and€gray;€Ô_ÔpractiseÔ_ÔÌà  àà ø àand€practice.€€Use€the€American€spelling€convention,€sinceÌà  àà ø àthe€Reference€Lexicon€used€for€these€analyses€(Carroll,Ð .Ø'3 Ðà  àà ø àÔ_ÔRichmanÔ_Ô€and€Davies,€1971)€comes€from€American€English€texts.ÌÌ€à  àòòFILLED„PAUSE€SPELLINGSóó.€€Use€these€conventions€for€filledÐ Ð Ðà  àà ø àpauses:€€=uh„huh€„„€I€acknowledge,€agree„„usually€spokenÌà  àà ø àwith€rising€inflection;€=un„huh€„„€I€disagree,€no„„withÌà  àà ø àfalling€inflection;€€=um€„„€I€follow€you,€I'm€listening;Ìà  àà ø à=uh€„„€hold€it,€I'm€groping€for€the€right€word€or€what€toÌà  àà ø àsay€next;€=oh„oh€„„€a€problem.ÌÌà  àòòFALSE€STARTS€and€FRAGMENTSóó.€€All€false€starts,€incompleteÌà  àà ø àphrases€and€repetitions€should€be€typed,€as€produced.€€WordÌà  àà ø àfragments€should€be€preceded€by€the€equal€sign€since€theyÌà  àà ø àare€not€recognized€as€words€in€dictionaries.ÌÌà  àòòPRINT€CONVENTIONS€REPRESENTING€CONVERSATIONóó.€€There€is€aÌà  àà ø àpublishing€convention€about€conversation€which€requires€aÌà  àà ø àchange€to€avoid€invalid€sentence€measures.€€Publishers€doÌà  àà ø àthis:€"Where€is€it,€Mom?"€said€Jane.€€The€problem€is€thatÌà  àà ø àthere€are€two€sentence€endings€in€that€one€sentence.Ìà  àà ø àTranscribed,€it€should€be:€"Where€is€it,€=Mom,"€said€=Jane?Ìà  àà ø àA€comma€is€substituted€and€the€question€mark€shifted€to€theÌà  àà ø àend€of€the€sentence.Ì€Ìà  àòòFOREIGN€LANGUAGE€TERMS€AND€EXPRESSIONSóó.€€Use€comparableÌà  àà ø àAmericanisms€where€possible,€unless€there€is€no€suitableÌà  àà ø àexpression„„in€which€case,€type€the€foreign€expression€withÌà  àà ø àthe€words€run€together€with€one€equal€sign€in€front.ÌÌà  àòòMONEYóó.€€Convert€money€(e.g.€pounds,€marks,€yen,€and€francs)Ìà  àà ø àinto€dollars.€€The€numeric€values€will€be€wrong€but€theÌà  àà ø àconcept€is€correct.ÌÌà  àòòTECHNICAL€TERMSóó.€€Type„„as€shown€or€spoken.€ÌÌà  àòòRARE€WORD€REPETITIONSóó.€€On€occasion,€a€rare€term€is€repeatedÌà  àà ø àmany€times.€€For€example,€in€the€1,000+€word€text€sampleÌà  àà ø àtaken€from€several€òòWoody€Woodpeckeróó€cartoon€shows,€the€wordÌà  à€€€'woodpecker'€appeared€18€times.€That€one€rare€word€had€theÌà  àà ø àeffect€of€making€the€text's€statistical€description€appearÏà  àà ø àmore€difficult€than€it€is.€To€prevent€rare€word€repetitionsÌà  àà ø àfrom€giving€false€estimates€for€a€text's€lexical€difficulty,Ìà  àà ø àan€arbitrary€rule€was€adopted„„no€single€rare€word€(i.e.€aÌà  àà ø àterm€not€listed€among€the€10,000€most€common€word„types€inÌà  àà ø àEnglish)€may€appear€more€than€five€times€per€thousandÌà  àà ø àtokens.€€Every€additional€instance€of€that€term€gets€anÌà  àà ø àequal€sign€before€it.€€Use€a€Comment€header€(&&000)€toÌ€à  àà ø àdescribe€that€this€rare€rule€was€invoked,€how€often€the€=Ð 0*ð#. Ðà  àà ø àsign€was€used,€and€why€it€was€necessary.ÌÌà  àòòDIRTY€WORDSóó.€€Modern€unabridged€dictionaries€contain€someÌà  àà ø à'dirty'€words,€but€most€are€omitted.€€All€such€terms€shouldÌà  àà ø àbe€included€but€most€will€have€the€=€sign.Ð .Ø'3 Їà  àòòWORDS€REQUIRING€SPECIAL€TREATMENTóó.€€To€reduce€the€most€seriousÌà  àà ø àdistortions€of€word€use,€my€colleague€Margaret€Ô_ÔAhrensÔ_Ô€hasÌà  àà ø àcompiled€a€list€which€occur€principally€in€conversation.Ìà  àà ø àFor€each,€the€transcription€convention€is:ÌÌÌà  àà ø à1.€€òòWords€without€equal€signs:ÌóóÌà  àà ø àà ` àholidays€(e.g.€òòChristmas)óó;€€òòbyeóó;€òòbye„byeóó€(a€hyphenatedÌà  àà ø àà ` àword)€òògood„byeóó„„a€hyphenated€word).€€Carroll,€et€al.Ìà  àà ø àà ` àreport€that€the€'good„bye'€form€is€3€times€more€commonÌà  àà ø àà ` àthan€'goodbye'.€€€òògonnaóó€becomes€going€to;€and€òòwannaÌà  àà ø àà ` àóóbecomes€want€to.ÌÌà  àà ø à2.€€òòWords€requiring€the€=€signóó:ÌÌà  à€€€€€€òò=Momóó,€òò=Dadóó,€òò=honeyóó,€òò=sweetieóó,€òò=darlingóó€(when€referringÌà  àà ø àà ` àonly€to€a€specific€person),€letters€of€the€alphabet,Ìà  àà ø àà ` àstanding€alone.€€ò òImportant€exceptions:ó ó€òòò òIó óóó€and€òòò òaó óóó.€Ð P ÐÌà  àà ø à3.€òòSpecial€contractionsóó:€€All€of€these€terms€are€used€inÌà  àà ø àà ` àinformal€speech,€but€rarely€in€formal€print.€€TheÌà  àà ø àà ` àpreferred€solution€is€to€òòdecompose€themóó.€Ìòòà  àà ø àÌóó€€€€€à ø àà ` àòò=Ô_Ôhow'dÔ_Ôóó€€òò=Ô_Ôthat'dÔ_Ôóó€€òò=Ô_Ôthere'dÔ_Ôóó€€òò=Ô_Ôwhat'dÔ_Ôóó€€òò=when'sÐ I  Ðà  àà ø àà ` à=how'sóó€€òò=that'llóó€€òò=Ô_Ôthere'reÔ_Ôóó€€òò=Ô_Ôwhat'llÔ_Ôóó€€òò=where'dÌà  àà ø àà ` à=Ô_Ôhow'reÔ_Ôóó€€òò=there'llóó€€òò=what'reóó€€òò=who'dóó€€òò=Ô_Ôthere'veÔ_ÔÌà  àà ø àà ` à=what'veóó€and€€òò=why'sÌÌóóà  àà ø à4.€òòTerms€generally€omitted€by€dictionaries€but€whoseÌóóà  àà ø à€€€òòinformal€use€is€commonóó.€€Such€terms€get€the€=€sign.ÌÌ€€€€€€€€€à ` à€For€example:€€òò=ehóó€€òò=ouchóó€€òò=wowóó€€òò=Ô_ÔyuckÔ_ÔÐ ‰I  Ðóó€€€€€€€€€€€€€à ¸ à€€€€€€€€òò=Ô_ÔyupÔ_Ôóó€€òò=Ô_ÔickÔ_Ôóó€€òò=owóó€€òò=Ô_ÔyepÔ_Ôóó€€òò=yuckyÐ Q ! ÐÌà  àòŽòC.€Print's€distorting€effects€on€the€Reference€LexiconóŽóóó.€Ìà  àà ø àCarroll,€Ô_ÔRichmanÔ_Ô€and€Davies'€òòWord€Frequency€Bookóó€is€basedÌà  àà ø àon€word€use€in€printed€texts„„which€are€normally€written€inÌà  àà ø àformal€style.€€Printed€texts€distort€the€relative€frequencyÌà  àà ø àof€certain€words,€and€particularly€under„represent€wordsÌà  à€à ø àused€in€casual€conversation€(Hayes,€1988).€€EspeciallyÐ É%‰( Ðà  àà ø àunder„represented€are€household€words,€consequently,€termsÌ€à  àà ø àwhich€are€commonplace€in€a€pre„Ô_ÔschoolerÔ_Ô's€experience€(e.g.Ð Y'!* Ðà  àà ø à'pajamas',€'diaper',€'potty'€and€'bottle')€or€informal€wordsÌà  àà ø àlike€'gonna'€and€€'where'd',€appear€as€relatively€rare€wordsÏà  àà ø àin€the€Carroll,€et€al.€corpus„„the€Reference€Lexicon€commonÌà  àà ø àto€all€Ô_ÔQLEX€Ô_Ôanalyses.ÌÌà  àà ø àThe€òòmost€under„represented€wordsóó€in€print€include€'ò òIó ó'€andÐ  ,É%0 Ðà  àà ø à'ò ògood„byeó ó'.€€While€pervasive€in€actual€use,€the€Carroll,€etÐ Ñ,‘&1 Ðà  àà ø àal.€list€reports€frequencies€much€lower€than€they€appear€inÐ ™-Y'2 Ðà  àà ø ànatural€conversation,€e.g.€'good„bye'€in€print€occurs€onlyÌà  àà ø à4€times€per€million€tokens,€far€rarer€than€in€the€CornellÌà  àà ø àCorpus€of€natural€conversations.€€ÌÌà  àà ø àThe€òòmost€over„represented€wordóó€in€print€is€'ò òsaidó ó'.€ThisÐ `  Ðà  àà ø àterm€appears€in€print€to€help€the€reader€keep€track€ofÌà  àà ø àthe€speaker.€€So€common€is€the€use€of€'said'€that€it€is€theÌ€à  àà ø à43rd€most€common€word€on€Carroll,€et€al's€list„„placing€itÐ ¸ x Ðà  àà ø àamidst€all€the€highest€frequency€function€words€of€English.Ìà  àà ø à'Said'€rarely€appears€in€the€natural€conversations€in€theÌà  àà ø àCornell€Corpus.òòÌóóÌà  àòòD.€ò òÔ_ÔLEXEDITÔ_Ôó óÔ_Ô„„Ô_Ôsemi„automatic€text€editingóó.€€The€Ô_ÔLEXEDITÔ_Ô€utilityÐ  `  Ðà  àà ø àprogram€was€designed€to€do€much€of€the€text€editing,Ìà  àà ø àfaithfully,€comprehensively€and€automatically„„but€it€òòdoesóóÌà  àà ø àòònotóó€make€all€the€editing€corrections.€€Furthermore,€inÌà  àà ø àfixing€some€editing€problems,€ò òÔ_ÔLEXEDITÔ_Ô€sometimes€introducesÐ À€  Ðà  àà ø àerrors€of€its€ownó ó.€€Ô_ÔLEXEDITÔ_Ô€saves€time€in€preparing€a€textÐ ˆH  Ðà  àà ø àfor€Ô_ÔQLEX€Ô_Ôanalysis,€òòbut€all€texts€must€be€examined,€word€forÌà  àà ø àword,€to€ensure€compliance€with€the€transcription€rules„„aÌà  àà ø àlugubrious€but€necessary€processóó.€€This€is€the€price€forÌà  àà ø àhaving€a€scientific€tool,€comparability€and€interpretation.ÌÌà  àà ø àÔ_ÔLEXEDITÔ_Ô€is€a€file€in€C:\QLEX€directory.€To€use€it:Ì€à  à€€€€€€à ¸ àtype:€ò òÔ_ÔLEXEDITÔ_Ô€Ô_Ôx.ASCÔ_Ô€Ô_ÔREPLACE.LSTÔ_Ô€€ó ówhere:Ð À ÐÌ€€€à  àà ø àà ` à(a)€Ô_ÔLEXEDITÔ_Ô€is€the€utility's€nameÐ P Ðà  àà ø àà ` à(b)€Ô_Ôx.ASCÔ_Ô€is€the€name€of€the€file€(prepared€accordingÌà  àà ø àà ` à€€€€to€the€Minimal€or€Normal€editing€rules€(above);Ìà  àà ø àà ` à(c)€Ô_ÔREPLACE.LSTÔ_Ô€is€the€list€of€'fixes'€which€Ô_ÔLEXEDITÔ_ÔÌ€€€€€€€€€€€€€€€€€€€searches€for€and€makes€to€the€text€file.€€ÌÌà  àà ø àà ` àÔ_ÔREPLACE.LSTÔ_Ô€can€be€examined€by€using€the€utility€SEE.ÌÌà  àà ø àWhen€Ô_ÔLEXEDITÔ_Ô€completes€its€job€(which€takes€a€second)€à @H! àà @H! àreviewÌà  àà ø àthe€text€to€correct€any€mistakes€which€Ô_ÔLEXEDIT€Ô_Ômight€haveÌà  àà ø àomitted€or€introduced.€€After€Ô_ÔQLEXÔ_Ô€is€run€on€this€editedÌà  àà ø àfile,€one€may€find€these€errors€in€the€alphabetical€orÌà  àà ø àfrequency€listing€of€all€a€text's€words,€or€in€the€list€ofÌ€à  àà ø àResiduals€(words€Ô_ÔQLEXÔ_Ô€did€not€find€in€its€REFERENCE€LEXICONÐ ¸$x' Ðà  àà ø àof€the€10,000€most€common€English€types).€€Make€the€repairsÏà  àà ø àand€then€rerun€Ô_ÔQLEXÔ_Ô.ÌÌà  àà ø àòòMODIFYING€THE€Ô_ÔREPLACE.LSTÔ_Ô€UTILITY.óó€€This€utility€is€neededÌà  àà ø àfor€Ô_ÔLEXEDITÔ_Ô.€€It€may€be€modified„„by€adding€or€removingÌà  àà ø àwords€to€suit€your€own€needs.€Use€your€word„processor€to€getÌà  àà ø àinto€Ô_ÔREPLACE.LSTÔ_Ô,€make€your€changes€and€then€convert€thatÏà  àà ø àfile€back€into€an€ASCII€file.€€Keep€a€backup€of€Ô_ÔREPLACE.LSTÔ_ÔÌà  àà ø àsince€your€changes€may€not€work€as€planned.ÌÌà  àà ø àNo€set€of€transcription€rules€could€serve€all€purposes.€€ForÌà  àà ø àthis€reason,€one€design€goal€was€to€keep€Ô_ÔQLEXÔ_Ô€texts€as€closeÐ .Ø'3 Ðà  àà ø àto€their€original€form€as€possible.€€It€is€relatively€à @H! àà @H! àsimpleÌà  àà ø àto€delete€most€Ô_ÔLEXEDITÔ_Ô€amendments€to€the€texts,€e.g.€aÌà  àà ø à'macro'€can€be€written€to€eliminate€all€the€=€signs€beforeÌà  àà ø àproper€names,€etc;€or€to€insert€periods€for€contractionsÌà  àà ø à(like€Mrs€and€Dr);€or€to€remove€other€of€these€Ô_ÔQLEXÔ_Ô€editingÌà  àà ø àamendments.ÌÌÒX°Ó¸ÒÒ°XÒòòò òVII.€PERFORMING€A€Ô_ÔQLEXÔ_Ô€ANALYSISó óóó.Ð ¸ x Ðò òÌà  àó óòòINSTALLING€Ô_ÔQLEXÔ_Ôóó.€€The€Ô_ÔQLEXÔ_Ô€programs,€the€10,000„word€ReferenceÐ H   Ðà  àà ø àLexicon,€this€Ô_ÔLEXGUIDE.2kÔ_Ô€and€several€utilities€are€storedÏà  àà ø àon€the€LEX€CD.€€ÌÌà  àà ø àTo€install€Ô_ÔQLEXÔ_Ô,€first€get€into€the€DOS€operating€systemÌà  àà ø à(available€in€Windows€95/98€after€clicking€on€START).€€à @À! àà @À! àNext,Ìà  àà ø àmake€a€new€Directory€on€your€C:\€drive„„naming€it€C:\QLEX.€Ïà  àà ø àThen€copy€all€the€files€from€the€CDÔ_ÔÔ_Ô€to€this€new€directory:ÌÌà  àà ø àà ` àType:€ò òÔ_ÔmdÔ_Ô€c:\qlexÐ P Ѐ€€€€€€€€€à ¸ àó óthen:Ð Ø Ðà  à€€€€€€€Type:€ò òcd€c:\qlexÐ à  Ðó óà  àà ø àà ` àà ¸ àthen€go€to€your€CD€DRIVEÐ ¨h Ѐà  àà ø àà ` àType:€ò òcopy€*.*€c:\QLEXó ó.Ð p0 Ðà  àà ø àà ` àÌà  àà ø àà ` àÔ_ÔLEXGUIDE.2KÔ_Ô€can€be€printed€from€C:\QLEX.ÌÌ€€€€€€€€€€€Ô_ÔQLEXÔ_Ô€will€complete€its€analysis€of€a€~1,000„word€text€inÌ€à  àà ø àà ` àless€than€one€second.Ð X ÐÌà  àòòSTEP„BY„STEP€PROCEDURESóó.€Ìà  àÌà  àà ø à1.€While€under€DOS,€and€in€the€C:\QLEX€directory:ÌÌà  àà ø àà ` àType:€ò òÔ_ÔQLEXÔ_Ôó óÐ  È! Ðà  àà ø à€€€Ì€€€€€à ø àà ` àà ¸ àò ò[To€exit€Ô_ÔQLEXÔ_Ô,€strike€the€ESC€key].ó óÐ ˜!X# ÐÌà  àà ø à2.€€In€large€letters,€the€screen€will€show:€"QUICK€LEX"Ìà  àà ø àÌà  àà ø àà ` àà ¸ àStrike€anyò ò€ó ókey€to€proceed.Ð ¸$x' ÐÌà  àà ø à3.à ` à"What€drive€can€be€used€for€temporary€files:€C„„F?"Ð H& ) Ðà  àà ø àà ` à€You€will€be€working€in€the€c:\qlex€directoryÌà  àà ø àà ` àÌà  àà ø àà ` à[You€can€use€other€directories€for€texts€and€outputÌ€à  àà ø àà ` à€files€if€you€designate€them€when€asked].Ð h)(#- ÐÌà  àà ø àà ` à€€€€€Strike:€ò òENTER€ó ó(to€move€to€the€next€step)ò òÐ ø*¸$/ Ðó óÌà  àà ø à4.€"Which€files€do€you€want€to€process?€(Up€to€80Ìà  àà ø àà ` à€characters,€wild€cards€are€OK)"ÌÐ .Ø'3 Ðà  àà ø àà ` àà ¸ àType:€ò òIRS1040.Ô_ÔASCÔ_Ôó ó€€(this€sample€IRS€text€isÐ @ Ðà  àà ø àà ` àà ¸ àà  àcontained€in€your€Ô_ÔQLEXÔ_Ô€directory)à x àÐ È ÐÌà  àà ø àà ` àà ¸ àà  ài.e.€file€should€be€on€òòCóó:\QLEX:Ìà  àà ø àà ` àà ¸ àà  à€€€€€its€file€name€òòIRS1040óó;€andÌà  àà ø àà ` àà ¸ àà  à€€€€€its€file€extension€.òòÔ_ÔASCÔ_Ôóó€(short€for€ASCII)ÌÌà  àà ø à5.€"You've€specified€1€(or€more)€file(s).€€Do€you€want€toÌà  àà ø àà ` àspecify€more?"ÌÌà  àà ø àà ` àà ¸ àType:€ò òN€ó ó(for€No,€if€òòIRS1040.Ô_ÔASCÔ_Ôóó€is€the€only€fileÐ Ð  Ðà  àà ø àà ` àà ¸ àà  àyou€want€Ô_ÔQLEXÔ_Ô€to€analyze.€€Later€you€may€Ìà  àà ø àà ` àà ¸ àà  àchoose€to€analyze€many€texts€at€once).ÌÌà  àà ø àà ` àà ¸ àType:€ò òYó ó€(for€Yes,€if€you€want€to€add€additionalÐ 0ð  Ðà  àà ø àà ` àà ¸ àà  àfiles€already€in€the€C:\QLEX€directoryÌ€€€€€Ìà  àà ø àà ` àà ¸ ài.e.€Ô_ÔQLEXÔ_Ô€will€run€a€series€of€files€if€they€areÌà  àà ø àà ` àà ¸ à€€€€€specified€at€this€point.€The€program€willÌà  àà ø àà ` àà ¸ àà  àkeep€asking€you€to€name€more€files€until€youÌà  àà ø àà ` àà ¸ àà  àtype€ò òNó ó.Ð à  ÐÌà  à€€6.€The€screen€will€show€a€large€MENU:€these€are€analysis€&Ïà  àà ø àà ` àà ¸ àprint€options.ÌÌà  àà ø àà ` àa.€€The€top€of€your€screen€shows€the€first€six€lines€fromÏà  àà ø àà ` à€€€€the€sample€file€òòIRS1040.Ô_ÔASCÔ_Ôóó.€€Ô_ÔQLEXÔ_Ô€lets€you€examineÌà  àà ø àà ` à€€€€the€text,€header€information,€and€the€comments€thereÌà  àà ø àà ` à€€€€so€you€can€confirm€that€is€the€file€you€intend€toÌà  àà ø àà ` à€€€€analyze€and€that€the€file€is€standard€ASCII€format.ÌÌà  àà ø àà ` àb.€€The€cursor€is€in€the€box:€"Are€these€choices€OK?"Ìà  àà ø àà ` àà ¸ àMove€the€CURSOR€arrow€key€ò òDOWNó ó€once.Ð @  Ðà  àà ø àà ` àà ¸ àTo€move€around€the€MENU,€use€of€the€UP€and€DOWNÌà  àà ø àà ` àà ¸ àcursor€keys€as€you€make€your€choices).ÌÌà  àà ø àà ` àc.€€The€next€box€shows€three€possible€formats€forÌà  àà ø àà ` à€€€€Your€file:€òòFreeóó,€òòDECóó,€òòIBMóó.ÌÌà  àà ø àà ` àà ¸ àThis€choice€is€made€for€you€by€Ô_ÔQLEXÔ_Ô€from€itsÌà  àà ø àà ` àà ¸ àexamination€of€your€file.€€€There€is€nothingÌà  àà ø àà ` àà ¸ àfor€you€to€do,€here.€€Note€that€the€òòIRS1040.Ô_ÔASCÔ_ÔÌà  àà ø àà ` àà ¸ àóófile€is€identified€as€an€IBM€file.€€To€move€to€theÌà  àà ø àà ` àà ¸ ànext€decision€box:Ì€Ìà  àà ø àà ` àà ¸ àà  àStrike€the€CURSOR€key€ò òDOWN€ó óonce.Ð h)(#- ЀÌà  àà ø àà ` àà ¸ àòòTo€change€an€earlier€decisionóó,€move€the€CURSORÌà  àà ø àà ` àà ¸ àUP€or€DOWN€to€go€to€that€option,€then€make€theÌà  àà ø àà ` àà ¸ àchange.€ÌÌÐ .Ø'3 Ðà  àà ø àà ` àd.€€The€next€box€asks€which€ò òID€code€isó ó€to€be€analyzed.Ð @ Ðà  àà ø àà ` àà ¸ à111€is€the€default€(the€program€provides€ampersands)ÌÌà  àà ø àà ` àà ¸ àòòSpecial€case:€conversational€textsóó.€€Ô_ÔQLEXÔ_Ô€canÌ€à  àà ø àà ` àà ¸ àanalyze€a€text€one€†speaker€at€a€time.€€Ô_ÔQLEXÔ_Ô€needsÐ `  Ðà  àà ø àà ` àà ¸ àto€know€who€is€speaking,€to€whom€the€speaker€wasÌ€à  àà ø àà ` àà ¸ àspeaking,€and€in€what€context„„before€proceeding.Ð ð ° ÐÌà  àà ø àà ` àà ¸ àSuppose€you€are€analyzing€a€conversation€betweenÌà  àà ø àà ` àà ¸ àPresident€Nixon€and€his€two€closest€staffÌà  àà ø àà ` àà ¸ àassociates:€Ô_ÔHaldemanÔ_Ô€and€Ô_ÔErhlichman,€on€Watergate.ÌÌà  àà ø àà ` àà ¸ àWith€two€or€more€parties€in€a€conversation,€theÌà  àà ø àà ` àà ¸ àòòfirstóó€of€the€three€digits€in€an€ID€refers€toÌà  àà ø àà ` àà ¸ àthe€speaker's€identity.€€President€Nixon€could€beÏà  àà ø àà ` àà ¸ àgiven€speaker€ID€#1;€Ô_ÔHaldemanÔ_Ô,€#2;€Ô_ÔErhlichmanÔ_Ô,€#3.ÌÌà  àà ø àà ` àà ¸ àThe€òòsecondóó€of€the€three€digit€ID€number€representsÏà  àà ø àà ` àà ¸ àthe€person€to€whom€the€speaker€was€talking.Ìà  àà ø àà ` àà ¸ àÌ€€à  àà ø àà ` à€€€€€€€The€òòthirdóó€of€the€three€digit€ID€numbers€may€beÐ à  Ðà  àà ø àà ` àà ¸ àused€for€any€purpose.€Often,€this€number€refers€toÌà  àà ø àà ` àà ¸ àthe€context€in€which€this€conversation€took€placeÌà  àà ø àà ` àà ¸ à(in€the€Oval€Office„„here€assigned€ID€#1).ÌÌà  àà ø àà ` àà ¸ àà  àThus„„&&121„„means€the€speaker€was€Nixon;Ìà  àà ø àà ` àà ¸ àà  àhe€was€talking€to€Ô_ÔHaldemanÔ_Ô;€in€Oval€Office.ÌÌ€à  àà ø àà ` àà ¸ àInvestigators€make€these€ID€code€assignments.Ð  à ÐÌà  àà ø àà ` àà ¸ àò òWildcards€and€ID'só ó.€€When€Mr.€Nixon€spoke€to€bothÐ °p Ðà  àà ø àà ` àà ¸ àÔ_ÔHaldemanÔ_Ô€and€Ô_ÔErhlichmanÔ_Ô€at€the€same€time,€use€1?1„„Ìà  àà ø àà ` àà ¸ àmeaning€the€speaker€was€Nixon,€but€all€the€personsÌà  àà ø àà ` àà ¸ àto€whom€he€was€speaking€would€be€included€in€theÌà  àà ø àà ` àà ¸ àanalysis.€The€?€symbol€is€a€DOS€code€for€all€numbersÌà  àà ø àà ` àà ¸ àbetween€0„9.€ò òDo€not€assign€ID€code€0ó ó„„these€areÐ ˜!X# Ѐà  àà ø àà ` àà ¸ àreserved€strictly€for€Comments„„whose€texts€areÐ `" $ Ðà  àà ø àà ` àà ¸ àignored€in€the€QLEX€analysis.ÌÌà  àà ø àà ` àe.€€The€next€prompt€is:€"Where€to€START/STOP?"ÌÌà  à€€€€€€à ¸ àÔ_ÔQLEXÔ_Ô€allows€you€to€begin€your€analysis€wherever€youÐ H& ) Ðà  àà ø àà ` àà ¸ àwant€and€end€wherever€you€want.ÌÌ€€€€€€€€€€€€€€START€@:€ò ò1ó ó€(the€first€word€in€the€text€is€whereÐ  (`", Ðà  àà ø àà ` àà ¸ àone€starts€ordinarily€but€it€could€be€anywhereÌà  àà ø àà ` àà ¸ àyou€designate.€€The€number€refers€to€the€locationÌà  àà ø àà ` àà ¸ àin€the€string€of€word€types€(ignoring€wordsÌ€à  àà ø àà ` àà ¸ àwith€equal€signs€before€them€and€other€speakers).Ð À+€%0 ÐÌÌÐ .Ø'3 Ðà  àà ø àà ` àà ¸ àENDING€@:€ò ò1000€ó ó(or€if€you€do€not€know€how€manyÐ @ Ðà  àà ø àà ` àà ¸ àwords€are€in€the€text,€put€a€number€well€in€excessÌà  àà ø àà ` àà ¸ àof€what€you€suspect€the€file€contains).€The€outputÌà  àà ø àà ` àà ¸ àwill€tell€how€many€words€are€in€your€chosen€text.ÌÌà  à€€€€€€€€€€€€€Most€files€in€the€Cornell€Corpus€are€based€onÌà  àà ø àà ` àà ¸ à1,000+€words€but€analyses€can€be€carried€out€onÌà  àà ø àà ` àà ¸ àas€few€as€250€and€as€many€as€64,000€words.€€TheÌà  àà ø àà ` àà ¸ àlower€limit€is€arbitrary,€but€function€wordsÌà  àà ø àà ` àà ¸ àaccount€for€about€half€of€all€a€text's€words,€Ìà  àà ø àà ` àà ¸ àso€in€a€500€word€text,€only€about€250€words€Ìà  àà ø àà ` àà ¸ àare€available€on€which€to€construct€the€necessaryÌà  àà ø àà ` àà ¸ àcumulative€frequency€distribution€from€which€LEX€isÏà  àà ø àà ` àà ¸ àdetermined.€ÌÌà  àà ø àà ` àà ¸ àSO„„small€n€texts€increase€LEX's€standard€error.€ÌÌ€€€€€€€€€€€€€€€€€At€the€top€of€Ô_ÔQLEXÔ_Ô's€principal€output€fileÌà  àà ø àà ` àà ¸ àis€a€report€on€how€many€words€are€in€the€file.Ìà  àà ø àà ` àà ¸ àFrom€that€information€you€may€divide€the€textÌà  àà ø àà ` àà ¸ àinto€as€many€sub„files€as€you€want.ÌÌà  àà ø àà ` àà ¸ àThat€same€output€file€tells€you€where€youÌà  àà ø àà ` àà ¸ àended€in€the€text„„if€you€specified€a€shorterÌà  àà ø àà ` àà ¸ àfile€than€it€proves€to€be.ÌÌà  àà ø àà ` àà ¸ àFor€example:€in€the€òòIRS1040.Ô_ÔASCÔ_Ôóó€sample€text,€afterÌà  àà ø àà ` àà ¸ àthe€1000th€word,€the€next€words€were:€€'owe€orÌà  àà ø àà ` àà ¸ àget€a€refund€even'.€ÌÌà  àà ø àà ` àf.€€The€next€box€in€the€Menu,€ò òOUTPUT€OPTIONSó ó,€allowsÐ °p Ðà  àà ø àà ` àà ¸ àyou€to€decide€what€is€to€be€included€in€yourÌà  àà ø àà ` àà ¸ àoutput.€There€is€a€designated€default€(Yes€orÌà  àà ø àà ` àà ¸ àNo)€for€each€option.€€Your€choice€is€to€use€theÌà  àà ø àà ` àà ¸ àdefault€or€change€it€to€the€opposite€option.€€AsÌà  àà ø àà ` àà ¸ àyou€move€the€cursor€down,€either€hit€the€CURSORÌà  àà ø àà ` àà ¸ àDOWN€key€(opting€for€the€designated€option)€orÌà  àà ø àà ` àà ¸ àstrike€the€key€(Y€or€N)€to€give€the€oppositeÌà  àà ø àà ` àà ¸ àoption.€The€Y€or€N€symbol€will€change€reflectingÌà  àà ø àà ` àà ¸ àyour€new€choice.Ìà  àà ø àà ` àÌà  àà ø àà ` àà ¸ àYou€can€examine€the€IRS1040.OUT€file€to€see€ifÌà  àà ø àà ` àà ¸ àyou€have€made€the€choices€you€want€by:Ìà  àÌà  àà ø àà ` àà ¸ àà  àType:€ò òsee€irs1040.outó óÐ  (`", Ðà  àà ø àÌà  àòòOption€1óó.€òòSENTENCE€ANALYSISóó.€€The€default€option€is€YÌà  àà ø à(yes)„„Ô_ÔQLEXÔ_Ô€will€print€out€the€full€sentence€analysisÌà  àà ø àin€Ô_ÔQLEXÔ_Ô's€analysis€of€the€IRS1040.Ô_ÔASCÔ_Ô€file.€€I€urge€youÌà  àà ø àò òNOTó ó€to€request€this€output€since€it€takes€many€minutes€toÐ ˆ,H&1 Ðà  àà ø àprint€out.€€Rather,€view€the€output€from€the€monitor.ÌÐ .Ø'3 Ðà  àà ø àTo€get€this€sentence€analysis€output,€simply€leaveÌà  àà ø àthe€default€option€in€place€and€skip€to€the€nextÌà  àà ø àoption.€If€you€do€not€want€the€sentence€analysisÌà  àà ø àoutput,€strike€ò òN,€ó ówhich€also€moves€the€cursor€toÐ ˜X Ðà  àà ø àthe€next€option.ÌÌà  àà ø àòòNOTE!óó€Once€you€change€one€of€these€options,€thatÌà  àà ø àchoice€will€remain€in€effect€for€all€successive€Ô_ÔQLEXÔ_ÔÌà  àà ø àanalyses€until€you€exit€from€Ô_ÔQLEXÔ_Ô€(using€the€ESCAPEÌà  àà ø àkey).€€If€you€are€doing€many€Ô_ÔQLEXÔ_Ô€analyses€at€once,€itÌà  àà ø àis€convenient€not€to€have€to€change€these€options.ÌÌà  àà ø àòòYou€may€change€back€to€the€default€or€chose€some€otherÌà  àà ø àoption€before€beginning€any€new€Ô_ÔQLEXÔ_Ô€analysisóó.Ì€òòÌóóà  àòòOption€2óó.€òòHISTOGRAM€(and€table)€of€SENTENCESóó.€€The€default€isÌà  àà ø àY.€€You€must€type€N€if€you€do€not€want€the€graph€andÌà  àà ø àtable€printed€out.€€This€information€is€shown€as€page€1Ìà  àà ø àof€the€sample€text€output„„part€of€IRS1040.OUT€(cf€p€41).ÌÌà  àòòOption€3óó.€òòWORD€LISTóó.€€In€any€1,000„word€text,€there€willÌ€à  àà ø àordinarily€be€between€350€to€550€word„types.€The€defaultÐ ¨h Ðà  àà ø àis€ò òYó ó,€i.e.€the€output€will€show€each€of€these€words,€listedÐ p0 Ðà  àà ø àtwice:€alphabetically€and€by€frequency€of€usage.ÌÌà0  àà0ø(#(#àThis€word€list€requires€many€pages€of€output„„but€it€à @Ð àisÐ ø(#ø(# Ѐà  àà ø àuseful€for€finding€typos€and€editing€errors.Ð P ÐÌà  àà ø àIn€a€separate€section€of€the€output€is€a€list€of€words€toÌà  àà ø àwhich€you€gave€names,€places,€numbers,€nonsense€words,Ìà  àà ø à=Ô_ÔzzzzÔ_Ô,€and€other€terms€specified€in€the€transcription€rules.Ïà  àà ø àIf€this€word„list€is€of€no€interest€to€you,€type€ò òN.Ð x8 Ðó óÌà  àòòOption€4.óó€òòRESIDUALSóó.€€These€are€the€text's€òòuncommon€and€rareóóÏà  àà ø àwords,€i.e.,€words€which€occur€fewer€than€three€times€perÌà  àà ø àmillion€in€the€Carroll,€et€al's€corpus.€€None€are€among€theÌà  àà ø à10,000€most€common€English€words.€€Ô_ÔQLEXÔ_Ô€could€not€find€theseÌà  à€€terms€in€its€Reference€Lexicon.€€Your€option€is€to€print€orÌà  à€€not€to€print€this€list€of€Residual€Words.€€You€will€want€toÌà  àà ø àòòexamine€this€listóó€since€typos€turn€up€here,€but€you€can€doÌà  àà ø àit€from€your€monitor€more€easily€than€from€paper.ÌÌà  àà ø àThis€RESIDUAL€list€is€useful€because€it€shows€which€wordsÌà  àà ø àamong€the€10,000€most€common€English€word„types€areÌà  àà ø àclosest€in€spelling€to€each€residual€term.€€Many€residualsÌà  àà ø àare€mere€derivatives€or€inflections€of€common€word„types.ÌÌà  àà ø àTexts€from€newspapers€and€popular€magazines€may€have€100+Ìà  àà ø àresiduals/1,000€word€sample.€The€default€is€ò òYó ó.Ð À+€%0 Ðà  àà ø àÌà  àà ø àOne€of€the€experiments€in€validating€the€LEX€measuresÌà  àà ø àexplored€the€relative€contribution€made€by€words€atÐ .Ø'3 Ðà  àà ø àdifferent€ranks€on€Carroll,€et€al's€list€of€the€10,000Ìà  àà ø àmost€common€words€to€identifying€the€topic€or€subject€of€à @°" àà @°" àtheÌà  àà ø àtext€passage.€€There€is€a€powerful€relationship€between€aÌà  àà ø àword's€frequency€of€use€and€the€information€that€wordÌà  àà ø àconveys.€Statistically,€uncommon€and€rare€words€convey€farÏà  àà ø àmore€information€than€common€words.€€òòWhen€this€finding€isÌà  àà ø àcombined€with€our€research€on€word€polysemy,€not€only€doÌà  àà ø àuncommon€and€rare€words€rarely€have€multiple€meanings€(i.e.Ìà  àà ø àtheir€meaning€is€less€conditional€on€word€context€and€syntaxÌà  àà ø àthan€common€words),€they€convey€more€information€regardingÌà  àà ø àthe€substance€of€the€passage€than€do€common€wordsóó.€ÌÌà  àà ø àò òÔ_ÔQLEXÔ_Ô's€10,000€word€Reference€Lexiconó ó.€€In€Carroll,€etÐ  `  Ðà  àà ø àal's€Reference€Lexicon,€word„types€'THE',€'The',€and€'the'Ïà  àà ø àare€considered€three€distinct€word„types,€i.e.€upper€andÌà  àà ø àlower€cases€of€the€same€'word'€are€different€word„types.Ìà  àà ø àÔ_ÔQLEXÔ_Ô€combines€such€terms,€with€the€result€that€theÌà  àà ø àfirst€10,000€types€in€the€Reference€Lexicon€do€not€preciselyÌà  àà ø àmatch€those€in€the€Carroll,€et€al€list.ÌÌà  àà ø àIn€combining€frequencies€of€upper€and€lower€case€terms,Ìà  àà ø àÔ_ÔQLEXÔ_Ô's€first€10,000€terms€are€roughly€equivalent€to€theÌà  àà ø àfirst€10,500€on€the€Carroll,€et€al.€list,€and€the€combinedÏà  àà ø àvalues€slightly€rearrange€the€rankings€of€terms„„relative€toÌà  àà ø àthose€shown€in€Carroll,€et€al.€These€are€very€minor€matters.ÌÌà  àà ø àòòò òHow€to€examine€this€10,000€type€Reference€Lexicon?óóó ó€€Ô_ÔQLEXÔ_Ô'sÐ P Ðà  àà ø à10,000€REFERENCE€LEXICON€is€a€readable€ASCII€file€namedÌà  àà ø àÔ_ÔòòASCIDICT.óó€A€compressed€version€of€that€file€can€be€found€onÌà  àà ø àthe€Ô_ÔQLEX€Ô_Ôdiskette€under€the€name€ò òÔ_ÔDICTION.QLXÔ_Ôó ó.€€You€canÐ è¨ Ѐà  àà ø àexamine€this€10,000„type€lexicon€while€in€C:\QLEX€by:Ð °p Ðà  àà ø àÌà  àà ø àà ` àType:€ò òÔ_ÔDODICTÔ_Ô€ó óand€use€the€arrow€keys€to€move€aboutÐ @  ÐÌòòà0  àOption€5óó.€òòDISTRIBUTION€BY€FREQUENCYóó.€ò òThis€1„page€analysis€à @8" àà @8" àisó óÐÐ "(#(# Ðà  àà ø àò òthe€most€important€table€produced€by€Ô_ÔQLEXÔ_Ôó ó.€€Columns€on€theÐ ˜!X# Ðà  àà ø àleft€describe€the€frequencies,€the€proportions€and€theÌà  àà ø àcumulative€proportions€for€word€from€that€text.€€It€showsÌà  àà ø àhow€the€speaker€or€author€drew€upon€all€those€10,000€mostÌà  àà ø àcommon€word„types.€Texts€are€compared€by€their€relative€useÌà  àà ø àof€these€cumulative€proportion€distributions,€and€by€theirÌà  àà ø àuse€of€the€~600,000€other€rarer€content€terms.€These€numbersÌà  àà ø àare€later€used€by€the€Ô_ÔMIGNONÔ_Ô€program€to€calculate€LEX.ÌÌà  àà ø àThis€table€shows€how€often€each€of€the€ten€most€commonÌà  àà ø àgrammatical€words€in€English€was€used€in€that€text€(i.e.,Ìà  àà ø àthe,€of,€and,€a,€in,€etc),€singly€and€cumulatively.Ìà  àà ø àThis€table€shows€that€in€the€1000€word€IRS1040.ASC€text,Ìà  àà ø à'the'€was€used€71€timesÔ_ÔÔ_Ô,€i.e.€alone€'the'€accounted€for€7.1%Ïà  àà ø àof€all€its€terms.€€The€first€ten€most€common€English€termsÌà  àà ø à(all€grammatical)€accounted€for€24.5%€of€its€terms.€TenÌà  àà ø àpercent€of€that€text's€words€were€òòrareóó€words€(i.e.,€theyÐ .Ø'3 Ѐà  àà ø àoccur€fewer€than€3€times/million€in€general€usage).Ð @ ÐÌà  àà ø àFollowing€this€one„page€distribution€table€is€anotherÌ€à  àà ø àtable€showing€the€frequency€per€million€of€that€òòcontentóó€orÐ ˜X Ðà  àà ø àòòopen€classóó€word€in€your€text€which€fell€at€the€10th,€25th,Ìà  àà ø à50th,€75th€and€90th€percentile€ranks.€The€default€is€ò òYó ó,Ð ( è Ðà  àà ø àsince€virtually€all€lexical€analyses€include€this€table.€€Ìà  àà ø àÌà  àà ø àà ` àòòRememberóó,€a€LEX€score€excludes€the€text's€use€of€allÌà  àà ø àà ` à75€most€common€function€or€closed€class€English€terms.ÌÌà  àòòOption€6.óó€€òòThe€1„2„3€OUTPUT€FILEóó.€You€have€the€option€here€ofÏà  àà ø àhaving€the€main€lexical€and€sentence€measures€from€Ô_ÔQLEXÔ_ÔÌà  àà ø àanalysis€sent€to€the€printer€or€sent€to€a€computer€file.Ìà  àà ø àThis€information€is€needed€by€Ô_ÔMIGNONÔ_Ô€to€calculate€LEX„„theÌà  àà ø àprincipal€statistic€from€Ô_ÔQLEXÔ_Ô.€€You€will€also€want€toÌà  àà ø àcombine€this€file€with€others€in€statistical€analyses.Ìà  àà ø àThis€file€may€then€be€imported€into€LOTUS€1„2„3€(developmentÌà  àà ø àof€these€programs€goes€back€19€years)€or€to€any€otherÌà  àà ø àspreadsheet€and€then€on€to€your€statistical€package.€€TheÌà  àà ø àdefault€is€N.Ìò òÌà  àà ø àà ` àNOTE.€ó óThere€is€a€quirk€in€Ô_ÔQLEXÔ_Ô.€€Before€you€call€forÐ p0 Ðà  àà ø àà ` àÔ_ÔQLEXÔ_Ô€to€produce€this€1„2„3€file,€your€text€must€haveÌà  àà ø àà ` àrun€successfully.€ÌÌà  àà ø àà ` àà ¸ àòòFirstóó€run€Ô_ÔQLEXÔ_Ô€without€asking€for€the€x.123Ìà  àà ø àà ` àà ¸ à€€€€€€output€file.Ìà  àà ø àà ` àà ¸ àIf€it€ran€successfully:Ìà  àà ø àà ` àà ¸ àà  à(a)€delete€the€Ô_Ôx.outÔ_Ô€file,€thenÌà  àà ø àà ` àà ¸ àà  à(b)€rerun€Ô_ÔQLEXÔ_Ô€asking€for€both€the€Ô_Ôx.outÔ_Ô€andÌà  àà ø àà ` àà ¸ àà  àà h à€x.123€files.ÌÌà  àà ø àà ` àOnly€the€first€time€you€run€a€text€with€Ô_ÔQLEXÔ_Ô€need€youÌà  àà ø àà ` ànot€ask€for€this€1„2„3€output€file.€Thereafter,€it€can€beÌà  àà ø àà ` àproduced€on€the€first€time€you€run€any€Ô_ÔQLEXÔ_Ô€job,€ò òso€longÐ ˜!X# Ðà  àà ø àà ` àas€there€are€no€editing€errors€in€the€textó ó.Ð `" $ ÐÌà  àòòOption€7óó.€òòCALCULATIONS€ON€PERCENTILES€vs€RANK€CURVEóó.Ìà  àà ø àò òIgnore€all€these€numbersó ó„„in€developmental€work€on€Ô_ÔQLEXÔ_Ô,€allÐ ¸$x' Ðà  àà ø àsorts€of€special€analyses€were€tried,€but€not€erased€fromÌà  àà ø àthe€program„„so€ignore€them.ÌÌÌà  àòòOption€8óó.€òòSEND€TO€PRINTERóó.€ò òThis€is€importantó ó„„normally€à @Ð àà @Ð àoneÐ  (`", Ðà  à€à ø àcannot€take€the€time€to€print€out€these€long€outputsÐ h)(#- Ðà  àà ø ànor€would€you€want€them„„they€are€voluminous.€€Save€outputÌà  àà ø àfiles€as€computer€files€and€then€examine€them€from€yourÌà  àà ø àmonitor.€€The€default€is€Y.€€By€typing€ò òNó ó,€the€entireÐ À+€%0 Ðà  àà ø àanalysis€output€will€be€put€into€a€computer€file€(whoseÌà  àà ø àname€you€supply)€which€can€then€be€examined€at€leisureÌà  àà ø àon€the€screen.€€You€can€later€copy€or€print€that€file.Ð .Ø'3 Їà  àà ø àò òPRINTING€OUTPUTó ó.€€Printing€output€from€a€>1,000„wordÐ @ Ðà  àà ø àtext€takes€about€twelve€pages„„(cf.€IRS1040.out)Ì€à  àà ø àconsequently€users€are€advised€to€store€both€Ô_ÔQLEXÔ_Ô€outputÐ Ð Ðà  àà ø àfiles€(x.out€and€x.123)€on€a€hard€drive,€diskette€or€ZIPÌà  àà ø àdrive€and€inspect€the€contents€from€the€monitor.Ìà  àà ø àò òÌà  àà ø àPRINTER€WRAP„AROUNDó ó„„The€formatted€output€fits€on€8.5"„wideÐ ð ° Ðà  àà ø àpaper€but€there€may€be€a€wrap„around€problem€unless€youÌà  àà ø àspecify€a€smaller„than„usual€type€font€for€your€printer.Ì€€€€€€€€€€à ¸ àà  àÐ H   Ðà  àòòOption€9.óó€òòSEND€TO€FILEóó.€If€you€specified€ò òNó ó€to€option€8,€thenÐ Ð  Ðà  àà ø àyou€must€send€the€Ô_ÔQLEXÔ_Ô€output€to€a€computer€file€whose€nameÌà  àà ø àyou€supply.€€Type€ò òYó ófor€this€option.€€Since€the€default€toÐ  `  Ðà  àà ø àoption€8€is€ò òYó ó,€the€†default€for€option€9€ò òMUST€BEó ó€ò òN€(No)ó ó.Ð h(  ÐÌà  àòòOption€10óó.€òòADD€COMMENTARYóó.€€If€you€type€ò òYó ó,€this€option€allowsÐ ø¸  Ðà  àà ø àyou€to€add€commentary€to€your€output€files„„e.g.€why€wasÌ€à  àà ø àthis€text€examined?€€The€default€is€ò òNó ó.€€Strike€the€CURSORÐ ˆH  Ðà  àà ø àDOWN€once€bringing€you€to€the€next€box€on€the€menu.€NormallyÌà  àà ø àno€commentary€is€added€so€type€ò òNó ó.Ð Ø Ðà  àg.€Supply€NAMES€for€these€OUTPUT€FILES.Ìà  àÌà  àà ø àò òòòThe€Ô_ÔQLEXÔ_Ô€OUTPUT€FILEóóó ó.€€If€you€specified€ò òYó ó€to€a€printerÐ p0 Ðà  àà ø àoutput,€and€ò òNó ó€to€the€statistics€output€file€(i.e.€theÐ 8ø Ðà  àà ø à1„2„3€output€file,€then€you€may€skip€over€these€options.ÌÌà  àà ø àIF€you€chose€ò òYó ó€to€Option€9€(i.e.€you€want€to€createÐ P Ðà  àà ø àan€output€computer€file„„the€normal€case),€then€here€isÌà  àà ø àwhere€you€supply€the€needed€File€Name:ÌÌà  àà ø àà ` àType:€ò òÔ_Ôx.outÔ_Ô€ó ó€('x'€stands€for€the€text's€file€nameÐ °p Ѐ€€€€€€€€€€€€€€€€€€€€€€€€„„up€to€eight€characters/numbers).Ì€€à  àà ø àà ` à€€€€€This€file's€extension€ò òmust€beó ó€ò òÔ_Ôx.OUTÔ_Ôó óÐ @  ÐÌà  àà ø àÔ_ÔQLEXÔ_Ô€puts€everything€it€would€have€printed€into€thisÌà  àà ø àfile,€and€stores€it€on€the€Ô_ÔQLEXÔ_Ô€directory.€If€youÌà  àà ø àanticipate€storing€a€great€many€Ô_ÔQLEXÔ_Ô€outputs,€you€canÌà  àà ø àdesignate€some€other€drive€and€directory€for€these€files,Ìà  àà ø àin€front€of€the€file€name€(e.g.€c:\LEXFILES).€€Ô_ÔQLEXÔ_Ô€will€putÏà  àà ø àthe€output€file€(Ô_Ôx.outÔ_Ô)€there.ÌÌà  àà ø àTo€examine€the€contents€of€this€major€output€file:ÌÌà  àà ø àà ` àType:€ò òSEE€Ô_Ôx.outÔ_Ôó ó€€use€the€cursor€to€move€about.Ð Ø'˜!+ ЀÌà  àà ø àà ` àThe€òòSeeóó€utility€is€useful€for€examining€any€ASCIIÌà  àà ø àà ` àfile.€€Exit€from€SEE€by€striking€the€ESC€key.ÌÌà  àà ø àTo€print€this€output€file:ÌÌà  àà ø àà ` àType€ò òPRINT€Ô_Ôx.OUTÔ_ÔÐ P-'2 Ѐ€€€€€€€€€€€€€€€€ó óthen,€in€response€to€the€prompt:€type€ò òPRNÐ .Ø'3 Ðó ó‡à  àà ø àò òòòThe€1„2„3€Statistics€File€Nameóóó ó.€€This€option€supplies€theÐ @ Ðà  àà ø ànecessary€data€on€the€cumulative€proportion€distributionÌà  àà ø àwhich€another€program€called€Ô_ÔMIGNONÔ_Ô€uses€to€calculateÌà  àà ø àLEX„„the€statistic€which€describes€the€direction€of€skewÌà  àà ø àand€the€magnitude€of€that€text's€skew€in€word€choice.ÌÌà  àà ø àà ` àType:€ò òx.123ó ó€using€the€same€filename€as€in€theÐ ð ° Ѐ€€à  àà ø à€€à ` àASCII€file€but€.123€as€the€file's€extension.Ð ¸ x ÐÌà  àà ø àTo€print€the€contents€of€the€file:Ìà  àà ø àÌà  àà ø àà ` àType:€ò òPRINT€x.123ó óРؘ  Ðà  àà ø àà ` àthen,€in€response€to€the€prompt:Ìà  àà ø àà ` àType:€ò òPRNó óÐ h(  ÐÌà  àh.€òòSKIP€THIS€FILEóó?€€The€default€for€this€Menu€option€is€N.Ìà  àà ø àSince€you€can€use€wildcards€(e.g.€*€or€?)€to€identify€aÌà  àà ø àwhole€series€of€texts€to€analyze€in€a€single€Ô_ÔQLEXÔ_Ô€run,Ìà  àà ø àyou€have€the€option€of€skipping€a€particular€file€withinÌà  àà ø àthat€set€of€files.ÌÌà  àà ø àTo€skip€the€file€shown€at€the€top€of€the€menu€screen:ÌÌà  àà ø àà ` àType:€ò òY€€ó ó(skip€that€text)ò òÐ 8ø ÐÌà  àà ø àà ` àó óòòRemember!óó€€if€you€want€to€analyze€the€very€nextРȈ Ðà  àà ø àà ` àtext,€change€this€option€back€to€ò òNó ó€(meaning„„IÐ P Ðà  àà ø àà ` àdon't€want€to€skip€this€next€text).Ìà  àÌÔ_Ôà  àiÔ_Ô.€òòARE€THESE€CHOICESò ò€ó óOKóó?€€After€making€so€many€choices,Ð è¨ Ðà  àà ø àthe€program€gives€you€a€chance€to€review€your€decisionsÌà  àà ø àbefore€the€Ô_ÔQLEXÔ_Ô€analysis€of€texts€begins.€The€default€isÌà  àà ø àY.€€Before€striking€ENTER€(which€starts€the€analysis),Ìà  àà ø àcheck€to€see€if€your€choices€are€exactly€what€you€want.Ìà  àà ø àIf€not,€use€the€cursor€key€to€move€about€the€Menu,€makeÌà  àà ø àthe€necessary€correction,€then€return€to€this€point.ÌÌà  àà ø àà ` àStrike€ò òENTERó 󀄄€and€Ô_ÔQLEXÔ_Ô€begins€the€analysisÐ (#è% ÐÌà  àà ø àWhen€you€analyze€many€texts€in€one€Ô_ÔQLEXÔ_Ô€run,€thisÌ€à  àà ø àprocess€is€repeated€for€each€new€text€displayedÐ €%@( Ðà  àà ø àat€the€top€of€the€screen€until€you€have€gone€through€theÌà  àà ø àentire€set.€€If€all€the€analyses€involve€the€same€MenuÌà  àà ø àoptions,€all€you€need€do€is€give€the€output€files€†theirÌà  àà ø ànames€and€strike€ENTER€after€each€file.€€Not€until€theÌà  àà ø àlast€sample€text's€choices€have€been€made€will€the€ENTERÌà  àà ø àkey€start€Ô_ÔQLEXÔ_Ô.€ÌÌ€€€à  àò òHOW€LONG€DOES€A€Ô_ÔQLEXÔ_Ô€JOB€TAKE?€€Typically,€under€1€SECONDó óÐ À+€%0 ÐÒX°ì#ÒÒ°XÒò òÌÌÐ .Ø'3 ÐòòVIII.€€Ô_ÔMIGNON:€the€program€which€calculates€the€final€LEX€measuresÔ_Ôóóó óÐ @ ІÌà  àThe€C:\QLEX€directory€contains€MIGNON€programs.€òòÔ_ÔMIGNONóó,€aÌ€à  àsubset€of€Ô_ÔQLEXÔ_Ô's€programs,€calculates€LEX€and€the€otherÐ ˜X Ѐà  àstatistics,€and€òòproduces€two€new€files,€Ô_Ôx.LEXÔ_Ô€and€x.321óó.Ð `  Ðà  àTo€run€Ô_ÔMIGNONÔ_Ô,€first€run€Ô_ÔQLEX€to€produce€the€cumulativeÌà  àfrequency€distributionÔ_Ô,€and€then€check€to€confirm€that€òòitsÌà  àóótwo€output€files€Ô_Ô(x.outÔ_Ô€and€x.123)€were€saved.ÌÌà  àÔ_ÔMIGNONÔ_Ô€operates€on€a€text's€x.123€file.€€It€calculates€LEX€Ïà  àby€integrating€the€òòò òAREAó óóó€beneath€a€text's€cumulativeÐ Ð  Ðà  àproportion€distribution„„produced€during€the€Ô_ÔQLEXÔ_Ô€analysisÌà  à(e.g.,€this€can€be€seen€in€IRS1040.OUT„„attached€to€LEXGUIDE.ÌÌà  àNext,€Ô_ÔMIGNONÔ_Ô€compares€two€areas€„„the€size€of€this€text'sÌà  àAREA€is€contrasted€with€a€òòconstant€areaóó„„the€integrated€òòAREAÌà  àóóbeneath€Ô_ÔHerdanÔ_Ô's€theoretical€Ô_ÔlognormalÔ_Ô€model€of€word€choice„„¼ñ üñ¼ñ üñà  àwhich€we€now€know€is€closely€approximated€(empirically)€byÌà  àword€choice€in€the€world€sample€of€newspapers€(mean€LEX€=Ì€à  à0.0).Ð Ø ÐÌà  àò òLEXó ó€is€the€difference€between€those€two€AREAS:Ð ¨h Ðà  àà ø à(1)€the€area€under€a€text's€cum.€prop.€distribution,€andÌà  àà ø à(2)€the€area€under€Ô_ÔHerdanÔ_Ô's€theoretical€Ô_ÔlognormalÔ_Ô€model€ofÌà  àà ø à€€€€word€choice.ÌÌà  àà ø àà ` à[ò ò„€LEXó ó]€texts:Ð P Ðà  àà ø àà ` àà ¸ àthese€natural€texts€have€larger€areas€thanÌà  àà ø àà ` àà ¸ àdoes€Ô_ÔHerdanÔ_Ô's€linear€model,€i.e.€theirÌà  àà ø àà ` àà ¸ àtexts'€word€choice€is€skewed€toward€commonÌà  àà ø àà ` àà ¸ àwords.€€These€texts€are€'lexically€simpler'Ì€€€€€€€€€€€€€€€€€€€€€or€more€accessible€than€the€average€newspaper.€Ïà  àà ø àà ` àà ¸ àFor€that€reason,€such€texts€are€given€the€(ò ò„ó ó)€LEXÐ @  Ðà  àà ø àà ` àà ¸ àsign.ÌÌà  àà ø àà ` à[ò ò+€LEXó ó]€texts:Ð ˜!X# Ðà  àà ø àà ` àà ¸ àthese€natural€texts€have€smaller€areas€thanÌà  àà ø àà ` àà ¸ àthat€under€Ô_ÔHerdanÔ_Ô's€linear€model,€i.e.€theirÌà  àà ø àà ` àà ¸ àtexts'€word€choice€was€skewed€toward€rareÌà  àà ø àà ` àà ¸ àwords.€€These€texts€are€more€'difficult'Ìà  àà ø àà ` àà ¸ àlexically€(less€accessible)€than€the€averageÌ€à  àà ø àà ` àà ¸ ànewspaper.€€For€that€reason,€such€texts€are€givenÐ H& ) Ðà  àà ø àà ` àà ¸ àthe€(ò ò+ó ó)€sign.Ð 'Ð * ÐÌà  àà ø àà ` à[ò òLEX€magnitudeó ó]:Ð  (`", Ðà  àà ø àà ` àà ¸ àthis€is€the€òòquantitative€difference€in€areaóó€beneathÏà  àà ø àà ` àà ¸ àthe€two€distributions€(the€empirical€versus€theÌ€à  àà ø àà ` àà ¸ àtheoretical).€The€larger€this€number€the€more€(less)Ð ø*¸$/ Ðà  àà ø àà ` àà ¸ àaccessible€the€text.ÌÌà  àA.€òòProcedures€for€running€Ô_ÔMIGNONÔ_Ôóó.€€It€takes€Ô_ÔMIGNONÔ_Ô€less€thanÏà  àà ø àa€second€to:€(a)€calculate€the€LEX€statistics€and€(b)Ð .Ø'3 Ðà  àà ø àproduce€the€two€new€computer€files:€Ô_Ôx.LEXÔ_Ô€and€x.321.€€FirstÌà  àà ø àtry€Ô_ÔMIGNON€Ô_Ôon€the€IRS1040.123€sample€output.ÌÌà  àà ø àà ` àType:€ò òÔ_ÔMIGNONÔ_ÔÐ ˜X Ðó óÌà  àà ø àà ` àThen,€answer€the€Ô_ÔquiryÔ_Ô€giving€the€file's€name:Ì€€€€€€€€€€€€€€€€€€€€€€€€€€€Ìà  àà ø àà ` àType:€ò òirs1040ó óÐ ¸ x ÐÌ€€€à  àà ø àòòò òNoteó óóó„„(a)€use€only€the€file€name;Ð H   Ѐ€€à  àà ø à€€€€€€(b)€òòomitóó€the€period€and€its€extensionÐ Ð  ЀÌà  àà ø àÔ_ÔMIGNONÔ_Ô€calculates€and€produces€the€LEX€and€other€lexicalÌà  àà ø àà ` àstatistics,€and€two€new€files:Ìà  àà ø àà ` àà ¸ à(a)€1040.321Ìà  àà ø àà ` àà ¸ à(b)€IRS1040.LEXÌÌà  àà ø àWhen€you€run€Ô_ÔMIGNON€on€Ô_Ôthe€IRS1040.LEX€file,€the€resultsÌà  àà ø àappear€on€your€monitor.€€That€analysis€produces€eightÌà  àà ø àstatistics„„beginning€on€the€leftmost:€ò òòò'LEX1'ó ó„„the€best,Ð Ø Ðà  àà ø àwell„validated€measure€of€a€text's€accessibility,Ìà  àà ø àdifficulty€or€comprehensibilityóó.Ì€Ìà  àà ø àLEX€(open€class€words€only)€and€LEX1€are€one€and€the€same.ÌÌà  àà ø àòòWhat€then€is€LEX2óó?€€Our€research€has€established€thatÌ€à  àà ø àgrammatical,€function€or€closed„class€terms€contributeÐ P Ðà  àà ø àvirtually€no€increment€in€predictive€power€to€a€text'sÌà  àà ø à'lexical€difficulty'„„beyond€that€produced€by€the€text'sÌà  àà ø àchoice€among€the€10,000€most€common€content€or€open€classÏà  àà ø àtypes.Ìà  àÌà  àà ø àMuch€effort€went€into€finding€the€Ô_ÔminimaxÔ_Ô€solution€forÌà  àà ø àwhere€to€partition€closed€from€open„class€terms€(sinceÌà  àà ø àthey€overlap€and€some€words'€grammatical€role€isÌ€à  àà ø àconditional).Ð ˜!X# Ðà  àÌÒX°n¬Òà  àà ø àà ` àòòLEX1óó€partitions€this€array€between€word€ranks€75€andÌà  àà ø àà ` à76„„meaning€that€LEX1€includes€a€text's€use€of€allÌà  àà ø àà ` àtypes€ranked€between€76€and€10,000.ÌÌà  àà ø àà ` àòòLEX2óó€sets€the€partition€point€between€word€ranksÌà  àà ø àà ` à35€and€36,€so€it€includes€all€types€ranked€from€36Ìà  àà ø àà ` àthrough€10,000.€€An€empirical€case€can€be€made€forÌà  àà ø à€€€€partitioning€the€first€and€most€common€10,000€word€typesÌà  àà ø à€€€€at€an€even€lower€rank.ÌÌà  àà ø àòòWhat€is€contained€in€this€new€IRS1040.321€fileóó?Ìà  àà ø àà ` àIRS1040.321€contains€the€percentages€on€which€itsÌà  àà ø àà ` àcumulative€proportion€distribution€is€produced,€andÌà  àà ø àà ` àseveral€new€exploratory€measures€created€by€Ô_ÔMIGNONÔ_Ô.Ìà  àà ø àà ` àThe€identity€of€the€selected€set€of€99€variables€plusÐ .Ø'3 Ðà  àà ø àà ` àthe€file's€name€is€described€in€Section€XV.€In€textÌà  àà ø à€€€€research,€one€normally€compares€sets€of€texts€with€oneÌà  àà ø à€€€€another€(e.g.€17th€&€18th€C.€newspapers€versus€19th€C.Ì€à  àà ø àà ` ànewspapers€versus€20th€C.€newspapers).€€That€requires€Ð ˜X Ðà  àà ø àà ` àthe€outputs€of€many€individual€texts€be€aggregated.€€YouÌà  àà ø àà ` àmay€combine€these€x.321€files€with€others€to€form€largeÌà  àà ø àà ` àdata€matrices€on€any€spreadsheet€software.ÌÌ€€€€€€€€€€€€€€In€Corel€Quattro„Pro€8,€this€is€done€as€in€this€example:ÌÌà  àà ø àà ` à€€€Type:€ò òcopy€*.321€c:\newspapr\newspapr.wb3ó óÐ Ð  ÐÌ€€€€€€€€€€€€€€€€€€€€€All€x.321€files€produced€by€MIGNON€will€beÌà  àà ø àà ` à€€€€€€€joined€into€a€single€file€known€as€newspapr.wb3€onÌ€€€€€€€€€€€€€€€€€€€€€the€directory€Newspapr€on€the€c:\€driveÌÌ€€€€€€€€€€€€€€€€€From€Quattro€Pro€8€or€Excel€or€Lotus€1„2„3,€one€canÌ€€€€€€€€€€€€€€€€€retrieve€this€newspapr.wb3€file€and€enter€it€into€aÌ€à  àà ø àà ` à€€€spreadsheet.€€From€there,€one€can€then€import€the€Ð P Ðà  àà ø àà ` à€€€spreadsheet€file€into€a€statpak€(such€as€Minitab)Ìà  à€€€€€€€€€€and€carry€out€more€elaborate€analyses€on€the€full€set€ofÌà  àà ø àà ` à€€€newspapers.Ìà  àà ø àà ` àà ¸ à€€€€€€€€€€€€€€€€€€€€€€€€€Ì€€€€€€€€€€€€€€ò òIMPORTANT:ó ó€the€MIGNON€program€will€perform€its€analysisÐ 8ø Ѐ€€€€€€€€€€€€€€€€of€x.123€files€singly€or€can€carry€out€its€analysisÌà  àà ø àà ` à€€€and€produce€the€x.321€and€x.LEX€output€files€forÌà  àà ø àà ` à€€€òòallóó€*.123€files€on€a€directory,€sequentially.€This€savesÌà  àà ø àà ` à€€€a€lot€of€keystrokes€and€time€when€there€are€many€QLEXÌ€à  àà ø àà ` à€€€analyses.Ð  à ÐÌò òÌà  àó óB.ò ò€ó óòòVariable€identity€in€Ô_ÔMIGNON's€x.LEXÔ_Ô€output€fileóó.Ð x8 ЀÌà  àà ø àà ` àà ¸ àvariable€#10:Ìà  àà ø àà ` àà ¸ àà  àò òLEX2ó ó€(the€difference€in€two€Areas€under€theÐ Ð " Ðà  àà ø àà ` àà ¸ àà  àcumulative€proportion€curves:€the€empiricalÌà  àà ø àà ` àà ¸ àà  àand€the€theoretical€(the€Carroll,€et€al,Ìà  àà ø àà ` àà ¸ àà  àand€newspaper€linear€distributions)€for€thoseÌà  àà ø àà ` àà ¸ àà  àwords€ranked€between€ò ò36ó ó€and€ò ò10,000ó ó).Ð ð#°& ÐÌà  àà ø àà ` àà ¸ àvariable€#11:€òòÌóóà  àà ø àà ` àà ¸ àà  àòòNóó„„the€number€of€tokens€found€in€this€textÌà  àà ø àÌà  àà ø àà ` àà ¸ àvariable€#15:€Ìà  àà ø àà ` àà ¸ àà  àthe€text's€òòmeanóó€open„class€type,€expressed€byÌà  àà ø àà ` àà ¸ àà  àits€U€value€(i.e.€its€freq.€per€million€inÌà  àà ø àà ` àà ¸ àà  àCarroll,€et€al's€Reference€Lexicon).Ìà  àà ø àÌà  àà ø àà ` àà ¸ àvariable€#67:Ìà  àà ø àà ` àà ¸ àà  àthe€òò%óó€of€all€a€text's€tokens€which€appear€amongÌà  àà ø àà ` àà ¸ àà  àthe€10,000€most€common€types€of€the€Carroll,Ìà  àà ø àà ` àà ¸ àà  àet€al.€Reference€LexiconÐ .Ø'3 Їà  àà ø àà ` àà ¸ àvariable€#72:Ìà  àà ø àà ` àà ¸ àà  àthe€text's€òòmedianóó€(Q2)€open€class€word,€expressedÌà  àà ø àà ` àà ¸ àà  àby€its€U€value€(freq./million).ÌÌà  àà ø àà ` àà ¸ àvariable€#88:Ìà  àà ø àà ` àà ¸ àà  àthe€size€of€the€Area€beneath€the€cumulativeÌà  àà ø àà ` àà ¸ àà  àproportion€distribution€for€the€text'sÌà  àà ø àà ` àà ¸ àà  àò òCLOSED€CLASSó ó€types€(types€ranked€ò ò1ó ó€through€ò ò75ó ó).Ð ¸ x ÐÌà  àà ø àà ` àà ¸ àvariable€#98:Ìà  àà ø àà ` àà ¸ àà  àthe€size€of€the€Area€beneath€the€cumulativeÌà  àà ø àà ` àà ¸ àà  àproportion€distribution€for€ò òCLOSED€CLAS