[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [interesting-people Home]
Subject: IP: Frequency of top 1,000 USENET words
From: Mike Radow <mradow@inx.inx.net>
-
It is hoped that this will be useful to others...
In building "word-to-token" compressed files of technical text, we've had
good experience with this file.
We've used this for several years and the distribution is a good fit for
the distribution of our text.
Unlike other "general text" frequencies, this list was generated from
USENET traffic.
My sincere thanks to Lee Maixner, for locating this URL...:
Linkname: top1000.use
URL:
http://wiretap.spies.com/Gopher/Library/Article/Language/top1000.use
/\/\...snipped...
Date: Tue, 19 Jan 1993 20:43:44 GMT
Subject: Re: Top 1000 English words ...
Top 1000 English words
Culled from one year of USENET traffic, here is my list of the top 1000
words, along with percentage of occurence: (this is from a database of
343945617 total scanned words).
--
Rick Walker
4.01838 the
2.43805 to
2.05957 of
1.95582 a
1.70176 I
1.68549 and
1.32531 is
1.23345 in
1.14749 that
0.811128 it
..
0.0109892 science
0.0109852 interface
0.010977 Americans
0.0109578 action
0.0109552 entire
0.0109494 below
0.0109288 Has
\/\/
Mike
-
Mike Radow <---> mradow@inx.net
[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [interesting-people Home]
Powered by eList eXpress LLC