VMP Home

How to Generate VMP2.2s
(using a revised formula for generating Vocabulary-Management Profiles)

The formula for computing VMP1 counts new vocabulary as = 1.0 and repeated vocabulary as = 0.0, then computes a ratio of new vocabulary (types)/tokens for moving intervals of 35 words, 55 words, or something similar. However, as long texts unfold, repetitions increase and new vocabulary becomes rarer. Consequently, VMP1s may bottom out at zero for long stretches, yielding no useful signals. VMP2.1 solves this problem by computing a ratio > 0.0 for repeated words based on how recently the word occurred in the text. The VMP2.1 formula for repeated words is (Number of Current Word - Number of Previous Occurrence - 1)/(Total tokens in the Text - 1). Like VMP1, VMP2.1 starts out high and drops off quickly at the beginnings of texts. In some respects, this mirrors the first reading of a text: everything seems new at first, then more familiar as the text unfolds. However, unlike VMP1, VMP2.1 never bottoms out at zero even for long texts; instead, it continues to give useful signals throughout the text.

VMP2.2 uses the same formula for computing ratios as VMP2.1, except that VMP2.2 calculates the ratios for the second pass through the text rather than the first pass. For VMP2.1, the first occurrence of any word in a text is assigned a maximum ratio of 1.0 (which is averaged with other ratios over the moving interval). Thus, even common words such as "the", "of", "and", "a" are assigned maximum ratios of 1.0 at the beginnings of texts. By contrast, VMP2.2 computes ratios wrap-around style, for the second pass through a text. Hence, the first occurrence of a word such as "the" (near the beginning of a text) occurs shortly after its last occurrence (near the end of the text); hence, its ratio is nearer to 0.0 than to 1.0. The same is true for all other repeated words; their first occurrences are assigned ratios greater than 0.0 and less than 1.0. Words that appear only once in the text are assigned ratios = 1.0. Unlike VMP2.1, VMP2.2 shows no rapid downtrend at the beginning of a text. VMP2.2s mirror our second readings of texts, when the beginnings are as familiar to us as the ends. Because we normally associate rhetorical structure with second (and subsequent) readings rather than first readings, VMP2.2 is the default program selected for this web site.

How to Generate VMP2.2s

1. From the "Analysis Method" drop-down menu, select "VMP2.2 (2nd pass thru text)."
2. Select an odd-numbered interval greater than 1 and less than the length of your text.


Note: This number will be the interval used to compute a moving average of the number of new types/tokens over that interval. Choose shorter intervals if your want your VMPs to be sensitive to short-term fluctuations in new vocabulary. Choose longer intervals if you prefer smoother VMPs, tracking longer-term trends. The default is 35, which is a rather short interval suitable for tracking short-term changes. For example, Youmans 1991 found a strong correlation between paragraph boundaries and valleys on VMP1s that were constructed with 35-word moving intervals. Shorter intervals can be used to highlight variations within shorter constituents of discourse, for example sentences rather than paragraphs. Longer intervals can be used to highlight longer constituents. For example, Youmans 1994 used a 55-word interval to investigate the correlation between VMP1 and the boundaries between numbered sections in two short stories by William Faulkner.


3. If you don't want to see a plot of your type-token curves displayed on your screen, then uncheck the box to the right of the prompt "Include graph with your output?"

Note: This online program can generate VMP statistics and their accompanying graphs for novella-length texts and shorter. An error message may result for longer texts unless the graphing option is clicked off. If no graph is requested, the program can generate VMP statistics for very long texts. These statistics can be converted into graphical form by any standard spreadsheet/plotting program such as Microsoft Excel or Corel Quattro Pro.

4. Select the text file you want to analyze. (For instructions: see How to Upload Your Text File.)
5. Click the button labeled "Upload & process".
6. If all goes well, progress messages similar to the following should appear:

Uploading file C:\My Documents\MyTextFile.txt for analysis...
File upload complete. Processing file . . .
Processing of file MyTextFile.txt complete...

7. If you have chosen to include a graph with your output, then a plot of VMP2.1 should appear on your screen: the ratio types/tokens (y-axis) vs. tokens (x-axis). If you receive an error message instead, then your text may be too long for this online program to generate a graph. Try running the program again with the graphical option turned off.
8. If you wish to view your VMP2.2 statistics on your screen, click the button labeled "Download Output".
9. If you wish to save your VMP2.2 statistics, follow the instructions described in How to Download Your Data.



VMP2.2 Statistics: William Faulkner's short story "A Rose for Emily" (for a 55-word moving interval)

VMP2.2: "Corrected Rose.text 1" Interval: 55 Types=1071 Tokens=3693 Types/Tokens=0.2900 AvgR = the ratio of Types/Tokens over the moving interval.
Average of the avgR = 0.28995
Standard Deviation = 0.06650

Midpoint AvgR Last Word in Interval
1, 0.30038, fallen, 28
2, 0.31574, monument, 29
3, 0.31511, the, 30
4, 0.32049, women, 31
5, 0.33845, mostly, 32
6, 0.34081, out, 33
7, 0.34012, of, 34
8, 0.34298, cuiosity, 35
9, 0.32489, to, 36
10, 0.32809, see, 37
11, 0.31251, the, 38
12, 0.33068, inside, 39
13, 0.31592, of, 40
14, 0.31590, her, 41
15, 0.30356, house, 42


Types = the total vocabulary (graphically distinct words) used in the story
Tokens = the total number of words in the story
Types/Tokens = the ratio of types divided by tokens
Average of the avgR = the mean of all the moving interval averages in the text
Standard deviation = the standard deviation of the average ratio of types/tokens over the moving interval

VMP2.2 plots the average ratios (between 0.0 and 1.0) for each word in a preset moving interval. Unlike VMP2.1, VMP2.2 computes these ratios wrap-around style, as though the text were concatenated with itself. Hence, the first ratio 0.30038, plotted at token #1 is computed for a 55-word interval that begins with the 27th word from the end of the story and extends through the 28th word of the beginning of the story. The next ratio is computed for the 26th word from the end through the 29th word of the beginning. This procedure is repeated throughout the text, generating a moving average of ratios, much like a moving average of stock market prices. Note that the last word in the interval has a crucial effect upon the average ratios.

The ratio for this text is 0.30038 for the 55-word interval ending with the 28th word "fallen", which occurs 3 times in the story. The ratio increases to 0.31574 for the interval ending with the next word "monument", which occurs only once in the story. Then the ratio declines to 0.31511 for the interval ending with the next word "the", which occurs 257 times in the story. Hence, upturns in VMP2.2s occur only when less-recently-used words are added to the end of the interval, and downturns occur when more-recently-used words are added to the end of the interval. This is why the program lists the last words in the interval. Less-recently-used vocabulary at the ends of moving intervals tends to correlate with a change in topics, whereas more-recently-used vocabulary tends to correlate with a continuation of the same topic. Hence, VMP2.2s are surprisingly sensitive indicators of the ebb and flow of new topics in discourse.

Note that the program changes upper case ("Emily") to lower case ("emily"). This can result in occasional processing errors; for example, "Will" (proper name), "will" (auxiliary verb), and "will" (noun: 'volition') are all treated as being identical graphical words. If you want to distinguish among homographs such as these, you will have to recode your text, for example, by replacing "Will/will" with "will1/will2" or something similar. (For further discussion, see How to Prepare Your Text File for Analysis.)

If single words such as "emily" are broken into two parts, "e" and "mly", this probably means that the text file was stored with "soft returns" rather than "hard returns" for line breaks. Cure: store your text file with "hard returns" to indicate line breaks.


Youmans, Gilbert. 1991. "A New Tool for Discourse Analysis: the Vocabulary-Management Profile." Language 67.4, 763-789.

Youmans, Gilbert. 1994. "The Vocabulary-Management-Profile: Two Stories by William Faulkner." Empirical Studies of the Arts 12.2, 113-130.

Youmans, Gilbert. 2001. Manuscript. "The Hierarchical Structure of Discourse: A New, Improved Vocabulary-Management Profile."


These documents require Adobe Acrobat Reader. If you don't have Acrobat Reader, you can download it for free.