public final class Scoring
extends java.lang.Object
| Modifier and Type | Method and Description |
|---|---|
static double |
and(double o,
double n)
Combines two scoring values.
|
static double |
intersect(double w1,
double w2)
Returns the scoring value for a phrase.
|
static double |
let(double s,
int c)
Returns a score for the let clause.
|
static double |
not(double d)
Inverses the scoring value for FTNot.
|
static double |
or(double o,
double n)
Combines two scoring values.
|
static double |
step(double sc)
Returns a score for a single step.
|
static double |
textNode(int npv,
int is,
int tokl,
int tl)
Calculates the score for a text node.
|
static int |
tfIDF(double freq,
double mfreq,
double docs,
double tokens)
Calculates a TF-IDF value for the specified values.
|
static double |
union(double w1,
double w2)
Returns the union value.
|
static double |
word(int tl,
double l)
Calculates a score value, based on the token length
and complete text length.
|
public static double word(int tl,
double l)
tl - token lengthl - complete lengthpublic static double and(double o,
double n)
o - old valuen - new valuepublic static double or(double o,
double n)
o - old valuen - new valuepublic static double not(double d)
d - scoring valuepublic static double let(double s,
int c)
s - summed up scoring valuesc - number of valuespublic static int tfIDF(double freq,
double mfreq,
double docs,
double tokens)
Calculates a TF-IDF value for the specified values. Used definition:
freq(i, j) / max(l, freq(l, j)) * log(1 + N / n(i))
The result is multiplied with the MP constant to yield
integer values. The value 2 is used as minimum score,
as the total minimum value will be subtracted by 1 to avoid eventual
0 scores.
freq - frequency of the token. TF: freq(i, j)mfreq - maximum occurrence of a token. TF: max(l, freq(l, j))docs - number of documents in the collection. IDF: Ntokens - number of documents containing the token. IDF: n(i)public static double textNode(int npv,
int is,
int tokl,
int tl)
npv - number of pos valuesis - total number of index entriestokl - token lengthtl - text lengthpublic static double intersect(double w1,
double w2)
w1 - score of word1w2 - score of word2public static double union(double w1,
double w2)
w1 - score of word1w2 - score of word2public static double step(double sc)
sc - current score value