-
Notifications
You must be signed in to change notification settings - Fork 2
Searching for articles
In order to search for articles in Event Registry, we provide two classes - QueryArticles
and QueryArticlesIter
. Both classes can be used to find articles using a set of various types of search conditions.
The class QueryArticlesIter
is meant to obtain an iterator, that makes it easy to iterate over all articles that match the search conditions. Alternatively, the QueryArticles
class can be used to obtain a broader range of information about the matching articles in various forms. In case of QueryArticles
, the results can be not only the list of articles but also a time distribution when articles were published, distribution of top news sources that wrote the matching articles, top concepts mentioned in the articles, etc.
The returned information about articles follows the Article data model.
Example of usage
Before describing the class, here is a simple full example that prints the list of all articles that mention George Clooney:
import {EventRegistry, QueryArticlesIter} from "eventregistry";
const er = new EventRegistry({apiKey: "YOUR_API_KEY");
er.getConceptUri("George Clooney").then((conceptUri) => {
const q = new QueryArticlesIter({conceptUri: conceptUri);
q.execQuery((article) => {
console.info(article);
});
});
Constructor
QueryArticlesIter
is a derived class from QueryArticles
. Its constructor can accept the following arguments:
new QueryArticlesIter(er, {
sortBy = "rel",
sortByAsc = false,
returnInfo = new ReturnInfo(),
maxItems = -1
keywords = undefined,
conceptUri = undefined,
categoryUri = undefined,
sourceUri = undefined,
sourceLocationUri = undefined,
sourceGroupUri = undefined,
authorUri = undefined,
locationUri = undefined,
lang = undefined,
dateStart = undefined,
dateEnd = undefined,
dateMentionStart = undefined,
dateMentionEnd = undefined,
keywordsLoc = "body",
ignoreKeywords = undefined,
ignoreConceptUri = undefined,
ignoreCategoryUri = undefined,
ignoreSourceUri = undefined,
ignoreSourceLocationUri = undefined,
ignoreSourceGroupUri = undefined,
ignoreAuthorUri = undefined,
ignoreLocationUri = undefined,
ignoreLang = undefined,
ignoreKeywordsLoc = "body",
isDuplicateFilter = "keepAll",
hasDuplicateFilter = "keepAll",
eventFilter = "keepAll",
startSourceRankPercentile = 0,
endSourceRankPercentile = 100,
dataType = "news"
} = {});
The parameters for which you don't specify a value will be ignored. In order for the query to be valid (=it can be executed by Event Registry), it has to have at least one positive condition (conditions that start with ignore*
do not count as positive conditions). The meaning of the arguments is the following
-
er
: instance of theEventRegistry
class -
sortBy
: sets the order in which the resulting articles are sorted, before returning. Options:id
(internal id),date
(publishing date),cosSim
(closeness to the centroid of the associated event),rel
(relevance to the query),sourceImportance
(manually curated score of source importance - high value, high importance),sourceImportanceRank
(reverse ofsourceImportance
),sourceAlexaGlobalRank
(global rank of the news source),sourceAlexaCountryRank
(country rank of the news source),socialScore
(total shares on social media),facebookShares
(shares on Facebook only). -
sortByAsc
: should the results be sorted in ascending order -
returnInfo
: sets the properties of various types of data that is returned (events, concepts, categories, ...). See details. -
maxItems
: max number of articlesto return by the iterator. Use default (-1) to simply return all the articles. -
keywords
: find articles that mention the specified keywords. A single keyword/phrase can be provided as a string, multiple keywords/phrases can be provided as a list of strings. UseQueryItems.AND()
if all provided keywords/phrases should be mentioned, orQueryItems.OR()
if any of the keywords/phrases should be mentioned. -
conceptUri
: find articles where the concept with concept URI is mentioned. A single concept uri can be provided as a string, multiple concept uris can be provided as a list of strings. UseQueryItems.AND()
if all provided concepts should be mentioned, orQueryItems.OR()
if any of the concepts should be mentioned. To obtain a concept URI based on a (partial) concept label useEventRegistry.getConceptUri()
. -
categoryUri
: find articles that are assigned into a particular category. A single category URI can be provided as a string, multiple category URIs can be provided as a list of strings. UseQueryItems.AND()
if all provided categories should be mentioned, orQueryItems.OR()
if any of the categories should be mentioned. A category URI can be obtained based on a (partial) category name usingEventRegistry.getCategoryUri()
. -
sourceUri
: find articles that were written by a news source sourceUri. If multiple sources should be considered useQueryItems.OR()
to provide the list of sources. Source URI for a given (partial) news source name can be obtained usingEventRegistry.getNewsSourceUri()
. -
sourceLocationUri
: find articles that were written by news sources located in the given geographic location. If multiple source locations are provided, then put them into a list insideQueryItems.OR()
. Location URI can either be a city or a country. Location URI for a given (partial) name can be obtained usingEventRegistry.getLocationUri()
. -
sourceGroupUri
: find events that were written by news sources that are assigned to the specified source group(s). If multiple source groups are provided, then put them into a list insideQueryItems.OR()
. Source group URI for a given name can be obtained usingEventRegistry.getSourceGroupUri()
. -
authorUri
: find articles that were written by a specific author. If multiple authors should be considered useQueryItems.AND()
if provided authors should be joint authors of the same articles, orQueryItems.OR()
if you want to find articles that were written by any of the provided authors. To obtain the author URI based on a (partial) author name and potentially source domain name useEventRegistry.getAuthorUri()
. -
locationUri
: find articles that describe an event that occured at a particular location. An article will be associated with that location, if it's mentioned in the dateline. Location URI can either be a city or a country. If multiple locations are provided, resulting events have to match any of the locations. Location URI for a given name can be obtained usingEventRegistry.getLocationUri()
. -
lang
: find articles that are written in the specified language. If more than one language is specified, resulting articles has to be written in any of the languages. Specify value as string or list inQueryItems.OR()
. See supported languages for the list of language codes to use. -
dateStart
: find articles that were written on or after dateStart. Date should be provided in YYYY-MM-DD format,datetime.time
ordatetime.datetime
. -
dateEnd
: find articles that occured before or on dateEnd. Date should be provided in YYYY-MM-DD format,datetime.time
ordatetime.datetime
. -
dateMentionStart
: find articles that explicitly mention a date that is equal or greater thandateMentionStart
. -
dateMentionEnd
: find articles that explicitly mention a date that is lower or equal todateMentionEnd
. -
keywordsLoc
: where should we look when searching using the keywords provided bykeywords
parameter. "body" (default), "title", or "body,title" -
ignoreKeywords
: ignore articles that mention the provided keywords. Specify value as string or list inQueryItems.OR()
orQueryItems.AND()
. -
ignoreConceptUri
: ignore articles that mention the provided concepts. Specify value as string or list inQueryItems.OR()
orQueryItems.AND()
. -
ignoreCategoryUri
: ignore articles that are about the provided set of categories. Specify value as string or list inQueryItems.OR()
orQueryItems.AND()
. -
ignoreSourceUri
: ignore articles which have been written by the specified list of news sources. Specify value as string or list inQueryItems.OR()
. -
ignoreSourceLocationUri
: ignore articles which have been written by news sources located at the specified geographic location(s). Specify value as string or list inQueryItems.OR()
. -
ignoreSourceGroupUri
: ignore articles which have been written by the news sources assigned to the specified source groups. Specify value as string or list inQueryItems.OR()
. -
ignoreAuthorUri
: ignore articles that were written by one or more provided authors. Specify value as string or list inQueryItems.OR()
orQueryItems.AND()
. -
ignoreLocationUri
: ignore articles that occured in any of the provided locations. A location can be a city or a place. Specify value as string or list inQueryItems.OR()
. -
ignoreLang
: ignore articles that are written in any of the provided languages. See supported languages for the list of language codes to use. -
ignoreKeywordsLoc
: where should we look when data should be used when searching using the keywords provided byignoreKeywords
parameter. "body" (default), "title", or "body,title" -
isDuplicateFilter
: some articles can be duplicates of other articles. What should be done with them. Possible values are: "skipDuplicates" (skip the resulting articles that are duplicates of other articles); "keepOnlyDuplicates" (return only the duplicate articles); "keepAll" (no filtering, default). -
hasDuplicateFilter
: some articles are later copied by others. What should be done with such articles. Possible values are: "skipHasDuplicates" (skip the resulting articles that have been later copied by others); "keepOnlyHasDuplicates" (return only the articles that have been later copied by others); "keepAll" (no filtering, default). -
eventFilter
: some articles describe a known event and some don't. This filter allows you to filter the resulting articles based on this criteria. Possible values are: "skipArticlesWithoutEvent" (skip articles that are not describing any known event in ER); "keepOnlyArticlesWithoutEvent" (return only the articles that are not describing any known event in ER). "keepAll" (no filtering, default). -
startSourceRankPercentile
andendSourceRankPercentile
: The parameters can be used to filter the returned articles to include only those that are from news sources that are of a certain ranking. Sources are ranked according to the global Alexa site ranking. By settingstartSourceRankPercentile
to 0 andendSourceRankPercentile
to 20 would, for example, return only articles from top ranked news sources that would amount to about approximately 20% of all matching content. Note: 20 percentiles do not represent 20% of all top sources. The value is used to identify the subset of news sources that generate approximately 20% of our collected news content. -
dataType
: what data types should we search? "news" (news content, default), "pr" (press releases), or "blog". If you want to use multiple data types, put them in an array (e.g. ["news", "pr"]). When two or more parameters are specified in the constructor, the results will be computed in a way so that all conditions will be met. For example, if you specify QueryArticlesIter({keywords = "Barack Obama", conceptUri = "http://en.wikipedia.org/wiki/White_House"})
then the resulting articles will mention phraseBarack Obama
and will be annotated with conceptWhite House
.
Methods
The class QueryArticlesIter has one main method called execQuery(). Which has the following format
execQuery(callback, doneCallback);
First argument is a callback function which repeats itself when it reaches a new page or when maxItems
is reached, by default it's set to -1 so all articles are fetched. So when package acquires new set of articles in the background the function is called again with new data. Then inside this function you can either analyze the data or iterate through it. To clarify there is no need to call execQuery
function multiple times.
Class also has implemented Async Iterator so alternatively you can use the class in the following way:
(async () => {
const er = new EventRegistry();
const articles = new QueryArticlesIter(er, {/** Apply any conditions based on the aforementioned documentation */});
for await (const article of articles) {
console.info(article);
}
})();
Constructor
new QueryArticles({
keywords = undefined,
conceptUri = undefined,
categoryUri = undefined,
sourceUri = undefined,
sourceLocationUri = undefined,
sourceGroupUri = undefined,
authorUri = undefined,
locationUri = undefined,
lang = undefined,
dateStart = undefined,
dateEnd = undefined,
dateMentionStart = undefined,
dateMentionEnd = undefined,
keywordsLoc = "body",
ignoreKeywords = undefined,
ignoreConceptUri = undefined,
ignoreCategoryUri = undefined,
ignoreSourceUri = undefined,
ignoreSourceLocationUri = undefined,
ignoreSourceGroupUri = undefined,
ignoreAuthorUri = undefined,
ignoreLocationUri = undefined,
ignoreLang = undefined,
ignoreKeywordsLoc = "body",
isDuplicateFilter = "keepAll",
hasDuplicateFilter = "keepAll",
eventFilter = "keepAll",
startSourceRankPercentile = 0,
endSourceRankPercentile = 100,
dataType = "news"
requestedResult = new RequestArticlesInfo(),
} = {});
The parameters for which you don't specify a value will be ignored. In order for the query to be valid (=it can be executed by Event Registry), it has to have at least one positive condition (conditions that start with ignore*
do not count as positive conditions). The meaning of the arguments is the following
-
keywords
: find articles that mention the specified keywords. A single keyword/phrase can be provided as a string, multiple keywords/phrases can be provided as a list of strings. UseQueryItems.AND()
if all provided keywords/phrases should be mentioned, orQueryItems.OR()
if any of the keywords/phrases should be mentioned. -
conceptUri
: find articles where the concept with concept URI is mentioned. A single concept uri can be provided as a string, multiple concept uris can be provided as a list of strings. UseQueryItems.AND()
if all provided concepts should be mentioned, orQueryItems.OR()
if any of the concepts should be mentioned. To obtain a concept URI based on a (partial) concept label useEventRegistry.getConceptUri()
. -
categoryUri
: find articles that are assigned into a particular category. A single category URI can be provided as a string, multiple category URIs can be provided as a list of strings. UseQueryItems.AND()
if all provided categories should be mentioned, orQueryItems.OR()
if any of the categories should be mentioned. A category URI can be obtained based on a (partial) category name usingEventRegistry.getCategoryUri()
. -
sourceUri
: find articles that were written by a news source sourceUri. If multiple sources should be considered useQueryItems.OR()
to provide the list of sources. Source URI for a given (partial) news source name can be obtained usingEventRegistry.getNewsSourceUri()
. -
sourceLocationUri
: find articles that were written by news sources located in the given geographic location. If multiple source locations are provided, then put them into a list insideQueryItems.OR()
. Location URI can either be a city or a country. Location URI for a given (partial) name can be obtained usingEventRegistry.getLocationUri()
. -
sourceGroupUri
: find events that were written by news sources that are assigned to the specified source group(s). If multiple source groups are provided, then put them into a list insideQueryItems.OR()
. Source group URI for a given name can be obtained usingEventRegistry.getSourceGroupUri()
. -
authorUri
: find articles that were written by a specific author. If multiple authors should be considered useQueryItems.AND()
if provided authors should be joint authors of the same articles, orQueryItems.OR()
if you want to find articles that were written by any of the provided authors. To obtain the author URI based on a (partial) author name and potentially source domain name useEventRegistry.getAuthorUri()
. -
locationUri
: find articles that describe an event that occured at a particular location. An article will be associated with that location, if it's mentioned in the dateline. Location URI can either be a city or a country. If multiple locations are provided, resulting events have to match any of the locations. Location URI for a given name can be obtained usingEventRegistry.getLocationUri()
. -
lang
: find articles that are written in the specified language. If more than one language is specified, resulting articles has to be written in any of the languages. Specify value as string or list inQueryItems.OR()
. See supported languages for the list of language codes to use. -
dateStart
: find articles that were written on or after dateStart. Date should be provided in YYYY-MM-DD format,datetime.time
ordatetime.datetime
. -
dateEnd
: find articles that occured before or on dateEnd. Date should be provided in YYYY-MM-DD format,datetime.time
ordatetime.datetime
. -
dateMentionStart
: find articles that explicitly mention a date that is equal or greater thandateMentionStart
. -
dateMentionEnd
: find articles that explicitly mention a date that is lower or equal todateMentionEnd
. -
keywordsLoc
: where should we look when searching using the keywords provided bykeywords
parameter. "body" (default), "title", or "body,title" -
ignoreKeywords
: ignore articles that mention the provided keywords. Specify value as string or list inQueryItems.OR()
orQueryItems.AND()
. -
ignoreConceptUri
: ignore articles that mention the provided concepts. Specify value as string or list inQueryItems.OR()
orQueryItems.AND()
. -
ignoreCategoryUri
: ignore articles that are about the provided set of categories. Specify value as string or list inQueryItems.OR()
orQueryItems.AND()
. -
ignoreSourceUri
: ignore articles which have been written by the specified list of news sources. Specify value as string or list inQueryItems.OR()
. -
ignoreSourceLocationUri
: ignore articles which have been written by news sources located at the specified geographic location(s). Specify value as string or list inQueryItems.OR()
. -
ignoreSourceGroupUri
: ignore articles which have been written by the news sources assigned to the specified source groups. Specify value as string or list inQueryItems.OR()
. -
ignoreAuthorUri
: ignore articles that were written by one or more provided authors. Specify value as string or list inQueryItems.OR()
orQueryItems.AND()
. -
ignoreLocationUri
: ignore articles that occured in any of the provided locations. A location can be a city or a place. Specify value as string or list inQueryItems.OR()
. -
ignoreLang
: ignore articles that are written in any of the provided languages. See supported languages for the list of language codes to use. -
ignoreKeywordsLoc
: where should we look when data should be used when searching using the keywords provided byignoreKeywords
parameter. "body" (default), "title", or "body,title" -
isDuplicateFilter
: some articles can be duplicates of other articles. What should be done with them. Possible values are: "skipDuplicates" (skip the resulting articles that are duplicates of other articles); "keepOnlyDuplicates" (return only the duplicate articles); "keepAll" (no filtering, default). -
hasDuplicateFilter
: some articles are later copied by others. What should be done with such articles. Possible values are: "skipHasDuplicates" (skip the resulting articles that have been later copied by others); "keepOnlyHasDuplicates" (return only the articles that have been later copied by others); "keepAll" (no filtering, default). -
eventFilter
: some articles describe a known event and some don't. This filter allows you to filter the resulting articles based on this criteria. Possible values are: "skipArticlesWithoutEvent" (skip articles that are not describing any known event in ER); "keepOnlyArticlesWithoutEvent" (return only the articles that are not describing any known event in ER). "keepAll" (no filtering, default). -
startSourceRankPercentile
andendSourceRankPercentile
: The parameters can be used to filter the returned articles to include only those that are from news sources that are of a certain ranking. Sources are ranked according to the global Alexa site ranking. By settingstartSourceRankPercentile
to 0 andendSourceRankPercentile
to 20 would, for example, return only articles from top ranked news sources that would amount to about approximately 20% of all matching content. Note: 20 percentiles do not represent 20% of all top sources. The value is used to identify the subset of news sources that generate approximately 20% of our collected news content. -
dataType
: what data types should we search? "news" (news content, default), "pr" (press releases), or "blog". If you want to use multiple data types, put them in an array (e.g. ["news", "pr"]). -
requestedResult
: the information that should be returned as the result of the query. IfNone
then by default we setRequestEventsInfo()
.
Example of usage
Let's look at an example of it's usage:
import {EventRegistry, QueryArticles} from
const er = new EventRegistry({apiKey: "YOUR_API_KEY"});
er.getConceptUri("Apple")).then((appleUri) => {
const q = new QueryArticles({
dateStart: "2014-04-16", // set the date limit of interest
dateEnd: "2014-04-28",
conceptUri: appleUri, // find articles mentioning the company Apple
});
const articleInfo = new ArticleInfoFlags({concepts: true, categories: true, image: true});
const returnInfo = new ReturnInfo({articleInfo: articleInfo});
const requestArticlesInfo = new RequestArticlesInfo({page: 1, count: 30, returnInfo: returnInfo});
q.setRequestedResult(requestArticlesInfo);
return er.execQuery(q);
});
The returned information about articles follows the Article data model.
Creating QueryArticles
using static methods
The QueryArticles
class can also be initialized in two other ways:
QueryArticles.initWithEventUriList()
is a static method that can be used to specify the set of article URIs that you want to use as the result. In this case, no query conditions are used and this set is used as the resulting set. All the return information about the articles will be based on this set of articles.
QueryArticles.initWithComplexQuery()
is another static method that can be used to create a complex query based on the advanced query language. You can call the method by providing an instance of ComplexArticleQuery
class. Alternatively, you can also call the method with an object or a string containing the JSON object matching the language (see the examples).
When executing the query, there will be a set of articles that will match the specified criteria. What information about these articles is to be returned however still needs to be determined. Do you want to get the details about these articles? Are you interested in the top concepts mentioned in them? Maybe news sources?
The information to be returned about the matching articles is set by calling the setRequestedResult()
method. The setRequestedResult()
accepts as an argument an instance that has a base class RequestArticles
. By calling setRequestedResult()
method multiple times on a QueryArticles
instance you can retrieve multiple results with a single query. Free users are only allowed one requested result per call and should instead use the setRequestedResult()
method. Below are the classes that can be specified in the addRequestedResult()
and setRequestedResult()
calls:
RequestArticlesInfo
new RequestArticlesInfo({
page = 1,
count = 20,
sortBy = "date",
sortByAsc = false,
returnInfo = new ReturnInfo()
} = {});
RequestArticlesInfo
class provides detailed information about the resulting articles.
-
page
: determines the page of the results to return (starting from 1) -
count
: determines the number of articles to return. Max articles that can be returned per call is 100. -
sortBy
: sets the order in which the resulting articles are sorted, before returning. Options:id
(internal id),date
(publishing date),cosSim
(closeness to the centroid of the associated event),rel
(relevance to the query),sourceImportance
(manually curated score of source importance - high value, high importance),sourceImportanceRank
(reverse ofsourceImportance
),sourceAlexaGlobalRank
(global rank of the news source),sourceAlexaCountryRank
(country rank of the news source),socialScore
(total shares on social media),facebookShares
(shares on Facebook only). -
sortByAsc
: should the results be sorted in ascending order -
returnInfo
: sets the properties of various types of data that is returned (concepts, categories, news sources, ...). See details.
RequestArticlesUriList
RequestArticlesUriList
returns a simple list of article URIs that match criteria. Useful if you wish to obtain the full list in a single query.
RequestArticlesTimeAggr
RequestArticlesTimeAggr
return information how the distribution of the resulting articles per time.
RequestArticlesConceptAggr
RequestArticlesConceptAggr
returns a list of top concepts that are mentioned the most in the resulting articles
RequestArticlesSourceAggr
RequestArticlesSourceAggr
provides a list of top news sources that have written the most articles in the results
RequestArticlesCategoryAggr
RequestArticlesCategoryAggr
returns information about what categories are the resulting articles about.
RequestArticlesKeywordAggr
new RequestArticlesKeywordAggr({ lang = "eng", articlesSampleSize = 500} = {});
RequestArticlesKeywordAggr
returns the keywords that summarize the best the resulting articles.
-
lang
: determines the language for which to compute the keywords. Articles in other languages will be ignored -
articlesSampleSize
: the sample size of articles on which to compute the keywords.
RequestArticlesConceptGraph
new RequestArticlesConceptGraph({
conceptCount = 25,
linkCount = 50,
sampleSize = 500,
returnInfo = new ReturnInfo(),
} = {});
RequestArticlesConceptGraph
returns a graph of concepts. Concepts are connected if they frequently occur in the same articles.
-
conceptCount
: number of top concepts (nodes) to return -
linkCount
: number of edges in the graph -
sampleSize
: on what sample of articles should the graph be computed -
returnInfo
: the details about the types of return data to include. See details.
RequestArticlesConceptMatrix
new RequestArticlesConceptMatrix({
conceptCount = 25,
measure = "pmi",
sampleSize = 500,
returnInfo = new ReturnInfo(),
} = {});
RequestArticlesConceptMatrix
computes a matrix of concepts and their dependencies. For individual concept pairs, it returns how frequently they co-occur in the resulting articles and how "surprising" this is, based on the frequency of individual concepts.
-
conceptCount
: the number of concepts on which to compute the matrix -
measure
: the measure to be used for computing the "surprise factor". Options:pmi
(pointwise mutual information),pairTfIdf
(pair frequency * IDF of individual concepts),chiSquare
. -
sampleSize
: on what sample of articles should the matrix be computed -
returnInfo
: the details about the types of return data to include. See details.
RequestArticlesConceptTrends
new RequestArticlesConceptTrends({conceptCount = 10, returnInfo = new ReturnInfo()} = {});
RequestArticlesConceptTrends
provides a list of most popular concepts in the results and how they daily trend over time
-
conceptCount
: number of top concepts to return -
returnInfo
: the details about the types of return data to include. See details.
RequestArticlesDateMentionAggr
RequestArticlesDateMentionAggr
provides information about the dates that have been found mentioned in the resulting articles.
For many users, simply providing a list of concepts, keywords, sources etc. is not sufficient and a more complex way of specifying a query is required. For such purposes, we provide a query language where conditions can be specified in particular JSON object, that resembles the query language used by the MongoDB. The grammar for the language is as follows:
ComplexArticleQuery {
"$query": CombinedQuery | BaseQuery,
"isDuplicateFilter": null | "keepAll" | "skipDuplicates" | "keepOnlyDuplicates",
"hasDuplicateFilter": null | "keepAll" | "skipHasDuplicates" | "keepOnlyHasDuplicates",
"eventFilter": null | "keepAll" | "skipArticlesWithoutEvent" | "keepOnlyArticlesWithoutEvent"
}
CombinedQuery {
"$or": [ CombinedQuery | BaseQuery, ... ],
"$not": null | CombinedQuery | BaseQuery
}
CombinedQuery {
"$and": [ CombinedQuery | BaseQuery, ... ],
"$not": null | CombinedQuery | BaseQuery
}
BaseQuery {
"conceptUri": null | string | { "$or": [ string, ... ]} | { "$and": [ string, ... ]},
"keyword": null | string | { "$or": [ string, ... ]} | { "$and": [ string, ... ]},
"categoryUri": null | string | { "$or": [ string, ... ]} | { "$and": [ string, ... ]},
"lang": null | string | { "$or": [ string, ... ]} | { "$and": [ string, ... ]},
"sourceUri": null | string | { "$or": [ string, ... ]},
"sourceLocationUri": null | string | { "$or": [ string, ... ]},
"sourceGroupUri": null | string | { "$or": [ string, ... ]},
"locationUri": null | string | { "$or": [ string, ... ]},
"dateStart": null | string,
"dateEnd": null | string,
"dateMention": null | [string, ... ],
"keywordLoc": null | "body" | "title" | "title,body",
"minArticlesInEvent": null | int,
"maxArticlesInEvent": null | int,
"$not": null | CombinedQuery | BaseQuery
}
Explanation: Each complex article query needs to be a JSON object that has a $query
key. The $query
key must contain another JSON object that should be parsable as a CombinedQuery
or a BaseQuery
. A CombinedQuery
can be used to specify a list of conditions, where all ($and
) or any ($or
) conditions should hold. The CombinedQuery
can also contain a $not
key containing another CombinedQuery
or BaseQuery
defining the results that should be excluded from the results computed by the $and
or $or
conditions. The BaseQuery
represents a JSON object with actual conditions to search for. These (positive) conditions can include concepts, keywords, categories, sources, etc to search for. If multiple conditions are specified, for example, a conceptUri
as well as a sourceUri
, then results will have to match all the conditions. The BaseQuery
can also contain the $not
key specifying results to exclude from the results matching the positive conditions of the BaseQuery
. A BaseQuery
containing only the $not
key is not a valid query (since it has no positive conditions).
Using this language you can specify queries that are not possible to express using the constructor parameters in QueryArticles
or QueryArticlesIter
. Here are some examples of queries and what they would return:
A query that would return the list of articles that mention AI or deep learning or machine learning:
{
"$query": {
"$or": [
{ "conceptUri": "http://en.wikipedia.org/wiki/Artificial_Intelligence" },
{
"keyword": {
$or: [ "deep learning", "machine learning" ]
}
}
]
}
}
A query that would return the list of politics related articles about Donald Trump or Hillary Clinton, or business related news that mention Elon Musk:
{
"$query": {
"$or": [
{
"conceptUri": {
"$or": [
"http://en.wikipedia.org/wiki/Donald_Trump",
"http://en.wikipedia.org/wiki/Hillary_Rodham_Clinton"
]
},
"categoryUri": "dmoz/Society/Politics"
},
{
"conceptUri": "http://en.wikipedia.org/wiki/Elon_Musk",
"categoryUri": "dmoz/Business"
}
]
}
}
Depending on your preference, you can build such JSONs for these complex queries yourself or you can use the associated classes such as ComplexArticleQuery()
, CombinedQuery()
and BaseQuery()
. Below is an example where we search for articles that are either about Donald Trump or are in the Politics category but were not published in February 2017 or mention Barack Obama:
const er = new EventRegistry();
Promise.all([
er.getConceptUri("Trump"),
er.getConceptUri("Obama"),
er.getCategoryUri("politics")
]).then(([trumpUri, obamaUri, politicsUri]) => {
const cq = new ComplexArticleQuery(
CombinedQuery.OR([
new BaseQuery({conceptUri: trumpUri}),
new BaseQuery({categoryUri: politicsUri}),
],
CombinedQuery.OR([
new BaseQuery({ dateStart: "2017-02-01", dateEnd: "2017-02-28" }),
new BaseQuery({conceptUri: obamaUri})]),
)
);
const q = QueryArticles.initWithComplexQuery(cq);
q.setRequestedResult(new RequestArticlesInfo());
return er.execQuery(q);
});
If you've built the JSON query yourself, you can also use like this:
const er = new EventRegistry();
q = QueryArticles.initWithComplexQuery("{ '$query': { ... } }")
q.setRequestedResult(new RequestArticlesInfo());
er.execQuery(q);
In this case you need to make sure you're providing a valid query in the JSON.
If you would like to simply iterate through the results that match the query you can of course also use QueryArticlesIter.initWithComplexQuery()
instead of QueryArticles.initWithComplexQuery()
.
Core Information
Usage tracking
Terminology
EventRegistry
class
ReturnInfo
class
Data models for returned information
Finding concepts for keywords
Filtering content by news sources
Text analytics
Semantic annotation, categorization, sentiment
Searching
Searching for events
Searching for articles
Article/event info
Get event information
Get article information
Other
Supported languages
Feed of new articles/events
Social media shares
Daily trends
Correlations
Mentions in news or social media
Find event for your own text
Article URL to URI mapping