-
Notifications
You must be signed in to change notification settings - Fork 2
Text analytics
EventRegistry package also has an Analytics
class that can be used to perform various text analytics. The class will be extended with additional functionality, but for now it allows you to
- semantically annotate your documents with entities and non-entities mentioned in the document,
- categorize the document into a list of predefined categories based on DMOZ.org taxonomy,
- compute sentiment of the document
- determine the language of the document.
To visually test different methods please visit our demo pages.
In order to semantically annotate a given document use code such as:
import {EventRegistry, Analytics} from "eventregistry";
const er = new EventRegistry();
const analytics = new Analytics(er);
analytics.annotate("Microsoft released a new version of Windows OS.").then((ann) => {
console.info(ann);
});
Categorization is currently only supported for English language. To categorize the document into a predefined set of categories and identify top related keywords use code such as:
import {EventRegistry, Analytics} from "eventregistry";
const er = new EventRegistry();
const analytics = new Analytics(er);
analytics.categorize("Microsoft released a new version of Windows OS.").then((cat) => {
console.info(cat);
});
Here is a sample code to detect the sentiment expressed in the document:
import {EventRegistry, Analytics} from "eventregistry";
const er = new EventRegistry();
const analytics = new Analytics(er);
analytics.sentiment("Microsoft released a new version of Windows OS.").then((cat) => {
console.info(cat);
});
Here is a sample code to detect the code of the document
import {EventRegistry, Analytics} from "eventregistry";
const er = new EventRegistry();
const analytics = new Analytics(er);
analytics.detectLanguage("Microsoft released a new version of Windows OS.").then((cat) => {
console.info(cat);
});
{
"dmoz": {
// top categories associated with the text
"categories": [
{
// category ID
"label": "dmoz/Computers/Companies/Microsoft_Corporation",
// relevance of the category to the document
"score": 0.456
},
....
],
// top keywords that summarize the document and their weights
"keywords": [
{
"keyword": "Computers",
"wgt": 0.160
}
...
]
}
}
{
"reliable": true,
"textBytes": 32,
// the language candidates for the document
"languages": [
{
"name": "ENGLISH",
// ISO2 code of the language
"code": "en",
// probability of the document being in this language
"percent": 96,
"score": 1321
},
...
]
}
{
// the list of annotations
"annotations": [
{
// the URL that uniquely identifies the concept represented by the annotation
"url": "http://en.wikipedia.org/wiki/Microsoft",
// the label that can be used to represent the annotation (in the language of the document)
"title": "Microsoft",
// the input language
"lang": "en",
// secondary URL that uniquely identifies the concept as a concept on English wikipedia
"secUrl": "http://en.wikipedia.org/wiki/Microsoft",
// label that can represent the concept in English language
"secTitle": "Microsoft",
"secLang": "en",
// dbpedia URI of the concept
"dbPediaIri": "http://dbpedia.org/resource/Microsoft",
// dbpedia types for the concept
"dbPediaTypes": [
"Agent",
"Organisation",
"Company"
],
// general categorization of the concept (person, org or loc)
"type": "org",
// importance of the concept for the whole document
"wgt": 0.6666,
// mentions of the concept in the document
"support": [
{
// character positions in text
"chFrom": 0,
"chTo": 8,
// based on the word(s) mentioned in the text, how likely it is that this is the correct annotation
"pMentionGivenSurface": 0.253001126280801,
"pageRank": 0.03690052603740375,
// the word/phrase that is used to mention the concept in the text
"text": "Microsoft",
// word indices
"wFrom": 0,
"wTo": 0,
"wikiLang": "en"
}
],
"pageRank": 0.2520778231483313,
// wikidata id for the concept
"wikiDataItemId": "Q2283"
// wikidata class ids for the concept
"wikiDataClassIds": [
"Q891723",
"Q1058914",
"Q4830453",
"Q43229",
"Q874405",
"Q24229398",
"Q16334295",
"Q58778",
"Q35120",
"Q16334298",
"Q286583",
"Q17519152",
"Q517966",
"Q223557",
"Q16889133",
"Q18844919",
"Q488383",
"Q5127848"
],
// wikidata class ids and names
"wikiDataClasses": [
{
"enLabel": "public company",
"itemId": "Q891723"
},
{
"enLabel": "software house",
"itemId": "Q1058914"
},
...
]
},
...
],
// list of nouns identified in the document
"nouns": [
{
// starting and ending indices of the noun
"iFrom": 25,
"iTo": 31,
// normalized form of the text
"normForm": "version",
// list of Wordnet synset IDs for the word
"synsetIds": [
"101267901",
"105840650",
"105928513",
"106408779",
"106536389",
"107173585"
]
},
...
],
// list of adjectives found in the document
"adjectives": [
{
// position in the document
"iFrom": 21,
"iTo": 23,
// normalized form of the adjective
"normForm": "new",
// wordnet synset ids
"synsetIds": [
"300024996",
"300128733",
"300818008",
"300937186",
"301640850",
"301687167",
"301687965",
"302070491",
"302584699"
]
},
...
],
// list of verbs identified in the document
"verbs": [
{
// text positions
"iFrom": 10,
"iTo": 17,
// normalized form of the verb
"normForm": "release",
// wordnet sysnsets
"synsetIds": [
"200069295",
"200104868",
"200269682",
"200967625",
"201436518",
"201474550",
"201757994",
"202316304",
"202421374",
"202494047"
]
},
...
],
// list of adverbs
"adverbs": [
],
// there are other returned properties that don't have significant importance for the user
}
Core Information
Usage tracking
Terminology
EventRegistry
class
ReturnInfo
class
Data models for returned information
Finding concepts for keywords
Filtering content by news sources
Text analytics
Semantic annotation, categorization, sentiment
Searching
Searching for events
Searching for articles
Article/event info
Get event information
Get article information
Other
Supported languages
Feed of new articles/events
Social media shares
Daily trends
Correlations
Mentions in news or social media
Find event for your own text
Article URL to URI mapping