TV-Series-Classifier

Arabic TV Series Detection App trained on TV web pages.

Fig.1 - WEB DEMO

Abstract

This repository contains experiment about a project i was doing about TV Series Classification from any given web page.
The target web page the model can recognize is Ra7eem TV Series related web pages

Features Engineering

In my experiment through building models before in topic modeling problems i prefer TfIdf features vectorization to CountVectorizer (normal BOW) due to it weights the important features which is relevant to the context of the web pages that is used to build the model. So better performance is obtained.
There is many features that is not very important like shown in the next captions and it can be treated like stopwords so better features realization is obtained.

Classifier

Fig.2 - Training performance

Fig.3 - test performance

Examples you can test with and were not in the training set:
https://mzarita.tv/watch.php?vid=dfa569e72
https://moviz4u.tv/
https://www.elcinema.com/work/2048748
http://www.masrawy.com/ramadan/Tag/797735/%D8%B1%D8%AD%D9%8A%D9%85
http://www.masrawy.com/ramadan/drama-news/details/2018/6/16/1376870/%D8%A8%D8%A7%D9%84%D9%81%D9%8A%D8%AF%D9%8A%D9%88-%D9%83%D9%88%D8%A7%D9%84%D9%8A%D8%B3-%D9%82%D8%AA%D9%84-%D8%B1%D8%AD%D9%8A%D9%85-%D9%81%D9%8A-%D8%A7%D9%84%D8%AD%D9%84%D9%82%D8%A9-%D8%A7%D9%84%D8%A3%D8%AE%D9%8A%D8%B1%D8%A9-#keyword

Data

Data fetching

The data is obtained through Scrapper library from any web page. The used scrapper i preferred in my experiment is BeautifulSoup4.

Missing data

There is many approaches to handle the missing data, The good practise recommend not to ignore the missing rows in the dataset since it will affect the learned parameters through the optimization. In my case the missing data was 1.12 % of the whole dataset so for sake of simplicity i dropped them. Otherwise, other cases could be handled like filling random numbers that represents features in those missing entries.

USAGE

navigate to live App to test.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
img		img
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TV-Series-Classifier

Abstract

Features Engineering

Classifier

Data

Data fetching

Missing data

USAGE

About

Releases

Packages

License

ahmednabil950/TV-Series-Classifier

Folders and files

Latest commit

History

Repository files navigation

TV-Series-Classifier

Abstract

Features Engineering

Classifier

Data

Data fetching

Missing data

USAGE

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages