All posts in 'data_science'

Introducing vnnews package

I’ve had to crawl the web to collect a lot of data for studying machine learning recently. Writing the same code again and again to perform the same task made me decided to write a separate Python package fo...     Read more »

Movie Review - Sentimental Analysis using Bag of Words

This post is my practical code that follow the Kaggle tutorial about Bag of Words model for Natural Language Processing. Please refer to the following link to read the notebook. Click here to view the noteb...     Read more »

World Population Analysis

In this post, we will crawl world population data from Wikipedia and do some analysis. The data is taken from this page provides us the population detail for every country on Earth every 5 years, from 1955 u...     Read more »

Vietnam IT Jobs Analysis

Introduction I’ve been looking for a Data Analysis job recently and failed to do so. Actually I got several offers for Python Developer position but the work in these companies are quite boring so I left. T...     Read more »

Scraping Vietnamworks job

Introduction This document shows some basic web scraping steps to crawl Vietnamworks and get 50 newest jobs. Below are some libraries used in this doc: BeautifulSoup 4: worker for all scraping activi...     Read more »

Using KMeans to cluster 1D data

Problem: Given the following set: { 2, 4, 10, 12, 3, 20, 30, 11, 25 }. Write pseudo code for k-means clustering algorithm to cluster the above set to 2 clusters. Then implement code to achieve the same. ...     Read more »

(VN) Implicit và Explicit rating

Lí thuyết Implicit và Explicit rating là 2 thuật ngữ được dùng trong data mining. Trong quá trình sử dụng internet như: lướt web, mua sắm online, xem phim, nghe nhạc, … người dùng sẽ thực hiện rất nhiều tác...     Read more »