What are the data mining techniques used in search engine like Google?
What are the data mining techniques used in search engines like Google?
What is Data Mining and Why is it Important?
Looking for gold in a mine of information. Data Mining can help.
how mining for gold is digging through Earth and rock for the valuable bits. Data mining is sorting through large datasets to find valuable information. The process of data mining involves using software algorithms and statistical methods to identify patterns in data to help answer business questions and predict future trends and behavior.
Data Mining techniques are used in business
- marketing
- risk management
- fraud detection
- cybersecurity
- medical diagnosis
- mathematics
- research disciplines
- cybernetics
- genetics.
Data mining is a means to drive increased efficiency in business operations, but it can also set a business apart from the competition in combination with predictive analytics, machine learning, and other aspects of advanced analytics.
Data mining is sometimes used interchangeably with data analytics, but it's really a component of the overall data science and analytics process. Data Mining focuses on finding relevant information and data sets, which can then be used for analytics and predictive modeling.
There are five primary steps to data mining
- Identification of business issues to analyze data sources, such as databases or operational systems.
- Data collection and exploration including the sampling and profiling of data sets.
- Data preparation and transformation to filter cleanse and structure data for analysis.
- Modeling, in which data scientists and other users create, test, and evaluate data mining models.
- Deployment of the models for analytics use cases.
Web Scraping - Data Mining
It's about web scraping is one of the most common methods of collecting data although most people consider it a last resort it's still one of the most commonly used methods of data mining actually.
I will be using Alexia metal library this is the link to it where you can find more information about this library, I'll be importing this library pandas numpy and Matt lost Lipe, finally, I'm configuring pandas for a maximum of 50 columns the data and data source that we'll be working.
that would be working with the Nobel Prize the data I'll be getting this data from Wikipedia this is the page that we'll be working with that's the page and this is the table that we want to scrape and get information from.
basic rules of the web scraping check if there is an API and use it it will make your life easier don't use web scraping too much in a short time so don't make a lot of requests to a single server in a very short time it will slow down that server.
it might get you banned from the website and never scrape anything that is not public so if you reach a document that's not supposed to be public you're not supposed to get any data from there I think that's clear but lastly, check robots.txt II usually guides you to how to use their website in an automated fashion.
YouTube Data API - Data Mining
It's about YouTube data API before you start with any Google API, you have to have a code and register with Google API console so to get there I'm looking for Google API console it's the first link in here.
Google API console: https://console.cloud.google.com/
If you have a project it will open to your active project if you don't you can you'll get a screen like this where you can create a new project here but I will go to my active project.
- The first step you can monitor your API usage here is requests per second so I can know my performance over the last few days.
- The second thing you can have to do you have to go to API and enable YouTube data API version 3 if it's not in your enabled API's you have to scroll down and click enable on it.
- The last thing we'll have to do is go to credentials create new public API access the server key and yes create and you will get a server an API key like this copy
This API that we will use in the next slide so here I have my API key I will be using two functions from the YouTube Data API I will be using YouTube that search that lists that searches for videos using a keyboard the second thing I'll be using is YouTube that videos that list this retrieves backs that stakes information like likes dislikes favorite and other information like even view count and things like this about one or multiple videos so make sure you put your API key in here before you proceed.
Twitter API - Data Mining
it's about Twitter API we'll be using Twitter API with the tweet pea library to perform basic data mining operations.
Using Tweepy to search for tweets and process them. We also cover the Cursor object for iteration which helps in retrieving a large amount of data from Twitter.
We will be using tweepy library to perform data mine on Twitter
https://developer.twitter.com/en/products/twitter-api
Google Search - Data Mining
It's about Google search so we will be covering the basics of using Google Custom Search API to search the Internet three important links to work with when you're dealing with this API is the Google custom search website Google developer console and this is the link to the Google Search API documentation.
Other important Links:
Google Custom Search: https://www.google.com/cse/
Google Developer Console: https://console.developers.google.com/
Custom Search Documentation: https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list
Searching the Internet using Google custom search API. Start by setting up a Google Custom Search to search the entire web. Then setting up Custom Search API to access the search service within your code.
MongoDB - Data Mining
it's about MongoDB in many cases you don't have to go and collect data because you have data in the local no sequel database that requires analysis this could be you are an audit trail from your website where you want to study visitors and trends.
For example
one common no sequel database is a Hmong it is MongoDB which is the subject of In this tutorial we'll be talking about why do we use no sequel common no sequel database systems PI which is a Python driver for among a DB single documents with nested documents and working with multiple documents.
Sometimes you don't have to look far for data to work with. If you have a NoSQL DB in your organization that has huge amounts of data, it might be interesting to look there for interesting observations.
It is usually log data for URL audit-train or social media data collected over some time. In this tutorial, you will learn how the basics of working with MongoDB using PyMongo Library.
Twitter Streaming API - Data Mining
it's about Twitter streaming API so starting from this tutorial that would be putting an outline for the tutorial Data Mining twitters streaming API using tweepy.
Twitter gives two kinds of API to get to their information:
- Peaceful API: Used to get information about existing information objects like situations with", "client, ... and so forth
- Streaming API: Used to get live situations with as they are sent
The justification for why you might want to utilize streaming API:
- Catch enormous measure of information on the grounds that RESTful API has restricted admittance to more seasoned information
- Continuous examination like observing social conversation about a live occasion
- In house documents like filing social conversation about your brand(s)
- Simulated intelligence reaction framework for a Twitter account like a robotized answer and recording questions or giving replies