Data Science: Techniques for Data Reduction in Data Pre-processing

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import VarianceThreshold, RFE, SelectFromModel, SelectKBest, f_classif, chi2, mutual_info_classif from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.datasets import load_iris
  • The best features are chosen using univariate statistical tests in univariate feature selection.
  • Each characteristic is compared to the target variable to check if there is a statistically significant link between them.
  • We ignore the other characteristics while analyzing the link between one feature and the target variable. That is why it is referred to as ‘univariate’.
  • Each feature has its test score.
  • Finally, all of the test scores are compared, and the attributes with the highest scores are chosen.
  • These objects accept a scoring function that provides univariate scores and p-values (or only scores for SelectKBest and SelectPercentile):
  1. f_classif
  1. Classification
  2. Regression
  1. Before using Feature Selection

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store