Starter steps of regression "It's not a code only"

1- Clear the data of dropna['the null and that not important colums and you will learn it forward in the next video"

2- The x "features and the y "output preperations"/||| X = data.drop(['late_aircraft_ct'],axis = 1,inplace = False)

y = data['late_aircraft_ct'] #selected that is my needed output # Overwrite

3- From sklearn.model_selection import train_test_split #split the parts for traning and tests after traning from the data...

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=44, shuffle=True)

#Splitted Data,#shape = the num of rows and num of columes in the file

print('X_train shape is ' , X_train.shape)

print('X_test shape is ' , X_test.shape)

print('y_train shape is ' , y_train.shape)

print('y_test shape is ' , y_test.shape)

-X_train = is the data that will train the model, X_test = the data that will test accuracy if the model after traning.

-y_train = the output that will train the model above it, y_test is the output that will test the accuracy of the modell output after traning.

-test_size=0.25 = is the size of the data from the file that will use in test the accuracy of the model after training it, 25% of the tootal data.

-shuffle=True = is like mixing the data to take a sample of all data in traning and testing

-random_state=44 = is use to fixed the randomization in the traning.

4- Using a model and shoud know that their is other models ....

from sklearn.linear_model import LinearRegression # LinearRegression is a model there are lot other it.

from sklearn.preprocessing import StandardScaler #is to make all data at the same shape "math" particualry.

from sklearn.pipeline import make_pipeline #Makes it easy to create a sequence of operations without need to write separate steps for each one.

LinearRegressionModel = make_pipeline(StandardScaler(), LinearRegression(fit_intercept=True, copy_X=True))

print(LinearRegressionModel)

5- LinearRegressionModel.fit(X_train, y_train) #Makes the model learn the relationship between X_train and y_train.

6- y_pred = LinearRegressionModel.predict(X_test)

print(y_pred)

#show the result of y^ that come from the x_test which the x .25% of the data that not use in the traning.

7- From sklearn.metrics import mean_squared_error #only a tool to calculate the cost function and there is tools other it searching.

MSEValue = mean_squared_error(y_test, y_pred, multioutput='uniform_average') # it can be raw_values

print('Mean Squared Error Value is : ', MSEValue)

trans str to float:

from sklearn.preprocessing import LabelEncoder

encoder = LabelEncoder()

for col in ['Brand', 'Model', 'Fuel_Type', 'Transmission']:

data[col] = encoder.fit_transform(data[col])

print(data.head()) # سترى أن القيم النصية تحولت إلى أرقام