Top 5 unknown ML libraries to help you through your day-to-day tasks | by Artyom Kulakov | Jan, 2021

[ad_1]


Increase your coding speed with these 5 must-have libraries

Artyom Kulakov

I have always noticed that top experts in any field are able to create awesome things much faster than usual people, not because they are smarter, but because they can iterate over their ideas much quicker. One essential thing to iterate fast is to have some code snippets and libraries which help to build complex models easier. In this post, I would like to share my must-have collection of Machine Learning libraries, which you might have not heard about yet.

The number one guest on my list today is an awesome PyTorch Forecasting Python library. With the help of this tool, I am able to test even more approaches when creating a model for Time Series forecasting. The library contains multiple models like NBeats and TemporalFusionTransformer, which as the authors insist outperform the Amazon DeepAR algorithm. To train and predict with those models, you will have to prepare your dataset first, then you can jump into the fit/predict phase with your new model.

import pytorch_lightning as pl
from pytorch_lightning.callbacks import EarlyStopping, LearningRateMonitor

from pytorch_forecasting import TimeSeriesDataSet, TemporalFusionTransformer

# load data
data = ...

# define dataset
max_encode_length = 36
max_prediction_length = 6
training_cutoff = "YYYY-MM-DD" # day for cutoff

training = TimeSeriesDataSet(
data[lambda x: x.date < training_cutoff],
time_idx= ...,
target= ...,
# weight="weight",
group_ids=[ ... ],
max_encode_length=max_encode_length,
max_prediction_length=max_prediction_length,
static_categoricals=[ ... ],
static_reals=[ ... ],
time_varying_known_categoricals=[ ... ],
time_varying_known_reals=[ ... ],
time_varying_unknown_categoricals=[ ... ],
time_varying_unknown_reals=[ ... ],
)

# create validation and training dataset
validation = TimeSeriesDataSet.from_dataset(training, data, min_prediction_idx=training.index.time.max() + 1, stop_randomization=True)
batch_size = 128
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=2)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size, num_workers=2)

# define trainer with early stopping
early_stop_callback = EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=1, verbose=False, mode="min")
lr_logger = LearningRateMonitor()
trainer = pl.Trainer(
max_epochs=100,
gpus=0,
gradient_clip_val=0.1,
limit_train_batches=30,
callbacks=[lr_logger, early_stop_callback],
)

# create the model
tft = TemporalFusionTransformer.from_dataset(
training,
learning_rate=0.03,
hidden_size=32,
attention_head_size=1,
dropout=0.1,
hidden_continuous_size=16,
output_size=7,
loss=QuantileLoss(),
log_interval=2,
reduce_on_plateau_patience=4
)
print(f"Number of parameters in network: {tft.size()/1e3:.1f}k")

# find optimal learning rate (set limit_train_batches to 1.0 and log_interval = -1)
res = trainer.tuner.lr_find(
tft, train_dataloader=train_dataloader, val_dataloaders=val_dataloader, early_stop_threshold=1000.0, max_lr=0.3,
)

print(f"suggested learning rate: {res.suggestion()}")
fig = res.plot(show=True, suggest=True)
fig.show()

# fit the model
trainer.fit(
tft, train_dataloader=train_dataloader, val_dataloaders=val_dataloader,
)

Moreover, you can easily build more complex things on top of PyTorch Forecasting. As always Pytorch is awesome, and simple to extend.

After the paper TabNet: Attentive Interpretable Tabular Learning TabNet model has become a superstar for many tabular data tasks. For example, TabNet outperformed tree-related models in the recent Kaggle challenge Mechanisms of Action (MoA) Prediction. TabNet for sure will not work better on any kind of tabular data task, but I would definitely give it a try, especially because it has a really really simple API.

from pytorch_tabnet.tab_model import TabNetClassifier, TabNetRegressor

clf = TabNetClassifier() #TabNetRegressor()
clf.fit(
X_train, Y_train,
eval_set=[(X_valid, y_valid)]
)
preds = clf.predict(X_test)

As you see, there is hardly any difference from PyTorch default API calls.

Googles’ Mediapipe library can significantly save your time, when you create Computer Vision models like Face Detection, Pose Estimation, Hair Segmentation, and more! Mediapipe API is really simple. Here is the full code to run the Pose estimation model on your Desktop via a Python script.

import cv2
import mediapipe as mp
mp_drawing = mp.solutions.drawing_utils
mp_pose = mp.solutions.pose

# For webcam input:
pose = mp_pose.Pose(
min_detection_confidence=0.5, min_tracking_confidence=0.5)
cap = cv2.VideoCapture(0)
while cap.isOpened():
success, image = cap.read()
if not success:
print("Ignoring empty camera frame.")
continue

image = cv2.cvtColor(cv2.flip(image, 1), cv2.COLOR_BGR2RGB)
image.flags.writeable = False
results = pose.process(image)
image.flags.writeable = True
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
mp_drawing.draw_landmarks(
image, results.pose_landmarks, mp_pose.POSE_CONNECTIONS)
cv2.imshow('MediaPipe Pose', image)
if cv2.waitKey(5) & 0xFF == 27:
break
pose.close()
cap.release()

You can not only create and deploy those models enormously fast, but they also work much better than their analogs. For example, the MediaPipe hand landmark model on iPhone works much faster, than the same model available on iPhone but default. If you want to try those models on Python, currently there are available only Hand Landmarks, Pose Landmarks, Face Mesh, and Holistic models. But as the developers say, they will be adding more Python models really soon.

Did you ever spend hours of time training a KNN model and then lose your patience and just stop the process? Well, I did… The good thing is that you will not experience that again with an awesome Rapids library. It allows running a lot of standard ML algorithms on GPU or on a cluster of GPUs. Not only that, it allows to run calculations on Pandas DataFrames on GPUs as well, this might be handy from time to time. The API for model fitting is yet again really simple, just a few lines of code, and you are done!


model = NearestNeighbors(n_neighbors=KNN)
model.fit(X_train)
distances, indices = model.kneighbors(X_test)

You can see the full KNN fitting tutorial here.

Last but definitely not least on my list today is Dask. I used to struggle with PySpark API a lot and didn’t really manage to fully learn it. Dask has similar functionality, but it is much more like default Numpy/Pandas API, in fact, it even has methods with similar names, which I find very useful.

df[['x', 'y']].rolling(window='24h').mean().head()

Will you ever guess that Dask performs this operation? You see, it is the same as Pandas!

Final words

Thank you for the time reading this material! I hope it broadened your horizons a bit and maybe it will even help with your future projects. I am planning to write a few more helpful posts like that one and if you like it you can follow my profile, not to miss them.

If you are interested in who am I and in my projects you can visit my personal website to learn more: artkulakov.com

Read More …

[ad_2]


Write a comment