Balancing RNN sequence data – Deep Learning w/ Python, TensorFlow and Keras p.10




[ad_1]

Welcome to the next part of our Deep Learning with Python, TensorFlow, and Keras tutorial series. In this tutorial, we’re going to continue building our cryptocurrency-price-predicting Recurrent Neural Network.

Text tutorials and sample code: https://pythonprogramming.net/balancing-rnn-data-deep-learning-python-tensorflow-keras/

Discord: https://discord.gg/sentdex
Support the content: https://pythonprogramming.net/support-donate/
Twitter: https://twitter.com/sentdex
Facebook: https://www.facebook.com/pythonprogramming.net/
Twitch: https://www.twitch.tv/sentdex
G+: https://plus.google.com/+sentdex

Source


[ad_2]

Comment List

  • sentdex
    December 3, 2020

    I want to classify anomaly detection using RNN keras.tf but I have a problem where the accuracy value increases but the val_accuracy value does not change and just remains constant at 50%. this is my complete code available on google colab https://colab.research.google.com/drive/1saoNuCxj08JCxZ_7taIjhp8sJEIV-T5U?usp=sharing //

  • sentdex
    December 3, 2020

    Wouldn't doing that affect the temporal sequence of the data? and hence the accuracy of the model in predicting future sequences of the data?

  • sentdex
    December 3, 2020

    Still don't understand why we shuffle the data…

  • sentdex
    December 3, 2020

    I'm not so sure about balancing buys and sells. For example, if you look at BTC on a major timeframe the right decision 90% of the time would be to buy. So it wouldn't be correct to train it to buy/sell 50% of the time.

  • sentdex
    December 3, 2020

    Why is target coming only from "LTC_USD"?

  • sentdex
    December 3, 2020

    If anyone got "ValueError: Input contains infinity or a value too large for dtype('float64')" while using BTC instead of LTC, I found there's an issue with some infinite values when scaling.
    To fix it add the following after dropping NAs and before scaling:
    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)

    df = df[indices_to_keep].astype(np.float64)

  • sentdex
    December 3, 2020

    Hi ,
    in a previous video we split last 5% for validation_data then how does the sequential_data have 60 entry ? it must be last 5% less right ?
    sorry if my question is stupid maybe i am confused with initial dateset value count kindly enlighten me, thanks.

  • sentdex
    December 3, 2020

    Probably we should create a third option, "hold" when the future price is the same as the current price. When I do that, the number of buys and sells is almost equal.

  • sentdex
    December 3, 2020

    A small advice for the others: never ever random.shuffle a numpy.array with more than 1 dimension… It spoils your data. Use numpy.random.shuffle.

    Note: In the video the random.shuffle was applied at a list, which is ok.

  • sentdex
    December 3, 2020

    I'm done and use model.predict successfully on live data from an exchange but how do I know which of the two floats corresponds to 1 or 0?

  • sentdex
    December 3, 2020

    What's wrong?
    train data: 34594 validation: 1531
    Dont buys: 34594, buys: 0
    VALIDATION Dont buys: 1531, buys: 0

  • sentdex
    December 3, 2020

    i have a question about the target column , the target is the label of LTC-USD column so what happened with the others column .

  • sentdex
    December 3, 2020

    You probably don't need the classify function to generate your targets. main_df['target'] = main_df[f'{RATIO_TO_PREDICT}_close'] < main_df['future_price'] should do the same thing. Thanks for the video, very good starter on the basics.

  • sentdex
    December 3, 2020

    Great video. Still I am not getting the reason behind the random shuffling, does not this make us loose the sequence information embedded isnide the data.?..can you please comment in that. And can you give a single real world application scenario.

  • sentdex
    December 3, 2020

    No one:
    Literally no one:
    Sentdex: random.shuffle()

  • sentdex
    December 3, 2020

    I am trying to implement RNN for batch control. I have 1 input and 4 outputs.
    All of them are in 50 batches each of length 600 each.
    So input A (temperature) has data of 50 batches with 600 values for each of the 50 batches.
    The outputs B,C,D also have the same dimension.

    Could you tell me how do I go about preparing appropriate shape/structure of this dataset to implement RNN?

  • sentdex
    December 3, 2020

    Can't figure how to use SMOTE to balance the type of data @sentdex provided. in case of crypto's daily data, brutal down sampling will be big issue, can anyone put a code snippet on this serie's specific data generation / preparations so that everyone can use, specially sentdex yourself would be the best to help us here. thanks

  • sentdex
    December 3, 2020

    This series are great videos. I have a question. How do you train (or retrain) with new training data? I did not see any example in this series. Thanks!

  • sentdex
    December 3, 2020

    "Hopefully you're using jupyter notebook to spot the following error early."

    "ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required by the scale function."
    if anyone gets this error, just remove the first row (timestamp=1528968660) in all the csv files because the ETC file starts at the next time-stamp

  • sentdex
    December 3, 2020

    Great video, thanks. To all the commenters saying, "this isn't correct, it shouldn't be done like this because…". Ya, no kidding, it's a seven minute video folks, what did you expect? This is just a bare-bones example of how you might go about trying this sort of thing – it's not perfect, but the ingredients are there, and they are nicely explained.

  • sentdex
    December 3, 2020

    Could someone please explain why we do the DEQUE part.

    I understand that the deque is there to remember SEQ_LEN of past data.

    However, I thought LSTMs are there to do that?
    I have not seen any other LSTM (RNN) tutorial where data would be constructed into packs of arrays and then shuffled.

    The shuffling part confuses me even more.

    I would expect to simply pass in data in order and LSTMs will take care of the rest?

  • sentdex
    December 3, 2020

    Can I ask a question sir, it may not related to this video, how do we feed json like data into tensroflow ? I've found something on tensorflow.org page, it says convert json data into binary protocol buffers, I didn't understand what is that means?

  • sentdex
    December 3, 2020

    Naive question here as im just starting with machine learning. What is the difference between normalization and scaling. And why do we need to scale the data after normalization?

  • sentdex
    December 3, 2020

    Can you recommend a good forum for Python design questions? Reddit Python forum has not been very helpful.

  • sentdex
    December 3, 2020

    Hey Sentdex, I love your tutorials and the way you code but I think the way you are balancing the dataset is not the right way to do it. If there's 90% no buys and 10% buys then by your method, you are removing 80% data from the nobuy class. So you are basically losing significant data which can be used to train the model. So, balancing using SMOTE method is way better since it upsamples the class with less data and sklearn has SMOTE function 🙂

  • sentdex
    December 3, 2020

    why did we shuffle the sequential data?

  • sentdex
    December 3, 2020

    if i understant this correctly, this is so wrong. i mean, in theory, undersampling to balance the classes and random sampling might improve results, but! when you leave out some cases of buys/longs you don't have the time series anymore. you might think that you will be evaluating your model on what happens the next day, but you won't, because you might have left out next day. this will be a hufe problem when there is a trend and a currency moves in one direction (like the whole crypto market last year) because you will leave out most of the buy positions. and then there is the random shuffling. so you randomly shuffle through the time and then divide into train/test. so examples from future might be in the train and examples from past might be in the test sample? you mustn't shuffle your data, it has to be ordered by time. so that you learn on the past and evaluate on the future. i still hope that i just didn't get it right…

  • sentdex
    December 3, 2020

    Please do some videos on Transfer Learning

  • sentdex
    December 3, 2020

    cannt we use a train_test_split in skilearn to splite the data,,

  • sentdex
    December 3, 2020

    Reveal the suspense @sentdex, what happened at 6:30 ?? Who is the ladderman ? Uncover the mysteryMan. Subscribers want the answer. We demand the Answer!!!

  • sentdex
    December 3, 2020

    When would target be anyting other than 0 or 1?

  • sentdex
    December 3, 2020

    Harrison, you are doing amazing job but please read this article https://hackernoon.com/dont-be-fooled-deceptive-cryptocurrency-price-predictions-using-deep-learning-bf27e4837151. I hope you can prove that it is wrong :-). Look forward to see your next video

  • sentdex
    December 3, 2020

    The shuffles might be a bit execcive… one shuffle at the end would actually do.

  • sentdex
    December 3, 2020

    Can't understand the creation of sequential_data from prev_days, can someone explain it please? The way I see it the closing prices and volumes of each currency is pushed into the previous_day array one father another and this repeats for the following days (rows of df), is this correct? If so, than how and why this works? Also by shuffling aren't we losing the timeseries? One last question, why aren't we trying to guess the actual future price of the selected currency (i.e. regression) instead of buy/sell classification?

  • sentdex
    December 3, 2020

    Whats up sentdex, great vid as ever !
    I was wondering if in the end of this series we'll be able to change some stuff and get this algo to work in Binance for example.. Will we ? I mean, train this model offline with a CSV and then keep it working live from API requests to really buy in the exchange.. Is this possible ?
    Thank you so much , cheers from Brasil.

  • sentdex
    December 3, 2020

    Why do you use a capital X but a lower y for the data?

  • sentdex
    December 3, 2020

    You are balancing the data with severe undersampling, I am sure that this works in some cases but your model might lose 90% of the information with highly unbalanced dataset. Isn't there a better way to balance the data?

  • sentdex
    December 3, 2020

    x is an np.array()
    y is a list
    shouldn't y also be an np.array() or just y as a list is fine?

  • sentdex
    December 3, 2020

    What's up Harrison ?
    I have a question which has been on my mind for days.
    Do you really believe one can make money in forex using AI based methods (like Deep Learning, Reinforcement Learning, you name it) ?
    How come we don't see or hear anywhere that some guy has made fortune in forex using these methods?
    Why didn't you make good money this way? (Or did ya ?! 😀 )
    I see everywhere that DL or RL has beaten this game or that game and everybody is like wow! imagine using this method to beat forex!
    But I personally think games have been made to lose against humans at some level. But it doesn't necessarily apply to forex market. Forex has not been made this way.
    I just don't wanna waste my time concentrating on this and then realizing it wasn't practical as I thought it was gonna be!
    And on the other hand I'm so tired of using technical and other methods to analyze forex market.
    Great job by the way. Cz of you I started learning python and it is kind of fun!

Write a comment