Normalizing and creating sequences Crypto RNN – Deep Learning w/ Python, TensorFlow and Keras p.9




[ad_1]

Leading up to this tutorial, we’ve learned about recurrent neural networks, deployed one on a simpler dataset, and now we are working on doing it with a more realistic dataset to try to predict cryptocurrency pricing movements.

Text tutorials and sample code: https://pythonprogramming.net/normalizing-sequences-deep-learning-python-tensorflow-keras/
Discord: https://discord.gg/sentdex
Support the content: https://pythonprogramming.net/support-donate/
Twitter: https://twitter.com/sentdex
Facebook: https://www.facebook.com/pythonprogramming.net/
Twitch: https://www.twitch.tv/sentdex
G+: https://plus.google.com/+sentdex

Source


[ad_2]

Comment List

  • sentdex
    November 27, 2020

    I want to classify anomaly detection using RNN keras.tf but I have a problem where the accuracy value increases but the val_accuracy value does not change and just remains constant at 50%. this is my complete code available on google colab https://colab.research.google.com/drive/1saoNuCxj08JCxZ_7taIjhp8sJEIV-T5U?usp=sharing //

  • sentdex
    November 27, 2020

    But why are you scaling across main_df? At the time of real use you won't be loading years of data so that you can get your 60 minutes scaled. Every sample and the future sequence should be scaled together and separately for each example.

  • sentdex
    November 27, 2020

    Its there a way we can have the Neural Network recognize the time? I mean when we do technical trading, the pattern really matters, thus we should not shuffle the data. And perhaps even add a time stamp say 1-60 as index for each deque for the DNN to understand its time, what do you think?

  • sentdex
    November 27, 2020

    Interesting Tutorial bro, but I have a question on data normalization like after having ur own dataset how u can normalize it with RGB images?

  • sentdex
    November 27, 2020

    Use `scikit-learn` instead. Pypi `sklearn` states to use `scikit-learn` instead.

  • sentdex
    November 27, 2020

    for scaling: (Value-min)/Max this will always give you a number between 0 to one

  • sentdex
    November 27, 2020

    deque is pronounced "deck" lol. at least that's how my professor pronounced it.

  • sentdex
    November 27, 2020

    Why is the target based only on "LTC-USD"?

  • sentdex
    November 27, 2020

    Thanks a lot for what you are doing, you'r my number one reference.
    I have a question plz :
    why didn't you try to predict price of the coin instead of whether it's going up or down?
    Again thanks a lot for your efforts.

  • sentdex
    November 27, 2020

    In 17:28 you are defining the sequential_data list as sequential_data.append([np,array(prev_days), i[-1]). Which is a list of sequences of SEQ_LEN length. For the sequence you use a single target which is the last value in the sequence of datapoints. Why only last one? What is has to do with all other points in the sequence of SEQ_LEN points? Would the max of 1 or 0 appearances be more appropriate for the final target?

  • sentdex
    November 27, 2020

    Got some question regrading the preprocessing. If I got it right, after the preprocessing, the values should be between 0 and 1. However, by using preprocessing.scale(), I got the values less than 0 and more than 1. Should I go for MinMaxScaler() instead?

  • sentdex
    November 27, 2020

    amazing video man, thank you very much!

    Just a small question, why do you use just one coin to calculate a target? Shouldnt it have multiple targets? I mean, one for each coin? Normally this coins follows BTC, but this is not always true, mainly on minute charts! I also dont understand (maybe conceptually) why we should shuffle data for a series-based train, if you can tell a bit about it, it would be amazing!

    Again, the series is amazing and Im looking forward to end the entire website =D

  • sentdex
    November 27, 2020

    Thanks. really helpful tutorials. just one question. you wrote df[col] = preprocessing.scale(df[col].values) # scale between 0 and 1.
    but as we can see, you have negative value after running the code. As I searched, preprocessing.scale just used to center to the mean and component wise scale to unit variance but it dose not mean we scale between 0 and 1. Am I right?

  • sentdex
    November 27, 2020

    Amazing video! I know this is a stupid question, but I did the exact same thing but somehow the preprocess_df function is not working ( i copied it word for word) where i cannot see neither the normalization nor the future column being removed, is there anything i am missing out on here? Please let me know. Thanks

  • sentdex
    November 27, 2020

    I will try this with MetaTrader library for python that i came across last year

  • sentdex
    November 27, 2020

    validation_main_df_2 = main_df[main_df.index.get_loc(last_5_pct):]
    a little cleaner way to get the slice of a DF instead of having an operator

  • sentdex
    November 27, 2020

    Thank you !

  • sentdex
    November 27, 2020

    Just how much coffee do you drink and how many weird-shaped cups do you have? 😛

  • sentdex
    November 27, 2020

    So each column of sequential_data[i][0] is a time series for a different attribute for the past 60 time steps, and sequential_data[i][1] is the trend of the time series from the last time step to 3 time steps in the future?

  • sentdex
    November 27, 2020

    FYI deque is pronounced "DECK" not "day-q".

  • sentdex
    November 27, 2020

    Let's collect all the reviews of the tutorials made by Harrison, and by applying LSTM based sequential time-series analysis, detect whether his next tutorial might be another excellent one or not!

  • sentdex
    November 27, 2020

    Thanks for the Tutorials, they are amazing and easy to understand ….

  • sentdex
    November 27, 2020

    why use pct_change, when you can normalize using preprocessing.scale

  • sentdex
    November 27, 2020

    This is the most beneficial youtube video I've ever seen. Thank you!

  • sentdex
    November 27, 2020

    Sentdex_Wonderful series! But problem here in the def preprocess_df function. The first dropna needs to be removed as, after each pct_change() action on a single column which generates a NaN in first position, dropna deletes entire row of real data from subsequent columns on each iteration. Next iteration creates a new Nan on a new row, that row is then deleted, and so on..Each iteration means the columns then get shorter and shorter. Commenting out first dropna increases final accuracy by c. 5%. Not easy to spot as no error is generated here. Error only with different geometry featuresets (my own '000's of columns). I've added the print(df[col]) to show the difference in the columns being pct_change's. the dropna outside the loop catches all NaNs, but keeps all other data. I'll take a 5% increase in accuracy… Thx, great series!

    def preprocess_df(df): # balance data (between currencies, scale data to between -1,0,1)

    df = df.drop('future', 1)

    for col in df.columns:

    if col != 'target':

    df[col] = df[col].pct_change() # this normalises all columns except target

    # df.dropna(inplace=True) # PROBLEM-as pct_change will generate at least 1 nan

    print(df[col]) # verify each column

    df[col] = preprocessing.scale(df[col].values) # scales data between -1,0,1 range

    df.dropna(inplace=True)

  • sentdex
    November 27, 2020

    where is the order of the videos?????????????

  • sentdex
    November 27, 2020

    You have lookahead bias, when you scale with regards to future prices, you need to scale it within sequences. Your accuracy will drop as a result of it

  • sentdex
    November 27, 2020

    Great video, many thanks! Can the target be a list of future 3 minutes of price, i.e. I want to predict the whole future 3 minutes of price instead just a binary value?

  • sentdex
    November 27, 2020

    Why wouldn't you just chop the full arrays by array length *0.95 instead of doing the whole timestamp thing?

  • sentdex
    November 27, 2020

    I am trying to implement RNN for batch control. I have 1 input and 4 outputs.
    All of them are in 50 batches each of length 600 each.
    So input A (temperature) has data of 50 batches with 600 values for each of the 50 batches.
    The outputs B,C,D also have the same dimension.

    Could you tell me how do I go about preparing appropriate shape/structure of this dataset to implement RNN?

  • sentdex
    November 27, 2020

    i am not sure about the deque. for example, does the first degive you the first 60 numbers starting from the first number and the second 60 numbers starting from the 61st number or does the first degive you the first 60 numbers starting from the first number and the second 60 numbers starting from the secnd number

  • sentdex
    November 27, 2020

    Why to use preprocessing.scale? Can I use MinMaxScaler instead?

  • sentdex
    November 27, 2020

    Code doesn't work, you can't under preprocessing_df() you have to return something, you cannot leave it blank.

  • sentdex
    November 27, 2020

    Hi Sentdex, another great video. Im trying to apply this to stock data but i keep getting ValueError: Input contains infinity or a value too large for dtype('float64').
    when trying to preprocess my df. Any pointers?

  • sentdex
    November 27, 2020

    i might be wrong but isnt the "target' col just based on LTC values? dont we need to make 3 more "Targets" for all others?

Write a comment