## Normalizing and creating sequences Crypto RNN – Deep Learning w/ Python, TensorFlow and Keras p.9

[ad_1]

Leading up to this tutorial, we’ve learned about recurrent neural networks, deployed one on a simpler dataset, and now we are working on doing it with a more realistic dataset to try to predict cryptocurrency pricing movements.

Text tutorials and sample code: https://pythonprogramming.net/normalizing-sequences-deep-learning-python-tensorflow-keras/

Discord: https://discord.gg/sentdex

Support the content: https://pythonprogramming.net/support-donate/

Twitter: https://twitter.com/sentdex

Facebook: https://www.facebook.com/pythonprogramming.net/

Twitch: https://www.twitch.tv/sentdex

G+: https://plus.google.com/+sentdex

Source

[ad_2]

I want to classify anomaly detection using RNN keras.tf but I have a problem where the accuracy value increases but the val_accuracy value does not change and just remains constant at 50%. this is my complete code available on google colab https://colab.research.google.com/drive/1saoNuCxj08JCxZ_7taIjhp8sJEIV-T5U?usp=sharing //

But why are you scaling across main_df? At the time of real use you won't be loading years of data so that you can get your 60 minutes scaled. Every sample and the future sequence should be scaled together and separately for each example.

Its there a way we can have the Neural Network recognize the time? I mean when we do technical trading, the pattern really matters, thus we should not shuffle the data. And perhaps even add a time stamp say 1-60 as index for each deque for the DNN to understand its time, what do you think?

Interesting Tutorial bro, but I have a question on data normalization like after having ur own dataset how u can normalize it with RGB images?

Use `scikit-learn` instead. Pypi `sklearn` states to use `scikit-learn` instead.

for scaling: (Value-min)/Max this will always give you a number between 0 to one

deque is pronounced "deck" lol. at least that's how my professor pronounced it.

Why is the target based only on "LTC-USD"?

Thanks a lot for what you are doing, you'r my number one reference.

I have a question plz :

why didn't you try to predict price of the coin instead of whether it's going up or down?

Again thanks a lot for your efforts.

In 17:28 you are defining the sequential_data list as sequential_data.append([np,array(prev_days), i[-1]). Which is a list of sequences of SEQ_LEN length. For the sequence you use a single target which is the last value in the sequence of datapoints. Why only last one? What is has to do with all other points in the sequence of SEQ_LEN points? Would the max of 1 or 0 appearances be more appropriate for the final target?

Got some question regrading the preprocessing. If I got it right, after the preprocessing, the values should be between 0 and 1. However, by using preprocessing.scale(), I got the values less than 0 and more than 1. Should I go for MinMaxScaler() instead?

amazing video man, thank you very much!

Just a small question, why do you use just one coin to calculate a target? Shouldnt it have multiple targets? I mean, one for each coin? Normally this coins follows BTC, but this is not always true, mainly on minute charts! I also dont understand (maybe conceptually) why we should shuffle data for a series-based train, if you can tell a bit about it, it would be amazing!

Again, the series is amazing and Im looking forward to end the entire website =D

Thanks. really helpful tutorials. just one question. you wrote df[col] = preprocessing.scale(df[col].values) # scale between 0 and 1.

but as we can see, you have negative value after running the code. As I searched, preprocessing.scale just used to center to the mean and component wise scale to unit variance but it dose not mean we scale between 0 and 1. Am I right?

Amazing video! I know this is a stupid question, but I did the exact same thing but somehow the preprocess_df function is not working ( i copied it word for word) where i cannot see neither the normalization nor the future column being removed, is there anything i am missing out on here? Please let me know. Thanks

I will try this with MetaTrader library for python that i came across last year

validation_main_df_2 = main_df[main_df.index.get_loc(last_5_pct):]

a little cleaner way to get the slice of a DF instead of having an operator

Thank you !

Just how much coffee do you drink and how many weird-shaped cups do you have? 😛

So each column of sequential_data[i][0] is a time series for a different attribute for the past 60 time steps, and sequential_data[i][1] is the trend of the time series from the last time step to 3 time steps in the future?

FYI deque is pronounced "DECK" not "day-q".

Let's collect all the reviews of the tutorials made by Harrison, and by applying LSTM based sequential time-series analysis, detect whether his next tutorial might be another excellent one or not!

Thanks for the Tutorials, they are amazing and easy to understand ….

why use pct_change, when you can normalize using preprocessing.scale

This is the most beneficial youtube video I've ever seen. Thank you!

Sentdex_Wonderful series! But problem here in the def preprocess_df function. The first dropna needs to be removed as, after each pct_change() action on a single column which generates a NaN in first position, dropna deletes entire row of real data from subsequent columns on each iteration. Next iteration creates a new Nan on a new row, that row is then deleted, and so on..Each iteration means the columns then get shorter and shorter. Commenting out first dropna increases final accuracy by c. 5%. Not easy to spot as no error is generated here. Error only with different geometry featuresets (my own '000's of columns). I've added the print(df[col]) to show the difference in the columns being pct_change's. the dropna outside the loop catches all NaNs, but keeps all other data. I'll take a 5% increase in accuracy… Thx, great series!

def preprocess_df(df): # balance data (between currencies, scale data to between -1,0,1)

df = df.drop('future', 1)

for col in df.columns:

if col != 'target':

df[col] = df[col].pct_change() # this normalises all columns except target

# df.dropna(inplace=True) # PROBLEM-as pct_change will generate at least 1 nan

print(df[col]) # verify each column

df[col] = preprocessing.scale(df[col].values) # scales data between -1,0,1 range

df.dropna(inplace=True)

where is the order of the videos?????????????

You have lookahead bias, when you scale with regards to future prices, you need to scale it within sequences. Your accuracy will drop as a result of it

Great video, many thanks! Can the target be a list of future 3 minutes of price, i.e. I want to predict the whole future 3 minutes of price instead just a binary value?

Why wouldn't you just chop the full arrays by array length *0.95 instead of doing the whole timestamp thing?

I am trying to implement RNN for batch control. I have 1 input and 4 outputs.

All of them are in 50 batches each of length 600 each.

So input A (temperature) has data of 50 batches with 600 values for each of the 50 batches.

The outputs B,C,D also have the same dimension.

Could you tell me how do I go about preparing appropriate shape/structure of this dataset to implement RNN?

i am not sure about the deque. for example, does the first degive you the first 60 numbers starting from the first number and the second 60 numbers starting from the 61st number or does the first degive you the first 60 numbers starting from the first number and the second 60 numbers starting from the secnd number

Why to use preprocessing.scale? Can I use MinMaxScaler instead?

Code doesn't work, you can't under preprocessing_df() you have to return something, you cannot leave it blank.

Hi Sentdex, another great video. Im trying to apply this to stock data but i keep getting ValueError: Input contains infinity or a value too large for dtype('float64').

when trying to preprocess my df. Any pointers?

i might be wrong but isnt the "target' col just based on LTC values? dont we need to make 3 more "Targets" for all others?