Building Database – Creating a Chatbot with Deep Learning, Python, and TensorFlow p.5
[ad_1]
Welcome to part 5 of the chatbot with Python and TensorFlow tutorial series. Leading up to this tutorial, we’ve been working with our data and preparing the logic for how we want to insert it, now we’re ready to start inserting.
Text tutorials and sample code: https://pythonprogramming.net/
https://pythonprogramming.net/support-donate/
https://twitter.com/sentdex
https://www.facebook.com/pythonprogramming.net/
https://www.twitch.tv/sentdex
https://plus.google.com/+sentdex
Source
[ad_2]
If you have the parent_id, then why is it that you can't find out the parent_data? I didn't understand this. Can someone explain please?
I'm following along exactly as you're showing (i've even copy and pasted just to prove it wasn't just my code {because I did my own thing while following along} that was having the problem) but I keep getting… " Invalid control character at: line 1 column 497 (char 496)
"…no matter what I do. I've been trying to figure it out for hours
getting an OSError: [Errno 22] Invalid argument: 'C:\Chatboxdatareddit_data/2015/RC_2015-07' helllllllppp
My total rows read is currently going beyond 5 million, how the hell did his end at 100,000 while he is using a much bigger RC file???
If anyone's using 2019-12 data,
parent_id = row['parent_id'][3:]
comment_id = row['id'] #Not sure if this is right!
Check sentdex's pythonprogrammingtutorial website for his complete code. He did not write everything the same way in his video.
parent_id = row['parent_id'].split('_')[1]
comment_id = row['id']
this is my part of the code, it throw paired_rows : 0.
How to fix this?
I use RC_2019-12 .
i am getting "Error: duplicate key value violates unique constraint "parent_reply_pkey", why are u inserting duplicate parent_id?
the 2015-01 db has not matching parent and comment id's.
I got paired_rows: 0, dont know why
i have inserted around 30 million rows to the db, shall i continue or is it enough??
16:52 lmao xD xD
hey, if you are using 2018 and more recent ones you have to replace the comment id to:
coment_id = 't1_' + row['id']
for 2011-08 data what will be comment_id?
I get all comments and parent false! What should I do?
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) any help??
my problem is the weirdest, i have compiled everything correctly. But my parent reply seems to be empty. It took me 19hrs to make the database. Plz help me
"Getting through all of the data will depend on the size of the starting file. Inserting will slow down the larger the database gets. To do the entire May 2015 file, it will probably take 5-10 hrs."
Should I wait until all of the entire May 2015 is loaded or something?
and im getting paired rows = 0
sir why is that my data not getting loaded in the db file… i have no errors too!!
pls do reply
why am I getting ''s0 insertion', "'ascii' codec can't encode character u'\xe9' in position 53: ordinal not in range(128)"' when I run this? :/
The code returns an error saying charmap codec can't decode byte 0x90…
After putting encoding = utf8 inside file open
Again it throws error utf8 can't decode byte 0xb5
Then changing encoding to latin1
It throws error – json. decoder. JSONDecodeError
im getting 0 paired rows with 2015-05. Dont understand why though
hey i'm still trying to get the hang of all of this, how would to put all the file to make one database?
like timeframe = '2007-10' '2007-11' and so on most like a list that splits and read all files at the same time and builds a database in one database file, yes it might take a long time for it to complete
but i'm a lit of confused about how to really put It in coding
Traceback (most recent call last):
File "C:/Users/Maniech/Desktop/chatbot/b.py", line 93, in <module>
with open("C:/Users/Maniech/Desktop/chatbot/{}/RC_{}".format(timeframe.split('-')[0],timeframe), buffering=1000) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/Maniech/Desktop/chatbot/2015/RC_2015-01'
enyone can u tell me what is the error
Everytime i run the code i get this error:
Traceback (most recent call last):
File "C:/Desktop/coding/AI/chatbot/database.py", line 77, in <module>
row=json.loads(row)
File "C:UserssebasAnaconda3envstensorlibjson__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "C:UserssebasAnaconda3envstensorlibjsondecoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:UserssebasAnaconda3envstensorlibjsondecoder.py", line 355, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 374 (char 373)
but before that it prints out a large amount of messages like "find_parent no such column: t1_cnapz1h
"
Hi, a bit of help here please.
It's been 12 hours and my process is not yet completed.
I am using 2015-01 data as in the tutorial. The .db file has reached over 6.5gb in size. Is it supposed to talk this long?
For those who are getting paired == 0 you can try
parent_id = row['parent_id'].split("_")[1]
comment_id = "t1_" + row['id']
worked for 2019_07 Database
anyone can u help me i am beginner,. In this tutorial sir do not write the sql quires, sir just copy paste the sql quires and he said this sql quires given link in the description but i don,t see this quires please help me
Harrison! Too many nested ifs! There's an operator called and in python. Use it my son!
has anybody attempted this on kaggle? there is data set of one month of may there but somehow i cant open that dataset. there is some zip error.
I have a 5gb compressed file of jan 2015. could anyone tell me what will be the exact size of the unzipped file. I am having space issues.
Is it possible to make this code run faster on 8x core 16 threads processor? I mean like make it use all the threads possible?
I'm getting parent column completely as NULL. I have copied the same code from his website and same file too. But this issue is percieving.
I'm getting constant rows of,
"Expecting value: line 1 column 1 (char 0)"
what do I do?
Hi @sentdex could you please help me out I am getting error : comment_id = row['data']
keyError: 'data'
Paired rows = 0 for RC_2015-01 data..Tried all possible options """parent_id = row['parent_id'].split('_')[1]
comment_id = row['name'].split('_')[1]
comment_id = row['id']""" no luck …Anyone tried this data ???
The same mistake I do my whole life: Following outdated descriptions
Edit: It works! Shame on me! Thx Sentdex and Daniel Kukila | if that was his name👍🏾🤪🤙🏾
I don't know why but the database isn't being created for me. Its working fine and its reading it too but no database… Can someone please help me out???
Can anyone share the code? It seems that the code has been removed from the website.
!! EDIT !!
For anyone having the same issue as I have, I fixed it by using the code from the website and changed the comment_id = row['name'] to comment_id = row[id']. Change parent_id to parent_id = row['parent_id'].split('_')[1]
Also don't forget to change the timeframe at the top. I've changed nothing else, and I was able to generate it with the error message I got in the replace_comment function where most of the comments tell you to change the ? to {}. You don't need to do this for it to function! I hope this helps someone in the future
For some reason I'm not getting any paired rows, the total rows reads ups just fine. Anyone knows what might cause this? Using this on dataset RC_2019-02
the database is growing but my command prompt is showing this
('s-PARENT insertion', "'ascii' codec can't encode character u'\u2019' in position 10: ordinal not in range(128)")
('s-NO_PARENT insertion', "'ascii' codec can't encode character u'\u2019' in position 100: ordinal not in range(128)")
('s-NO_PARENT insertion', "'ascii' codec can't encode character u'\u2019' in position 112: ordinal not in range(128)")
Total Rows Read: 900000, Paired Rows: 35575, Time: 2019-05-17 18:47:16.282000
('s-NO_PARENT insertion', "'ascii' codec can't encode character u'\xe9' in position 64: ordinal not in range(128)")
('s-NO_PARENT insertion', "'ascii' codec can't encode character u'\xe9' in po
is this a problem?…..please help
Unterminated string starting at: line 1 column 274 (char 273)
i got this error in my code. Anyone else?
how to fix it?
Ok, I'm about to crash my pc against the wall…. I'm getting a loop printing "s-NO_PARENT insertion name 'transaction_bldr' is not defined." I'm a beginner and I don't know whats happening I did it step by step but… Now what?
Does the script close or stop when the reddit data is finished being paired or is when it is "cleanin up"
For what it's worth, using global and pass in a run-once import script isn't really a sin … well, either that or I'm seriously doomed.
with open('E:/reddit_data/{}/RC_{}'.format(timeframe.split('-')[0], timeframe), buffering=1000) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'E:/reddit_data/2015/RC_2015-01'