25 Questions to Ask as You Clean Data | by Rose Day | Oct, 2020


The subsequent factor I search for is textual content fields. When I take advantage of textual content fields, I both use them as classes, mentioned subsequent, or as plain textual content that will likely be displayed or used for extra info. But what occurs in case you are utilizing your textual content fields for modeling? You may have to contemplate various kinds of cleansing or pure language processing strategies to work together with your information, such as stemming, lemmatization, or eradicating filler phrases. I discover it more durable to work with textual content fields as there may be variations within the spelling, acronyms, mistypes info, and extra.

11. Are there spelling errors within the column that you simply want to contemplate? 
12. Can a phrase or abbreviation be spelled a number of methods?
13. Could there be a couple of abbreviation for a similar factor?
14. Do you want to extract information from the textual content discipline? If so, will you employ common expressions, stemming, lemmatization, take away filler phrases, white area, and so on.?
15. Do you could have timestamp or different numeric information sort columns that learn as strings however needs to be one other information sort?


Source hyperlink

Write a comment