How do you manage large size datasets of text for preprocessing? Submitted by [deleted] t3_10j6jpl on January 23, 2023 at 7:12 AM in deeplearning [deleted] 8 comments 1
[deleted] OP t1_j5ivup4 wrote on January 23, 2023 at 8:53 AM #1,448,414 Replying to timelyparadox (#1,448,295) No, I’m not. Can you share any reference material 1
timelyparadox t1_j5iw0j4 wrote on January 23, 2023 at 8:55 AM #1,448,430 Replying to [deleted] (#1,448,414) Medium article: https://medium.com/analytics-vidhya/how-to-pre-process-large-datasets-for-machine-learning-using-spark-19500155b521 Not perfect example but might be helpfull to lead into right direction. 1
developers_hutt t1_j5kwuq5 wrote on January 23, 2023 at 7:06 PM #1,455,334 Use your PC You can use libraries like RAY or PySpark. 1
[deleted] OP t1_j5kwzqk wrote on January 23, 2023 at 7:07 PM #1,455,350 Replying to developers_hutt (#1,455,334) I think I just need to start working with spark 1
[deleted] OP t1_j5kxa5u wrote on January 23, 2023 at 7:09 PM #1,455,382 Replying to [deleted] (#1,455,375) Yes, python 2
developers_hutt t1_j5kxaun wrote on January 23, 2023 at 7:09 PM #1,455,386 Replying to [deleted] (#1,455,350) Are you using Python? Then Pyspark will be easy for you. 1
timelyparadox t1_j5iu8hi wrote
Are you using Spark?