python - Pickle dump with progress bar -


i've big json object want dump pickle file. there way display progress bar while using pickle.dump?

the way know of define getstate/setstate methods return "sub objects" can refresh gui when pickled/unpickled. example, if object list, use this:

import pickle  class sublist:     on_pickling = none      def __init__(self, sublist):         print('sublist', sublist)         self.data = sublist      def __getstate__(self):         if sublist.on_pickling not none:             print('sublist pickle state fetch: calling sub callback')             sublist.on_pickling()         return self.data      def __setstate__(self, obj):         if sublist.on_pickling not none:             print('sublist pickle state restore: calling sub callback')             sublist.on_pickling()         self.data = obj   class listsubpickler:     def __init__(self, data: list):         self.data = data      def __getstate__(self):         print('creating sublists pickling long list')         num_chunks = 10         span = int(len(self.data) / num_chunks)         sublists = [sublist(self.data[i:(i + span)]) in range(0, len(self.data), span)]         return sublists      def __setstate__(self, subpickles):         self.data = []         print('restoring pickleable(list)')         subpickle in subpickles:             self.data.extend(subpickle.data)         print('final', self.data)   def refresh():     # something: refresh gui (for example, qapp.processevents() qt), show progress, etc     print('refreshed') 

if run following in script,

data = list(range(100))  # large data object list_pickler = listsubpickler(data) sublist.on_pickling = refresh  print('\ndumping pickle of', list_pickler) pickled = pickle.dumps(list_pickler)  print('\nloading pickle') new_list_pickler = pickle.loads(pickled) assert new_list_pickler.data == data  print('\nloading pickle, without on_pickling') sublist.on_pickling = none new_list_pickler = pickle.loads(pickled) assert new_list_pickler.data == data 

you see refresh callback gets called 10 times. if have 2gb list dump, , takes 1 minute dump, you'd want 60*10 = 600 gui refreshes, set number of chunks 600.

code modified dict, numpy array, etc.


Comments