python pandas dataframe to dictionary
I've a two columns dataframe, and intend to convert it to python dictionary - the first column will be the key and the second will be the value. Thank you in advance.
Dataframe:
id value
0 0 10.2
1 1 5.7
2 2 7.4
See the docs for to_dict
. You can use it like this:
df.set_index('id').to_dict()
And if you have only one column, to avoid the column name is also a level in the dict (actually, in this case you use the Series.to_dict()
):
df.set_index('id')['value'].to_dict()
mydict = dict(zip(df.id, df.value))
If you want a simple way to preserve duplicates, you could use groupby
:
>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value'])
>>> ptest
id value
0 a 1
1 a 2
2 b 3
>>> {k: g["value"].tolist() for k,g in ptest.groupby("id")}
{'a': [1, 2], 'b': [3]}
The answers by joris in this thread and by punchagan in the duplicated thread are very elegant, however they will not give correct results if the column used for the keys contains any duplicated value.
For example:
>>> ptest = p.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value'])
>>> ptest
id value
0 a 1
1 a 2
2 b 3
# note that in both cases the association a->1 is lost:
>>> ptest.set_index('id')['value'].to_dict()
{'a': 2, 'b': 3}
>>> dict(zip(ptest.id, ptest.value))
{'a': 2, 'b': 3}
If you have duplicated entries and do not want to lose them, you can use this ugly but working code:
>>> mydict = {}
>>> for x in range(len(ptest)):
... currentid = ptest.iloc[x,0]
... currentvalue = ptest.iloc[x,1]
... mydict.setdefault(currentid, [])
... mydict[currentid].append(currentvalue)
>>> mydict
{'a': [1, 2], 'b': [3]}
in some versions the code below might not work
mydict = dict(zip(df.id, df.value))
so make it explicit
id_=df.id.values
value=df.value.values
mydict=dict(zip(id_,value))
Note i used id_ because the word id is reserved word
You can use 'dict comprehension'
my_dict = {row[0]: row[1] for row in df.values}
Another (slightly shorter) solution for not losing duplicate entries:
>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id','value'])
>>> ptest
id value
0 a 1
1 a 2
2 b 3
>>> pdict = dict()
>>> for i in ptest['id'].unique().tolist():
... ptest_slice = ptest[ptest['id'] == i]
... pdict[i] = ptest_slice['value'].tolist()
...
>>> pdict
{'b': [3], 'a': [1, 2]}
You need a list as a dictionary value. This code will do the trick.
from collections import defaultdict
mydict = defaultdict(list)
for k, v in zip(df.id.values,df.value.values):
mydict[k].append(v)
Simplest solution:
df.set_index('id').T.to_dict('records')
Example:
df= pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id','value'])
df.set_index('id').T.to_dict('records')
If you have multiple values, like val1, val2, val3,etc and u want them as lists, then use the below code:
df.set_index('id').T.to_dict('list')
참고URL : https://stackoverflow.com/questions/18695605/python-pandas-dataframe-to-dictionary
'Development Tip' 카테고리의 다른 글
I have an error: setOnItemClickListener cannot be used with a spinner, what is wrong? (0) | 2020.09.25 |
---|---|
Select data from date range between two dates (0) | 2020.09.25 |
Error in plot.new() : figure margins too large, Scatter plot (0) | 2020.09.25 |
Android Gradle Could not reserve enough space for object heap (0) | 2020.09.25 |
Rails /lib modules and (0) | 2020.09.25 |