Basics usage

The getting started with cellpy tutorial (opinionated version)

This tutorial will help you getting started with cellpy and tries to give you a step-by-step recipe. The information in this tutorial can also (most likely) be found elsewhere. For the novice users, jump directly to chapter 1.2.

How to install cellpy - the minimalistic explanation

If you know what you are doing, and only need the most basic features of cellpy, you should be able to get things up and running by issuing a simple

pip install cellpy

It is recommended that you use a Python environment (or conda environment) and give it a easy to remember name e.g. cellpy.

You also need the typical scientific python pack, including numpy, scipy, and pandas. It is recommended that you at least install scipy before you install cellpy (the main benefit being that you can use conda so that you don’t have to hassle with missing C-compilers if you are on an Windows machine).

Install a couple of other dependencies

You should also install some additional dependencies:

pytables is needed for working with the hdf5 files (the cellpy-files):

conda install -c conda-forge pytables

If you would like to use some of the fitting routines in cellpy, you will need to install lmfit:

conda install -c conda-forge lmfit

Another tool that is really handy is Jupyter. And the plotting library bundle holoviz. You might already have them installed. If not, I recommend that you look at their documentation (google it) and install them. You can most likely use the same method as for pytables etc.

Note! In addition to the requirements set in the setup.py file, you will also need a Python ODBC bridge for loading .res-files from Arbin testers. And possible also other ‘too-be-implemented’ file formats. I recommend pyodbc that can be installed from conda forge or using pip.

conda install -c conda-forge pyodbc

For reading .res-files (which actually are in a Microsoft Access format) you also need a driver or similar to help your ODBC bridge accessing it. A small hint for Windows users: if you don’t have one of the most recent Office version, you might not be allowed to install a driver of different bit than your office version is using (the installers can be found here). Also remark that the driver needs to be of the same bit as your Python (so, if you are using 32 bit Python, you will need the 32 bit driver).

For POSIX systems, I have not found any suitable drivers. Instead, cellpy will try to use mdbtoolsto first export the data to temporary csv-files, and then import from those csv-file (using the pandas library). You can install mdbtools using your systems preferred package manager (e.g. apt-get install mdbtools).

The tea spoon explanation

If you are used to installing stuff from the command line (or shell), then things might very well run smoothly. However, a considerable percentage of us don’t feel exceedingly comfortable installing things by writing commands inside a small black window. Let’s face it; we belong to the point-and-click (or double-click) generation, not the write-cryptic-commands generation. So, hopefully without insulting the savvy, here is a “tea-spoon explanation”

Install a scientific stack of python 3.x

If the words “virtual environment” or “miniconda” don’t ring any bells, you should install the Anaconda scientific Python distribution. Go to www.anaconda.com and select the Anaconda distribution (press the Download Now button). And no, don´t select python 2.7. Use at least python 3.6. And select the 64 bit version (if you fail at installing the 64 bit version, then you can try the weaker 32 bit version). Download it and let it install.

Create a virtual environment

This step can be omitted (but its not necessary very smart to do so). Create a virtual conda environment called my_cellpy (the name is not important, but it should be a name you are able to remember).

Open up a command window (you can find a command window on Windows by e.g pressing the Windows button + r and typing cmd.exe), or even better, open up “anaconda prompt”. Then type

conda create -n my_cellpy

Then activate your environment:

conda activate my_cellpy

If you get an error message, then it could be that your Python version is not available for you (maybe you installed as root?). If you were using the command window on windows, try to locate the “anaconda prompt” program and run that instead.

Install cellpy

conda install -c conda-forge cellpy

Note that the bin version matters some times, so try to make a mental note of what you selected (for example, if you plan to use the Microsoft Access odbc driver, and it is 32-bit, you probably should chose to install an 32-bit python version (see next sub-chapter)).

If you don’t have the newest office suit, you might need to install the Microsoft Access odbc driver which can be downloaded from this page

Check your installation

The easiest way to check if cellpy has been installed, is to issue the command for printing the version number to the screen

cellpy info --version

If the program prints the expected version number, you probably succeeded. If it crashes, then you will have to retrace your steps, redo stuff and hope for the best. If it prints an older (lower) version number than you expect, it is a big chance that you have installed it earlier, and what you would like to do is to do an upgrade instead of an install

pip install --upgrade cellpy

It could also be that you want to install a pre-release (a version that is so bleeding edge that it ends with a alpha or beta release identification, e.g. ends with .b2). Then you will need to add the –pre modifier

pip install --pre cellpy

To run a more complete check of your installation, there exist a cellpy sub-command than can be helpful

cellpy info --check

The cellpy command to your rescue

To help installing and controlling your cellpy installation, a CLI is provided with four main commands, including info for getting information about your installation, and setup for helping you to set up your installation and writing a configuration file.

To get more information, you can issue

cellpy --help

This will out-put some (hopefully) helpful text

Usage: cellpy [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
   edit   Edit your cellpy config file.
   info   This will give you some valuable information about your cellpy.
   new    Will in the future be used for setting up a batch experiment.
   pull   Download examples or tests from the big internet.
   run    Run a cellpy process.
   serve  Start a Jupyter server
   setup  This will help you to setup cellpy.

You can get information about the sub-commands by issuing –-help after them also. For example, issuing

cellpy info --help

gives

Usage: cellpy info [OPTIONS]

Options:
  -v, --version    Print version information.
  -l, --configloc  Print full path to the config file.
  -p, --params     Dump all parameters to screen.
  -c, --check      Do a sanity check to see if things works as they should.
  --help           Show this message and exit.

Using the cellpy command for your first time setup

After you have installed cellpy it is highly recommended that you create an appropriate configuration file and create folders for raw data, cellpy-files, logs, databases and output data (and inform cellpy about it)

cellpy setup -i

The -i option makes sure that the setup is done interactively. The program will ask you about where specific folders are, e.g. where you would like to put your outputs and where your cell data files are located. If the folders don’t exist, cellpy will try to create them.a

If you want to specify a root folder different from the default (your HOME folder), you can use the -d option e.g. cellpy setup -i -d /Users/kingkong/cellpydir

Note

If you don’t choose the -i option, you can always edit your configurations directly in the cellpy configuration file (that should be located inside your home directory on posix or Documents folder on windows).

When you have answered all your questions, a configuration file will be made and saved to your home directory. You can always issue cellpy info -l to find out where your configuration file is located (it’s written in YAML format and it should be relatively easy to edit it in a text editor)

Running your first script

As with most software, you are encouraged to play a little with it. I hope there are some useful stuff in the code repository (for example in the examples folder).

Note

The cellpy pull command can assist in downloading both examples and tests.

Let’s start by a trying to import cellpy in an interactive Python session. If you have an icon to press to start up the Python in interactive mode, do that (it could also be for example an ipython console or a Jupyter Notebook). You can also start an interactive Python session if you are in your terminal window of command window by just writing python and pressing enter.

Once inside Python, try issuing import cellpy. Hopefully you should not see any error-messages.

Python 3.6.7 |Anaconda, Inc.| (default, Oct 23 2018, 14:01:38)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cellpy
>>>

Nothing bad happened this time. If you got an error message, try to interpret it and check if you have skipped any steps in this tutorial. Maybe you are missing the box package? if so, go out of the Python interpreter if you started it in your command window, or open another command window and write

pip install python-box

and try again.

Now let’s try to be a bit more ambitious. Start up python again if you not still running it and try this:

>>> from cellpy import prmreader
>>> prmreader.info()

The prmreader.info() command should print out information about your cellpy settings. For example where you selected to look for your input raw files (prms.Paths.rawdatadir).

Try scrolling to find your own prms.Paths.rawdatadir. Does it look right? These settings can be changed by either re-running the cellpy setup -i command (not in Python, but in the command window / terminal window). You probably need to use the --reset flag this time since it is not your first time running it).

What next?

For example: If you want to use the highly popular cellpy.utils.batch utility, you need to make (or copy from a friend) the “database” (an excel-file with appropriate headers in the first row) and make sure that all the paths are set up correctly in you cellpy configuration file.

Or, for example: If you would like to do some interactive plotting of your data, try to install holoviz and use Jupyter Lab to make some fancy plots and dash-boards.

And why not: make a script that goes through all your thousands of measured cells, extracts the life-time (e.g. number of cycles until the capacity has dropped below 80% of the average of the three first cycles), and plot this versus time the cell was put. And maybe color the data-points based on who was doing the experiment?

Configuring cellpy

How the configuration parameters are set and read

When cellpy is imported, it sets a default set of parameters. Then it tries to read the parameters from your .conf-file (located in your user directory). If it is successful, the parameters set in your .conf-file will over-ride the default ones.

The parameters are stored in the module cellpy.parameters.prms.

If you during your script (or in your jupyter notebook) would like to change some of the settings (e.g. if you want to use the cycle_mode option “cathode” instead of the default “anode”), then import the prms class and set new values:

from cellpy import parameters.prms

# Changing cycle_mode to cathode
prms.Reader.cycle_mode = 'cathode'

# Changing delimiter to  ',' (used when saving .csv files)
prms.Reader.sep = ','

# Changing the default folder for processed (output) data
prms.Paths.outdatadir = 'experiment01/processed_data'

The configuration file

cellpy tries to read your .conf-file when imported the first time, and looks in your user directory on posix or in the documents folder on windows (e.g. C:\Users\USERNAME\Documents on not-too-old versions of windows) after files named .cellpy_prms_SOMENAME.conf.

If you have run cellpy setup in the cmd window or in the shell, the configuration file will be placed in the appropriate place. It will have the name .cellpy_prms_USERNAME.conf (where USERNAME is your username).

The configuration file is a YAML-file and it is reasonably easy to read and edit (but remember that YAML is rather strict with regards to spaces and indentations).

As an example, here are the first lines from one of the authors’ configuration file:

---
Paths:
  outdatadir: C:\scripts\processing_cellpy\out
  rawdatadir: I:\Org\MPT-BAT-LAB\Arbin-data
  cellpydatadir: C:\scripts\processing_cellpy\cellpyfiles
  db_path: C:\scripts\processing_cellpy\db
  filelogdir: C:\scripts\processing_cellpy\logs
  examplesdir: C:\scripts\processing_cellpy\examples
  notebookdir: C:\scripts\processing_cellpy\notebooks
  batchfiledir: C:\scripts\processing_cellpy\batchfiles
  db_filename: 2020_Cell_Analysis_db_001.xlsx

FileNames:
  file_name_format: YYYYMMDD_[NAME]EEE_CC_TT_RR

The first part contains definitions of the different paths, files and file-patterns that cellpy will use. This is probably the place where you most likely will have to do some edits sometime.

Next comes definitions needed when using a db.

# settings related to the db used in the batch routine
Db:
  db_type: simple_excel_reader
  db_table_name: db_table
  db_header_row: 0
  db_unit_row: 1
  db_data_start_row: 2
  db_search_start_row: 2
  db_search_end_row: -1

# definitions of headers for the simple_excel_reader
DbCols:
  id:
  - id
  - int
  exists:
  - exists
  - bol
  batch:
  - batch
  - str
  sub_batch_01:
  - b01
  - str
  .
  .

Its rather long (since it needs to define the column names used in the db excel sheet). After this, the settings the datasets and the cellreader comes, as well as for the different instruments. You will also find the settings for the batch utility at the bottom.

# settings related to your data
DataSet:
  nom_cap: 3579

# settings related to the reader
Reader:
  diagnostics: false
  filestatuschecker: size
  force_step_table_creation: true
  force_all: false
  sep: ;
  cycle_mode: anode
  sorted_data: true
  load_only_summary: false
  select_minimal: false
  limit_loaded_cycles:
  ensure_step_table: false
  daniel_number: 5
  voltage_interpolation_step: 0.01
  time_interpolation_step: 10.0
  capacity_interpolation_step: 2.0
  use_cellpy_stat_file: false
  raw_datadir:
  cellpy_datadir:
  auto_dirs: true
  chunk_size:
  last_chunk:
  max_chunks:
  max_res_filesize: 400000000

# settings related to the instrument loader
# (each instrument can have its own set of settings)
Instruments:
  tester: arbin
  custom_instrument_definitions_file:
  Arbin:
    chunk_size:
    detect_subprocess_need: false
    max_chunks:
    max_res_filesize: 400000000
    odbc_driver:
    office_version: 64bit
    sub_process_path:
    use_subprocess: false

# settings related to running the batch procedure
Batch:
  fig_extension: png
  backend: bokeh
  notebook: true
  dpi: 300
  markersize: 4
  symbol_label: simple
  color_style_label: seaborn-deep
  figure_type: unlimited
  summary_plot_width: 900
  summary_plot_height: 800
  summary_plot_height_fractions:
  - 0.2
  - 0.5
  - 0.3
...

As you can see, the author of this particular file most likely works with silicon as anode material for lithium ion batteries (the nom_cap is set to 3579 mAh/g, i.e. the theoretical gravimetric lithium capacity for silicon at normal temperatures). And, he or she is using windows.

By the way, if you are wondering what the ‘.’ means… it means nothing - it was just something I added in this tutorial text to indicate that there are more stuff in the actual file than what is shown here.

Interacting with your data

Read cell data

We assume that we have cycled a cell and that we have two files with results (we had to stop the experiment and re-start for some reason). The files are in the .res format (Arbin).

First, import modules, including the cellreader-object from cellpy:

import os
from cellpy import cellreader

Then define some settings and variables and create the CellpyData-object:

raw_data_dir = r"C:\raw_data"
out_data_dir = r"C:\processed_data"
cellpy_data_dir = r"C:\CellpyData"
cycle_mode = "anode" # default is usually "anode", but...
# These can also be set in the configuration file

electrode_mass = 0.658 # active mass of electrode in mg

# list of files to read (Arbin .res type):
raw_file = ["20170101_ife01_cc_01.res", "20170101_ife01_cc_02.res"]
# the second file is a 'continuation' of the first file...

# list consisting of file names with full path
raw_files = [os.path.join(raw_data_dir, f) for f in raw_file]

# creating the CellpyData object and sets the cycle mode:
cell_data = cellreader.CellpyData()
cell_data.cycle_mode = cycle_mode

Now we will read the files, merge them, and create a summary:

# if the list of files are in a list they are automatically merged:
cell_data.from_raw([raw_files])
cell_data.set_mass(electrode_mass)
cell_data.make_summary()
# Note: make_summary will automatically run the
# make_step_table function if it does not exist.

And save it:

# defining a name for the cellpy_file (hdf5-format)
cellpy_file = os.path.join(cellpy_data_dir, "20170101_ife01_cc2.h5")
cell_data.save(cellpy_file)

For convenience, cellpy also has a method that simplifies this process a little bit. Using the loadcell method, you can specify both the raw file name(s) and the cellpy file name, and cellpy will check if the raw file(s) is/are updated since the last time you saved the cellpy file - if not, then it will load the cellpy file instead (this is usually much faster than loading the raw file(s)). You can also input the masses and enforce that it creates a summary automatically.

cell_data.loadcell(raw_files=[raw_files], cellpy_file=cellpy_file,
                       mass=[electrode_mass], summary_on_raw=True,
                       force_raw=False)

if not cell_data.check():
    print("Could not load the data")

Another method has recently appeared in the cellpy universe: the cellpy.get method.

cell_data = cellpy.get(raw_file, mass=0.23)

Extract current-voltage graphs

If you have loaded your data into a CellpyData-object, let’s now consider how to extract current-voltage graphs from your data. We assume that the name of your CellpyData-object is cell_data:

cycle_number = 5
charge_capacity, charge_voltage = cell_data.get_ccap(cycle_number)
discharge_capacity, discharge_voltage = cell_data.get_dcap(cycle_number)

You can also get the capacity-voltage curves with both charge and discharge:

capacity, charge_voltage = cell_data.get_cap(cycle_number)
# the second capacity (charge (delithiation) for typical anode half-cell experiments)
# will be given "in reverse".

The CellpyData object has several get-methods, including getting current, timestamps, etc.

Extract summaries of runs

Summaries of runs includes data pr. cycle for your data set. Examples of summary data is charge- and discharge-values, coulombic efficiencies and internal resistances. These are calculated by the make_summary method.

Remark that note all the possible summary statistics are calculated as default. This means that you might have to re-run the make_summary method with appropriate parameters as input (e.g. normalization_cycle, to give the appropriate cycle numbers to use for finding nominal capacity).

Another method is responsible for investigating the individual steps in the data (make_step_table). It is typically run automatically before creating the summaries (since the summary creation depends on the step_table). This table is interesting in itself since it contains delta, minimum, maximum and average values for the measured values pr. step. This is used to find out what type of step it is, e.g. a charge-step or maybe a ocv-step. It is possible to provide information to this function if you already knows what kind of step each step is. This saves Cellpy for a lot of work.

Remark that the default is to calculate values for each unique (step-number - cycle-number) pair. For some experiments, a step can be repeated many times pr. cycle. And if you need for example average values of the voltage for each step (for example if you are doing GITT experiments), you would need to tell make_step_table that it should calculate for all the steps (all_steps=True).

Create dQ/dV plots

The methods for creating incremental capacity curves is located in the cellpy.utils.ica module.

Save / export data

Saving data to cellpy format is done by the CellpyData.save method. To export data to csv format, CellpyData has a method called to_csv.

# export data to csv
out_data_directory = r"C:\processed_data\csv"
# this exports the summary data to a .csv file:
cell_data.to_csv(out_data_directory, sep=";", cycles=False, raw=False)
# export also the current voltage cycles by setting cycles=True
# export also the raw data by setting raw=True

Working with the pandas.DataFrame objects directly

Warning

The package authors are seriously considering re-naming several of the classes and DataFrames. The methodology presented below will be the same in spirit, but the actual names will change. Soon. Very soon.

The CellpyData object stores the data in several pandas.DataFrame objects. The easies way to get to the DataFrames is by the following procedure:

# Assumed name of the CellpyData object: cellpy_data

# get the 'test':
c = cell_data.cell
# cellpy_test is now a cellpy Cell object (cellpy.readers.cellreader.Cell)

# pandas.DataFrame with data vs cycle number (e.g. coulombic efficiency):
summary_data = c.summary

# pandas.DataFrame with the raw data:
raw_data = c.raw

# pandas.DataFrame with statistics on each step and info about step type:
step_info = c.steps

You can then manipulate your data with the standard pandas.DataFrame methods (and pandas methods in general).

Note

At the moment, CellpyData objects can store several sets of test-data (several ‘tests’). They are stored in a list. It is not recommended to utilise this ‘possible to store multiple tests’ feature as it might be removed very soon (have not decided upon that yet).

Happy pandas-ing!

Data mining / using a database

One important motivation for developing the cellpy project is to facilitate handling many cell testing experiments within a reasonable time and with a “tunable” degree of automation. It is therefore convenient to be able to couple both the meta-data (look-up) to some kind of data-base, as well as saving the extracted key parameters to either the same or another database (where I recommend the latter). The database(s) will be a valuable asset for further data analyses (either using statistical methods, e.g. Bayesian modelling, or as input to machine learning algorithms, for example deep learning using cnn).

TODO.

TODO.

TODO.

TODO.

Using some of the cellpy special utilities