With a number of deployed IoT devices and amount of collected data on the constant rise, so are increasing database solutions, analytics, monitoring and visualization tools to store and make sense of the overwhelming amount of new data. And rightly so – often the investment in new smart technology and its integration does not deliver to customers expectation, mainly due to overwhelming amount of collected data and no meaningful result. It is safe to say that not many smart devices, even if working properly, actually empower users with meaningful data that are easy to understand and, ultimately, monetize.
Since we specialize in the development of specialized IoT devices, we aim to utilize data analysis and visualization tools, during the development process for our internal purposes. This is conducted separately from the client implementation of the IoT management that is outside our scope.
We find that with combining two open-source tools, InfluxDB – time-series database and Grafana – analytics, monitoring and visualization tool, one can achieve effective and very adaptive data visualization, log monitoring and alerting form your devices and sensors. While multiple plug-ins allow for endless visualization and data analysis possibilities, we still see a high demand for customization and optimization of these steps so the outcomes can easily be fed-back to the development process.
Examples of effective debugging and testing
Implementation
For that purpose, InfluxDB offers libraries for multiple languages support, including influxdb-python library, that is hosted by InfluxDB’s GitHub account. Combined with other data-analysis packets Python has to offer, such as PANDAS for large data structures handling, NumPy for numerical features and plotly for fast user-interactive visualization, one can easily achieve custom data processing that is needed for a certain application. The most important reason behind choosing Python-based data analysis during the development process is high-level library support, the flexibility of fast implementation and major development time saving over other language options.
To start working with InfluxDB data in python environment, install python package and import it into your work file. Next, we want to create a new instance of the InluxDBClient
from influxdb import InfluxDBClient, DataFrameClient
# Server data
myhost = 'myhost_name'
myuser = 'myuser_name'
mypass = 'mypass_name'
mydatabase = 'mydatabase_name'
client = InfluxDBClient(host=myhost, port=8086, username=myuser, password=mypass, database=mydatabase)
If unsure of the desired database to use, parameter default database=None
can be left out. To see more parameter options on all functions see documentation. To explore available database options use
client.get_list_database()
which returns database dictionary. If you want to switch between databases proceed with
client.switch_database(mydatabase)
Once a new client is created, data can be then retrieved using query forms with SELECT
, FROM
and WHERE
statements and complementary clauses. If we have time-series database, with database_name
, which includes measurements defined by measurement_name
with different fields, labelled by field_keys
we can retrieve data using SELECT
statement, always specifying complementary clause
SELECT <field_key>
, – specify single or multiple keys
SELECT *
– select all
with FROM
statement we specify the measurements of interest
FROM <measurement_name>
, – specify single or multiple measurements to select
FROM <database_name>.<measurement_name>
– specify single or multiple measurements from specific database
FROM *
– from all
To further filter the selected data WHERE
statement can be utilized to add conditional expression with AND
\ OR
. Conditional expression supports standard logical operators: =, !=, <, <=, >=, >
. Final query form should follow
SELECT select_clause FROM from_clause WHERE <conditional_expression> AND\OR <conditional_expression> ...
WHERE
statement can be especially useful to limit the time frame of selected data. Keep in mind that the whole query statement is passed to the InfluxDB query()
function as a string. For example:
results = client.query('SELECT * FROM datapackets WHERE time > 2020-02-07T15:00:00 AND time < 2020-02-14T00:00:00')
where the start and end time are provided in the form of timestamps in the standard UTC form '2020-02-07T15:00:00'
. Since WHERE statements can easily get complicated, employing multiple conditions, bind_parms
parameter, in a form of a dictionary, can be specified in python query. Above query example can be rewritten in the form:
date_parms = {'start_time': '2020-02-07T15:00:00', 'end_time': '2020-02-14T00:00:00'}
results = client.query('SELECT * FROM datapackets WHERE time > $start_time AND time < $end_time", bind_params=date_parms)
Such an approach is very useful in an exploratory analysis, as it allows for fast and, even more importantly, transparent change in query conditions. A useful feature here is the epoch parameter, particularly in applications where precise measurement timestamp is important. This can be set to 'h'
, 'm'
, 's'
, 'ms'
, 'u'
, or 'ns'
precision. To find more about query constructions, look at the InfluxQL data exploration.
Once raw data results are obtained, you can explore it in the raw JSON format
results.raw
or parse it to a “python-friendlier” version by converting it to points, with the option of selecting a specific key, that can be iterated and used in further analysis.
points = results.get_points()
for point in points:
print('Time: ', point['time'])
If you are dealing with a large dataset, which is often the case with IoT applications, and fast visualization and exploratory analysis are in place, InfluxDB offers another option, where direct data integration with Python PANDAS library is possible. Instead of defining InfluxDBClient you want to create an instance of DataFrameClient client, following the same principles as before.
from influxdb import InfluxDBClient, DataFrameClient
# Server data
myhost = 'myhost_name'
myuser = 'myuser_name'
mypass = 'mypass_name'
mydatabase = 'mydatabase_name'
client_df = DataFrameClient(host=myhost, port=8086, username=myuser, password=mypass, database=mydatabase)
Data can be retrieved using the same query form and functions as in the previous case with the difference that obtained data is in the form of PANDAS DataFrame
, convenient for direct analysis and visualization.
results_df = client_df.query('SELECT * FROM datapackets WHERE time > 2020-02-07T15:00:00 AND time < 2020-02-14T00:00:00')
From there fast data manipulation and analysis is possible due to multiple integrated DataFrame functionality, including handling missing data, straightforward visualization, correlation calculation, statistical functionality and much more.
About the author

Eva Černčič
Translates numbers into plain English. She manipulates and analyses collected data sets from prototyped systems to evaluate their performance and develop market-oriented solutions.
Partnering with the software development team, she aids in algorithm design and implementation.