idq.io¶
The io.py module houses our classes that inferface with data discovery.
Specifically, we define classes here that abstract various types of data discovery behind a single API with with other objects will interact.
This includes different types of file-system queries and I/O (see idq.io.KWMClassifierData
and idq.io.KWSClassifierData
) as well as more generalized queires to databases (see idq.io.DruidClassifierData
).
In this way, we can fludily transition between various sources of data without modifying source code down the line.
Class Architecture¶
Each specific data discovery method is encapsulated in a separate class.
Each of these classes inherits from the parent object (idq.io.ClassifierData
), which defines the standardized API.
In particular, care is taken to minimize the memory footprint and query workload here; queries are only performed when necessary and data is cached locally until it is popped.
More detail about the attributes and inheritance of idq.io.ClassifierData
objects can be found in the API Reference.
Data Ranges¶
Furthermore, each idq.io.ClassifierData
object has specific provinence over limited data ranges.
These are set at instantiation time and should not be modified; they control which data is queried when requested.
If you modify these after instantiation, there the class will no longer guarantee complete coverage from all requested channels.
This is because it may see that a channel was already queried and skip the query during the second request; if the time periods have changed since after the first query there could be a mismatch.
Data Structures¶
The actual data structures maintained within idq.io.ClassifierData
is a dictionary with channel names as keys and numpy structured arrays as values.
Because of the rigidity of the structured array format (i.e.: a fixed set of columns), idq.io.ClassifierData
objects also declare a set of columns during instantiation that should be immutable throughout the object’s lifespan.
We note that the objects do not formally make a copy of either the segment lists or the column lists, and therefore these could be accidentally modified via shared references. This is intentional, as we do not want to make an arbitrarily large number of copies of what would essentially be the same specification for many objects. The user is assumed to be responsible enough to handle this properly.