xarray_regex.file_finder¶
Find files using a pre-regex.
Classes
|
Find files using a regular expression. |
-
class
xarray_regex.file_finder.
FileFinder
(root: str, pregex: str, **replacements: str)¶ Bases:
object
Find files using a regular expression.
Provides abilities to ‘fix’ some part of the regular expression, to retrieve values from matches in the expression, and to create an advanced pre-processing function for xarray.open_mfdataset.
- Parameters
root (str) – The root directory of a filetree where all files can be found.
pregex (str) – The pre-regex. A regular expression with added ‘Matchers’. Only the matchers vary from file to file. See documentation for details.
replacements (str, optional) – Matchers to replace by a string: ‘matcher name’ = ‘replacement string’.
-
max_depth_scan
¶ Maximum authorized depth when descending into filetree to scan files.
- Type
int
-
root
¶ The root directory of the finder.
- Type
str
-
pregex
¶ Pre-regex.
- Type
str
-
regex
¶ Regex obtained from the pre-regex.
- Type
str
-
pattern
¶ Compiled pattern obtained from the regex.
- Type
re.pattern
-
matchers
¶ List of matchers for this finder, in order.
- Type
list of Matchers
-
segments
¶ Segments of the pre-regex. Used to replace specific matchers. [‘text before matcher 1’, ‘matcher 1’, ‘text before matcher 2, ‘matcher 2’, …]
- Type
list of str
-
fixed_matchers
¶ Dictionnary of matchers with a set value. ‘matcher index’: ‘replacement string’
- Type
dict
-
files
¶ List of scanned files.
- Type
list of str
-
scanned
¶ If the finder has scanned files.
- Type
bool
-
create_regex
()¶ Create regex from pre-regex.
-
find_files
()¶ Find files to scan.
Uses os.walk. Limit search to max_depth_scan levels of directories deep. Sort files alphabetically.
- Raises
AttributeError – If no regex is set.
IndexError – If no files are found in the filetree.
-
fix_matcher
(key: Union[int, str], value: str)¶ Fix a matcher to a string.
- Parameters
key (int, or str, or tuple of str of lenght 2.) – If int, is matcher index, starts at 0. If str, can be matcher name, or a group and name combination with the syntax ‘group:name’. When using strings, if multiple matchers are found with the same name or group/name combination, all are fixed to the same value.
value (str) – Will replace the match for all files.
- Raises
TypeError – Value must be a string.:
TypeError – key is neither int nor str.:
-
fix_matchers
(fixes: Optional[Dict[Union[int, str], str]] = None)¶ Fix multiple values at once.
- Parameters
fixes (dict) – Dictionnary of matcher key: value. See
fix_matcher()
for details. If None, no matcher will be fixed.
-
get_files
(relative: bool = False, nested: Optional[List[str]] = None) → List[str]¶ Return files that matches the regex.
Lazily scan files: if files were already scanned, just return the stored list of files.
- Parameters
relative (bool) – If True, filenames are returned relative to the finder root directory. If not, filenames are absolute. Defaults to False.
nested (list of str) – If not None, return nested list of filenames with each level corresponding to a group in this argument. Last group in the list is at the innermost level. A level specified as None refer to matchers without a group.
- Raises
KeyError – A level in nested is not in the pre-regex groups.:
-
get_func_process_filename
(func: Callable, relative: bool = True, *args, **kwargs) → Callable¶ Get a function that can preprocess a dataset.
Written to be used as the ‘process’ argument of xarray.open_mfdataset. Allows to use a function with additional arguments, that can retrieve information from the filename.
- Parameters
func (Callable) – Input arguments (xarray.Dataset, filename: str, FileFinder, *args, **kwargs) Should return a Dataset. Filename is retrieved from the dataset encoding attribute.
relative (If True, filename is made relative to finder root.) – This is necessary to match the filename against the finder regex. Defaults to True.
args (optional) – Passed to func when called.
kwargs (optional) – Passed to func when called.
- Returns
Function with the signature of the ‘process’ argument of xarray.open_mfdataset.
- Return type
Callable
Examples
This retrieve the date from the filename, and add a time dimensions to the dataset with the corresponding value. >>> from xarray_regex import library … def process(ds, filename, finder, default_date=None): … matches = finder.get_matches(filename) … date = library.get_date(matches, default_date=default_date) … ds = ds.assign_coords(time=[date]) … return ds … … ds = xr.open_mfdataset(finder.get_files(), … preprocess=finder.get_func_process_filename( … process, default_date={‘hour’: 12}))
-
get_matchers
(key: str) → List[xarray_regex.matcher.Matcher]¶ Return list of matchers corresponding to key.
- Parameters
key (str) – Can be matcher name, or group+name combination with the syntax: ‘group:name’.
- Raises
KeyError – No matcher found.:
-
get_matches
(filename: str, relative: bool = True) → Dict[str, Dict]¶ Get matches for a given filename.
Apply regex to filename and return a dictionary of the results.
- Parameters
filename – Filename to retrieve matches from.
relative – Is true if the filename is relative to the finder root directory. If false, the filename is made relative before being matched. Default to true.
- Returns
- [{‘match’: string matched,
’start’: start index in filename, ‘end’: end index in filename, ‘matcher’: Matcher object}, …]
- Return type
list of dict
- Raises
AttributeError – The regex is empty.:
ValueError – The filename did not match the pattern.:
IndexError – Not as many matches as matchers.:
-
property
n_matchers
¶ Number of matchers in pre-regex.
-
scan_pregex
()¶ Scan pregex for matchers.
Add matchers objects to self. Set segments attribute.
-
set_pregex
(pregex: str, **replacements: str)¶ Set pre-regex.
Apply replacements.
-
update_regex
()¶ Update regex.
Set fixed matchers. Re-compile pattern. Scrap previous scanning.