cheapr_table
to create fast frequency tables.Fixed additional issues flagged by R checks.
Capture of ...
in case
and val_match
has been improved.
val_match
safety checks are slightly improved.
get_breaks
has been re-written in C and the algorithm has been
improved simultaneously to reduce floating point error.
The result of get_breaks
now matches the breaks generated by cut
for vectors with zero-range.
val_rm
and na_rm
have been sped-up.
New functions cheapr_if_else
, case
and val_match
to make
vectorised if-else operations much cheaper.
New function with_local_seed
to help run reproducible expressions with a
local seed to remove the need for setting a seed globally, especially helpful
for small expressions and comparisons without affecting the global RNG state.
Various internal bug fixes related to the scalar functions.
Fixed a regression where NULL
elements were not being correctly dropped
in new_df()
.
New factor functions levels_rename
, levels_add
, levels_rm
,
levels_lump
and levels_count
.
overview
cols are abbreviated to save visual space and histograms are
printed by default.
levels_drop
was not working correctly and has been fixed.
New functions cheapr_var
and cheapr_rev
.
get_breaks
has been improved and a few small bugs have been fixed.
as_discrete
gains a new argument inf_label
.
Safety improvements to as_discrete
.
Removed internal C++ functions as package installation was failing for some machines.
New scalar functions have been added and some renamed. Most are now
prefixed with 'val_' or 'na_' in the case of NA
specific scalar functions.
New cheap functions for binning continuous data into discrete bins.
These include get_breaks
, as_discrete
and bin
.
get_breaks
finds 'pretty' break-points of numeric data very quickly.
as_discrete
converts numeric data to discrete categories as a factor.
bin
is a low-level function for binning numeric data into the correct
bins. It can also efficiently return the corresponding break values
instead of the break indices through codes = FALSE
.
New function na_insert
to randomly insert NA
values into a vector.
New function vector_length
as a hybrid between length
and nrow
.
gcd
and scm
now make use of 64-bit integers internally and can accept
'integer64' objects. scm
used to return NA
once the 32-bit integer limit
of 2^31 - 1 was reached if the input was an integer vector.
This has now been increased to the 64-bit integer limit,
which is approximately 9.223372e+18 and errors if that limit is exceeded.
'integer64' objects are now lightly supported. They are not supported in any sequence functions or in the 'set_math' functions.
New functions new_df
and named_list
.
All factor levels utilities now begin with the prefix 'levels_'.
New cheap factor functions as_factor
, levels_add_na
, levels_drop_na
,
levels_drop
and levels_reorder
.
lag_
now uses memmove
where possible.
Fixed an issue where lag_(x)
was materialising x twice if x was an ALTREP
integer sequence.
Range based subsetting, e.g. sset(x, 1:10)
should now be faster as memmove
is used where possible.
New functions val_count
and which_val
for common scalar operations.
Some functions gain a 'names' argument.
Replaced calls to STRING_PTR
with STRING_PTR_RO
to satisfy R package check results.
lag_
should now be somewhat faster.
Fixed a small bug in lag2_
that would produce incorrect results when supplying a vector of lags and an order vector.
A signed integer overflow bug in lag2_
has been fixed. This occurred when supplying NA
lags.
lag2_
no longer fills the names of named vectors when the fill
value is supplied.
New function recycle
to help recycle R objects to a common size.
The set
functions that update by reference are now ALTREP aware and
take a copy when the input is an ALTREP object.
New function lag2_
as a generalised solution for complex lags. It supports
dynamic lag vectors, lags using an order vector, and custom run lengths.
It doesn't support updating by reference or long vectors.
New function lag_
for very fast lags and leads on vectors and data frames.
It includes a set
argument allowing users to create a lagged vector
by reference without copies.
set_round
has been amended to improve floating point accuracy.
New 'set' Math operations inspired by 'data.table' and 'collapse' that transform data by reference.
Fixed an inconsistency of when sequence_()
would error when supplied with
a zero-length size argument.
Fixed a protection stack imbalance in count_val(x)
when x
is NULL
.
sset
has been optimised for wide data frames with many variables.
It is also faster when applied to a data frame with dates, date-times and factors.
In sset
, when i
is a logical vector it must match the length of x.
sset
can now handle 'ALTREP' compact real sequences as well.
sset
is now parallelised when i
is an 'ALTREP'
compact integer sequence, e.g. sset(x, 1:10)
.
sset
now has an internal range-based subset method for
'ALTREP' integer sequences made using :
for example.
New function count_val
as a cheaper alternative to e.g. sum(x == val)
.
Negative indexing in sset
has been improved.
It is also now partially parallelised.
Setting recursive
to false should now be faster.
'overview' objects gain an additional list element "print_digits" which is passed to the print method in order to correctly round the summary statistics without affecting the 'cheapr.digits' option globally.
factor_
and na_rm
now handle data frames.
A bug in sset.data.table
that caused further set calculations to produce
warnings has been fixed.
is_na.POSIXlt
and sset.POSIXlt
have been rewritten to handle unbalanced
'POSIXlt' objects.
New function sset
to consistently subset data frame rows and vectors in
general.
overview
now always returns an object of class "overview". It also returns
the number of observations instead of rows so that it makes sense
for vector summaries as well as data frame summaries.
sequence_
has been optimised and rewritten in C++. It now only checks for
integer overflow when both from
and by
are integer vectors.
The internal function list_as_df
has been rewritten in C++.
New function overview
as a cheaper alternative to summary
.
All of the NA
handling functions now fall back to using is.na
if an appropriate
method cannot be found.
More support has been added for all objects with an is.na
method.
is_na
has been added as an S3 generic function which is parallelised and internally falls back
on is.na
if there are no suitable methods.
Additional list utility functions have been added.
Limited support for vctrs_rcrd
objects has been added again.
num_na
and similar functions no longer treat empty data frame rows as single observations but instead return the total number of NA
values in the data frame.
Fixed a bug in row_na_counts
and col_na_counts
that would cause the
session to crash when a column variable was a list.
For the time being, vctrs 'vctrs_rcrd' objects are no longer supported though this support may be re-added in the future.