- Bugfix:
@summary
no longer errors with non-numeric columns. Instead, it only reports non-numeric summary stats on non-numeric columns. Minor changes to summary column names to be snake_case. - Bugfix: Reverted a bug introduced in v0.13.4, which escaped all macros. Now, string macros remain escaped (i.e., keeping it possible to work with Unitful units, e.g.
u"psi"
), but other macros are not escaped to allow for those macros to refer to column names within arguments. - Updated documentation on new preferred method of interpolation using
@eval
and$
- Added documentation on using other macros inside of TidierData macros
- Bugfix:
@slice_min
and@slice_max
respect then
argument - Adds
@head
- Adds
extra
argument for@separate()
andremove
argument for@unite()
- Adds support for tuples and vectors as arguments to select multiple columns. Prefixing tuples/vectors with a
-
or!
will exclude the selected columns. - The
:
selector from Julia is now available and equivalent toeverything()
@pivot_longer()
now pivots all columns if no column selectors are provided
unique()
,mad()
, andiqr()
are no longer auto-vectorized- Bugfix:
@ungroup()
now preserves row-ordering (and is faster) - Bugfix:
slice_sample()
now throws an error if non
orprop
keyword argument is provided - Bump minimum Julia version to 1.9
- Update Chain.jl dependency version
- Bugfix:
n()
pulls a single value and not a vector of values rand()
is no longer auto-vectorized
- Add support for
begin-end
blocks for all macros accepting multiple expressions - Bug fix to add support for expressions inside of
@group_by()
, as in@group_by(b = a + 1)
- Bug fix to allow
PackageName.function()
within macros to be used without escaping
- Bug fix to ensure that data type constructors are not escaped
- Adds
@relocate()
- Adds
@unnest_wider()
- Adds
@unnest_longer()
- Adds
@nest()
- Fixes tidy selection in
@unite()
- Adds support for interpolation and tidy selection in
@fill_missing
- Fixes tidy selection in
@separate_rows()
@slice()
now supports interpolation and user-defined functions- Adds
where()
- Adds
is_number()
@separate()
now supports regular expressions- Adds
@separate_rows()
- Update parsing engine so that non-function reserved names from the Base and Core modules (like
missing
,pi
, andReal
) are auto-escaped now, with the exception of names in the not_escaped[] array, which are never escaped - Add
collect()
to not_vectorized[] array
@summarize()
and@summarise()
now perform auto-vectorization in the same way as@mutate()
, meaning that the top-level macros are now all consistent in their treatment of auto-vectorization.- Update documentation to describe new auto-vectorization behavior and give an example of how to modify the
TidierData.not_vectorized[]
array.
- Macros used inside of verbs like
@mutate()
are now escaped, making it possible to work with Unitful units (e.g.u"psi"
)
@slice()
now correctly handlesn()
in grouped data frames
- Adds
@anti_join()
and@semi_join()
- Adds
@slice_head()
and@slice_tail()
- Adds
@slice_min()
,@slice_max()
, and@rename_with()
- Adds
missing_if()
andreplace_missing()
- Add Statistics version to Project.toml
- Adds support for
everything()
selection helper. - Adds docstrings for
everything()
,starts_with()
,ends_with()
, andmatches()
- Fixes bug in
@separate()
so that the value ofinto
supports interpolation.
- Fixes
!!
interpolation so that it works using normal Julia scoping rules. It no longer usesMain.eval()
in the implementation. The way interpolation works contains some breaking changes, and the documentation has been updated accordingly. - Fixes name conflict with
Cleaner.rename()
andDataFrames.rename()
- Adds
categorical()
to array of non-vectorized functions.
- Add
@fill_missing()
,@slice_sample()
,is_float()
,is_integer()
,is_string()
- Rename
@drop_na()
to@drop_missing()
to be consistent with Julia data types. - Added StatsBase.jl dependency for use of
sample()
function within@slice_sample()
- Simplified dependency versions to ensure future compatability with dependency updates
- Refactor macros to make them much faster and memory-efficient.
@group_by
no longer automatically sorts by group, which makes it much faster. This is a slight change in behavior fromdplyr
but the speed trade-off is worth it.
- Remove
TidierData_not_vectorized[]
from exports - Add
TidierCats.jl
functions tonot_vectorized[]
list
- Export
TidierData_not_vectorized[]
to make it easier for other packages to access it
- Exposed
not_vectorized[]
as a package global variable so that the user or other packages can modify it - Added
@separate
,@unite
, and@summary
Tidier.jl
cloned and changed toTidierData.jl
- Added documentation on how to interpolate variables inside of
for
loops. Note:!!
interpolation doesn't work inside offor
loops because macros are expanded during parsing and not at runtime. - Fixed bug in
parse_pivot_arg()
to enable interpolation inside of pivoting functions when used inside afor
loop. - Added
cumsum()
,cumprod()
, andaccumulate()
to the do-not-vectorize list.
- Fixed bug to allow multiple columns in
@distinct()
separated by commas or using selection helpers.
- Fixed bug to ensure that
&&
and||
are auto-vectorized - Added docstrings and examples to show different ways of filtering by multiple "and" conditions, including
&&
,&
, and separating multiple expressions with commas.
- Added
as_float()
,as_integer()
, andas_string()
- Added
@glimpse()
- Moved repo to TidierOrg
- Added
@drop_na()
with optional column selection parameter - Re-exported
lead()
andlag()
from ShiftedArrays.jl and added both to the do-not-vectorize list - Bug fix: Fixed
ntile()
condition for when all elements are missing
- Added
@count()
and@tally()
- Added
@bind_rows()
and@bind_cols()
- Added
@clean_names()
to mimic R'sjanitor::clean_names()
by wrapping the Cleaner.jl package - Added support for backticks to select columns containing spaces.
- Added support for
ntile()
, which is on the do-not-vectorize list because it takes in a vector and returns a vector. - Bug fix: removed selection helpers (
startswith
,contains
, andendswith
from the do-not-vectorize list).
- Added
@distinct()
. It behaves slightly differently from dplyr when provided arguments in that it returns all columns, not just the selected ones. - Added support for
n()
androw_number()
. - Added support for negative selection helper functions (e.g.,
-contains("a")
). - Added support for negative selection using
!
(e.g.,!a
,!(a:b)
,!contains("a")
). - In
@pivot_longer()
, thenames_to
andvalues_to
arguments now also support strings (in addition to bare unquoted names). - In
@pivot_wider()
, thenames_from
andvalues_from
arguments now also support strings (in addition to bare unquoted names). - Bug fix:
@mutate(a = 1)
or any scalar previously errored because the1
was being wrapped inside aQuoteNode
. Now, 1 is correctly broadcasted. - Bug fix:
@slice(df, 1,2,1)
previously only returned rows 1 and 2 only (and not 1 again).@slice(df, 1,2,1)
now returns rows 1, 2, and 1 again. - Bug fix: added
repeat()
to the do-not-vectorize list.
- Added
@pivot_wider()
and@pivot_wider()
. - Added
if_else()
andcase_when()
. - Updated documentation to include
Main.variable
example as an alternative syntax for interpolation. - Simplified internal use of
subset()
by using keyword argument ofskipmissing = true
instead of usingcoalesce(..., false)
. - For developers: doctests can now be run locally using
runtests.jl
.
- In addition to
in
being auto-vectorized as before, the second argument is automatically wrapped inside ofRef(Set(arg2))
if not already done to ensure that it is evaluated correctly and fast. See: https://bkamins.github.io/julialang/2023/02/10/in.html for details. This same behavior is also implemented for∈
and∉
. - Added documentation and docstrings for new
in
behavior with@filter()
and@mutate()
. - Improved interpolation to support values and not just column names. Note: there is a change of behavior now for strings, which are treated as values and not as column names. Updated examples in the documentation webpage for interpolation.
- Bug fix: Re-exported
Cols()
because this is required for interpolated columns inside ofacross()
. Previously, this was passing tests becauseusing RDatasets
was exportingCols()
.
- Rewrote the parsing engine to remove all regular expression and string parsing
- Selection helpers now work within both
@select()
andacross()
. @group_by()
now sorts the groups (similar todplyr
) and supports tidy expressions, for example@group_by(df, d = b + c)
.@slice()
now supports grouped data frames. For example,@slice(gdf, 1:2)
will slice the first 2 rows from each group ifgdf
is a grouped data frame.- All functions now work correctly with both grouped and ungrouped data frames following
dplyr
behavior. In other words, all functions retain grouping for grouped data frames (e.g.,ungroup = false
), other than@summarize()
, which "peels off" one layer of grouping in a similar fashion todplyr
. - Added
@ungroup
to explicitly remove grouping - Added
@pull
macro to extract vectors - Added joins:
@left_join()
,@right_join()
,@inner_join()
, and@full_join()
, which support natural joins (i.e., where noby
argument is given) or explicit joins by providing keys. All join functions ungroup both data frames before joining. - Added
starts_with()
as an alias for Julia'sstartswith()
,ends_with()
as an alias for Julia'sendswith()
, andmatches()
as an alias for Julia'sRegex()
. - Enabled interpolation of global user variables using
!!
similar to R'srlang
. - Enabled a
~
tilde operator to mark functions (or operators) as unvectorized so that Tidier.jl does not "auto-vectorize" them. - Disabled
@info
logging of generatedDataFrames.jl
code. This code can be shown by setting an option using the newTidier_set()
function. - Fixed a bug where functions were evaluated inside the module, which meant that user-provided functions would not work.
@filter()
now skips rows that evaluate to missing values.- Re-export a handful of functions from the
DataFrames.jl
package. - Added doctests to all examples in the docstrings.
- Updated auto-vectorization so that operators are vectorized differently from other types of functions. This leads to nicer printing of the generated DataFrames.jl code. For example, 1 .+ 1 instead of (+).(1,1)
- The generated DataFrames.jl code now prints to the screen
- Updated the ordering of columns when using
across()
so that each column is summarized in consecutive columns (e.g.,Rating_mean
,Rating_median
,Budget_mean
,Budget_median
) instead of being organized by function (e.g. of prior ordering:Rating_mean
,Budget_mean
,Rating_median
,Budget_median
) - Added exported functions for
across()
anddesc()
as a placeholder for documentation, though these functions will throw an error if called because they should only be called inside of Tidier macros - Corrected GitHub actions and added tests (contributed by @rdboyes)
- Bumped version to 0.3.0
- Fixed bug with
@rename()
so that it supports multiple arguments - Added support for numerical selection (both positive and negative) to
@select()
- Added support for
@slice()
, including positive and negative indexing - Added support for
@arrange()
, including the use ofdesc()
to specify descending order - Added support for
across()
, which has been confirmed to work with both@mutate()
,@summarize()
, and@summarise()
. - Updated auto-vectorization so that
@summarize
and@summarise
do not vectorize any functions - Re-export
Statistics
andChain.jl
- Bumped version to 0.2.0
- Initial release, version 0.1.0