-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚚 Migrations for lamindb 1.0 #2323
base: main
Are you sure you want to change the base?
Conversation
…ription field, re-purpose Transform.key as the central semantic identifier, and rename Transform.name to Transform.description
The conversation you were looking for could not be found. |
I don't really understand the point, as we have |
This is something for the frontend; I shared the Claude discussion with you. After the whole exploration around the block model this will likely make more sense in |
I still don't see it. |
Ok, that is clearer to me now, but i am still not sure. If we have a path in |
|
This is ready. After another review of at least @sunnyosun we can merge this to main and then deploy the migrations against our own instances to test for a day or two. On the weekend, we can complete the entire migration. Please be aware that this is the "final core schema". We'll not change it for the year to come unless a mistake happened here. So, it's good to spot anything that looks fishy (@Koncopd already spotted some that I resolved). |
FYI: the tests that hit our remote instances are commented out because I'll need to deploy the migrations first; I will only do this once it's reviewed. |
Could we rename lamindb/lamindb/core/_feature_manager.py Line 1184 in 519b1d7
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the docs fail, otherwise looks good to me.
f"`name` will be removed soon, please pass '{description}' to `description` instead" | ||
) | ||
else: | ||
raise ValueError("name doesn't exist anymore `description`") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unclear to me what this tries to convey.
I have added a few comments about minor things. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
ln.Artifact.filter(visibility=None).df()
gets changed toln.Artifact.filter(_branch_code=None).df()
in the deletion FAQ. I don't like that we're suggesting to filter by an underscore attribute. Should we consider not making this an underscore attribute given thatvisibility
wasn't either?- Generally, for all things visibility, I found
visibility
to be much easier to understand than_branch_code
although I of course see how much power we gain with it. It reads hacky though for users in my opinion. - WDYT about using
obj_type
instead ofotype
? It took me a few seconds too much to understand what you meant. - I don't get warm with
kind
at all but welltype
is a very overloaded term...
parts = field.split("__") | ||
if parts[0] in name_mappings: | ||
if parts[0] != "transform": | ||
logger.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeprecationWarning
?
else: | ||
if description is None: | ||
description = kwargs.pop("name") | ||
logger.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FutureWarning
?
Thanks, @Zethson. Responses below.
In fact this is backward compatible and people can keep doing
💯
The main reason for But either way I don't like
Yes, I agree. But it's still better than |
Other changes
db_default
with Python >= 3.10 -- discussion here on GitHubSchema changes
_source_code_artifact
from Transform, it's been deprecated since 0.75 222931a_source_code_artifact
populated, populatesource_code
Transform.name
toTransform.description
because it behaves likeArtifact.description
and it's confusing because people expect versioning to happen based onname
5d68c5e Claude chatTransform
constructor usename
to populatekey
in all cases in which onlyname
is passedkey
in casesource_code is None
via._name_field = "key"
description
field that was never exposed on the constructor; to be safe, we concatenated potential data in it on the new description field Claude chatkey=None
andname!=None
, usename
to pre-populatekey
Collection.name
toCollection.key
for consistency with Artifact & Transform and the high likelihood of users wanting to organize them hierarchically 2bef93a_branch_code
integer on every record to model pull requests 483fb37visibility
within that codevisibility=0
as_branch_code=0
as "archive".aux
json field onRecord
: Claude threadrun._status_code
that allows to writefinished_at
in clean up operations so that there is a run time also for aborted runsRun.is_consecutive
toRun._is_consecutive
_template_id
FK to store the information of the generating template (whether a record is a template is coded via _branch_code)_accessor
tootype
to publicly declare the data format assuffix, accessor
Artifact.type
toArtifact.kind
(internal Slack thread) Claude chatrun._logfile
which holds logshash
field onParamValue
andFeatureValue
to enforce uniqueness without running the danger of failure for large dictionaries 9e4b503 Claude chatArtifact._curator: dict
field 068bcf7._expect_many
toFeature
/Param
that defaults toTrue
/False
and indicates whether values for this feature/param are expected to occur a single or multiple times for every single artifact/run 1cc1ca8True
(default), the values come from an observation-level aggregation and a dtype ofdatetime
on the observation-level meanset[datetime]
on the artifact-levelFalse
it's an artifact-level value anddatetime
meansdatetime
; this is an edge case because an arbitrary artifact would always be a set of arbitrary measurements that would need to be aggregated ("one just happens to measure a single cell line in that artifact")False
(default), the values mean artifact/run-level values anddatetime
meansdatetime
True
, the values would be from an aggregation, this seems like an edge case but say when characterizing a model ensemble trained with different parameters it could be relevant.transform
foreign key from artifact and collection for consistency with all other records; introduce a property and a simple filter statement instead (context: https://github.com/laminlabs/laminhub/issues/1498) a4e0fb2TransformULabel
,RunParamValue
,ArtifactParamValue
576d858Run.parent
toRun.initiated_by_run
9fbdd6f_overwrite_versions
, which indicates whether versions are overwritten or stored separately; it defaults toFalse
for file-like artifacts and toTrue
for folder-like artifacts - see Slack threadn_objects
ton_files
for more clarity laminlabs/lamindb-setup@eba36ba Claude chatTODO:
ULabel
calledis_concept
to distinguish ontological concepts from instances of the concepts: labels meant for labeling; for instance, you would never want to label an artifact with a ulabelProject
, you'll only want to label with actual project valuesProject 1
,Project 2
, etc. 6bd7f62Materials
Needs:
Affects:
Triggered these issues:
Did not move forward with these changes
Record._page_md
textfield so that every entity can hold markdown contentEvent
model with fieldsRecord._public
boolean that indicates whether a record is public Claude chatFeature
toDimension
andFeatureSet
toDimGroup
feature_sets
is a public attribute and given that some people might want to query forfeature_values
orparam_values
outside the.features
and.params
managers; we should consider making_feature_values
and_param_values
public. This will likely also help people better understand the underlying mechanics if they're ever interested. Decided against because then there would be two ways to query by a feature or param. Everything is already supported through the lamindb-native API and by that considerably simpler than doing this with Django, in particular since ✨ Enable to easily join features onto artifacts viaArtifact.df()
#2238. Also see the run docs where nested queries are exemplified: