The early design decisions for the Categorical type were under strain because of our streaming engine. Every data chunk carried its own mapping between the categories and their underlying physical values, forcing constant re-encoding. The global StringCache we built to solve it caused lock contention and wasn't designed for a distributed architecture.
The new Categories object, released in 1.31, solves this, and gives you:
• Control over the physical type (UInt8/16/32)
• Named categories with namespaces
• Parallel updates without locks
• Automatic garbage collection
When you know the categories up front you can use Enums. They're faster because of their immutability and allow you to define the sorting order of values.
The StringCache is now a no-op, but the code will keep working how it used to (with global Categories). You can also migrate by replacing it with explicit Categories where needed.
The result is a Categoricals data type that works well on the streaming engine without performance degradation, and is compatible with a distributed architecture.
Read the full deep dive: pola.rs/posts/categori…
pandas 3 has been released and marks the most significant evolution of #pandas in over ten years.
No more `copy()` everywhere, and no more `lambda` gymnastics.
Want examples? Read this hands-on article with the main changes: datapythonista.me/blog/whats-new…
#pandas 3 is faster, and more intuitive. I just published an article with practical examples explaining the main improvements: datapythonista.me/blog/whats-new…
We're happy to announce the release of #pandas 2.3.0. You can install it with `pip install pandas` or `conda install -c conda-forge pandas`. Thanks to all contributors and sponsors who made this release possible! The release notes can be found at: pandas.pydata.org/docs/whatsnew/…
Today we are launching the first open Crash Course training sessions with a limited time discount. These instructor-led sessions are open to everyone looking to get up and running with Polars.
Find a date and sign up via our Academy: pola.rs/academy/
I've written a blog about Tonbo's research on async Rust and io_uring:
tonbo.io/blog/async-rus…
We need to be careful to avoid the cancellation problem when using async Rust and io_uring together.
92 Followers 795 FollowingMember of Technical Staff @orbitalhardware, ex Senior Data scientist working in generative @shutterstock, ex @datasine (acquired). All views my own.
2.3M Followers 2K FollowingOpen Source Intelligence Monitor focused on Europe and Conflicts across the World. RT ≠ Endorsement. Want to Support my Work? https://t.co/PcUbewvWPr
791K Followers 4 FollowingA platform for illuminating academic papers. We annotate and share a paper every week. Save, annotate and share papers with anyone: https://t.co/0o2Pls3jmo
1K Followers 530 FollowingPyCon Namibia 2027. Namibia's most important international open-source software conference. Join us in Windhoek, Namibia. 19 - 25 February 2027
434K Followers 752 FollowingHassan Ghulam Sajwani-Emirati, tech, business, CT, RPs not endorsements الله، وطن، ثم رئيس الدولة Patriot. Personal account not related to Hussain Sajwani Damac
131K Followers 178 FollowingProfessor of computer science at UW and author of '2040' and 'The Master Algorithm'. Into machine learning, AI, and anything that makes me curious.
222 Followers 2 Followingsciwork is a community for researchers and engineers to share and discuss computer code. Join our discord: https://t.co/lV47AiWN4D
2.1M Followers 3 FollowingWe are an independent media company bringing you unparalleled coverage of all-things geopolitics & BRICS News in real-time. Not an official government account.
3K Followers 666 FollowingCreating the next generation of geospatial data tools for Python & the browser with GeoArrow, GeoParquet & GeoRust @developmentseed | he/him 🌈
27 Followers 9 FollowingA community initiative empowering coders in the UAE to turn their ideas into Startups in 24 hours with our workshops and frameworks.
7K Followers 860 FollowingSecurity engineer at https://t.co/027VXUlgOx. Focusing on the Linux kernel. Maintaining @linkersec. Trainings at https://t.co/D5MrxmYimS.
43K Followers 29 FollowingYouTube videos about computers and the internet and stuff... Sister project to @numberphile
Supported by Jane Street - https://t.co/e3W4ysePpH
9K Followers 1K FollowingCTO of @InfluxDB (YC W13), founder of NYC Machine Learning, series editor for Addison Wesley's Data & Analytics, author of Service Oriented Design with Ruby.
922 Followers 109 FollowingDeveloping PyO3 to bring Rust 🦀 and Python 🐍 together.
Working with the team at Pydantic to build software that developers love.
@[email protected]