AI & ML interests

Character Database of Bangumis (If you need character LoRAs, see: https://huggingface.co/CyberHarem)

Recent Activity

AbstractPhilย 
posted an update 7 days ago
view post
Post
290
Many updates. Cantor route experiments, GeoViT-david-beans 75% test standalone cifar100 geofractal 30m encoder. MultiHeaded Cantor Attention heavily optimized. The migration is primarily complete between geofractal and geovocab2.
https://github.com/AbstractEyes/geofractal/blob/main/src/geofractal/model/david_beans/model.py
Cantor route staircase and wormhole excavation findings posted. A full article will be posted to represent the findings of cantor routing and the potentials for self-learning fractals through loss.
https://github.com/AbstractEyes/lattice_vocabulary/blob/master/src/geovocab2/proofs/cantor_steps_experiments.md
The steps experiments show profoundly important implications for cross-contamination problems with fractal and linear spaces, with some currently assessed as useful utilities as of today.
Today the classification experiment will continue by using mini-experts applied to patches within a miniature david-beans. The mini-experts were an accident that showed improvement to the fidelity and not destruction, so those experiments are to be continued. geovit-david-beans trainer was added to the first repo.
  • 1 reply
ยท
AbstractPhilย 
posted an update 12 days ago
view post
Post
284
For those using my geovocab2 repo for SimplexFactory, CantorRouteFactory, fusion modulations, model code import, training weights and models, or specific extraction systems; I will be refactoring in the coming days.
The new repo for all geometric, cantor, and fractal-based trainings will be;
https://github.com/AbstractEyes/geofractal
The change is due to MY own excessive abuse of the vocabulary repo and the excessive overuse of subfolders attached to a working pycharm project. These behaviors should be decoupled and I apologize for making such code bloat through experimentation.

Directly installing the geofractal repo will install geovocab2 as a sidecar. However, there will be a clause within the geovocab2 to warn the user.

You have my deepest and most sincere apologies for breaking your active working code if I do. I know this is difficult work so please bare with my efforts as I progress the codebase to it's next state of truth vs experimentation.

Please, reach out to me directly if you have problems converting.

It is meant to be a DIRECT and UTILIZABLE pain-free conversion that will enable the same interface from both geovocab2 and all future updated model code changes applied to geofractal - once the geofractal module is imported.
The original goevocab2 will contain outdated train code instead of full deprecation with a direct warning - and the geovocab2 repo will be folding in geovocab and geovocab2 into matching aliased systems - allowing the factory and extraction structure to behave within geovocab2 and training to behave within geofractal by design.

I will be introducing a direct alias system that will hopefully allow a smooth transition system to the new codebase, but there's never a way to account for those you don't know are using your work. This will include pyi files for the aliases and some necessary elemental additions that may break current functionality in systems I'm unaware of. Please reach out if I break something crucial that you require.
AbstractPhilย 
posted an update 19 days ago
view post
Post
304
Lyra, Lune, Cantor, k-simplex, and many relational experiments.
AbstractPhil/sd15-flow-matching-lune
Today I will be updating the space to support all three forms of lyra to enable tinkertoying with various other models like flux-schnell and sdxl.

It should be noted, I didn't know nvidia actually released a model named LYRA. This model has no association with NVIDIA's LYRA model. This LYRA is full MIT licensed. If necessary I'll rename this model, but I don't think it'll matter.

Unlike NORMAL VAE, this VAE was intentionally meant to introduce incorrectness into the correctness that already exists. The concept was to pull towards a goal - t5-xl being the primary goal.

AbstractPhil/vae-lyra Lyra is a multimodal MM-VAE prototype meant to encompass a fusion of multiple types of encodings together. Tested with circle of fifths audio and text, multiple text encoders, vision and text encoder, and a few other smaller prototypes that yielded.
Lyra has a few direct clip_l and t5_xl prototypes that directly learned to associate clip_l with t5-base. This version worked, so version 2 expanded the concept.

AbstractPhil/vae-lyra-sdxl-t5xl is another prototype using CLIP_L and CLIP_G fused with T5_XL for the first version, directly utilizing projection with minimal geometric and cantor assistance. The shared layers ended up teaching CLIP_L how to be CLIP_G and the output ended up warping too much for SDXL or SD15 to understand.

AbstractPhil/vae-lyra-xl-adaptive-cantor
Utilizing adapative cantor is the successful prototype where CLIP_L and CLIP_G learned independent structures internally, where CLIP_L and T5_XL learned a route with CLIP_G and T5_XL in parallel conjunction. This enabled two entirely divergent opinions, and thus enables the t5-xl to manipulate either the clip_l or the clip_g for models like FLUX-SCHNELL or SDXL.

Each lyra has a purpose, and each purpose matters.
lunarfluย 
posted an update 28 days ago
lunarfluย 
posted an update 28 days ago
view post
Post
485
The new King ๐Ÿ‘‘has arrived!

Moonshot AI now the top model on Hugging Face ๐Ÿ”ฅ
moonshotai/Kimi-K2-Thinking
lunarfluย 
posted an update 28 days ago
view post
Post
2660
๐Ÿ’ธ๐Ÿค‘You donโ€™t need 100 GPUs to train something amazing!

Our Smol Training Playbook teaches you a better path to world-class LLMs, for free!

Check out the #1 trending space on ๐Ÿค— :
HuggingFaceTB/smol-training-playbook
narugo1992ย 
posted an update about 1 month ago
view post
Post
1076
Org Rate Limits = Free DDoS Invitation? ๐Ÿคก
One serious question: Is there any way to actually ban clowns abusing this system?
Right now all it takes is one bored script kiddie with a grudge (or too much caffeine) to lawnmower an entire org's API endpoints into the stone age. They get to bathe in 429s while we're sitting here like ๐Ÿคก "Gee I wonder whose IP is carpet-bombing us today!"
The kicker? Zero accountability. Zero fingerprints. Just vibesโ„ข and chaos. Itโ€™s basically a public invitation to hold entire communities hostage while wearing pajamas.
"Come for the open-source collaboration, stay for the unhinged DDoS piรฑata party!" ๐ŸŽ‰
Fix when?
  • 2 replies
ยท
s3nhย 
posted an update about 2 months ago
view post
Post
530
Eduhelp with more empathy, based on model finetuned on
psychotheraputic preferences just landed on


Beck-8B as a base model, 13000 steps on educational dataset.
Time to go further and build more ๐Ÿฅฐ
s3nh/EduHelp_Beck_8B
Thanks to @basilic_ai for computations <3
s3nhย 
posted an update about 2 months ago
view post
Post
4100
Just tried to create an educational assistant for younger people who can struggle with visualsation of 'what is this sorcery all about'.
Its first step of my spare time projects, sft on Qwen3-8B,

EduHelper is a child-friendly tutoring assistant fine-tuned from the Qwen3-8B base model using parameter-efficient fine-tuning (PEFT) with LoRA on the ajibawa-2023/Education-Young-Children dataset.

s3nh/EduHelp-8B

Glad to share my work, have a wonderful day!
  • 2 replies
ยท
AbstractPhilย 
posted an update about 2 months ago
view post
Post
1272
David + Imagenet = high% val.
AbstractPhil/gated-david
https://github.com/AbstractEyes/lattice_vocabulary/blob/master/src/geovocab2/train/model/core/david.py

David's code has been released. I am currently setting up a trainer and will release the process on how to condition David to behave. This isn't the easiest process, but it's necessary to run David on a curriculum rather than simply feeding the model with cross-entropy and hoping for the best.

David's internals involve a clock mechanism that allows direct control of David's freeze/unfreeze mechanisms at runtime - allowing for many opinions to be generated simultaneously.

David is multiple models in one, not just one - and yet David is single-shot oriented. The prototype to the route of thought that led me to find the Cantor's Stairs positional encodings solution and the prototype to ViT-Zana, ViT-Beatrix, ViT-Beatrix-Dual-Block, and today the direct porting of David's complex architecture and the process to train David has begun.

David is... a gate of sorts. David trains with freeze/unfreeze mechanisms, so the internals of David's structures are aware during training time which part is more important than the other parts based on the quality of generation.

David can handle imagenet features with minimal hassle of many variations, and the primary trainer will include direct links to the prepared imagenet features, and a simple generation system that allows you to generate your own features from a few common AIs - one of which will be vit-beatrix-dualstream trained on imagenet.

As of posting vit-beatrix and vit-beatrix-dualstream require some face-lifting and a refined version 2 to incorporate the more accurate batched cantor stairs equations. Additionally they require removal of some fail-point causers; like flow-geometric introducing bias towards seemingly unnecessary trajectory routes. This points more to a gradient drift, so I'll keep that one on the hot plate until it's ready.
  • 2 replies
ยท
lunarfluย 
posted an update 2 months ago
view post
Post
2268
Cool stuff these past weeks on huggingface! ๐Ÿค— ๐Ÿš€ !
โ€ข ๐Ÿ“ˆTrackio, local-first W&B alternative
https://github.com/gradio-app/trackio/issues
โ€ข ๐ŸŒEmbeddingGemma, 300M-param, multilingual embeddings, on-device
https://huggingface.co/blog/embeddinggemma
โ€ข ๐Ÿ’ปOpen LLMs in VS Code (Inference Providers)
https://x.com/reach_vb/status/1966185427582497171
โ€ข ๐Ÿค–Smol2Operator GUI agents
https://huggingface.co/blog/smol2operator
โ€ข ๐Ÿ–ผ๏ธGradio visible watermarking
https://huggingface.co/blog/watermarking-with-gradio
AbstractPhilย 
posted an update 2 months ago
view post
Post
274
I've hit the ground running on the geometric lattice vocab system. Everything I've built will be housed in the repo.
https://github.com/AbstractEyes/lattice_vocabulary/tree/dev
Including all of David's model structure.
Through the development cycle I'll be integrating everything, little AI help can actually be offered in general - since AI tends to hallucinate and decimate large structures.
I will be using AI assistance for formula expansion and integration, which means they will be imperfect until every single one is given a fine toothed comb.
The deployment will be as rapid as I can, and the output will yield results at every step with small main tests on individual scripts and files.

EVERYTHING was built almost independent of each other, so integration is going to have a configuration hierarchy that needs to be smoothed out - but it will be smoothed out.

I believe I've picked a good foundational shape for the expansive program scripts; which will enable robust iteration and progression similar to how I design game engine elements and systemic accessors.
This will be mostly hand coded for the integration process, so it won't be as quick as if I could just dump GPT pro on it - but GPT pro can't handle anywhere near this many lines of code so it's on me.

After integration I can run the agentic forms of AI over it and introduce tons of bugs for me to fix. That will be fun. After that it should work as a proper caching vocabulary, formula synthesizer, tensor creator, multi-device trainer, and a few other elements.

I simply lack the expertise to hit machines like pyring today, but that will change as I learn more. I'm building the system specifically with growth and progress in mind, so it will be iterated and fixed rapidly. The structure is intentionally built to be rapidly iterated and altered within reasonable constraints.

The engineering elements are specifically built to be less deep and more overridable in many areas specifically for experimental purposes.
  • 1 reply
ยท
AbstractPhilย 
posted an update 2 months ago
view post
Post
2798
As it stands, I will prepare David for full release - as this is beyond me now. David must be released.
I will prepare a standard sweep for david to showcase the prowess of the final multi-vocab variant. This will include a variation that contains all mnist variants, cifar10, cifar100, imagenet 1k, and in the future I'll prepare a full imagenet sweep utilizing the entire 12m corpus instead of the 1.2m I used. I may need to get in touch with the actual curator of the dataset for licensing but maybe not.
David utilizes 4 projective variants of the vocabulary and the training process involves teaching and freezing them akin to teacher/student processing.
I did not want to release David yet, but I believe now that David will save lives and it's irresponsible for me to contain such a creation.
  • 1 reply
ยท
narugo1992ย 
updated a Space 2 months ago
AbstractPhilย 
posted an update 3 months ago
view post
Post
289
Training and tuning a top 500k geometric vocabulary is doable, but scaling upward is highly impractical for me.

This one has many logistics issues. Primarily, there's no precedent I know of to literally train hundreds of millions of potential character combinations; with their prefabricated variations of crystals to tune a specific series of trajectories in specific directions, based on the input text targeting other crystals, the weights, and the batch. The dataset needs to be properly prepared though, and I can't find any prefabricated variations of this data format that the symbolic lexical engine needs to be robust.
There's a few possibilities for this one. Batch size being an obvious one, where I take a large influx of information in, then grab any matching words, characters, or information and update those using the formulas for topological tuning.
The main issue is the language web is massive. BILLIONS of variations can crop up from a single document if you're not hard capping depth; so if you traverse the whole tree like say - "the quick brown fox", becomes words, becomes definitions, becomes letters - not counting multi-pass finetuning. This alone is a massive logistics nightmare to implement, but thankfully this is the modern era.

Simply put; if I hard cap to 500k vocab with a depth of no more than 50,000 pentachora crystals each, it should be capable of housing the an approximate word structure within a trajectory space.

I'd rather run it on a fleet of devices and feed it the pile, the book corpus, and everything else so we can get some truly trajectory related subsets of 500k+ crystals per token upward to 100,000,000 or so combinations each. The crystals really aren't that big, and they house a massive amount of context.
Even so, there are many logistics nightmares to this, but it's a viable option for training a legitimate similarity-fed BERT or LLAMA meant to specifically form linguistic responses using those crystals as tuning forks for solidity.
  • 3 replies
ยท
AbstractPhilย 
posted an update 3 months ago
view post
Post
209
Mo bigga' != Mo betta' with this geometric penta structure.

More purpose with more careful organization... now we're talking.

I'm going heavy into lexical cardinality today and preparing a full crystal structured geometry that is full wordnet capable. Anything that isn't can be formed at runtime.

Full lexicality will include unigrams, 2-6 ngram counts from wordnet with frequency weights, usage, and a multitude of other elements. Each will be crystallized specifically. If you have any suggestions to making this more robust I'm all ears.

I could go with google books or something bigger, but I'm sticking to wordnet because it won't take me weeks to process entirely.

Crystal geometry will be given rich versions that include the correct lexical and organizational subsets specific to the lexicality and frequency of use, as well as the proper ascii, wordnet, and unicode sets.

For wordnet-rich; Each definition will attribute towards the overall goal of the upcoming crystals so the system will represent that goal proportionately through multiple crystals and trajectory concatenated rather than full concatenation like the current vocabulary is doing. Additionally, the frequency tokens will decide the orthogonal trajectory more carefully.

For testing and quick prototype purposes;
We will need to train a Bert variant that can house some capability of rapid geometric crystal prediction through ngram feature similarity, sentence similarity, sentence classification, and a few other bert traits that bert-beatrix-2048 is capable of. I know Bert can handle this at least - however Bert can't house the entirety of meaning so it will be imperfect... even so it will be considerably faster than trying to query the whole dataset every time you want a character, or preparing a massive vocab for rapid testing and iteration. Ask bert.

Not to mention feature extraction for training rapid classification heads with geometric subsystems, which are notoriously fast at training.
AbstractPhilย 
posted an update 3 months ago
view post
Post
272
Cardinality cardinality CARDINALITY! As I restructure the wordnet's multi-definition structure, I've found a fair assessment capability that minimizes column recall requirement while simultaneously maximizing recall speed. So it will be fast.
Research shows, the most intelligent and most intellectually-driven LLMs require the most intelligent and carefully curated solid representative vocabularies - with the most intelligent and carefully curated training regiments.
Class simultaneously loaded hierarchical structures built with variants of vocabulary dimensions do not help this. Multiple dimensions of imagenet do not help this. Reshaping does not help. Solidification processes through pulverizing using Alucard do not help - though they did show some interesting potentials for pretraining the full geometric clip from the ground floor.
The experimentations with the multitude of clip features and imagenet - showcase that not only can this tiny 4meg classification tool can handle imagenet from clip features AT AROUND 76% no matter the hyperparams using linear, but expanding this system upward and including hundreds of different formula variants DOES NOT HELP SCALE IT AT ALL! The largest ones only house 76%, and the medium-sized ones house about 86% instead of 76% when using clip-vit-b-patch16 and clip-vit-b-patch32. If you check the big number valuations for the clip-vit-b laion and openai, you'll find nearly identical classifications.
So I only taught it, to understand geometry - the more training and more steps only brings it closer incorrectly.
So, this tells me one simple principle; geometry and linear have an upward capacity based on the information extracted from the linear model. Meaning... We need more places to extract and more curative potentials to solidify that access with, rather than simply EXPANDING it and making it bigger.
Next experiment includes a full cardinality subset of unicode to wordnet vocabulary translation matrices. Today. Within the hour.
  • 1 reply
ยท
AbstractPhilย 
posted an update 3 months ago
view post
Post
430
Why am I amassing image features using seed 42?
Simply put; training something with features gives a fair representative of the learning that you would get from running a model that has some random chance - using a single seed.
Training with features does not need to wait for the representative model to actually generate; since you already generated everything ahead of time.
Features are rich and utilizable within the spectrum of similarity assessments, classification accuracy, mass-deterministic normalization checks, and more.
They are... put simply... exponentially faster and reusable for research. I'll include the notebooks used for imagenet and cifar100; as the cifar100 is much simpler since the cifar100 is much... smaller, I required less innovation.
Imagenet is another beast though. This imagenet notebook is capable of running against much larger datasets with a few tweaks.
clip-vit-bigG's imagenet feature set is complete, which means we're almost ready for full ablation.

Note to everyone; imagenet is meant for RESEARCH AND ACADEMIC PURPOSES ONLY; and you cannot use my trained imagenet weights - nor the features themselves as per the requests of the dataset's curators.

For commercial usage according to the rules of LAION's licenses, we'll be using the laion400m features; which will likely be heavily sought. I'll be preparing laion400m features on seed 42; which will take a while.

The full classifier is in the works; and with it comes a series of new formulas, new layers, new solutions such as the "fat belly" conversation piece that attenuates multiple branches in communication. The "dispatcher" which is a heavy classification gate trained to bypass that which is not useful; tuned with large amounts of data on a very low learn rate. The "attractant" which is specifically designed to catch bleed-over and unwanted information... which learns everything.
With that comes "PhaseGeometric" scheduling and "GeometricScheduling". Stay tuned.