PredictorX1 t1_j38if43 wrote on January 6, 2023 at 7:43 PM

Reply to comment by Baturinsky in [D] Is it a time to seriously regulate and restrict AI research? by Baturinsky

>major countries should work on this together, like on limiting nuke spread.

This is a good parallel: See how much cheating goes on with nuclear material and nuclear weapons.

PredictorX1 t1_j38i8bn wrote on January 6, 2023 at 7:42 PM

Reply to comment by HateRedditCantQuitit in [D] Is it a time to seriously regulate and restrict AI research? by Baturinsky

Bombs require special materials. Suspicious purchases of the precursors of explosives are watched. There are hundreds of millions of PCs on this planet, every one of them capable of being used to develop cryptographic software and every one of them able to execute it.

Bombs are made one at a time. Once encryption software is written, it can be copied endlessly.

PredictorX1 t1_j37nzay wrote on January 6, 2023 at 4:40 PM

Reply to comment by soraki_soladead in [D] Is it a time to seriously regulate and restrict AI research? by Baturinsky

>cryptography is regulated

In practice, this mainly applies to commercial offerings. If a competent programmer wanted to implement strong encryption, all they would need is the right book.

PredictorX1 t1_j2pdcx4 wrote on January 3, 2023 at 12:44 AM

Reply to [P] Using machine learning to correct geometrical distortion in images by LordChips4

Have you tried conventional image registration techniques? One common process is to manually or automatically determine matching pairs of points in the image being adjusted and a reference image, and fit linear or low-order polynomials to map the coordinates of one to the other. I'd imagine that radial basis function neural networks would be quite good at making such a mapping.

PredictorX1 t1_j2nk8sa wrote on January 2, 2023 at 5:38 PM

Reply to [P] An old fashioned statistician is looking for other ways to analyse survival data - Is machine learning an option? by lattecoffeegirl

My recollection (feel free to correct me) is that statistical survival models are (can be?) created as a series of logistic regressions, one for each of several forecast horizons. One could keep that same structure, substituting any classifier (induced decision tree; neural network, ...) which produces an estimated probability for those logistic regressions.

PredictorX1 t1_j2n5fy5 wrote on January 2, 2023 at 3:59 PM

Reply to [D] What do you do while you wait for training? by hollow_sets

I do the dishes, change batteries in things that need them and waste time on Reddit.

PredictorX1 t1_j1yn6sz wrote on December 28, 2022 at 10:49 AM

Reply to [D] Protecting your model in a place where models are not intellectual property? by nexflatline

Perhaps the model could be adulterated in some way which requires reversal, which is calculated remotely? My thought is that the model being calculated locally would be unusable, and the unlocking would be a simple hash function or something similar which would require minimal telecommunications bandwidth. Similarly, if calculation of the model could be divided into parts which require assembly in a final step, this could be worked the same way.

PredictorX1 t1_j102cz7 wrote on December 20, 2022 at 6:10 PM

Reply to [D] Why are we stuck with Python for something that require so much speed and parallelism (neural networks)? by vprokopev

>Why are we stuck with Python ...

I can only speak for myself, but I have been working in anakytics for a long time and I rarely use Python. Most of my analytical work is done in MATLAB, though I occasionally use machine learning shells or (matrix-capable) compiled BASIC. Since I write nearly all of my own code at the algorithm level, I can generate source code for any deployment system (SAS, SQL, Java, ...even Python!) with no need of libraries, etc.

PredictorX1 t1_j0z7qtr wrote on December 20, 2022 at 2:47 PM

Reply to How to train a model to distinguish images of class 'A' from images of class 'B'. The model can only be trained on images of class 'A'. by 1kay7

This is known as one-class learning or one-class classification. You could try obtaining "background class" images (images similar to yours in resolution, overall brightness, ...) and training an ordinary classifier on the combination of the two. Obviously, the background class images cannot contain food, but searches for things unrelated to food ("nail", "dancer", "floor", "statue", ...) followed up by quick visual inspection should serve.

PredictorX1 t1_j0dkm9h wrote on December 15, 2022 at 9:34 PM

Reply to [D] Waveform recognition question by Tavallist

I suggest trying to come up with simple summaries of the data to use as features. Also, I suggest attempting a relatively simple model first, such as linear discriminant or logistic regression. How many examples do you have of each of the 10 waveforms?

PredictorX1 t1_j0cz1pr wrote on December 15, 2022 at 7:16 PM

Reply to Neural networks and machine learning for data science in business [D] by lordgriefter

The term "neural networks" covers a rather wide collection of techniques. While deep learning models consume astronomical amounts of data, older "shallow" neural networks (single hidden layer MLP, for instance) are often used with observations counts in the thousands.

PredictorX1 t1_j08cat9 wrote on December 14, 2022 at 8:16 PM

Reply to comment by acardosoj in [D] Industry folks, what kind of development methodology/cycle do you use? by DisWastingMyTime

In my experience, data science features costs which are relatively stable, and whose payment is committed to on an ongoing basis as a necessary part of the business by management. The only time costs would come into question is when more people were to be hired, on a permanent basis. Tracking the activity itself is handled by a manager of a small team, who periodically presents results to upper management. The only real "project management" I see is done in small teams when management assigns tasks and deploys or reports results to external entities. Tracking of progress is, again, in my experience, a light activity. I just don't perceive the need for excessive formality in the management of data science.

PredictorX1 t1_j082k9y wrote on December 14, 2022 at 7:16 PM

Reply to comment by acardosoj in [D] Industry folks, what kind of development methodology/cycle do you use? by DisWastingMyTime

>CRISP is not a project management methodology...

That was my point: Data science work needs a technical procedure, not project management.

PredictorX1 t1_iztv3pj wrote on December 11, 2022 at 8:28 PM

Reply to [D] Industry folks, what kind of development methodology/cycle do you use? by DisWastingMyTime

I've never been at a workplace which used any of the structures you mention. Honestly, model development is fairly straightforward from the project management and software development perspectives. The clever bit is the statistics/machine learning, and the parts requiring the most care are data acquisition (problem definition, statistical sampling, ...), model validation (error resampling, testing for important sub-populations, ...) and deployment (verifying the deployed model, ...). Most serious analysts I know use something that resembles CRISP.

PredictorX1 t1_iyzsby0 wrote on December 5, 2022 at 12:20 PM

Reply to [D] What is the advantage of multi output regression over doing it individually for each target variable by triary95

For modeling solutions featuring intermediate calculations (such as the hidden layers of multilayer perceptrons), the hope is that what is learned about each target variable might be "shared" with the others. Whether this effect yields a net gain depends on the nature of the data. Outputs in a multiple-output model which is trained iteratively tend to reach their optimum performance at differing numbers of iterations. There is also the logistical benefit of only having to train one, larger model versus several.

PredictorX1 t1_iyvvu6k wrote on December 4, 2022 at 4:11 PM

Reply to comment by TopGun_84 in [D] What methods would you recommend for building an image-stitching AI? by YourBoyZeus

It would depend on too many details to say simply 'yes' or 'no', but conventional image fusion would take less computation than CNN or GANN.

PredictorX1 t1_iyr4a7q wrote on December 3, 2022 at 2:51 PM

Reply to [D] What methods would you recommend for building an image-stitching AI? by YourBoyZeus

You may have reasons for specifically using A.I. to do this, but image fusion is well studied in image processing. If you're interested, look into "image registration" and "image fusion".

PredictorX1 t1_iykbv9p wrote on December 2, 2022 at 12:57 AM

Reply to [D] What are promising research areas of machine learning in the humanities? by hogfd

Broadly, I've seen a number of applications of statistical modeling or machine learning in music, literature and still imagery and movies. Patterns are discovered, permitting the generation of art, determination of authorship, detection of alterations (areas painted over, written passages modified, ...).

Honestly, I think that simple statistical procedures have been among the most interesting in facilitating critique. How many main characters? How many lines of spoken dialogue? How many special effects? ...

PredictorX1 t1_iyk9mqk wrote on December 2, 2022 at 12:39 AM

Reply to [p] Really Dumb Idea(bear with me) by poobispoob

How would this work in practice? You pull up to the location, pan a digital camera around the area, the system makes a recommendation and you strip down and change into the selected camouflage?

I think something like this could be done, but I would think that most people would be able to make this judgment upon seeing the physical environment.

Some technical challenges which come to mind:

Any location will likely have varying lighting conditions (in bright sunlight out in the open, in semi-shaded areas, in large shadows from trees and large rocks, backlighting, ...), varying vegetaion (sage, nothing, large deciduous trees, pines, ...). The person may be laying down in the bushes, walking down an open path, ...

PredictorX1 t1_ixucpv3 wrote on November 26, 2022 at 12:40 PM

Reply to comment by [deleted] in [R] Approach to identify clusters on a time series by [deleted]

Usually, the assumption is made that the variables are equally "important", so they are standardized. Most often this is done, for each variable, by subtracting the mean then dividing by the standard deviation. Then, the data is clustered, for instance by k-means to discover the clusters. Have you gathered and prepared the data? What kind of clustering algorithms do you have?

PredictorX1 t1_ixu8zje wrote on November 26, 2022 at 11:52 AM

Reply to [R] Approach to identify clusters on a time series by [deleted]

It sounds like you want a classifier, not clustering. If the data includes the washing programs in use at any given time, then building a classifier is possible. If that information is not available, then the remaining variables ("water consumption, electricity consumption, noise and so on") can be clustered, though the clusters may not correspond exactly to the washing programs. Just determining how many clusters should be used can be a challenge.

PredictorX1 t1_ixlzohu wrote on November 24, 2022 at 1:29 PM

Reply to comment by jellyfishwhisperer in [D] inference on GNN by Beneficial_Law_5613

Good points! I'd also mention that, if probability estimates are desired, the numeric model outputs could be calibrated as a separate step at the end.

PredictorX1 t1_ixlvw4d wrote on November 24, 2022 at 12:51 PM

Reply to comment by Beneficial_Law_5613 in [D] inference on GNN by Beneficial_Law_5613

Even if your model outputs 0.4 for one class, and 0.6 for the other, at a 93% accuracy, it has learned something. The real question is: How do you intend to use this model's output? If you only care about assorting items to these two categories, then you have achieved 93% accuracy already: Whether this is high enough for your purposes is dictated by the larger problem being solved. If, however, you would prefer more specific predicted probabilities, then you should consider other performance measures, such as cross-entropy.

PredictorX1 t1_ixluuih wrote on November 24, 2022 at 12:39 PM

Reply to [D] inference on GNN by Beneficial_Law_5613

When you say that accuracy is 93%, are you referring to class accuracy? In other words, your model may output a probability, but class accuracy measures how often the predicted probability is on the correct side of 50%. If so, then predicted probabilities pf a 93% accuracy model could certainly frequently be 0.4 and 0.6 (or even 0.49 and 0.51), since the model is still accurately predicting classes.

PredictorX1 t1_iwiflod wrote on November 15, 2022 at 9:34 PM

Reply to Update an already trained neural network on new data by Thijs-vW

Assuming that you will not change the network architecture, I suggest concatenating both data sets, start training using the existing weights, and train as long as necessary. I would suggest re-examining the size of the hidden layer, though (which implies starting training over from scratch).